Content uploaded by Emil O. W. Kirkegaard
Author content
All content in this area was uploaded by Emil O. W. Kirkegaard on Jul 01, 2023
Content may be subject to copyright.
MANKIND QUARTERLY 2023 63:4 527-600
A Genetic Hypothesis for American Race/Ethnic
Differences in Mean g:
A Reply to Warne (2021) with Fifteen New Empirical Tests
Using the ABCD Dataset
John G. R. Fuerst*
Cleveland State University and University of Maryland, USA
Vladimir Shibaev
Vladivostok State University of Economics and Service, Russia
Emil O. W. Kirkegaard
Ulster Institute of Social Research, London, UK
*Corresponding author: j122177@hotmail.com
Intelligence tests are excellent predictors of school and job performance,
and racial/ethnic differences in mean IQ are common. Based on five
lines of evidence, Warne (2021) builds a case for partly genetic causes
of differences in general intelligence (g) across American
self/parental-identified race or ethnicity (SIRE). Based on a careful
reading of Warne (2021) and the authors he cites, we generated 15
predictions flowing from a partial genetic hypothesis. These predictions
concern (1) mean cognitive differences and Spearman’s hypothesis, (2)
measurement invariance across European genetic ancestry, (3) high
within-group heritability and low shared environmentality, (4) admixture
regression for g, (5) polygenic scores, (6) brain volume, and (7) results
based on the method of correlated vectors. We used the Adolescent
Brain Cognitive Development Study sample (N= 10,245) to test these
hypotheses using classical and state-of-the-art statistical techniques.
Decomposition of variance using twins showed that the heritability of
intelligence and of brain/intracranial volume estimates were, respectively,
moderate and high for both the White and the non-White subsamples,
while the variance attributable to shared environment was low. Within
SIRE groups, both genetic ancestry and education-related polygenic
527
MANKIND QUARTERLY 2023 63:4
scores (eduPGS) predicted both brain volume and g. Moreover, brain
volume was weakly but statistically significantly related to g(r= .14 to
.25). Path and mediation analysis showed that total brain volume
explained approximately 15% of the association between European
ancestry and gand also explained approximately 8% of that between
eduPGS and g. Finally, based on the method of correlated vectors
(MCV), a positive manifold was found for genetic, brain volume, and
ancestry/SIRE-related variables. We conclude that the results support
the hypotheses tested and are in line with a partial genetic hypothesis.
Keywords: Brain volume, g, Ancestry, Admixture, Heritability, Polygenic
scores, SIRE
IQ tests are statistically robust predictors of school and job performance,
and average cognitive score differences between self/parental-identified race or
ethnicity (SIRE) groups in the US such as Blacks, Whites, and Hispanics have
been well documented (C. Murray, 2021; Roth et al., 2017). These differences
are not due to psychometric bias, as evidenced by the finding of measurement
invariance across American SIRE groups (e.g., Frisby & Beaujean, 2015;
Scheiber, 2016a,b). Rather, these score differences represent real average
differences in the underlying construct of general intelligence. Among
intelligence researchers, there is disagreement about the degree to which
differences are due to genes; notably, when surveyed anonymously, an
overwhelming majority of experts, meaning people who have published on this
specialist topic, attributed some part of the differences to genetics. Most of the
disagreement is in the degree of the genetic contributions, varying from 0% to
100% (Rindermann et al., 2020). Determining the source of these differences is
important in order to better understand and address them and their social
implications (Flynn, 2018; Pesta et al., 2021).
Warne’s lines of evidence
Recently, Warne (2021, based on Warne, 2020b) outlined five lines of
converging evidence favoring a partially genetic model for the cause of SIRE
group differences. In summary, Warne’s five lines of evidence are:
1. The consistent finding of measurement invariance,
2. Finding of constraints of high within-group heritability on between-group
environmentality,
3. Findings of genetics-based studies applying the admixture regression
methodology,
528
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
4. Findings of genetics-based studies using education-related polygenic scores
(eduPGS),
5. Large-scale, consistent support based on the method of correlated vectors
(MCV).
The evidence and rationale for these five lines of evidence has been
discussed by Warne (2021). Our analyses in this respect constitute an
attempted replication of previous work.
Brain volume
An additional line of evidence frequently cited by proponents of a partially
genetic hypothesis is the finding of average differences in brain volume
(Rushton & Jensen, 2005). Warne (2021) did not find this sixth line convincing,
and concluded:
I call upon psychologists to have an open mind and to investigate the
evidence for themselves, starting with the sources I have cited in this
article. I also encourage social scientists to make research contributions
that can address this question.
Regarding brain volume, numerous meta-analyses have confirmed that
brain volume/size correlates with intelligence (Plomin & von Stumm, 2018). In a
recent meta-analysis of over 194 studies, Pietschnig et al. (2022) found a
correlation of r= .24 to .29 between brain volume and full-scale IQ. The
association was found to be higher in samples which used highly g-loaded tests,
hence tests of high cognitive complexity. As for causes, twin studies indicate
that the relation between brain volume and intelligence is predominately genetic
(Betjemann et al., 2010; van Leeuwen et al., 2009; Vuoksimaa et al., 2015).
Moreover, recent research shows that polygenic scores for education positively
relate to brain volume (Elliott et al., 2019; Judd et al., 2020), confirming that the
association between brain volume and intelligence among individuals has a
genetic etiology. Similarly, Lee et al. (2019) found that the association between
brain volume and intelligence exists within sibships; moreover, they found that
the genetic variants associated with intracranial volume overlapped with those
associated with eduPGS.
A large number of studies based on autopsy, MRI, and craniometric data
confirmed differences in brain volume/cranial capacity between continental
ancestry groups (Beals et al., 1984; Rindermann, 2018; Rushton & Rushton,
2003), so global craniometric variation, including variation in mean brain size, is
well established (Relethford, 2010). Generally, groups with their recent
evolutionary past near the equator have on average smaller brains than groups
529
MANKIND QUARTERLY 2023 63:4
which evolved further from the equator (Beals et al., 1984; Rindermann, 2018).
In line with evolutionary interpretations of these geographic differences, recent
studies have shown that brain morphology is related to continental genetic
ancestry in admixed populations (Fan et al., 2015; Mehta et al., 2017).
Because the relation between brain volume and intelligence is
predominately genetic in origin within populations, and because certain ancestry
groups differ in both brain volume and intelligence, an obvious scientific
conjecture is that ancestry-related differences in intelligence may be partially
explained by ancestry-related differences in brain volume (Cochran &
Harpending, 2009; Rushton & Jensen, 2005). Thus Rushton and Rushton
(2003) claim: “[B]rain size-related variables provide the most likely biological
mediators of the race differences in intelligence” (p. 139).
Hypotheses
In this paper, we follow Warne’s call for research, and test the overarching
hypothesis that American race/ethnic differences have a partial genetic basis.
To do this, we first carefully read through Warne’s (2021) arguments and those
of the authors cited by Warne (e.g., Lasker et al., 2019; Rushton & Jensen,
2005; Warne, 2020a). From this reading, we then generated the following seven
key hypotheses and tested them on a large database. When testing these
hypotheses, we focused on individuals who were primarily of European, African,
and Amerindian ancestry, and not those of primarily East Asian, South Asian, or
Pacific Islander descent. One reason for this was because power analyses
indicated that in our database the sample sizes for the latter groups were too
small to detect the predicted effects for a number of the analyses. Another
reason was that there were problems with reliable and interpretable East Asian,
South Asian, and Pacific Islander ancestry estimates. For example, Pacific
Islanders in the US are primarily of Polynesian ancestry, but there were no
Polynesian samples to use as reference samples in either the 1000 Genomes or
HapMap datasets. Because we did not have a Pacific Islander component, the
subsample from Hawaii, which is known to contain admixed European,
indigenous Pacific Islander, and East Asian subpopulations (Sun et al., 2021),
was assigned East Asian ancestry in place of Pacific Islander ancestry. But
because Pacific Islander and East Asian genetic ancestry are differently
associated with cognitive ability (Kirkegaard et al. 2019, Table 7), including
these groups might introduce confounds in our analyses; so we decided to
exclude them.
Measurement of group differences and Spearman's hypothesis
Warne reviews the literature showing group differences on various
530
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
measures. Whites generally have higher scores than Hispanics and Blacks. Our
Hypothesis 1 therefore states: Between the relevant SIRE groups, there are
medium to large differences in g, educational polygenic scores (eduPGS), and
brain/intracranial volume. To determine the magnitude of differences in g, we
employed multigroup confirmatory factor analysis (MGCFA) as recommended in
the literature (e.g., Dolan, 2000). Related to this, we also tested Spearman’s
hypothesis, which states that group differences in cognitive ability are primarily
on the gfactor. A strong confirmation of Spearman’s hypothesis has been taken
as support for a genetic hypothesis because general intelligence is “related to
many types of biological markers and is highly heritable” within groups
(Gottfredson. 2007). Note, while between-group mean differences in gand
brain/intracranial volume do not in and of themselves entail trait-relevant genetic
differences, the existence of phenotypic difference obviously is a prerequisite for
these differences to have partial genetic causes, as Warne (2021) argues.
Measurement invariance across European genetic ancestry
Several studies have shown that the differences across SIRE groups have
the same psychometric meaning as and are caused by a subset of the same
sources as those differences within SIRE groups and ancestry deciles (Lasker
et al., 2019; Warne, 2021). This leads to our Hypothesis 2: Intelligence
differences across European genetic ancestry will also exhibit strict
measurement invariance.
Within-group heritability and shared environmentality
Warne (2021) correctly points out that the between-groups heritability of a
trait is a mathematical function of both the quantitative genetic variation between
groups and the heritability of the trait within groups (DeFries, 1972a; McClearn
& DeFries, 1973). If, for example, the within-group heritability of the trait is zero,
the between-groups heritability will be zero, even if there are large trait-related
quantitative genetic differences between groups. So, the within-group heritability
matters for Warne’s (2021) hypothesis. Of course, this relation applies to
quantitative traits, in which case the within-group phenotypic variance is
non-zero; otherwise, the between-groups heritability becomes undefined
(DeFries, 1972a, p10).
Warne also argues that when the values of within-group environmentalities
(or 1 minus heritability) are modest, between-group differences are unlikely to
be explained by environment-only models. This is because when environmental
factors have only modest explanatory power within groups, environmental
differences between groups would need to be fairly large to account for the
531
MANKIND QUARTERLY 2023 63:4
typically observed moderate-to-large phenotypic differences. Of course, this
assumes that the sources of the between-group variance are a subset of the
sources of the within-group variance (Rushton & Jensen, 2005; Sesardic, 2005;
Warne, 2021). With regards to environmental sources of variance, shared
environmental influences, the ones that make siblings similar to one another
(e.g., parental income, neighborhood and school quality, and common family
experiences), are of particular interest. This is because SIRE differences in
intelligence are typically hypothesized as being caused by these sorts of
variables (e.g., Weiss & Saklofske, 2020) and because ancillary evidence, such
as from differential regression to the mean and sibling intraclass correlation
studies (Hu et al., 2019), implies that differences are due to inter-generationally
transmitted factors which impact siblings similarly. So, the within-group
environmentality of traits also matters for Warne’s (2021) hypothesis.
It has also been hypothesized that genetic influences might be substantially
lower and environmental influences substantially higher in poorer performing
race/ethnic groups (e.g., Greenspan, 2022; Guo & Wang, 2002;
Scarr-Salapatek, 1971; van den Oord & Rowe, 1997). If so, this would allow
environmental factors that vary within groups to be more potent sources of
between-group differences.
This review of the literature leads to our Hypothesis 3a: Within American
SIRE groups, most of the variance in intelligence and in brain/intracranial
volume is attributable to genes, while little is attributable to shared environment,
and our Hypothesis 3b: American SIRE groups exhibit similar heritabilities and
environmentalities.
Admixture regression for g
Warne (2021) describes admixture regression in detail and we refer the
interested reader to that discussion. This method has been frequently used in
genetic epidemiology over the last twenty years. It has been applied to
numerous traits including obesity (Fernández & Shiver, 2004), diabetes (Cheng
et al., 2012), sleep depth (Halder et al., 2015), dopamine receptor availability
(Wiers et al., 2018), cigarette smoking behaviors (Choquet et al., 2021), and
metabolomics (Mehanna et al., 2022). In essence, admixture regression cleverly
uses the natural experiment of mating between individuals with different
ancestral backgrounds to infer the environmental and genetic components to
trait variation across racial and ethnic groups (Connor & Fuerst, 2022).
Because it is well recognized that there can be ancestry-correlated
environmental influences (Fernández & Shiver, 2004), variables which may
capture these influences are included in the regression models (Connor &
Fuerst, 2022). Typically, SES is included because European ancestry is fairly
532
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
consistently related to SES within admixed American populations (Kirkegaard et
al., 2017). Given this, and given that research indicates that SES can
statistically explain a large portion of White-Hispanic and White-Black
intelligence gaps (Weiss & Saklofske, 2020), we add a general factor of SES to
the regression model which is based on the education, income, finances,
marriage status, and employment status of the parents, along with a rating of
neighborhood crime and a sixteen-item measure of neighborhood deprivation.
This compound SES variable is added in addition to SIRE, child and family
migrant status, and family and recruitment-site effects, which control for,
respectively, race- and ethnicity-related culture, family-migrant status,
geography, and family-related factors such as the number of children in the
family.
The findings from admixture regression research lead to our Hypothesis 4:
The mean intelligence differences between SIRE groups are partly explained by
genetic ancestry, and as a corollary, European genetic ancestry will be positively
associated with g(relative to African and Amerindian ancestry). Note, the
predictions will differ depending on the specific groups. For example, because
Northeast Asians typically score somewhat above individuals of European
descent, European ancestry is predicted to be negatively correlated with gin
mixed European and Northeast Asian ancestry groups (Warne, 2020a).
Polygenic scores and g
Warne (2021) reviews the literature showing that polygenic scores (PGSs)
are genetically related to gwithin SIRE groups. This leads to our Hypothesis 5:
Educationally-related polygenic scores correlate positively with intelligence
within SIRE groups and among siblings within families.
We subsequently used the eduPGS in two distinct ways. First, we included
them in path models relating ancestry, brain volume, and g. When doing so, we
explicitly assumed that polygenic score differences corresponded with genetic
differences causative of trait differences and estimate the amount of the
between-group phenotypic differences in gand brain volume that the eduPGS
can account for. While this causal assumption may not hold for a variety of
reasons (e.g., Yair & Coop, 2022), it's very plausible. This is because the
eduPGS used here 1) causally explain differences within ancestries, 2)
statistically explain differences between ancestries, and because 3)
ancestry-associated eduPGS differences have yet to be accounted for in terms
of specific forms of population structure-related confounding (Fuerst et al., 2021;
Lawson et al., 2020).
Second, we used the eduPGS to estimate the effects of g-related genes on
533
MANKIND QUARTERLY 2023 63:4
tests within SIRE groups (controlling for ancestry). We then correlated these
effects with the magnitude of cognitive differences between SIRE groups. These
later analyses avoid any assumptions regarding the meaning of eduPGS
differences between ancestries.
Brain volume, g, ancestry, and eduPGS
Much has been written about the link between group differences in
intelligence and brain volume (see Rushton & Jensen, 2005). This rich literature
allowed us to test a series of related hypotheses that could provide indirect
genetic evidence for a genetic hypothesis of differences in brain volume.
Hypothesis 6a is that brain volume correlates with intelligence within SIRE
groups and among siblings within families. Confirmation of this hypothesis
implies that brain volume is genetically related to gwithin SIRE groups.
Hypothesis 6b is that brain/intracranial volume differences between SIRE
groups are also explained by genetic ancestry. Correspondingly, Hypothesis 6c
is that within SIRE groups primarily of European, African, and Amerindian
ancestry, European genetic ancestry will be positively associated with
brain/intracranial volume relative to African and Amerindian ancestry (see
Rushton & Jensen, 2005).
Hypothesis 6d is that SES explains a portion of the relation between both
ancestry and gand between ancestry and brain volume, and that the relations
are substantially genetic in origin. This hypothesis is based on Warne’s (2021b,
p. 488) statement that “most people live in environments that are at least
partially influenced by their genes, which means that even an “environmental”
variable is often partially caused by genetics.” It is also based on the finding that
parental SES is genetically related to child intelligence within populations (e.g.,
Trzaskowski et al., 2014). The finding that the relations of SES with both gand
brain volume have a genetic basis in this sample would weaken environmental
explanations which rely on SES to environmentally explain differences, as it
undermines the frequently made assumption that the correlation between traits
and SES is environmental in origin (i.e., “the sociologist’s fallacy”; Sesardic,
2005).
In our last brain-related hypotheses, we focus on the roles that polygenic
scores and brain volume play if the genetic hypothesis is correct. Hypothesis 6e
is that brain volume differences are predicted by eduPGS in all SIRE groups.
Hypothesis 6f is that the relation between European genetic ancestry and gis
mediated both by eduPGS and brain volume; moreover, Hypothesis 6g is that
the relation between eduPGS and gis mediated by brain volume. The last two
hypotheses, which are tested at the same time, are logical extensions of the
prediction that eduPGS and brain volume account for the association between
534
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
genetic ancestry and g.
Method of correlated vectors
Warne (2021) reviews the extensive literature showing that group
differences on subtests of IQ batteries are strongly related to the cognitive
complexity of these same subtests. Because the cognitive complexity of IQ
subtests is strongly linked to their heritability, this is indirect evidence that group
differences in IQ have a genetic component. We had various variables that are
hypothesized to be genetic, so our Hypothesis 7a was that there was a positive
correlation between the effects on putative genetic variables across cognitive
tests. Rushton (1999) showed that all genetic variables in his study correlated,
so that a strong higher-order genetic factor could be computed in a factor
analysis and all putative genetic variables loaded on it. This leads to our
Hypothesis 7b that there would be a higher-order genetic factor resulting from
the intercorrelation among genetic variables in our study.
Methods
Dataset
The ABCD is a collaborative longitudinal project involving 21 sites across
the US. The National Institute of Health (NIH) supported ABCD is the largest
longitudinal study of brain development conducted in the US to date.
Approximately 11,000 9-10-year-old children were sampled at baseline,
between 2016 and 2018, using a probabilistic sampling strategy. The ABCD
subjects were limited to children who were fluent in English and who did not
have severe medical, neurological, or psychiatric conditions. The children are
broadly representative of healthy US children in this age range. Informed
consent was provided by the parents.
For all analyses, we utilized the ABCD 3.01 dataset. Because our focus
was on groups who are primarily of African, European, and Amerindian
ancestry, we excluded any child who was identified as being either Pacific
Islander or Asian.
Variables for the analyses
The following variables were used for the analyses:
1. Eleven age-corrected cognitive tests
The following cognitive measures were given at baseline: the seven NIH
Toolbox® (NIHTBX) neuropsychological battery tests, NIHTBX Wechsler
Intelligence Scale for Children’s Matrix Reasoning, the Little Man Test (efficiency
535
MANKIND QUARTERLY 2023 63:4
score), the Rey Auditory Verbal Learning Test (RAVLT) immediate recall, and
RAVLT delayed recall.
Regarding the first seven of these, the NIHTBX neuropsychological battery
was designed to measure a broad range of cognitive abilities. It consists of
seven tasks which index attention (Flanker Inhibitory Control and Attention
Task), episodic memory (Picture Sequence Memory Task), language abilities
(Picture Vocabulary Task & Oral Reading Recognition Task), executive function
(Dimensional Change Card Sort Task & Flanker Inhibitory Control and Attention
Task), processing speed (Pattern Comparison Processing Speed Task), and
working memory (List Sorting Working Memory Task) (Akshoomoff et al., 2014;
Weintraub et al., 2014; Thompson et al., 2019).
For the seven NIHTBX subtests, the ABCD precomputed age-corrected
scores were used. For the remaining four tests, scores were adjusted for age by
applying regression with a cubic spline of age to the full sample. Subsequently,
scores for all eleven tests were standardized.
2. General intelligence (g) scores
Multi-group confirmatory factor analysis (MGCFA) was previously
conducted based on the 11 cognitive tests noted above (Fuerst, Hu, & Connor,
2021). We repeated the analysis using the same model and specifications.
When doing so, we first assessed if outliers and missing data had any impact,
and also if our results remained robust after imputation of missing data, removal
of outliers, and adjustments for age and sex. Next, we conducted both
exploratory factor analysis and multi-group confirmatory factor analysis. A
three-broad-factor model (memory, complex cognition, and executive function)
with a general factor (g) fit the data well. As previously found, strict
measurement invariance held across the SIRE groups (and across sex groups
and age measured in months). The complete results are detailed in Appendix A.
In the best fitting and additionally most parsimonious model for SIRE group
differences, galone explains the differences. gfactor scores from this MGCFA
model were standardized (M= 0.00; SD = 1.00) in the full sample of 10,245
children.
3. NIH Toolbox® (NIHTBX) Cognition Battery
As the gscores used for the admixture regression analyses were
dependent on MGCFA model specification, we additionally used the
age-corrected NIH Toolbox® Cognition Battery (NIHTBX) summary composite
scores (“nihtbx_totalcomp_agecorrected”) precomputed by NIH as an
alternative measure of general cognitive ability. Our choice was made for the
sake of replicability. The NIHTBX was normed for samples between ages 3 and
536
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
85; tasks correlate highly with comparable ability assessments (Weintraub et al.,
2014). The NIHTBX cognition battery had previously been found to be
measurement invariant across American Black, Hispanic, and White SIRE
groups (Lasker et al., 2019). Prior to analysis, we standardized scores on the
full sample.
4. Unadjusted and adjusted brain and intracranial volume
ABCD provides summary brain (“smri_vol_scs_wholeb”) and intracranial
volume (“smri_vol_scs_intracranialv”) variables which show volume in mm3. We
standardized these unadjusted variables.
We additionally created sex-, age-, and MRI assessment site-adjusted brain
variables to use in the path analysis models and biometric analyses. We adjust
for collection site to account for differences in MRI protocol across collection
sites. Brain and intracranial volume are adjusted for sex and for the cubic spline
of age-, height-, and site-related fixed effects. These adjusted brain variables
were then standardized.
5. Child US born & immigrant family
Parents were asked if the child was born in the “United States”. This
variable is recoded as “1” for “United States” and “0” for all other responses.
Parents were also asked if anyone in the child’s family, including maternal or
paternal grandparents, was born outside of the United States. This variable is
also coded as “1” for “Yes” and “0” for all other responses.
6. General socioeconomic status (SES)
A general factor of SES was computed by applying principal components
analysis (PCA) to seven indicators of SES. To analyze the data, we used the
PCAmixdata R package (Chavent et al., 2014), which allows handling mixed
categorical and continuous data. The first unrotated component explained 42%
of the variance, indicating a strong general factor. The loadings on the first PCA
factor for the seven indicators were: financial adversity (.31), area deprivation
index (.49), neighborhood safety protocol (.31), parental education (.54),
parental income (.66), parental marital status (.43), and parental employment
status (.23). We note that this variable was standardized in the full sample.
Details are provided in Appendix B.
7. SIRE fraction & Hispanic
Based on the 18 questions inquiring about the child’s race, we created four
dummy SIRE variables: Black, White, Native American, and Not Otherwise
537
MANKIND QUARTERLY 2023 63:4
Classified (NOC). The latter category includes those identified as: “Other race,”
“Refused to answer,” or “Don’t know”. These variables were then converted into
interval variables, calculated as the value selected for each of the four groups (0
or 1) over the total number of responses (1 to 4). As a result, individuals were
assigned four SIRE fractions (frac_White SIRE, frac_Black SIRE,
frac_Native_American SIRE, & frac_NOC SIRE), each ranging from 0 to 1. For
example, a Hispanic individual who chose both White and Black SIRE receives
a ½ weighting on each of the associated group variables; an individual who
chose all three of Black, Native American and White SIRE has 1/3 weight for
each. This ensures that the sum across the group variables equals one for
every individual. The White category is used as the base or benchmark group
and the associated variable is dropped from the regression. This SIRE coding is
used because interval SIRE variables had previously been found to be most
predictive when included in models alongside genetic ancestry (Kirkegaard et
al., 2019). These variables were left unstandardized, allowing the
unstandardized beta coefficients for SIRE fraction to be interpreted as the effect
of a change in 100 percent SIRE identity on one standardized unit of the
dependent variable.
We also created a dummy variable for Hispanic ethnicity, coded as “1” for
“Hispanic” and “0” for non-Hispanic.
8. Height
ABCD provided a summary variable (“anthroheightcalc”), which records the
child’s height (in inches) based on an average of up to three measures. We
divide the values by 12 to give height in feet and then standardize the results in
the full sample.
9. Genetic ancestry
Subjects were genotyped using Illumina XX, with 516,598 variants directly
genotyped and surviving the quality control carried out by the data provider. We
used the 3.0 release of the genotypic dataset, which also includes an edition
with imputed variants using TOPMED and the Eagle 2.4 software. All our work
was done on build 38. Files in hg17/37 were lifted to hg38 using liftOver
(https://github.com/sritchie73/liftOverPlink) and the GRC chain file at
ftp://ftp.ensembl.org/pub/assembly_mapping/homo_sapiens/
(GRCh37_to_GRCh38.chain.gz).
Before global admixture estimation, we applied quality control analyses
using plink 1.9. We used only directly genotyped, bi-allelic, autosomal SNP
variants (N= 494,433 and 493,196, before and after lifting, respectively). We
pruned variants for linkage disequilibrium at the 0.1 R² level using plink 1.9
538
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
(--indep-pairwise 10000 100 0.1), as recommended in the admixture
documentation
(https://vcru.wisc.edu/simonlab/bioinformatics/programs/admixture/admixture-m
anual.pdf). This variant filtering was carried out in the reference population
dataset to reduce bias due to sample representativeness. After pruning, we
were left with 99,642 variants. To ensure a reasonable balance of populations in
the estimation dataset, we merged the target samples from ABCD with
reference population data for the populations of interest. We desired a k=5
solution (European, Amerindian, African, East Asian, and South Asian), so we
merged with relevant samples from the 1000 Genomes database and from the
HGDP database. We used all populations in the 1000 Genomes database and
from the HGDP database, except Adygei, Balochi, Bedouin, Bougainville,
Brahui, Burusho, Druze, Hazara, Makrani, Mozabite, Palestinian, Papuan, San,
Sindhi, Uygur, and Yakut. These reference populations were excluded because
they were overly admixed or because, in the case of Melanesians and San, the
individuals in the ABCD sample lacked significant portions of these ancestries.
Because the estimation sample would still be highly skewed towards
European ancestry when using this joint sample, we used repeated subsetting
to achieve balance. Specifically, we split the ABCD target samples into 50
random subsets, each with approximately 222 persons, and merged them one
at a time with the reference data, followed by running admixture k=5 on each
merged subset. We verified that these subsets produced stable results by
examining the stability of the estimates for the reference samples. There was
very little variation across runs, e.g., for the reference sample with the most
variance (European, NA12342), the mean estimate was 98.3% with SD = 0.17%
across the 50 runs. Because the Admixture software does not label the resulting
clusters, we used five reference samples to index the populations so the data
would be merged correctly. In no case did this produce inconsistencies.
A reviewer stated that the reliability of our admixture estimates was so low
that it would invalidate all our conclusions. However, we had validated our
genetic ancestry percentages against those in the ABCD dataset, which are
based on a K= 4 solution (African, European, East Asian, and American
ancestry). For these, ABCD researchers used 1000 Genomes populations as
the reference samples and fastStructure as the algorithm (Hatton, 2018). Note,
we computed our own genetic ancestry percentages for two reasons: 1) We
desired a K= 5 solution to separate South Asian and European ancestry; 2) We
were concerned with ABCD’s Amerindian admixture estimates as these were
computed using only reference samples from 1000 Genomes (which had, for
Amerindian ancestry, only Mexicans from Los Angeles and Peruvians from
539
MANKIND QUARTERLY 2023 63:4
Lima, Peru). Figure 1 shows the correlation matrix for our percentages and
those provided by ABCD, and it can be seen that the Amerindian, African, and
European ancestry percentages are perfectly correlated and the East Asian
percentages are nearly perfectly correlated (r= .98). So, both sets of ancestry
estimates can be considered virtually perfectly reliable.
By design, the estimated ancestry proportions for an individual i,Ai1,
Ai2,…,Ai5, are non-negative and sum to one for every individual, using the
following formula:
ℎ=1
5
∑𝐴𝑖ℎ=1 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑖
In order to use all ancestry variables in a regression context, one category must
be chosen as the base group and the associated variable dropped from the
regression, otherwise the unit-sum condition across the variables produces a
singularity in the regression model. When using multiple ancestries, we used the
European ancestry category as the base, because it has the largest number of
nonzero observations. We used the European ancestry variable as the single
ancestry for ancestry by outcome predictor effect plots. This was because these
effect plots are with respect to a single ancestry predictor.
These ancestry variables were left unstandardized. This allows the
unstandardized beta coefficients for genetic ancestries to be interpreted as the
effect of a change in 100 percent ancestry on one standardized unit of the
dependent variable.
540
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
Figure 1. Matrix of correlations between genetic ancestry percentages used in
this paper (e.g., European) and those provided in the ABCD dataset (e.g., NIH
European)
10. MTAG-eduPGS & poly-eduPGS
For the main analyses, we created two eduPGS variables. First, we used
the genome-wide association study (GWAS) results from Lee et al.’s (2018)
meta-analysis, based on European-descent individuals. Specifically, the
multi-trait analysis of genome-wide association study (MTAG) eduPGS SNPs (N
= 8,898 variants in the ABCD sample). MTAG is a method for analyzing statistics
from genome-wide association studies (GWAS) on different but genetically
correlated traits (e.g., education and intelligence). Lee et al. (2018) applied
MTAG to four traits: education (n= 1,131,881), intelligence (n= 257,841),
hardest math class taken (n= 430,445), and mathematical ability (n= 564,698);
we used Lee et al.’s MTAG results for education. This predictor, denoted
MTAG-eduPGS, was used because it was previously shown to predict gin
541
MANKIND QUARTERLY 2023 63:4
European, Hispanic, and African-American populations (Lasker et al., 2019) and
because it was assessed for some commonly proposed forms of
population-structure-related confounding (e.g., biasing effects of discovery
population-specific variants and of derived versus ancestral variant status,
biasing effect of SNP sign discordance between populations, and the
discrepancy between population-GWAS and within family coefficients) (Fuerst et
al., 2021).
Second, we created eduPGS, denoted poly-eduPGS, using the new
PolyFun predictor detailed by Weissbrod et al. (2022) (using N~ 750,000
variants in the ABCD sample). Weissbrod et al. (2022) trained this predictor on
334,353 unrelated individuals of British descent from the UK Biobank. The
relevant trait was having a college education or not. This predictor uses
genome-wide functionally-informed fine-mapping to precisely estimate causal
effects. In doing so, this method is known to circumvent linkage disequilibrium
differences, which can confound trans-ancestral comparisons (Weissbrod et al.,
2020, 2022). Moreover, as the training sample only includes the UK Biobank
data, it is relatively homogeneous; this characteristic should reduce
population-structure-related confounding. While this predictor does not provide
an accurate estimate of the effect size, it provides a precise estimate of causal
effects, at least given a sufficiently large training sample. Weissbrod et al.
(2022) provide the SNP set and SNP weights, based on the UK Biobank, and
the code for generating this eduPGS. Weissbrod et al. (2022) also created
additional predictors called PolyPred and PolyPred+. These combined Polyfun
with other predictors that do not use functionally-informed fine-mapping (e.g.,
BOLT-LMM). The combined predictors, which can incorporate non-European
training samples, have been shown to increase the predictive validity of the
polygenic scores. We did not use the combined predictors because we wanted
a predictor based on fine-mapping only, because we were particularly
concerned with confounding due to differences in linkage disequilibrium.
While an updated meta-analysis of educational GWAS has since been
published (Okbay et al., 2022), only about 10k variants have been made public.
This is because results based on the 23andME sample, which accounted for
most of the sample size increase since Lee et al.’s (2018) meta-analysis, were
not published. Moreover, this SNP set is made available by the Social Science
Genetic Association Consortium (SSGAC), and the SSGAC currently proscribes
the use of eduPGS for making comparisons between ancestry groups. So, it is
not possible to assess population-related confounding with these eduPGS.
Given this, the eduPGS above are the best currently available ones for the
purposes of the present analyses.
542
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
11. First 20 genetic principal components
For several of the analyses involving eduPGS, we took
population-structure-related effects into account by controlling for the first twenty
ancestry principal components generated by PLINK v1.90b6.8. We used all
imputed SNPs to compute these PCs.
12. Pseudo eduPGS
Using PLINK v1.90b6.8, we selected ten random sets of 8,898 variants. We
randomly assigned the MTAG-eduPGS’s beta weights to these sets of SNPs.
We used these individually and we also averaged these to create averaged
pseudo eduPGS.
Analyses
Analyses related to measurement of group differences and Spearman's
hypothesis
We computed means, standard deviations, and effect sizes for g, the two
eduPGS, and brain/intracranial volume. For effects sizes, we computed Cohen’s
dvalues.This commonly used metric is computed as the difference between the
group means over the pooled standard deviation for the two groups.
Analyses related to measurement invariance
We repeated the multi-group confirmatory factor analysis (MGCFA) of
Fuerst et al. (2021) as detailed in Appendix A. We additionally ran local
structural equation models (LSEM) to determine if measurement invariance also
held with respect to European genetic ancestry deciles in addition to the four
SIRE groups. This method was applied to the 10,245 White, Hispanic, Black,
and Other individuals with European genetic ancestry, g, and brain volume
scores.
The LSEM method has been detailed by Hildebrandt et al. (2016) and was
previously applied to genetic ancestry and intelligence by Lasker et al. (2019)
and Lasker (2019) using two different samples. As with Fuerst et al. (2021),
when conducting confirmatory factor analysis we elected to use a higher-order
factor (HOF) model based on theoretical grounds (Decker, 2021; Hood, 2010)
and on the absence of empirical disconfirmation of the HOF model (Murray &
Johnson, 2013).
To assess invariance across European genetic ancestry, the following
model-fit indices are used: chi-square, comparative fit Index (CFI), root mean
square error of approximation (RMSEA), and Bayesian information criterion
(BIC). CFI estimates the discrepancy between the proposed model and the null
543
MANKIND QUARTERLY 2023 63:4
model: larger values indicate better fit. RMSEA estimates the discrepancy
related to the approximation, or the amount of unexplained variance (residual),
or the lack of fit compared to the saturated model: smaller values indicate better
fit. BIC is a comparative measure of fit used in the comparison of two or more
models; it evaluates the difference between observed and expected
covariances: smaller values indicate better fit.
We selected European ancestry as the ancestry variable because there
was no sparsity of data along this ancestral component.
Analyses related to within-group heritability and shared environmentality
A same-sex monozygotic and dizygotic twin sample was nested within the
ABCD sample. Details for this twin sample are provided by Iacono et al. (2018).
We used the SEM feature from the Lavaan (Rosseel et al., 2012) and the umx
(Bates et al., 2019) packages to decompose the variance using an A (additive
genetic), C (shared environment), and E (non-shared environment) model. For
both NIHTBX/gand brain/intracranial volume, this ACE model fits better than an
alternative ADE model, which replaces the C component with D (dominance) or
an alternative AE model, which excludes shared environment. This assessment
is based on a combination of the following fit indices: root mean square error of
approximation (RMSEA), Tucker–Lewis index, and χ2/df. For the SEM, we used
the theoretical genetic correlation of 1.0 for MZ and 0.5 for DZ. The correlations
for C and E, for both kinship classes, were set to 1 and 0, respectively. In
describing variance components, we adopted Chen et al.’s (2022) guidelines,
which indicate that low, moderate, and high genetic variance components
correspond to less than 30% of the variance, between 30% and 60%, and
greater than 60%, respectively.
For these analyses, twin pairs were identified and classified using the
ABCD’s precomputed family relationship and genetically inferred zygosity
variables. Variance component estimates were computed for the eleven
cognitive subtests, gscores, the NIHTBX composite scores, unadjusted brain
volume, unadjusted intracranial volume, adjusted brain volume, and adjusted
intracranial volume. For these analyses we did not use imputed data, except in
the case of the RVALT memory trials. Specifically, in this case, we imputed
missing data for, and based on only, the six RVALT trials
(pea_ravlt_sd_trial_i_tc, pea_ravlt_sd_trial_ii_tc, pea_ravlt_sd_trial_iii_tc,
pea_ravlt_sd_trial_iv_tc, pea_ravlt_sd_trial_v_tc, and pea_ravlt_ld_trial_vii_tc).
This was done because the short-term memory scores represent the average of
the first five trials and so missing data need to be handled.
To allow heritabilities for intelligence to be compared with previously
published results (e.g., Pesta et al., 2020, 2023), for these biometric analyses
544
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
we defined “Black” as including any non-Hispanic individual identified as being
African American. Thus “Black” was defined broadly and includes multi-racial
individuals. The “Other” group, correspondingly, was defined as everyone who
was not White, Hispanic, or Black as just defined. Because our primary
comparison, for the biometric analyses, was between the White and the
combined Hispanic, Black, and Other subgroups, this decision did not affect the
interpretation of our results. This definition of “Black” differs from that used for
the admixture regression analyses discussed below. In those, the “Black”
subgroups included only those individuals who were marked as being Black.
The latter was done for the purposes of minimizing possible sociocultural
variation which may confound admixture analyses.
We report the a’, c’, and e’ estimates, which are the square roots of the A,
C, and E estimates, because these represent the correlation between traits and,
respectively, genes, shared environment, and non-shared environment. These
values are directly relevant to Warne’s (2021) argument, because the magnitude
of environmental variation within populations needed to cause phenotypic
differences between populations is calculated as the phenotypic difference
divided by the environmental-trait correlation.
Analyses related to admixture regression for g
The rationale for admixture regression analysis has been described in detail
previously (Connor & Fuerst, 2022; Kirkegaard et al., 2019; Lasker et al., 2019).
For these analyses, we used the pooled data with the ABCD baseline sample
and the twin sample. We first subset to cases which had admixture estimates, g
scores, and brain volume, leaving 10,245 cases. We then imputed missing data,
yielding the same number of cases. We ran a series of full-sample and also
SIRE-stratified (White, Black, Hispanic, and Other) regression analyses so as to
control for potential SIRE-related environmental confounds. This was because
“stratifying on SIRE has the potential benefits of reducing heterogeneity of these
non-genetic variables and decoupling the correlation between genetic and
non-genetic factors,” since SIRE “acts as a surrogate to an array of social,
cultural, behavioral, and environmental variables” (Fang et al., 2019, p. 764).
For these analyses, the SIRE subgroups were defined using the ABCD
race_ethnicity variable. There are four mutually exclusive groups in terms of
SIRE categorization – non-Hispanic White (White) only, Hispanic of any race
(Hispanic), non-Hispanic Black (Black) only, and any other (Other).
These were regression analyses with gas the dependent variable and
ancestries as the key predictors. These models additionally included: child US
born, family immigrant status, sex, age, and SES variables. For the total-sample
545
MANKIND QUARTERLY 2023 63:4
analyses, the Hispanic and SIRE fraction variables were also included.
Likewise, the SIRE fraction variables were included for the Hispanic subsample
analyses.
As advised by Heeringa and Berglund (2020), we used a linear
mixed-effects model rather than ordinary least squares. This involved partially
decomposing the residual term into linear random effects components linked to
the data collection site identifiers and same-family identifiers within the sample.
This allows for the possibility of error term correlations within data collection
sites or within families with multiple tested individuals (see Heeringa & Berglund,
2020). As Heeringa and Berglund (2020) note, this specification replicates that
which is used by the ABCD Data Exploration and Analysis Portal (DEAP). Thus,
the use of this multilevel model also aids in replication. For the mixed-effects
regression models, we employ the lmer command from the lme4 package
(Bates et al., 2015).
In addition to the regression results, we depicted the partial residual plots
for European ancestry with gas the dependent variable. Partial residual plots
are a form of predictor effect plot (see Fox & Weisberg, 2018). These plots show
the effect of European ancestry on gwhile holding everything else in the
regression model constant. These plots were created using the effect plot
command from the jtools package in R (Long, 2020). This command uses the
output from the mixed-effects regression models for genetic ancestry g.
Analyses related to polygenic scores and g
These were regression analyses with gas the dependent variable and
eduPGS as the main predictors. These models additionally included: sex, the
cubic spline of age, and the first twenty genetic PCs. For these analyses we
used a linear mixed-effects model as described above. We ran the models both
on the full sample and the SIRE-stratified subsamples. Associations between
intelligence and eduPGS may be confounded by demographic factors (Zaidi &
Mathieson, 2020). As such, we also run sibship analyses to assess the effects
of MTAG-eduPGS and Poly-eduPGS on intelligence within families. We then
compare the magnitude of these within-sibship effects to the effects found in the
full sample. We use the model detailed by Howe et al. (2021, 2022) and a
modified version of the R code provided by these authors
(https://github.com/LaurenceHowe/SiblingGWAS). This model gives the effect of
the predictor on the criterion within sibships (denoted PXCi) and between
sibships (denoted PFXi) in addition to within the full sample. The sibship
analyses are based on families with multiple children, with MZ twins excluded.
We use two criteria, gand NIHTBX composite scores. Following Howe et al.
(2021, 2022) all analyses include age and sex as covariates. We also include
546
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
the first 20 PCs as covariates, except when using European ancestry as the
predictor. When using European ancestry as the predictor, we instead include
East and South Asian ancestry as covariates. To maximize power for these
analyses we use all available White, Hispanic, Black, or Other individuals in a
pairwise deletion fashion. To be clear, we do not run within-sibship GWAS
analyses, rather we determine if population-wide eduPGS predict gamong
sibships.
Analyses related to brain volume, g, ancestry, and eduPGS
Brain volume and g
These are regression analyses with gas the dependent variable and brain
volume, adjusted for sex, the cubic spline of age, and height as predictors. A
second set of regression models added the first twenty genetic PCs. For these
analyses we used a linear mixed-effects model as described above. We ran the
models both on the full sample and the SIRE-stratified subsamples. We
additionally ran the within-sibship analyses to assess the effects of brain volume
on intelligence within families.
Brain/ Intracranial volume, SIRE, and genetic ancestry
These were admixture regression analyses with unadjusted brain or
intracranial volume variables as the dependent variable and genetic ancestries
as the main predictors. The baseline model included SIRE variables but not
genetic ancestries. These models, which were run on the full sample of 10,245
children, additionally included: child US born, family immigrant status, sex, age,
height, and SES variables.
Brain/intracranial volume within SIRE groups
These analyses repeated the one discussed immediately above but on the
SIRE subsample. As such, SIRE fraction variables were included only for the
Hispanic subsample analysis.
SES, brain volume, & g among adopted and biological children
We investigated the relation between SES, brain volume, and gby taking
advantage of the small adoption sample in the ABCD dataset. We defined
adopted children as children for whom both parents were reported to be
adoptive parents. There were 126 such children and there were 6400 biological
children. We compare the associations for adoptive children with those for
biological children. We defined biological children as children for whom both
parents were reported to be biological parents. We fit a regression model with g
547
MANKIND QUARTERLY 2023 63:4
and, alternatively, brain volume as the dependent variables. In these models, we
included European ancestry, SIRE, the cubic spline of age, sex, child US born,
family migrant status, and site- and family-fixed effects. European ancestry was
included in this model because, as shown in Figure S14a of the Supplementary
File 1, it was also linearly related to gin the sample of adopted children.
Brain volume and eduPGS
These were linear mixed-effects regression analyses with unadjusted brain
volume as the dependent variable and eduPGS as the main predictors. These
models additionally included: sex, age, and the first twenty genetic PCs.
Mediation by brain volume
While path models cannot prove causal assumptions, they can provide
estimates of effects given these causal assumptions, for example, that
increased brain volume causes increased g, instead of vice versa, or that
eduPGS differences cause gdifferences, instead of being spuriously related
(Bollen & Pearl, 2013). As such, we model the relationship between genetic
ancestry, polygenic scores, adjusted brain volume, and gusing the lavaanPlot
command from the lavaan package in R (Long, 2020). Because we used
cross-sectional data, the causal assumption imbedded in the brain volume x g
path cannot be verified. Yet this path, with mental ability being dependent on
brain matter, is theoretically well grounded. In these models, brain volume is
adjusted for the effects of sex, the cubic spline of age, and height.
We additionally ran mediation analysis using the mediation R package
(Tingley et al., 2013). This package estimates the proportion of the effect that is
directly causally mediated using a different set of assumptions than used in path
analysis (see Imai et al., 2014). For these models we used the poly-eduPGS,
because it is supposed to estimate causal effects and be less confounded by
population structure. In these models both the outcome and mediator variables
were residualized for demographic confounds.
Analyses related to the method of correlated vectors
Method of correlated vectors
Finally, we applied Jensen’s method of correlated vectors (MCV) to
examine the patterns of the genetic, brain volume, and admixture variables’
effects on cognitive test scores. To be clear, we do not use MCV to test
Spearman’s hypothesis, the hypothesis that group differences are mostly on g.
For that, we use multi-group confirmatory factor analysis (MGCFA) as
recommended in the literature. Rather, we use MCV to assess if there is a
relation between the effects of various variables on tests, as is frequently done
548
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
(e.g., Kan et al., 2013; Flynn et al., 2014).
So, we computed standardized SIRE differences for each test as described
below. We also ran the admixture regression analyses for the genetic, brain
volume, and admixture variables with cognitive test scores as the dependent
variables. The effects on the eleven cognitive tests formed the vectors. These
vectors were then correlated with and without corrections for test reliability. A
genetic hypothesis for SIRE and ancestry differences predicts a positive
manifold of genetic, brain volume, and SIRE/ancestry effects. The reason is that
if genes related to gand brain volume are largely causative of differences both
within and between groups, then these genes will cause similar patterns of
effects across cognitive tests (te Nijenhuis et al., 2019). For example, if eduPGS
effects are more pronounced on Matrix Reasoning than on Card Sorting, then
we would expect ancestry effects also to be more pronounced if these ancestry
effects on intelligence are due to eduPGS differences.
For these analyses, we computed the variables in such a way that they
were conceptually and statistically independent:
1. Cognitive test gloadings were based on factor analysis using the
Schmid-Leiman transformation. The Schmid-Leiman transformation
separates gfrom non-gvariance. Detailed results are provided in
Supplementary File 1, Table S11.
2. The effects of MTAG-eduPGS, eduPGS and poly-eduPGS on cognitive test
scores were the betas for the eduPGS predictor from the mixed-effects
models. The mixed-effects models also included the cubic spline of age, sex,
and the first twenty genetic PCs in addition to site and family fixed effects.
3. The effect of European admixture on cognitive test scores were the betas for
the European ancestry predictor from the mixed-effects models which also
include the cubic spline of age, sex, US born status, immigrant family status,
SIRE fraction, and Hispanic ethnicity in addition to site and family fixed
effects.
4. The effects of brain volume on cognitive test scores were the betas for the
brain volume predictor from the mixed-effects models which also included the
cubic spline of age, sex, height, US born status, immigrant family status,
SIRE fraction, Hispanic ethnicity, and the first twenty genetic PCs in addition
to site and family fixed effects.
5. Heritability estimates for the full sample were from the biometric analyses.
6. Black-White and Hispanic-White mean differences are computed as the sex-
and age- corrected mean differences divided by the pooled standard
deviations of both groups.
549
MANKIND QUARTERLY 2023 63:4
This procedure produced eight variables. These vectors were corrected for
cognitive test reliabilities, based on validation, standardization, and other
samples. See Supplementary File 1, Table S12a for the list of sources. Note that
no reliability data were available for the Little Man Test. For this reason, we ran
the analyses for both vectors uncorrected for reliability (N= 11) and for vectors
corrected for reliability (N= 10). The uncorrected and corrected vectors were
then correlated using Pearson’s correlation (with results based on Spearman’s
correlation reported in Supplementary File 1, Table S12c).
A possible explanation for the association between the vectors of
eduPGS-loadings and both the vectors of ancestry effects and SIRE differences
is that these variables are all related to gloadings. According to this argument,
these relations merely indicate that the relevant variables are related to
phenotypic g(Kan, 2012; Chapter 4). To test this conjecture, we use partial
correlations and control alternatively for the vector of g loadings and the vector
of eduPGS effects.
Factor analysis
As an alternative analysis, we factor analyzed the eight vectors using
principal factor analysis. This was done to determine if the SIRE/ancestry
variables loaded strongly on the same factor that genetic variables did.
Relation between effects of eduPGS on test scores within SIRE groups and
phenotypic differences between SIRE groups
We further examined the relation between the mean differences on tests
between SIRE groups and the effects of eduPGS on tests within SIRE groups.
We computed the betas for MTAG- and Poly-eduPGS for each SIRE group
separately. We used mixed-effects models, which included the cubic spline of
age, sex, and the first twenty genetic PCs in addition to site and family fixed
effects, as above. We averaged the eduPGS effects for White and the
respective non-White groups and then correlated the vector of average eduPGS
effects with the standardized differences in cognitive ability.
Summary showing how statistical analyses are related
We carefully investigated a collection of closely interrelated hypotheses
with seven sets of statistical analyses and then statistically connected the
results using two sets of statistical analyses. As some readers may not be able
to see the forest for the trees due to the large number of statistical analyses, we
make explicit how the statistical analyses are related to each other.
Set 1: We carefully investigated cognitive differences, using multi-group
550
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
confirmatory factor analysis, and assessed the extent to which these
differences are measurement invariant and on g.
Set 2: We assessed the environmental and genetic influences on gand brain
volume within groups. We incorporated the subtest heritability estimates
into the final analysis using MCV.
Set 3: We ran global admixture analyses for gand brain volume. We explored
these results carefully because the subsequent path, mediation, and MCV
analyses depended on them.
Set 4 and Set 5: We carefully examined to what extent eduPGS/brain volume
predict gwithin SIRE groups and within families.
Set 6: We incorporated our g, brain volume, and eduPGS variables in path and
mediation analyses.
Set 7: We incorporated our g, brain volume, and eduPGS variables along with
heritability estimates in MCV analyses.
Supplementary analyses
We included two sets of supplementary analyses. The first provides
additional information regarding possible PGS bias across ancestries. The
second provides additional information regarding the explanatory power of
MRI-based brain differences. It made more sense to report these analyses
separately than under the heading of any of the hypotheses listed above.
Group-level analyses for PGS and cognitive ability
Population-level analyses have frequently been employed to assess the
cross-population validity of PGS (e.g., Chande et al., 2020; Martin et al., 2017).
It is sometimes found that PGS differences are inconsistent with observed trait
differences across populations. When this is the case, the cross-population
validity of the PGS is called into question (e.g., Martin et al., 2017). So, we
examined the correlation between several educational and intelligence PGS, on
the one hand, and SIRE and national mean cognitive scores, on the other hand.
High positive correlations between PGS differences and trait differences can
increase our confidence that PGS differences correspond with causally-relevant
genetic differences (Warne, 2021).
To maximize the number of groups, we used the dataset with all individuals,
even those with missing data on admixture and g. We then decomposed the
sample in 30 ethnocultural groups based on SIRE and nationality variables. For
comparison, we alternatively decomposed the sample by the child's religion,
which can overlap with ancestry (e.g., most Hindus are Asian Indians). We then
correlated mean PGS and NIHTBX scores. We used seven different PGSs,
551
MANKIND QUARTERLY 2023 63:4
based on different methods (GWAS, MTAG, and functional mapping) and on
different criteria (education, cognitive ability/intelligence, and mathematical
skills) because it is plausible that scores will vary by groups due to PGS
construction and criteria. We focused on lead or functionally-informed SNPs
because these are generally less influenced by confounding related to
differences in linkage disequilibrium. The PGS used are as follows:
1. MTAG-eduPGS – These PGS were used in the main analysis and are
detailed above.
2. Poly-eduPGS - These PGS were used in the main analysis and are detailed
above.
3. Lee_GWASlead-eduPGS - Scores computed from the lead GWAS
educational SNPs reported by Lee et al. (2018) (N= 9,336 SNPs were in the
ABCD sample).
4. MTAG-cognitivePGS - Scores computed from the MTAG for cognitive ability
SNPs from Lee et al. (2018) (N= 8,898 SNPs in the ABCD sample). These
were constructed the same way as the MTAG-eduPGS, except that cognitive
ability, not educational attainment, was the criterion.
5. MTAG-mathclassPGS - Scores computed from the MTAG for highest math
class taken from Lee et al. (2018) (N= 8,898 SNPs in the ABCD sample).
These were constructed the same way as the MTAG-eduPGS, except that
highest math class taken, not educational attainment, was the criterion.
6. MTAG-mathabilityPGS - Scores computed from the MTAG for self-reported
math ability SNPs from Lee et al. (2018) (N= 8,898 SNPs in the ABCD
sample). These were constructed the same way as the MTAG-eduPGS,
except that self-reported math ability, not educational attainment, was the
criterion.
7. Hill_MTAG-IQPGS - Scores based on the lead independent SNPs identified
using FUMA (functional mapping and annotation of genetic associations)
from Hill et al. (2019) (N= 512 variants in the ABCD sample).
We correlated the seven PGS and NIHTBX scores across groups. When
doing so, we weighted the groups by the square-root of the sample size, so that
effects of larger sized groups did not dominate the results. Next, we output
regression plots for PGS and NIHTBX. Details about the construction of the
SIRE/national groups are provided in Supplemental File 2, Tab 1.
Multi-model MRI-based brain predictor, cognitive ability, and ancestry
In the main analyses, we focus on brain volume as a mediator of the
relation between genetic ancestry and cognitive ability. It is possible that while
552
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
brain volume could mediate the relation between European ancestry and ability,
other brain variables could moderate this relation. So, to see how robust the
results are, we run analyses using a multi-model MRI-based predictor of
intelligence. This variable, derived by applying machine learning to over 50,000
neurological variables, including both structural and functional MRI data, is
detailed by Kirkegaard & Fuerst (2023). The brain predictor was trained to
predict cognitive ability in only the non-Hispanic White sample to avoid ancestry
confounding. In these supplementary analyses, we examine the relation
between the MRI-based brain predictor of intelligence and genetic ancestry in
the subsample of N= 7464 who had both MGCFA-based gscores and
MRI-based brain predictor scores. For these analyses g, the brain predictor, and
height were re-standardized in the subsample of N= 7464, so that the means
were zero and the standard deviations were 1. We repeated the same analyses
described in Section 2.2.6.6, replacing brain volume with the brain predictor
variable.
Code and data
The R code used for these analyses is provided in Supplemental File 1.
Supplemental files and appendices can be found at: https://osf.io/wudtq/ The
complete data set is available to qualified researchers at:
https://nda.nih.gov/abcd
Results
Measurement of group differences
Descriptive statistics for the total sample and the four SIRE subsamples are
shown in Table 1, along with the Cohen’s dvalues between the White and the
non-White groups. Based on conventional standards (Cohen, 1988), there are
medium to large differences between Whites and non-White groups in g,
eduPGS, and adjusted brain/intracranial volume. The difference in mean g
between Whites and Blacks was typically sized at d= 1.02. However, the
difference in mean gbetween Whites and Hispanics was smaller than typically
found at d= 0.38. For context, C. Murray (2021) estimated the Black-White and
Hispanic-White Full-Scale IQ differences to be d= .85 and d= 0.62,
respectively, based on four recent IQ standardizations since the 2000s.
Regarding brain volume, Jensen (1998, p. 438) notes a Black-White difference
of d= .76 to .78 in brain weight based on autopsy data of individuals without
brain pathologies. Also, Rushton & Ankney (2009) note a 63 and 32 gram
Black-White and Hispanic-White difference, respectively, in brain weight; since
standard deviations were not provided we were unable to convert these latter
553
MANKIND QUARTERLY 2023 63:4
brain weight differences into standardized units. For comparison, the
Black-White adjusted brain volume difference in this sample of healthy
adolescents is approximately one standard deviation (d= 1.12).
Table 1. Descriptive statistics for the total sample and the four SIRE
subsamples.
Full
Sample
M ± SD
White
M ± SD
Hispanic
M ± SD
Black
M ± SD
Other
M ± SD
W-H
d
W-B
d
W-O
d
Age
9.91 ±
0.62
9.93 ±
0.63
9.88 ±
0.63
9.91 ±
0.61
9.89 ±
0.62
0.08
0.03
0.06
European
ancestry
0.75 ±
0.33
0.98 ±
0.05
0.60 ±
0.21
0.16 ±
0.11
0.62 ±
0.25
3.32
12.09
3.75
African ancestry
0.18 ±
0.31
0.01 ±
0.02
0.10 ±
0.14
0.82 ±
0.11
0.32 ±
0.26
-1.24
-14.88
-3.48
Amerindian
ancestry
0.06 ±
0.14
0.01 ±
0.03
0.28 ±
0.19
0.01 ±
0.02
0.04 ±
0.09
-2.72
0.00
-0.73
South Asian
ancestry
0.00 ±
0.02
0.00 ±
0.01
0.01 ±
0.01
0.00 ±
0.01
0.01 ±
0.05
-1.00
0.00
-0.52
East Asian
ancestry
0.01 ±
0.03
0.00 ±
0.02
0.01 ±
0.02
0.01 ±
0.02
0.01 ±
0.07
-0.50
-0.50
-0.33
Frac White SIRE
0.73 ±
0.43
1.00 ±
0.03
0.67 ±
0.45
0.00 ±
0.00
0.39 ±
0.25
1.44
37.72
6.90
Frac Black SIRE
0.19 ±
0.38
0.00 ±
0.00
0.07 ±
0.23
1.00 ±
0.04
0.29 ±
0.26
-0.60
-53.43
-3.33
Frac. Native
Amer. SIRE
0.02 ±
0.11
0.00 ±
0.00
0.03 ±
0.13
0.00 ±
0.00
0.18 ±
0.28
-0.46
NA
-1.92
frac_NOC SIRE
0.06 ±
0.23
0.00 ±
0.03
0.23 ±
0.42
0.00 ±
0.04
0.13 ±
0.34
-1.08
0.00
-1.11
Hispanic
0.20 ±
0.40
0.00 ±
0.00
1.00 ±
0.00
0.00 ±
0.00
0.00 ±
0.00
NA
NA
NA
g
0.00 ±
1.00
0.24 ±
0.86
-0.10 ±
0.99
-0.69 ±
1.07
-0.08 ±
1.07
0.38
1.02
0.36
SES
0.00 ±
1.00
0.45 ±
0.75
-0.38 ±
0.90
-0.98 ±
0.91
-0.39 ±
0.99
1.05
1.82
1.08
MTAG-eduPGS
0.00 ±
1.00
0.49 ±
0.77
-0.21 ±
0.75
-1.34 ±
0.57
-0.36 ±
0.86
0.92
2.50
1.09
Poly-eduPGS
0.00 ±
1.00
0.48 ±
0.78
-0.21 ±
0.78
-1.30 ±
0.60
-0.36 ±
0.85
0.88
2.39
1.07
Brain vol.
0.00 ±
1.00
0.26 ±
0.94
-0.18 ±
0.95
-0.64 ±
0.90
-0.13 ±
1.00
0.47
0.97
0.41
Brain vol. adj.
0.00 ±
1.00
0.29 ±
0.92
-0.21 ±
0.92
-0.74 ±
0.91
-0.13 ±
0.99
0.54
1.12
0.45
Intracranial vol.
0.00 ±
1.00
0.20 ±
0.96
-0.19 ±
1.01
-0.40 ±
0.94
-0.21 ±
1.01
0.40
0.63
0.42
Intracranial
vol. adj.
0.00 ±
1.00
0.23 ±
0.94
-0.21 ±
1.01
-0.47 ±
0.95
-0.22 ±
1.00
0.46
0.74
0.48
Height
0.00 ±
1.00
-0.04 ±
0.97
-0.12 ±
0.97
0.26 ±
1.09
0.05 ±
0.97
0.08
-0.30
-0.09
Child US born
0.98 ±
0.15
0.99 ±
0.11
0.94 ±
0.24
0.98 ±
0.15
0.98 ±
0.14
0.32
0.08
0.09
Immigrant family
0.28 ±
0.45
0.18 ±
0.38
0.73 ±
0.44
0.14 ±
0.35
0.20 ±
0.40
-1.39
0.11
-0.05
N
10,245
5,858
2,005
1,642
740
554
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
Notes: W-H d, W-B d, and W-O dare the Cohen’s dbetween, respectively, the White
and Hispanic. White and Black, and White and Other SIRE groups. A positive value
indicates an advantage for the White group.
We find substantially larger SES gaps than reported by Warne (2021).
Whereas Warne (2021) reports a Black-White composite score SES gap —
based on parental income, occupation, and educational level – of d= 0.66, we
find a general factor of SES gap of d= 1.82. Our SES variable, however, was
computed using a larger, more diverse set of variables and also using factor
analysis. As a result, the values we report are not directly comparable to those
reported by Warne (2021). In Supplementary File 2, we additionally report
means, standard deviations, and Cohen’s ds for the individual SES variables.
This way interested readers can compare the SES-related differences in this
dataset to those in other datasets. It can be seen, for example, that the average
Black-White parental education and family income gaps are d= 1.14 and d=
1.18, respectively. So, in this sample, the Black-White educational and income
differences are larger than those reported by Warne (2021).
Regarding Spearman’s hypothesis, Appendix A, Table A2 shows the
MGCFA model fit statistics for SIRE groups. The best fitting models were the
model for the strong version of Spearman’s hypothesis (M6A), in which galone
explained group differences, and the model for the weak version of Spearman’s
hypothesis in which the gfactor explained most of the differences but the
memory and executive function group factors also contributed to group
differences (M6B).
Measurement invariance across genetic ancestry
Using the same three-broad-factor model to test for strict measurement
invariance between SIRE groups, we applied LSEM to the full sample (n =
10,245) using 11 focal points (representing 10% ancestry increments, ranging
from 0% to 100% European ancestry). Across the full range of European
ancestry, there was no deviation from strict MI. These results thus agree with
those of Lasker et al. (2019), who found that strict MI held across European
genetic ancestry in a large African and European American sample of
adolescents, and Lasker (2019) who additionally reports results based on a
smaller pediatric dataset including Blacks, Whites, and Hispanics. This indicates
that the observed difference in gcannot be attributed to other causes that differ
between races, such as stereotype threat (Lubke et al., 2003).
Within-group heritability and shared environmentality
555
MANKIND QUARTERLY 2023 63:4
Table 2 shows the number of DZ and MZ twin pairs along with the a’c’e’ and
ACE estimates for adjusted brain and intracranial volume. The a’c’e’ estimates
for the unadjusted variables are additionally provided in the Supplementary
Material. For adjusted brain and intracranial volume, heritability estimates were
high for all groups (h2= .75 to .95), while the shared environmental variance was
low (c2= .00 to .17), consistent with previously reported values for this age group
(Jansen et al., 2015). With regards to brain volume, heritability estimates were
somewhat higher, while shared environmental estimates were lower for the
non-White sample (h2= .94; c2= .00) than for the White sample (h2= .79; c2=
.11), but this difference was not statistically significant. For intracranial volume,
heritability and shared environment were very similar for the combined
non-White sample (h2= .88; c2= .07) and for the White sample (h2= .83; c2= .10).
Table 2. Variance component estimates for brain volume and intracranial
volume.
NDZpairs
NMZpairs
a’
SE
c’
SE
e’
SE
a2
c2
e2
Brain Volume
All available
443
333
.87
.04
.42
.08
.28
.01
.75
.17
.08
White-Hisp.-Black
-Other
443
333
.87
.04
.42
.08
.28
.01
.75
.17
.08
White
426
323
.87
.04
.41
.08
.27
.01
.76
.16
.08
Hispanic-Black
-Other
280
214
.89
.05
.34
.12
.30
.02
.79
.11
.09
Hispanic
146
109
.97
.00
.00
NA
.24
.02
.94
.00
.06
Black
43
34
.98
.01
.00
.00
.21
.03
.95
.00
.04
Other
84
56
.93
.08
.27
.29
.24
.03
.87
.07
.06
Intercranial Volume
All available
443
333
.89
.04
.39
.08
.24
.01
.79
.15
.06
White-Hisp.-Black
-Other
426
323
.89
.04
.39
.08
.25
.01
.78
.15
.06
White
280
214
.91
.05
.32
.13
.26
.01
.83
.10
.07
Hispanic-Black
-Other
146
109
.94
.06
.26
.23
.24
.02
.88
.07
.06
Hispanic
43
34
.95
.11
.24
.45
.21
.03
.90
.06
.04
Black
84
56
.90
.08
.39
.19
.21
.02
.81
.15
.05
Other
19
19
.93
.03
.00
.00
.36
.07
.87
.00
.13
Notes: a and a2represent, respectively, the genetic-phenotypic correlations and
proportion of variance due to additive genetic effects; c and c2represents, respectively,
the shared environment-phenotypic correlations and proportion of variance due to
shared environmental effects; e and e2represents, respectively, the non-shared
environment-phenotypic correlations and proportion of variance due to non-shared
environmental effects; Npairs DZ is the number of dizygotic twin pairs; Npairs MZ is the number
of monozygotic twin pairs. SE = standard error.
Table 3 shows the ACE estimates for the NIHTBX and gscores. More
556
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
detailed results are provided in Supplementary File 1. Consistent with previously
reported results (Pesta et al., 2020, 2023), the heritability estimates of the
NIHTBX scores were moderate-to-high for the total, the White, and the
combined non-White samples (h2= .56 to .64). In line with this, the shared
environmental variances were low (c2= .00 to .20). For NIHTBX, the heritability
was somewhat higher, and shared environmentality lower for the White sample
(h2= .64; c2= .00) than the combined non-White sample (h2= .56; c2= .20),
though these differences were not statistically significant. Regarding gscores,
the heritability was also somewhat higher, and shared environmentality lower for
the White sample (h2= .67; c2= .00) than the non-White sample (h2= .59; c2=
.16), though, again, these differences were not statistically significant. For
comparison, in their meta-analysis, Pesta et al. (2020; Table 4) found that
heritabilities and shared environmentalities were approximately the same
among Hispanics, Blacks, and Whites.
Supplementary File 1, Table 21, additionally reports the heritabilities for
each of the 11 subtests used to create our gscores. These subtest heritabilities
were used in the MCV analysis.
Table 3. Variance component estimates for NIH Toolbox cognitive scores.
NDZpairs
NMZpairs
a'
SE
c'
SE
e’
SE
a2
c2
e2
NIHTBX
All available
443
332
.76
.06
.35
.11
.55
.02
.57
.13
.30
White-Hisp.-Black
-Other
441
330
.76
.06
.35
.11
.55
.02
.57
.12
.30
White
291
219
.80
.02
.00
.00
.60
.03
.64
.00
.36
Hispanic-Black
-Other
150
111
.75
.09
.45
.14
.50
.04
.56
.20
.25
Hispanic
43
34
.82
.14
.40
.28
.41
.06
.67
.16
.17
Black
87
58
.73
.13
.40
.22
.55
.05
.54
.16
.30
Other
20
19
.76
.27
.32
.58
.56
.10
.58
.10
.32
g
All available
434
319
.78
.05
.36
.11
.52
.02
.60
.13
.27
White-Hisp.-Black
-Other
430
319
.78
.05
.35
.11
.52
.02
.61
.12
.27
White
291
220
.82
.02
.00
.00
.57
.03
.67
.00
.33
Hispanic-Black
-Other
139
99
.77
.09
.41
.16
.50
.04
.59
.16
.25
Hispanic
43
34
.32
.25
.82
.08
.48
.06
.10
.67
.23
Black
87
56
.80
.12
.30
.29
.52
.05
.64
.09
.27
Other
9
9
.80
.11
.00
NA
.60
.15
.64
.00
.36
Notes: a and a2represents, respectively, the genetic-phenotypic correlations and
proportion of variance due to additive genetic effects; c and c2represent, respectively,
the shared environment-phenotype correlations and proportion of variance due to
shared environmental effects; e and e2represent, respectively, the non-shared
557
MANKIND QUARTERLY 2023 63:4
environment-phenotype correlations and proportion of variance due to non-shared
environmental effects; Npairs DZ is the number of dizygotic twin pairs; Npairs MZ is the number
of monozygotic twin pairs. SE = standard error.
We provide more details on the low heritability of gscores for the Hispanic
group. We found that ACE estimates for gwere unstable in this small twin
sample. The SEM model had difficulty distinguishing between the A and the C
component, which was not surprising because there were only 43 DZ twins and
34 MZ twins. When Black and Other twins were added, the ACE estimates
became both moderate-to-high and stable. The discrepancy between the ACE
estimates for gand for NIHTBX, among Hispanics, resulted from both a
difference in scores and a difference in twin pairs (while the numbers were the
same, the set was not, owing to missing data in both variables). When we
computed the ACE estimates for each subtest, we found that the averages were
similar for Hispanics (A = .28, C = .17, E = .54) and for Whites (A = .33, C = .04, E
= .63). The ACE estimates by test are provided in Supplementary File 2.
Generally, ACE estimates based on small samples can be very unstable, which
is why we limited our research focus to estimates for the White and combined
Hispanic-Black-Other groups.
Admixture regression for g
The triangle admixture plot for the full sample is shown in Figure 2.
Hispanics had high variability along the Amerindian-European axis and
moderate variability along the African-European axis. The African American
group and the Other group had, respectively, moderate and high variability
along the African-European axis. White Americans exhibited little variability in
ancestry. This pattern of admixture is typical for the US (Bryc et al., 2015).
558
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
Figure 2. Admixture triangle plot for SIRE groups in the ABCD sample.
The correlation matrices for the full sample and the SIRE subsamples are
provided in Supplemental File 1 (Tables S3a-S3e). Table 4 shows the results of
the admixture regression analyses for g.
Table 4. Regression results for the effect of genetic ancestry on g in the full
sample.
Model 1
Model 2a
Model 2b
Model 3a
Model 3b
b
p
b
p
b
p
b
p
b
p
(Intercept)
0.75
(0.38)
0.051
0.62
(0.38)
0.100
0.43
(0.37)
0.247
-1.12
(0.18)
<0.001
-0.76
(0.17)
<0.001
frac_Black
SIRE
-0.99
(0.03)
<0.001
0.03
(0.09)
0.733
0.06
(0.08)
0.501
0.04
(0.07)
0.566
0.01
(0.06)
0.825
frac_Native
Amer. SIRE
-0.36
(0.09)
<0.001
-0.07
(0.09)
0.463
0.02
(0.09)
0.791
-0.06
(0.09)
0.533
0.03
(0.09)
0.717
frac_NOC
SIRE
-0.42
(0.05)
<0.001
-0.09
(0.05)
0.079
-0.01
(0.05)
0.760
-0.08
(0.05)
0.102
-0.01
(0.05)
0.846
Hispanic
-0.30
(0.03)
<0.001
0.01
(0.04)
0.791
0.06
(0.04)
0.135
-0.04
(0.03)
0.223
0.03
(0.03)
0.336
Child US
Born
0.12
(0.06)
0.049
0.14
(0.06)
0.027
0.18
(0.06)
0.002
0.12
(0.06)
0.046
0.18
(0.06)
0.003
Immigrant
Family
0.10
(0.03)
<0.001
0.15
(0.03)
<0.001
0.08
(0.03)
0.001
0.16
(0.03)
<0.001
0.09
(0.02)
<0.001
sex [M]
-0.01
0.483
-0.01
0.696
-0.01
0.468
-0.01
0.721
-0.01
0.491
559
MANKIND QUARTERLY 2023 63:4
(0.02)
(0.02)
(0.02)
(0.02)
(0.02)
Age
-0.06
(0.04)
0.118
-0.05
(0.04)
0.233
-0.05
(0.04)
0.234
Age’
0.08
(0.05)
0.120
0.06
(0.05)
0.222
0.06
(0.05)
0.247
African
ancestry
-1.34
(0.11)
<0.001
-0.87
(0.11)
<0.001
Amerindian
ancestry
-1.56
(0.12)
<0.001
-0.90
(0.11)
<0.001
East Asian
ancestry
0.13
(0.33)
0.703
0.21
(0.32)
0.516
South Asian
ancestry
0.42
(0.56)
0.450
0.76
(0.54)
0.161
SES
0.33
(0.01)
<0.001
0.33
(0.01)
<0.001
European
ancestry
1.35
(0.08)
<0.001
0.81
(0.08)
<0.001
Age
-0.00
(0.01)
0.891
-0.00
(0.01)
0.767
Random effects
σ2
0.41
0.41
0.41
0.41
0.41
τ00
0.46
site_id_l:rel_family
_id
0.02 site_id_l
0.44
site_id_l:rel_family
_id
0.03 site_id_l
0.36
site_id_l:rel_family
_id
0.05 site_id_l
0.44
site_id_l:rel_famil
y_id
0.03 site_id_l
0.37
site_id_l:rel_family
_id
0.05 site_id_l
ICC
0.54
0.53
0.50
0.53
0.50
N
22 site_id_l
8600 rel_family_id
22 site_id_l
8600 rel_family_id
22 site_id_l
8600 rel_family_id
22 site_id_l
8600 rel_family_i
d
22 site_id_l
8600 rel_family_id
Observations
10245
10245
10245
10245
10245
Marginal R2 /
conditional R2
0.145 / 0.606
0.178 / 0.615
0.247 / 0.625
0.175 / 0.615
0.246 / 0.625
Notes: Beta coefficients (b) and p-values (p) from the mixed-effects models, with recruitment
site and family common factors treated as random effects are shown. The values in
parentheses are standard errors. The marginal and conditional R2of the mixed effects model
are shown at the bottom. ICC = intraclass correlation coefficient. Note, only linear age was
used in context to European ancestry for reasons discussed in the Methods section.
In this table, Model 1 shows the results with SIRE but not genetic ancestry
variables, while Model 2a adds the four non-European genetic ancestries.
Model 2b adds SES to Model 2a. Models 3a and 3b repeat Models 2a and 2b,
replacing the four non-European ancestries with European ancestry. In these
last two Models (3a & 3b), age, instead of cubic spline of age is used. This was
done so that effect plots could be reported for the results.
As seen in Model 1 and 2, SIRE variables are unrelated to g, independent
of genetic ancestry. Additionally, African, Amerindian, and European ancestry
had large effects on geven when the general factor of SES was added to the
model.
The effects of European, African, and Amerindian ancestry on g, stratified
by SIRE, are summarized in Table 5. The betas are from Models 1b to 2b shown
in Tables S4a to S4e. Statistically significant (p< .05) values are shown in bold.
560
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
The complete model results are provided in Supplementary File 1. The relation
between European, African, and Amerindian ancestry and gis robust to SIRE
stratification.
Table 5. Validities (b) from the mixed-effects regression models for genetic
ancestry and brain and intracranial volume in the full sample (All) and in the
SIRE subsamples.
Predictor
Criterion
Controls
All
White
Hispanic
Black
Other
European
ancestry
g
1.35
1.33
1.33
1.06
0.96
g
w/ SES
0.81
0.79
0.82
0.84
0.54
African
ancestry
g
-1.34
-1.72
-1.10
-1.10
-1.21
g
w/ SES
-0.87
-0.81
-0.65
-0.88
-0.79
Amerindian
ancestry
g
-1.56
-1.86
-1.40
-3.84
-1.32
g
w/ SES
-0.90
-1.11
-0.86
-3.12
-0.79
N
10245
5858
2005
1642
740
Note: Statistically significant results (p< .05) are presented in bold.
The partial residual plot with respect to European ancestry and gis
depicted in Figure 3. This plot shows the effect of European ancestry on g
holding everything else in the regression model constant. Supplementary File 1,
Figures S4a-S4e provide the residual plots for each SIRE subsample. This
linear relationship is evident in each SIRE group and in the full sample.
Additionally, Table 6 reports the beta coefficients for the sibling and the
comparable full sample analyses with either gor NIHTBX as the dependent
variable. The complete model results are provided in Supplementary File 1,
Table S15c. Within sibships, European ancestry was not significantly related to
intelligence, though the standard errors were large. The shrinkage of the
within-sibship coefficients, relative to the full sample ones, was 33%. Generally,
as Conley & Fletcher (2017) note, within-sibship ancestry analyses are highly
underpowered and, as such, unreliable.
561
MANKIND QUARTERLY 2023 63:4
Figure 3. Partial residuals and estimated regression fits for European ancestry
in the admixture regression with g as the dependent variable.
Table 6. Validities (b) for the within- and between-sibship and full sample
analyses with g or NIHTBX as the dependent variable and European ancestry
as the predictor. Controls include sex, age, East Asian and South Asian
ancestry.
b (SE, N)
Predictor
Criterion
Within sibships
Between sibships
Full sample
Shrinkage
European
g
0.62
1.02
0.93
33%
ancestry
(0.55; 2291)
(0.06; 2291)
(0.02; 10370)
European
NIHTBX
0.83
1.09
1.23
33%
ancestry
(0.53; 2222)
(0.06; 2222)
(0.03; 10040)
Notes: Statistically significant results (p< .05) are presented in bold. Shrinkage % = 1 –
(Within sibships b / Full sample b).
562
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
Polygenic scores
Next, we report the relation between eduPGS and g. For these analyses g
is the dependent variable and the polygenic scores are the main predictors. The
regression plots for eduPGS and gin the Black (red), Hispanic (green), Other
(blue), and White (purple) samples are shown in Figure 4.
a. MTAG-eduPGS
b. Poly-eduPGS
563
MANKIND QUARTERLY 2023 63:4
Figure 4. Regression plot for the predictive validity of eduPGS with respect to g
in the Black (red), Hispanic (green), Other (blue), and White (purple) samples.
We report results from the regression models for eduPGS and g. Tables
S5a-S5e show the complete results. Table 7 summarizes the results from these
analyses. eduPGS is statistically significantly related to gin all groups. The
validities for Black Americans are reduced by around one-third relative to that
for Whites. This finding is consistent with previously reported results (Lasker et
al., 2019). It should be noted that the validities, even among Whites, are lower
than those reported for European-ancestry discovery samples (e.g., Lee et al.,
2019). This is theoretically expected because these validities concern
out-of-sample predictions, which do not suffer from overfitting (Choi et al.,
2020).
Table 7. Validities (b) from the mixed-effects regression models for eduPGS
564
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
and g in the full sample and in the SIRE subsamples.
Predictor
Criterion
Controls
All
White
Hispanic
Black
Other
MTAG-eduPGS
g
20PCs, sex, age
0.26
0.26
0.26
0.16
0.32
Poly-eduPGS
g
20PCs, sex, age
0.23
0.22
0.20
0.16
0.42
N
10245
5858
2005
1642
740
Notes: Statistically significant results (p< .05) are presented in bold.
Finally, Table 8 reports the beta coefficients for the sibling and the
comparable full sample analyses with either gor NIHTBX as the dependent
variable. The complete model results are provided in Supplementary File 1,
Table S15b. Within sibships, both EduPGS were statistically significant
predictors of intelligence. The shrinkage of the within-sibship coefficients,
relative to the full sample ones, ranged from 20 to 27%. This is in line with the
within-sibship shrinkage of 22% (95% CI: 6%, 37%) reported by Howe et al.
(2022) for cognitive ability.
Table 8. Validities (b) for the within- and between-sibship and full sample
analyses with g or NIHTBX as the dependent variable and EduPGS as the
predictors. Controls include sex, age, and 20 principal components.
b(SE;N)
Predictor
Criterion
Within
sibships
Between
sibships
Full
sample
Shrinkage
MTAG- eduPGS
g
0.20
(0.05; 2291)
0.28
(0.03; 2291)
0.25
(0.01; 10369)
20%
MTAG- eduPGS
NIHTBX
0.21
(0.05; 2323)
0.30
(0.03; 2323)
0.27
(0.01; 10039)
22%
Poly-eduPGS
g
0.17
(0.05; 2302)
0.19
(0.03; 2302)
0.22
(0.01; 10369)
23%
Poly-eduPGS
NIHTBX
0.16
(0.05; 2323)
0.22
(0.03; 2323)
0.22
(0.01; 10039)
27%
Notes: Statistically significant results (p< .05) are presented in bold. Shrinkage % = 1 –
(Within sibships b / Full sample b).
Brain volume
Brain volume and g
Table 9 summarizes the effects (b) for brain volume on gfor the full sample
and the SIRE subsamples. For these analyses, the adjusted brain volume
variable is used. This adjusted variable was then included in the mixed-effects
regression model along with the first twenty genetic PCs. The complete results
are provided in Supplementary Tables S8a-S8e. As can be seen, brain volume
has a weak relation to gin the full sample and in all SIRE subsamples (bs = .12
565
MANKIND QUARTERLY 2023 63:4
to .25). Adjusting for genetic PCs attenuated the effect somewhat in the full
sample and also in the three non-White samples; the effects remained
statistically significant in all cases.
Table 9. Validities (b) from mixed-effects regression model for brain volume and
g in the full sample and in the SIRE subsamples.
Predictor
Criterion
Controls
All
White
Hispani
c
Black
Other
Brain volume
adj.
g
0.22
0.13
0.18
0.15
0.25
Brain volume
adj.
g
20 PCs
0.14
0.13
0.12
0.13
0.20
N
10245
5858
2005
1642
740
Note: Statistically significant results (p< .05) are presented in bold.
Finally, Table 10 reports the beta coefficients for the sibling and the
comparable full sample analyses with either gor NIHTBX as the dependent
variable. The complete model results are provided in Supplementary File 1,
Table S15a. Within sibships, brain volume was a statistically significant predictor
of intelligence. The shrinkage of the within-sibship coefficients, relative to the full
sample coefficients, ranged from 0 to 6%.
Table 10. Validities (b) for the within- and between-sibship and full sample
analyses with g or NIHTBX as the dependent variable and brain volume as the
predictor.
Predictor
Criterion
Controls
Within
sibships
Between
sibships
Full
sample
Shrinkage
Brain
volume
g
20PCs,
sex, age
0.15
(0.04; 2258)
0.16
(0.03; 2258)
0.15
(0.01; 1024))
0%
Brain
volume
NIHTBX
20PCs,
sex, age
0.15
(0.04; 2189)
0.15
(0.03; 2189)
0.16
(0.01; 9920)
6%
Notes: Statistically significant results (p< .05) are presented in bold. Shrinkage % = 1 –
Within (Sibships b / Full Sample b). Cells show b (SE; N).
Brain volume, SIRE, and genetic ancestry
Table 11 shows the results from the admixture regression analysis for both
brain volume and intracranial volume. In this table, Model 1a and 1b show the
results for brain volume as the dependent variable, while Model 2a and 2b show
those when intracranial volume is the dependent variable. In both Model 1b and
2b, European ancestry is used as the only ancestry predictor.
566
FUERST, J., SHIBAEV, V. & KIRKEGAARD, E. A REPLY TO WARNE
(2021)
Table 11. Regression results for the effect of genetic ancestry on brain and
intracranial volume in the full sample.
Brain volume predicted
Intercranial volume predicted
Model 1a
Model 1b
Model 2a
Model 2b
Predictors
b
p
b
p
b
p
b
p
(Intercept)
0.54
(0.33)
0.098
-0.26
(0.16)
0.095
0.25
(0.32)
0.437
-0.41
(0.17)
0.016
African ancestry
-0.89
(0.09)
<0.001
-0.68
(0.09)
<0.001
Amerindian
ancestry
-0.96
(0.10)
<0.001
-0.71
(0.09)
<0.001
East Asian
ancestry
-0.39
(0.29)
0.175
-0.22
(0.27)
0.419
South Asian
ancestry
-1.73
(0.48)
<0.001
-1.68
(0.45)
<0.001
frac_Black SIRE
-0.00
(0.08)
0.969
0.01
(0.06)
0.820
-0.00
(0.07)
0.998
0.01
(0.05)
0.913
frac_Native
American SIRE
0.05
(0.08)
0.551
0.04
(0.08)
0.598
0.00
(0.08)
0.999
-0.01
(0.07)
0.924
frac_NOC SIRE
0.03
(0.04)
0.464
0.03
(0.04)
0.456
0.01
(0.04)
0.799
0.01
(0.04)
0.792
Hispanic
-0.00
(0.03)
0.894
-0.01
(0.03)
0.627
0.01
(0.03)
0.692
0.01
(0.03)
0.761
Child US Born
-0.02
(0.05)
0.720
-0.02
(0.05)
0.679
-0.06
(0.05)
0.209
-0.06
(0.05)
0.192
Immigrant Family
0.03
(0.02)
0.161
0.03
(0.02)
0.184
0.02
(0.02)
0.329
0.02
(0.02)
0.372
Sex [M]
0.94
(0.02)
<0.001
0.94
(0.02)
<0.001
0.86
(0.01)
<0.001
0.86
(0.01)
<0.001
Age’
-0.08
(0.03)
0.020
-0.05
(0.03)
0.114
Age”
-0.01
(0.05)
0.736
-0.00
(0.04)
0.950
Height
0.16
(0.01)
<0.001
0.16
(0.01)
<0.001
0.19
(0.01)
<0.001
0.19
(0.01)
<0.001
SES
0.10
(0.01)
<0.001
0.11
(0.01)
<0.001
0.10
(0.01)
<0.001
0.10
(0.01)
<0.001
European ancestry
0.91
(0.07)
<0.001
0.68
(0.07)
<0.001
Age
-0.09
(0.01)
<0.001
-0.05
(0.01)
<0.001
Random effects
σ2
0.27
0.27
0.24
0.24
τ00
0.36site_1_id^rel_family_id
0.01site id 1
0.36 site_1_id^rel_family_id
0.01 site id 1
0.30 site_1_id^rel_family_id
0.17 site id 1
0.30 site_1_id^rel_family_id
0.17 site id 1
ICC
0.58
0.58
0.66
0.66
N
22site_id_1
8600rel family id
22site_id_1
8600rel family id
22site_id_1
8600rel family id
22site_id_1
8600rel family id
Observations
10245
10245
10245
10245
Marginal R2/
Conditional R2
0.368/0.732
0.367/0.731
0.289/0.761
0.289/0.761
567
MANKIND QUARTERLY 2023 63:4
Notes: Beta coefficients (b) and p-values (p) from the mixed-effects models, with
recruitment site and family common factors treated as random effects are shown.
Values in parentheses are standard errors. The marginal and conditional R2of the mixed
effects model are shown at the bottom. ICC = Intraclass correlation coefficient. Note,
only linear age was used in context to European ancestry for reasons discussed in the
Methods section.
As seen in all models, there was no association between SIRE and brain
volume independent of genetic ancestry. Moreover, the effect of African (b=
-.89) and Amerindian (b= -.96) ancestry, or conversely European (b= .91), on
brain volume was largely independent of sex, age, height, migrant status, SIRE,
the general factor of SES, and both site- and family-related fixed effects. For
intracranial volume the ancestry effects were moderate: European (b= .68),
African (b= -.68) and Amerindian (b= -.71).
The associations with European, African, and Amerindian genetic ancestry
were stronger for brain volume than for intracranial volume. It is not clear why
this is the case as these two variables correlate at r= .86 in the full sample. Both
height and SES were statistically significantly related to brain and intracranial
volume. These associations could be a result of genetic and environmental
factors.
Brain/intracranial volume within SIRE groups
The effects of European, African, and Amerindian ancestry on brain and
intracranial volume, stratified by SIRE, are summarized in Table 12. The betas
are from Models 1a to 2b from Tables S6a to S6e, which report the equivalent
model results from Table 11, above, for each SIRE group. Statistically significant
(p< .05) values are shown in bold. The complete model results are provided in
the Supplementary File 1. Generally, the finding of a positive effect of European
ancestry and a negative effect of both African and Amerindian ancestry is robust
to SIRE stratification.
The negative coefficients for African ancestry are very consistent with the
anthropological record according to which sub-Saharan African populations
have smaller cranial capacities than European populations (Clark & Henneberg,
2021; Coon, 1982; Hambly, 1947; Lewis et al., 2011; Smith & Beals, 1990).
While some dat