Content uploaded by Emil O. W. Kirkegaard
Author content
All content in this area was uploaded by Emil O. W. Kirkegaard on Mar 14, 2016
Content may be subject to copyright.
MANKIND QUARTERLY 2016 56:3 255-373
255
Admixture in the Americas: Regional and National
Differences
John Fuerst,
Emil O. W. Kirkegaard
1
We conducted novel analyses regarding the association between
continental racial ancestry, cognitive ability and socioeconomic
outcomes across 6 datasets: states of Mexico, states of the United
States, states of Brazil, departments of Colombia, sovereign nations and
all units together. We find that European ancestry is consistently and
usually strongly positively correlated with cognitive ability and
socioeconomic outcomes (mean r for cognitive ability = .708; for
socioeconomic well-being = .643) (Sections 3-8). In most cases,
including another ancestry component, in addition to European
ancestry, did not increase predictive power (Section 9). At the national
level, the association between European ancestry and outcomes was
robust to controls for natural-environmental factors (Section 10). This
was not always the case at the regional level (Section 18). It was found
that genetic distance did not have predictive power independent of
European ancestry (Section 10). Automatic modeling using best subset
selection and lasso regression agreed in most cases that European
ancestry was a non-redundant predictor (Section 11). Results were
robust across 4 different ways of weighting the analyses (Section 12). It
was found that the effect of European ancestry on socioeconomic
outcomes was mostly mediated by cognitive ability (Section 13). We
failed to find evidence of international colorism or culturalism (i.e.,
neither skin reflectance nor self-reported race/ethnicity showed
incremental predictive ability once genomic ancestry had been taken
into account) (Section 14). The association between European ancestry
and cognitive outcomes was robust across a number of alternative
measures of cognitive ability (Section 15). It was found that the general
1
Corresponding author. E-mail address: the.dfx@gmail.com (E. Kirkegaard)
MANKIND QUARTERLY 2016 56:3
256
socioeconomic factor was not structurally different in the American
sample as compared to the worldwide sample, thus justifying the use of
that measure. Using Jensen's method of correlated vectors, it was found
that the association between European ancestry and socioeconomic
outcomes was stronger on more S factor loaded outcomes, r = .75
(Section 16). There was some evidence that tourist expenditure helped
explain the relatively high socioeconomic performance of Caribbean
states (Section 17).
Key words: race, admixture, ancestry, admixture mapping,
cognitive ability, intelligence, IQ, academic achievement, general
socioeconomic factor, Americas.
Contents
1. Introduction ................................................................................ 257
2. Methods ..................................................................................... 263
3. Regional racial admixture in Mexico .......................................... 264
4. Regional racial admixture in the United States .......................... 268
5. Regional racial admixture in Brazil ............................................274
6. Regional racial admixture in Colombia ...................................... 277
7. Regional racial admixture in the Americas ................................ 280
8. Sovereign nations and regional racial admixture together .........287
9. Taking into account all ancestry, multiple regression ................. 291
10. Adding non-admixture predictors: theory-driven approach ........ 297
11. Adding non-admixture predictors: automatic approach .............306
12. Units of unequal size: using weights ......................................... 315
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
257
13. Race-Cognitive ability-S: does cognitive ability mediate the
relationship between racial ancestry and S? .............................317
14. International colorism? ...............................................................318
15. Other measures of cognitive ability and human capital ..............319
16. The S factor in the American sample .........................................324
17. The West Indies ........................................................................326
18. Parasite prevalence and European ancestry in the US .............332
19. Spatial autocorrelation ...............................................................333
20. Discussion and conclusion .........................................................351
21. Appendix A – Main data table ....................................................365
22. Appendix B – Path model results ...............................................372
1. Introduction
The existence of large cross-national differences in both socioeconomic
outcomes and measured cognitive ability is well established. Lynn & Vanhanen
(2012) have shown that cognitive differences can statistically explain the
socioeconomic ones to a large extent. Using a cross-lagged panel design,
Rindermann (2012) found support for a model in which cognitive differences had
a larger causal effect on socioeconomic ones than vice versa. Taken together the
results suggest that international differences in cognitive ability are, to a
significant degree, antecedent to socioeconomic ones. These cognitive
differences themselves have been found to be associated with numerous historic,
biological, genetic and evolutionary variables, suggesting that they have deep
roots. Some of these variables are listed below in Table 1. Of particular relevance
to this paper, it has been found that genetic ancestry is strongly associated with
cognitive variation (e.g., Christainsen, 2013; Kodila-Tedika & Asongu, 2015).
Correspondingly, Putterman & Weil (2010) found that geographic ancestry
accounts for a substantial portion of the international socioeconomic variation. All
MANKIND QUARTERLY 2016 56:3
258
of these findings, taken together, confirm the frequent observation that, on the
global level, major human biological races
2
(just “races” from now on), such as
Europeans, West Africans and Amerindians, differ in their mean levels of
cognitive ability (Galton, 1869; Price, 1934; Baker, 1974; Lynn, 2006, 2008). The
findings are consistent with the hypothesis that differences in cognitive ability are
passed on along lineages and that they explain some of the global variation in
socioeconomic outcomes (Lynn, 2008). We dub this hypothesis the racial-
cognitive ability-socioeconomic (R~CA-S) hypothesis.
R~CA-S hypotheses tend to be genetic ones. They typically propose that the
observed associations between cognitive ability and race are mediated by genes
(Lynn, 2006, 2008). According to typical R~CA-S models, over evolutionary
history environmental factors led to differential selectionfor cognitive ability. Since
cognition-related genetic differences are transmitted across generations, and
since biological races are defined by ancestry, a genetic R~CA-S hypothesis
predicts a robust association between racial ancestry and cognitive ability that is
independent of geography. While R~CA-S hypotheses tend to be genetic ones,
they need not be. Indeed, pre-Darwinian racial hypotheses were frequently
epigenetic
3
(Fuerst, 2015). Apart from genes and epigenetic marks, other factors
could potentially mediate a geographically dependent association between race
and ability, such as cultural factors which are inter-generationally transmitted.
Alternatively, the apparent association between racial ancestry and cognitive
ability could be non-robust and simply a function of covarying natural-
environmental factors.
Though debate about differences between races often gets bogged down on
semantic issues (Fuerst, 2015), there are basic empirical facts to be explained
and some which are in need of further exploration. In need of explanation is the
association, on the global level, between racial ancestry and both cognitive and
socioeconomic outcomes. In need of further investigation is the geographical
robustness of these associations and the extent to which differences in cognitive
2
By “races” we mean descent-based divisions of a species (cf. Kant, 1777; Darwin, 1903;
Hooton, 1946). For an adroit elaboration of the concept, see: Brues (1990). These
divisions are alternatively called “geographical ancestry” groups, “genetic clusters”, or
genetic “populations” (Fullwiley, 2014; Kitcher, 2007; Williams, 2015).
3
Epigenetic effects involve changes in gene function which are not due to changes in
DNA sequences. Most pre-Darwinian race theorists were species realists. For them, all
individuals of the same species had the same essence or structural program. For those
who conceptualized races as intraspecific lineages, enduring between-group
differences were typically attributed to environmentally induced changes which had
become imprinted on genealogical lines. This is analogous to an epigenetic model.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
259
ability can account for the association between racial ancestry and
socioeconomic outcomes. Discussions of race differences are complicated
because, owing to extensive population flows over the last 500 years, most
regions of the world contain peoples of non-indigenous ancestry and because the
individuals in many regions exhibit substantial racial admixture with respect to the
traditionally recognized major geographic races of mankind. The Americas, for
example, contain racial tribrid (three-part) populations, with individuals having
varying degrees of West African, European and Amerindian biogeographic
ancestry (Salzano & Sans, 2014).
Table 1. Historic, biological, genetic and evolutionary variables associated with
international differences in measured cognitive ability.
Correlate
Reference
IQ and edu. attainment associated SNP
frequencies
(Piffer, 2013, 2015a)
Cognitive functioning associated SNP
frequencies
(Minkov, Blagoev, & Bond, 2015)
Immunology associated SNP frequencies
(Woodley et al., 2014)
Immunology associated SNP frequencies
(Fedderke et al., 2014)
Racial classifications (based on genetic
clusters)
(Christainsen, 2013)
Genetic proximity
(Becker & Rindermann, 2014)
Genetic proximity
(Piffer & Kirkegaard, 2015)
Genetic distance from native South
Africans
(León & Burga-León, 2015)
Genetic distance from the US and the UK
(Kodila-Tedika & Asongu, 2015)
Spatial proximity of nations to each other
(Gelade, 2008)
Haplogroups
(Rindermann, Woodley, & Stratford, 2012)
Haplogroups
(Rodriguez-Arana, 2010)
Cranial capacity
(Meisenberg & Woodley, 2013b)
Nasal Index
(Templer & Stephens, 2014)
Time since the origin of agriculture
(Meisenberg & Woodley, 2013b)
Technological development in 1000 B.C.
(Lynn, 2012)
Skin color
(Templer & Arikawa, 2006)
Skin reflectance
(Templer, 2008)
Temperature: annual mean
(Vanhanen, 2009)
Average winter temperature
(Meisenberg & Woodley, 2013b)
Latitude
(Dama, 2013)
Discussion of differences is also semantically complicated because “race” in
the form of self/socially-identified race/ethnicity (SIRE) often does not correspond
well with race in the biological sense of divisions delineated by descent (or now
by ancestrally informative molecular markers). This is particularly true for
MANKIND QUARTERLY 2016 56:3
260
populations with long histories of admixture. For example, Ruiz-Linares et al.
(2014) found a correlation of only 0.48 between European and Amerindian SIRE
and European and Amerindian racial ancestry in a sample from five Latin
American countries. The connection between SIRE and race is, of course,
obvious. Groups such as Amerindian and European Brazilians were originally
relatively unadmixed with respect to major traditionally recognized races (e.g.,
Amerindian and European Caucasoid). Thus they constituted separate races in
the sense of “biogeographic ancestry” groups. Over time they admixed to some
degree, yet the admixed groups retained some variant of the original race names,
thus leading to discordance between SIRE and race in the form of “genetic
ancestry.”
4
This situation lends itself to semantic confusion, with the term “race”
being used at times to refer to SIRE and at times to refer to biological race, even
when the two do not correspond well. This situation has led a number of
researchers to varyingly use terms such as “genetic ancestry”, “geographic
ancestry”, “biogeographic ancestry”, “genetic populations”, “genetic structure” or
“genetic clusters” to denote what historically was called, and what we call, race
(e.g., Mersha & Abebe, 2015).
While the discordance between SIRE and race complicates discussions of
differences, it allows for the testing of certain hypotheses concerning them
(Dalliard, 2014; Rowe & Rodgers, 2005). The presence of racial admixture within
SIRE groups allows one to disentangle the statistical effects of genetic ancestry
from those of SIRE cultural identity (a categorical variable). Put more simply, one
can see if genetically assessed racial ancestry is associated with outcomes
between individuals within SIRE cultural groups. Just as admixture within SIRE
groups allows one to disentangle SIRE associated cultural effects from
genealogical ones, post-1500 population flows and the resultant peopling of
regions with non-indigenous populations enables one, to a degree, to see if the
global associations between race and outcomes transfer across environments.
This allows one to better determine whether the associations are independent of
the natural environment (i.e., tied to biological descent, rather than geography),
as the R~CA-S model would predict.
In this paper, we take advantage of the American regional variation in
European, Amerindian and West African admixture to investigate whether there
is a robust association between ancestry and outcomes. Our goal is modest: we
merely wish to determine if racial ancestry, with respect to three major races, is
4
To note, in Brazil, “color” classifications are not officially conceived racially (i.e., in terms
of ancestry) but rather morphologically, in terms of phenotype. This point is often
overlooked by researchers.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
261
associated with cognitive and socioeconomic outcomes in the Americas and if the
ancestry-socioeconomic association is mediated by cognitive ability as has been
claimed (e.g., Lynn, 2008). We focus on the Americas because two of our
ancestral lineages are not indigenous to the region. As such, for one,
multicollinearity between evolutionary and contemporaneous environments is
attenuated, thus allowing us to better assess the contemporaneous
environmental effects on outcomes. For another, we can see if European and
West African ancestry is similarly associated with outcome differences across the
Americas as between the regions of origin. We focus on major racial ancestries
(European, West African and Amerindian) because, for these divisions, genomic
values are readily available. In Section 3, we examine state-level admixture and
outcomes within Mexico. In Section 4, 5 and 6, we do the same for the US, Brazil
and Colombia. In Section 7, we broaden the analysis to the pan-American
national level. In Section 8, we combine our national and intranational estimates.
In Section 9, we use multiple regressions to see if using more than one racial
ancestry predictor improves predictive power. In Section 10, we test various
theories and hypotheses, specifically the climatic and parasite load ones. In
Section 11, we use automatic modeling to determine which predictors are non-
redundant when predicting outcomes. Specifically, we include geographical-
environmental and institutional predictors. In Section 12, we conduct robustness
analyses by looking at different weighting methods. In Section 13, we test if
cognitive ability mediates the association between ancestry and socioeconomic
outcomes as predicted. In Section 14, we test the hypotheses of international
colorism and international culturalism. In Section 15, as a robustness test, we
examine whether racial ancestry is related to other measures of cognitive
outcomes. In Section 16, as a robustness test, we check whether the factor
structure of our primary socioeconomic outcome measure is similar to that found
when analyzing all countries. Furthermore, we employ Jensen's method of
correlated vectors to examine if the observed correlation between European
ancestry and socioeconomic outcomes loads strongly on our general factor of
socioeconomic outcomes. In Section 17, we take a closer look at the West Indies.
There, we test Lynn & Vanhanen's (2012) tourist hypothesis, we attempt to
validate our cognitive measure, and we re-analyze the data after including non-
sovereign overseas territories. In Section 18, we take a closer look at racial
ancestry and socioeconomic outcomes in the US. In Section 19, we look into the
issue of spatial autocorrelation. Finally, in Section 20, we discuss the overall study
findings and note study limitations.
MANKIND QUARTERLY 2016 56:3
262
1.1. R~CA-S model
In our analysis, the primary outcomes of interest are measured cognitive
ability and a summary measure of socioeconomic outcomes, as according to the
R~CA-S model the main causal route runs from cognitive ability (CA) to
socioeconomic outcomes (S). Thus the statistical link between race and
socioeconomic outcomes is proposed to run mostly through cognitive ability,
though other pathways are allowed. Figure 1 shows an overview of the model.
Cognitive ability and socioeconomic outcomes are modeled as having reciprocal
causal effects (bidirectional arrow). Direct effects of culture and/or genes on
socioeconomic outcomes are allowed. These effects could be mediated through
traits such as personality, risk aversion, aggressiveness, superstitiousness,
wisdom or creativity. Such causes are often posited (e.g., Sternberg, 2013), but
since the proposed causal factors are not easily measured (Meisenberg, 2015),
little research has been done on them. To note, we are not positing that variance
in genomic ancestry (race) directly causes variance in cognitive ability and, by
way of this, in socioeconomic outcomes. Rather, according to the model, ancestry
covaries with causal factors (e.g., genes and/or cultural practices), which lead to
differences in cognitive ability and other traits. Thus we use a tilde (~) and a dash
(-) in the abbreviation of the model: the relationship between race and cognitive
ability is thought to be statistical, while the relationship between cognitive ability
and socioeconomic outcomes is thought to be causal.
Figure 1. R~CA-S model.
We note that the model could easily be expanded to include more pathways.
One could, for example, add a bidirectional relationship between culture and
cognitive ability, e.g., having a culture of reading could increase cognitive ability
(Harlaar, Hayiou-Thomas, & Plomin, 2005; Ritchie, Bates, & Plomin, 2015), which
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
263
could further stimulate the development of a reading culture, perhaps by
increasing intellectual curiosity (Stumm, Hell, & Chamorro-Premuzic, 2011).
However, since we are not in a position to test these pathways, we do not include
them in our stylized model.
2. Methods
The general method employed for this project is as follows:
Stage 1. Compile and aggregate genomic admixture estimates. When
possible validate them using other data.
Stage 2. Compile and aggregate cognitive ability, socioeconomic outcome
and geographic environmental data.
Stage 3. Examine the relationship between genomic ancestry and outcomes
using scatter plots, correlations, semi-partial correlations, multiple regressions
and path analysis.
In selecting countries for regional analysis, to allow for reliable associations,
four conditions had to be met:
Condition 1. There must be substantial inter-regional variation in admixture.
Condition 2. There must be a decent number of cases (n > 12).
Condition 3. Either genomic estimates must be available for the
states/districts or SIRE admixture estimates must be available along with SIRE
state/district percentages.
Condition 4. Reliable cognitive ability and socioeconomic outcome data must
be available.
Regarding Stage 2, we conducted a series of studies to obtain good
summary measures of socioeconomic well-being. There are many different
socioeconomic outcomes that one can look at. When such socioeconomic
variables are factor analyzed, though, a general socioeconomic factor (S factor)
emerges such that, most of the time, desirable outcomes load positively and
undesirable outcomes load negatively on it. Previous research has found S
factors at the nationallevel (Kirkegaard, 2014b), the state/region/department level
(Carl, 2015; Kirkegaard, 2015b, 2015d, 2015e, 2015f, 2015g, 2015h, 2015j,
2015k, 2015l) and the city district level (Kirkegaard, 2015a). Analyses of national
and state level data showed that Human Development Index (HDI) scores
correlated strongly with S factor scores at typically >.9. Justified by the very strong
correlation between HDI and S, when we could not obtain S scores for particular
units, we employed HDI scores as reasonable proxies.
Also regarding Stage 2, humans have many cognitive abilities, yet factor
analysis demonstrates the existence of a general factor – which has been called
(general) intelligence, general mental ability, general cognitive ability, or simply g
MANKIND QUARTERLY 2016 56:3
264
(Kirkegaard, 2014a). This general factor has been found to be the most predictive
cognitive ability for many outcomes (Carroll, 1997; Jensen, 1998; Ree, Carretta,
& Green, 2003). It is measured by virtually all cognitive tests (Jensen, 1998),
though not all tests measure it equally well. Large and diversified IQ batteries are
good measures of it (Johnson, Nijenhuis, & Bouchard, 2008), however high-
quality IQ estimates are not available for many countries, let alone for provinces
within them (Rosas, 2004). For this reason, we use scholastic tests which are
known to correlate strongly with IQ, especially at the group level (Condon &
Revelle, 2014; Frey & Detterman, 2004; Rindermann, 2007), and for which there
are high-quality data, e.g., from the OECD's PISA program. We note that it is not
clear whether differences in our academic measures actually index differences in
general cognitive ability.
5
Undoubtedly, though, they index differences in some
type of cognitive ability. Since we are unsure about the psychometric nature of
the differences being discussed, we will simply refer to them as “cognitive ability”
differences, without the implication that we are necessarily dealing with variation
in general intelligence.
Indeed, we think that it is implausible that the cognitive differences being
discussed – see appendix A – solely represent average general intelligence. For
example, given the performance of first and second generation Surinamese in the
Netherlands (e.g., Lynn, 2008), individuals who were not particularly migrant
selected, it is highly unlikely that Surinamese in Suriname have an average
general cognitive ability score of 74 (on the standard IQ metric). Our default model
would be that there are average individual-level general cognitive ability
differences between nations and regions. These induce socioeconomic
differences, such as differences in the quality of schooling, which in turn lead to
expanded cognitive differences, broadly conceived. These latter differences are
then indexed by our measures of national and subnational ability.
3. Regional racial admixture in Mexico
Mexico is a racially admixed country which exhibits substantial regional
variation in mean admixture, cognitive ability and socioeconomic outcomes.
There are 31 states and a federal district. Since federal districts are often outliers
(Kirkegaard, 2015d, 2015e), we excluded the federal district from all analyses
except the admixture plot.
5
Measurement invariance needs to be examined for cross-national data, or at least by
proxy using Jensen's method (Jensen, 1998).
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
265
3.1. Data sources
We created regional admixture, cognitive ability and socioeconomic variables
as discussed below. Data with sources and computations are available in
Supplementary File 1.
3.1.1. Admixture estimates
Admixture estimates were copied from Salzano and Sans (2014), Moreno-
Estrada et al. (2014) and Salazar-Flores et al. (2015). Salzano and Sans (2014)
provided a review of older studies; Moreno-Estrada et al. (2014) and Salzano and
Sans (2014) provided new results based on multi-state studies. The unweighted
intercorrelations for the three sources were determined. For European and
Amerindian ancestry the Pearson correlations were above 0.83; for African
ancestry they ranged from -0.60 to 0.46. Regarding European and Amerindian
ancestry, the values exhibited a high reliability, thus justifying their combination.
The African ancestry values were unreliable due to the noisiness of the measures
in conjunction with the limited range of African ancestry (2.04% to 11.00% in the
final estimates). The minimal variance ensures that African ancestry itself will
have little effect on state-level outcomes and makes it likely that any found
association is spurious.
Admixture values were averaged for each state. This provided data for 18 of
Mexico's 31 states. Missing data points were then estimated based on the
measured admixture of adjacent regions. Details pertaining to how this was done
for each region are provided in Supplementary File 1B. Figure 2 shows a ternary
plot of the state racial admixture estimates (Hamilton, 2015).
Figure 2. Admixture estimates for Mexican states. Admixture proportions are
read counterclockwise from each corner: % African on the left side, % Amerindian
on the right side and % European at the base of the triangle.
MANKIND QUARTERLY 2016 56:3
266
We see that Mexico is mainly European and Amerindian; the amount of
African ancestry is low. This situation implies that multiple regressions, which take
into account all three ancestries, should not show substantial incremental
predictive power over a zero-order correlation using just European or Amerindian
ancestry. We'll return to this issue in Section 9.
3.1.2. Cognitive ability estimates
PISA scores averaged across math and reading tests and across the years
2003, 2006, 2009 and 2012 were computed for each state. The original values
can be found in Supplementary File 1B. State scores were highly correlated
across years with an average correlation of 0.81. This justified the use of cross-
year average scores. Since there were missing values for certain years and since
average PISA scores varied by year, point scores could not be directly averaged.
Instead, deviation scores relative to the Mexican national mean were computed
for each year and then averaged. To calculate these deviation scores, we used
the international average PISA standard deviations for the years and tests used.
These scores were then transformed into achievement quotient scores with a
standard deviation of 15 set relative to the national achievement quotient
(NACHQ) score of Mexico. The Mexican NACHQ was, in turn, set relative to a
UK mean of 99 following Lynn & Vanhanen's (2012) equalization of means and
standard deviations method.
6
For validation of the cognitive estimates, 2002 and
2005 average state level short form Raven’s Matrices scores, from the Mexican
Family Life Survey, were also computed and correlated with the cross year
average PISA scores. The original Raven's scores came from Salomón &
Briseño, (n.d.). The state level Raven-PISA correlation was 0.66 (N=14). This
correlation was likely attenuated by the relatively poor reliability of the Raven's
scores; the correlation between the 2002 and 2005 state average Raven's scores
was only .69.
3.1.3. Socioeconomic outcomes
As discussed in Section 16, when analyzing socioeconomic outcomes, a
general factor tends to emerge. Since no Mexican state S factor study existed,
one of us conducted a thorough study (Kirkegaard, 2015d), using outcome data
from approximately 2005 to 2015. We found that year 2010 HDI correlated very
strongly (r = .93) with the S scores based on 23 diverse indicators.
6
The UK IQ was set to 100. The ACHQ came out to one point below that. We did not
adjust upwards.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
267
3.2. Zero-order correlations and scatter plots
The zero-order correlations are shown in Table 2. Below the diagonal,
weighted correlations are presented. For this and subsequent analyses, we used
the square root of the regions' populations as weights. We discuss the matter of
weighting in Section 12 and show results generated using alternative weighting
methods.
Table 2. Zero-order correlations for Mexican states. Weighted correlations below
the diagonal. N=31 for all cases. CA, cognitive ability (PISA score); S,
socioeconomic (S) factor; HDI, human development index.
CA
S
HDI
African%
Amerindian
%
European%
CA
0.77
0.74
0.36
-0.57
0.51
S
0.80
0.93
0.21
-0.69
0.64
HDI
0.78
0.94
0.22
-0.65
0.60
African%
0.42
0.24
0.28
0.08
-0.22
Amerindian%
-0.59
-0.71
-0.65
0.03
-0.99
European%
0.52
0.67
0.60
-0.18
-0.99
Unsurprisingly, given the ternary plot above, European% and Amerindian%
were almost perfectly negatively related. To facilitate comparability across
countries, we employed European% in our analyses. European%, cognitive ability
scores and socioeconomic outcomes were all substantially positively related, as
expected, given the R~CA-S model. Using S instead of HDI produced somewhat
stronger results, but, as expected, HDI acted as an acceptable proxy for S.
Regarding African%, the results were unexpected and are likely to be flukes.
Figures 3 and 4 show the scatter plots for European% and cognitive ability scores
and European% and S factor scores, respectively.
MANKIND QUARTERLY 2016 56:3
268
Figure 3. European ancestry% and cognitive ability scores for Mexican states.
Figure 4. European ancestry% and S factor scores for Mexican states.
4. Regional racial admixture in the United States
We now turn to the United States. For this country, there are high-quality data
concerning state-level cognitive ability scores and socioeconomic outcomes, but
state-level genomic admixture data are wanting. Bryc et al. (2015) did provide
some SIRE group admixture estimates by states, but data points were missing
for many states and the samples (from the personal genomics company 23&Me)
were not particularly representative. Thus, we estimated state-level admixture
data using SIRE rates in conjunction with SIRE genomic admixture data.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
269
4.1. Data sources
We created regional admixture, cognitive and socioeconomic variables as
discussed below. The raw data are available in Supplementary File 2.
4.1.1. Admixture estimates
We computed the state racial ancestry estimates using 2010 census SIRE
data, in conjunction with the SIRE admixture estimates provided by Shriver et al.
(2003) and Klimentidis et al. (2009). Table 3 depicts the racial admixture for the
SIRE groups.
Our state level estimates are crude, as there is regional variation in SIRE
admixture. For example, Hispanics in the Northeast are more ancestrally
European than are those in the Southwest (Bryc et al., 2015). Nonetheless, these
estimates are reasonable approximations. The US has non-trivial Asian
American, mixed ethnic and Pacific Islander populations. For three reasons this
is problematic: one, no good admixture data are available for these groups, two,
the Asian and Pacific Islander SIRE groups have unique cognitive and outcome
profiles relative to African, White, Hispanic and Native American SIRE groups
(see, for example: Fuerst, 2014) and three, the Asian and Pacific Islander SIRE
groups largely belong to major races different from the three being discussed in
this paper. These three problems taken together render state comparisons
problematic.
Table 3. SIRE admixture estimates.
Ethnorace
European%
African%
Amerindian%
Source
White
96.1
3.2
0.7
Shrivers et al.
Black
18.0
82.0
0.0
Shrivers et al.
Native-
American
25.3
2.9
71.8
Klimentidis et
al.
Hispanic
61.6
5.7
32.7
Klimentidis et
al.
We decided that the soundest method was to exclude these groups. This
was done by using only White, Black, Native American and Hispanic SIRE
percentages to compute state admixture percentages and by dividing the
resultant admixture percentages by the White, Black, Native American and
Hispanic SIRE sum percentages. Thus, the state formulas were:
European genomic% =
(White*96.1+Black*18+Native*25.3+Hispanic*61.6)/(White+Black+Native+
Hispanic)
MANKIND QUARTERLY 2016 56:3
270
African genomic% =
(White*3.2+Black*82+Native*2.9+Hispanic*5.7)/(White+Black+Native+Hispanic)
Amerindian genomic% =
(White*0.7+Black*0+Native*71.8+Hispanic*32.7)/(White+Black+Native+
Hispanic)
By this method, outcomes are modeled as varying due only to factors related
to European, African and Amerindian ancestry. We excluded Hawaii from the
analyses since the majority (51%, or 68% when including individuals who
reported two or more races) of Hawaiians reported East Asian and Pacific
Islander ethnicity.
Figure 5 shows the ternary plot of the state admixture estimates. We see that
there is substantial admixture, and unlike with Mexico, it is not solely along one
axis. If African and Amerindian ancestry is associated with different levels of
cognitive ability and S, then multiple regressions should give additional predictive
power for this dataset.
Figure 5. Ternary plot for admixture in the US.
4.1.2. Cognitive ability
There are no PISA scores for all states in the US. One alternative is to
estimate cognitive ability scores from NAEP achievement scores. McDaniel
(2006) used data from multiple years to estimate state IQs, scaled to a national
mean of 100. These estimates have subsequently been employed in a number of
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
271
analyses. In addition to McDaniel’s scores, we computed average 2009 and 2013
NAEP scores based on those provided by science blogger The Audacious
Epigone.
7
The scores provided by The Audacious Epigone were adjusted up so
as to be set relative to a national IQ of 100. The average of Audacious Epigone's
scores were then averaged with McDaniel’s and set relative to the national US
NACHQ score.
4.1.3. Socioeconomic outcomes
There is no official set of HDI scores for the US, but there is the American
Human Development Index (AHDI, see http://www.measureofamerica.org/).
While it is not set on the same scale as the HDI values and thus is not very useful
for international comparisons, we nonetheless included the 2010 AHDI scores in
our US analysis. Additionally, one of us undertook an S factor study of the US
(Kirkegaard, 2015e) and found an S factor using 24 diverse indicators. We
excluded Washington DC in line with Kirkegaard (2015e) and the Mexican
analysis in Section 3. Due to some facts which will be discussed later, one of us
undertook a new and larger S factor analysis for the US (81 indicators based on
2010 data, Kirkegaard, 2015b). Between the datasets, the S factor scores
correlated .961. We used the scores from the second paper as it was based on
more indicators.
4.2. Zero-order correlations and scatter plots
Zero-order correlations are shown in Table 4. Regarding social outcomes,
the association between European% and S scores was substantially larger than
that between European% and AHDI scores, this despite the fact that our S scores
correlated at .94 with AHDI scores. The discrepancy is likely due to the relative
homogeneity between states in the few variables that were used to compute the
AHDI scores. We also note that AHDI correlated more weakly with CA (.52) than
did S (.70), which also supports the hypothesis that something is amiss with the
AHDI numbers.
Table 4. Zero-order correlations for the US. Weighted correlations below the
diagonal. N=49 for ancestry variables, N=50 for cognitive ability (CA),
socioeconomic (S factor) and American Human Development Index (AHDI).
7
The relevant blog posts can be found here:
http://anepigone.blogspot.com/2006/07/better-state-iq-estimates.html
http://anepigone.blogspot.com/2010/05/state-iq-estimates-2009.html
http://anepigone.blogspot.co.at/2015/01/state-iq-estimates-2013.html.
MANKIND QUARTERLY 2016 56:3
272
CA
S
AHDI
African
%
Amerindian
%
European%
CA
0.75
0.57
-0.50
-0.32
0.67
S factor
0.70
0.94
-0.39
-0.11
0.44
AHDI
0.52
0.94
-0.21
0.08
0.16
African%
-0.40
-0.37
-0.21
-0.25
-0.84
Amerindian
%
-0.40
-0.06
0.15
-0.27
-0.31
European%
0.64
0.39
0.12
-0.81
-0.35
It is notable that despite the weak correlation between racial ancestry and
AHDI between states, there were substantial intrastate differences between SIRE
groups. The AHDI differences between Africans and Hispanic Americans, on the
one hand, and White Americans, on the other, were roughly three times the
magnitude of the average AHDI difference between states. Moreover, the
magnitude of these intrastate differences was fairly constant across states. The
substantial association between SIRE and AHDI on the intrastate level lends itself
to the prediction that racial ancestry will be a major predictor at the state level.
However, as we have seen, this is not the case. More research is needed on this
issue. We return to it briefly in Section 18.
As before, we examine the scatter plots to visualize the results. These are
shown in Figures 6 and 7.
Figure 6. Scatter plot of European% and cognitive ability scores for the US.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
273
Figure 7. Scatter plot of European% and S factor scores for the US.
With regard to S, Maryland (MD) and West Virginia (WV) represent two major
outliers. The values for these two states appear to be correct. The higher than
expected S score for Maryland could be due to its proximity to the capital district,
which is an enclave between Maryland and Virginia. Many affluent individuals
commute from Maryland to the capital district. The weighted correlation between
S scores and European ancestry without these two outliers is .50 instead of .39.
Generally, it seems that state-level socioeconomic outcomes are driven largely
by factors different from those correlated with the major racial lineages analyzed
here. It is possible that intra-European ancestry is associated with state-level
outcome differences. European ethnic groups tended to settle in different parts of
the US (“American Nations Series,” 2013; Fischer, 1989; Woodard, 2012) and
there is some evidence that regional European ancestry is associated with
regional outcome differences in the US (Fulford, Petkov, & Schiantarelli, 2015).
MANKIND QUARTERLY 2016 56:3
274
5. Regional racial admixture in Brazil
Brazil has 26 states and a federal district. The number of states and the
amount of variation in ancestry between them is sufficient for the type of analysis
which we are conducting.
5.1. Data sources
As with the other studies, we compiled data from multiple sources. The data
are available in Supplementary File 3.
5.1.1. Admixture estimates
For the Brazilian estimates, we averaged the state admixture values reported
in Rodriguez de Moura et al.'s (2015) meta-analysis. This provided estimates for
16 of Brazil's 26 states. The values for the 10 remaining states were filled in with
the respective average values of the five major Brazilian regions (North,
Northeast, Central-West, Southeast and South). This was justified because state
variation in admixture clusters regionally. To validate these estimates, we
correlated the European state ancestry estimates with White (Branco state
SIRE%. The state SIRE percentages were obtained from the 2012 The Brazilian
Institute of Geography and Statistics (IBGE) survey (http://www.sidra.
ibge.gov.br/bda/tabela/listabl.asp?z=t&c=262). The correlation was 0.79 (N=26,
weighted). The ternary plot is shown in Figure 8.
Figure 8. Ternary plot for admixture in Brazilian states.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
275
5.1.2. Cognitive ability
For cognitive ability scores, we used the average of the math and reading
PISA 2012 scores (OECD, 2014).
5.1.3. Socioeconomic outcomes
Both S factor and HDI scores are available for Brazilian states for the years
1991, 2000 and 2010 (Kirkegaard, 2015k). These all correlate very strongly
(range .90 to .98). We used the S scores from 2010, because it was based on the
largest number of indicators (26) and our socioeconomic data for our other
countries came from around 2010. We also used the 2010 HDI values.
5.2. Zero-order correlations and scatter plots
The correlation matrix is shown in Table 5. The results are similar to those
for Mexico, in that European ancestry is strongly related to cognitive ability
scores, S scores and HDI scores. In this bivariate analysis, African ancestry was
not strongly negatively associated with outcomes. As will be seen in Section 9,
the negative associations become more robust when the covariance between
African and Amerindian ancestry is taken into account. The scatter plots for
European ancestry and cognitive scores and European ancestry and S factor
scores are shown in Figures 9 and 10, respectively.
Table 5. Zero-order correlations for Brazil. N=26 in all cases.
CA
S
HDI
African%
Amerindian%
European%
CA
0.81
0.71
-0.12
-0.65
0.74
S
0.84
0.95
-0.17
-0.65
0.77
HDI
0.78
0.97
-0.30
-0.47
0.67
African%
-0.17
-0.19
-0.28
-0.34
-0.31
Amerindian%
-0.65
-0.67
-0.54
-0.29
-0.79
European%
0.73
0.76
0.70
-0.44
-0.73
It is notable that Ceará seems to be a major outlier with regards to both
cognitive and socioeconomic outcomes. If it is excluded, the correlations become
.72 and .80 for CA and S, respectively. As shown in Table 5, Amerindian% is
more negatively associated with outcomes than is African%. This could be
because provinces with high Amerindian ancestry tend to be the more remote
ones.
MANKIND QUARTERLY 2016 56:3
276
Figure 9. Scatter plot of European% and cognitive ability scores for Brazilian
states.
Figure 10. Scatter plot of European% and S factor scores for Brazilian states.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
277
6. Regional racial admixture in Colombia
Colombia is located in the northern region of South America. It has 32
departments and a capital district with a total population of approximately 50
million. Like the three previously discussed countries, Colombia shows significant
spatial variation in admixture.
6.1. Data sources
The data are available in Supplementary File 4.
6.1.1. Admixture estimates
Estimating regional admixture for Colombia’s 32 departments is not without
difficulty, since existent studies provide admixture data for only half of the
departments. Problematically, specific estimates for the eastern and
southeastern departments, which are reported to have high Amerindian
components, are not available. Nonetheless, we were able to construct a set of
admixture estimates. First, 18 departmental estimates were copied from Salzano
and Sans’ compilation (Salzano & Sans, 2014). The ancestry ratios from Salzano
and Sans’ two main sources correlated at 0.9, justifying the use of the combined
estimates. Second, missing values were filled in based on regional values and
based on Ruiz-Linares et al.’s and Rodriguez-Palau et al.’s admixture/SIRE maps
(Rodriguez-Palau et al., 2007; Ruiz-Linares et al., 2014). For example, estimates
for Caribbean-Pacific departments were averaged and used to fill in missing data
for other departments in this region. Specific computations are provided and
explained in Supplementary File 4C and 4E. To validate these estimates we
computed ones using SIRE (Afro descent, Indigenous and “no ethnic” plus Roma)
data from the 2005 census (“Censo Nacional,” 2005), in conjunction with average
SIRE admixture percentages as reported in all locatable studies. The correlations
between the two estimates for African, Amerindian and European ancestry were,
respectively, 0.81, 0.79 and 0.67. The relatively low correlation between our SIRE
admixture derived European estimates and our district genomic ones likely
relates to the imprecise nature of the SIRE categories. We ran the analysis using
both sets of estimates and came up with comparable results. Below, we report
results based on the genomic estimated district ancestry data (not the SIRE x
admixture based estimates). Figure 11 shows the admixture plot.
MANKIND QUARTERLY 2016 56:3
278
Figure 11. Ternary plot for admixture in Colombian states.
6.1.2. Cognitive ability
For cognitive scores, grade 5 and 8 SABER ("knowledge") math and reading
exam scores were used (sources given in Supplementary File 4F). For each year,
these were transformed into deviation scores. The average of the 2012 and 2014
exam scores correlated at about 0.85 with the average of the 2003 and 2005
scores. The 2012/2014 and 2003/2005 scores were on different metrics, and
yearly standard deviations were not available for the 2003 and 2005 scores (given
the source used), so, in the end, only the 2012 and 2014 average scores were
employed. Following the previously discussed method, scores were converted to
ACHQs relative to a UK mean of 99.
6.1.3. Socioeconomic outcomes
No Colombian S factor study had previously been conducted. For this
reason, one of us carried out such a study (Kirkegaard, 2015j). The study
extracted an S factor from 16 diverse socioeconomic variables. The variables
were based on data from 2005. Results were generally in line with previous
studies from other countries. In addition to S scores, we included 2010 HDI
scores.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
279
6.2. Zero-order correlations and scatter plots
Table 6 shows the zero-order correlations. As in the other countries, we see
strong correlations between European% and outcomes. The cognitive ability
association is driven by a strong negative relation between African% and ability.
The S and HDI associations are driven by a negative association split between
Amerindian% and African%. Figures 12 and 13 show the scatter plots for
cognitive ability and S factor scores, respectively.
Table 6. Zero-order correlations for Colombian departments. N=33 in all cases.
CA, cognitive ability; S, socioeconomic (S) factor; HDI, Human Development
Index.
CA
S
HDI
African
%
Amerindian
%
European
%
CA
0.70
0.67
-0.62
0.04
0.82
S
0.65
0.83
-0.35
-0.14
0.62
HDI
0.64
0.84
-0.20
-0.25
0.51
African%
-0.70
-0.34
-0.23
-0.70
-0.76
Amerindian
%
0.26
-0.05
-0.19
-0.76
0.07
European%
0.81
0.53
0.47
-0.89
0.39
Figure 12. Scatter plot of European% and cognitive ability scores for Colombian
Departments.
MANKIND QUARTERLY 2016 56:3
280
Figure 13. Scatter plot of European% and S factor scores for Colombian
departments.
7. National racial admixture in the Americas
Previously, we used divisions within countries as units of analysis. Now it is
time to zoom out and take a look at all countries in the Americas. Doing so does
not drastically change the sample size because there are only 35 sovereign
nations. It is worth noting upfront that many of these 35 are small island nations
for which good data points are hard to come by.
7.1. Data sources
7.1.1. Admixture estimates
Estimating and validating admixture at the national level is more complex
than at the intranational level, and thus, the discussion of the procedure and the
results necessitates more space than used in the previous sections.
7.1.1.1. Genomic estimates
Average genomic ancestry percentages were created for the 35 sovereign
American nations, based on the data available as of September 2014. Most
admixture studies decomposed racial ancestry into three components: European,
African and Amerindian. For some countries, a significant fraction of the
population had other major ancestral components, such as South Asian, East
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
281
Asian or Oceanian. As such, an “other” category was included. Not all possible
studies were used in creating the national estimates. Rather, estimates from the
most methodologically sound and nationally representative studies were. To
avoid problems with sex-biased dispersion and mating we used only autosomal
based estimates, omitting results obtained with Y-chromosomal and
mitochondrial DNA. Roughly 70 different study estimates were employed to
create the 35 national ones. For some countries up to four sets of estimates were
averaged, while for others only one was available. For Belize and Paraguay, no
national level data were available. Estimates instead were calculated based on
those of the surrounding nations and the regions within those nations. This was
justified given a number of historical facts related to the peopling of these two
countries. Data was also unavailable for Guyana and Antigua and Barbuda. CIA
based SIRE estimates were used instead for these two countries.
For Trinidad and Tobago, values were available only for the SIRE Black
population, which constitutes approximately 40% of the total. In making a Trinidad
and Tobago estimate, it was assumed that SIRE South Asian Indians had a
similar level of European ancestry as did SIRE Blacks. The “mixed” group was
treated as one half SIRE Black and one half SIRE South Asian Indian. US national
admixture estimates were created by weighing the national SIRE percentages by
the admixture percentages for each SIRE group. Asians (~4.5% of the population)
were treated as 100% other. Pacific Islanders and Mixed race individuals (~1.5%)
were discounted. For Canada, the national estimate was made using US SIRE
admixture values in conjunction with Canadian SIRE percentages. To make
ancestry percentages more comparable across countries, national admixture was
expressed in terms of the three main source populations: European/West
Caucasian, African and Amerindian. Computations and sources are provided in
Supplementary File 5B.
7.1.1.2. Self-identified race
CIA World Factbook SIRE data were used to create national racial averages,
except in the case of Canada, for which the 2011 Canadian census data were
used. As with genomic ancestry, European, African, Amerindian and other
percentages were computed. Specific ethnic identities such as “Spanish” and
“Aymara” were grouped into major regional racial identities. In regards to hybrid
identities such as Mestizo and Mulatto, percentages were split by parental group
(e.g., one half European and one half Amerindian). For tribrid identities such as
Montubio, percentages were split three ways. Assumptions had to be made for a
number of nations. For example, Costa Rica was said to be 83.6% “White and
Mestizo”; this was treated as 83.6 percent Mestizo (i.e., as 41.8% European and
MANKIND QUARTERLY 2016 56:3
282
41.8% Amerindian). St. Lucia was said to be 85.3% Black, 3.9% White and 10.9%
Mixed; it was assumed that the “mixed” group was mixed Black and White.
Judgment calls such as these are noted in Supplementary File 5D.
7.1.1.3. Putterman and Weil’s World Migration Matrix
Ancestry components were also computed based on Putterman and Weil’s
World Migration Matrix for 165 countries (Putterman & Weil, 2010). For each
nation, the matrix gives the percentages of ancestors hailing from every nation in
the year 1500. Putterman and Weil based their estimates on a mix of genetic
studies, immigration data and other sources. The four ancestral components were
created by summing the year 1500 national ancestry components into the broad
categories of European, African, Amerindian and other (which includes Middle
Easterner and North African). Computations are provided in Supplementary File
5C.
7.1.1.4. Skin reflectance
National skin reflectance data were provided by Gerhard Meisenberg
(Personal Communication, 2014). These estimates have previously been used in
a number of analyses (e.g., Meisenberg & Woodley, 2013a). For this variable,
higher values correspond with brighter skin. Values are provided in
Supplementary File 5A.
7.1.1.5. Results
The genomic admixture estimates, SIRE estimates, Putterman and Weil's
ancestry estimates and skin reflectance scores are available in Supplementary
File 5. Results are shown in Tables 7 through 9 below. Note that for this validation
check, we excluded the two purely SIRE imputed genomic values (Guyana and
Antigua-Barbuda) from the genomic% variable. European, African and
Amerindian genomic estimates strongly correlate with estimates based on racial
identification and with Putterman and Weil’s World Migration Matrix. As expected,
White/European ancestry is a strong positive predictor of national reflectance,
while Black/African ancestry is a strong negative one.
All correlations are substantially positive. The lowest are those between the
skin reflectance variable and the others. This is because skin reflectance does
not discriminate well between European and Amerindian in relation to African
ancestry. When we statistically control for Amerindian ancestry with a semi-partial
correlation, the correlation is .73.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
283
Table 7. Validation correlations for European ancestry. Weighted below the
diagonal. N's=25-33.
Genomic
Euro%
CIA
White%
Putterman
Euro%
Skin
reflectance
Genomic
Euro%
0.89
0.89
0.73
CIA
White
0.88
0.96
0.85
Putterman
Euro
0.90
0.96
0.75
Skin
reflectance
0.66
0.75
0.66
Table 8 shows the analogous correlations for African ancestry. Again, the
correlations are very strong. The skin reflectance correlations are strong because
African ancestry is being contrasted with both European and Amerindian ancestry
and because both of the latter are associated with relatively high reflectance
levels.
Table 8. Validation correlations for African ancestry. Weighted below the
diagonal. N's=25-33.
Genomic
Afri%
CIA
Black%
Putterman
Afri%
Skin
reflectance
Genomic
Afri%
0.96
0.96
-0.94
CIA
Black%
0.95
0.93
-0.94
Putterman
Afri%
0.95
0.92
-0.94
Skin
reflectance
-0.89
-0.86
-0.91
Finally, we turn to Amerindian ancestry which is shown in Table 9. As before,
the ethnicity and genomic variables are very strongly correlated but, as expected
for the reasons noted above, skin reflectance is not. The weighted semi-partial
correlation between Amerindian genomic ancestry and skin reflectance
controlling for European genomic ancestry (and thus relative to African ancestry)
is .70. Figure 14 shows the admixture plot.
MANKIND QUARTERLY 2016 56:3
284
Table 9. Validation correlations for Amerindian ancestry. Weighted below the
diagonal. N's=25-33.
Genomic
Amer%
CIA
Amer%
Putterman
Amer%
Skin
reflectance
Genomic
Amer%
0.87
0.9
0.48
CIA
Amer%
0.88
0.94
0.49
Putterman
Amer%
0.91
0.96
0.35
Skin
reflectance
0.11
0.07
0.09
Figure 14. Ternary plot for admixture in sovereign states.
7.1.2. Cognitive ability
For cognitive ability scores, we used Gerhard Meisenberg's (2014/2015)
achievement estimates (in preparation). These were based on international test
scores, regional test scores and national GMAT/GRE scores. These scores came
from tests given between 1997 and 2013. As to these, Meisenberg (personal
communication, 2015) noted that:
[The strategy] was to form the averages of each TIMSS and PISA
assessment first. PISA had to be adjusted for the changing standards in different
assessments (published scores are based on performance in participating OECD
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
285
countries, but different OECD countries participated in different years). After
adjusting TIMSS and PISA to a common metric (500/50 for countries participating
in both TIMSS and PISA), the weighted average was formed. Then minor
adjustments were made for % not in school. Then gaps were filled with results
from several regional and older assessments (SACMEQ in Africa, SERCE in Latin
America etc). Finally the last remaining gaps were filled with data from Graduate
Management Admission Test, Graduate Record Exam and International
Mathematics Olympiad.
These scores were transformed into achievement quotient scores relative to
a UK mean of 99 following Lynn and Vanhanen’s (2012) equalization of means
and standard deviations method.
7.1.3. Socioeconomic outcomes
S scores were available for 142 countries from Kirkegaard (2014b), but for
only 25 of the 35 sovereign American nations. These scores were based on 2012
and 2013 data. For each country there was also 2010 HDI data. As the country-
level correlation between HDI2010 and the S factor scores was .96 (weighted and
unweighted), we felt justified in using HDI scores as proxies when S scores were
missing. We scaled the country S factor scores to the HDI2010 metric using the
following formula:
S on HDI2010 metric = S score * sd(HDI2010 in all country sample) +
mean(HDI2010 in American sample).
For example, the formula for Canada was:
S score Canada = 1.434 * 0.146 + 0.725 = 0.934
The HDI2010 score for Canada is 0.896, so the socioeconomic score for that
country is increased somewhat by the use of the rescaled S factor score. For
countries with no S factor scores (N=10), we filled in data with the countries’
HDI2010 values. We refer to this variable simply as “S” in what follows.
7.2. Zero-order correlations and scatter plots
Table 10 shows the zero-order correlations between cognitive ability, S and
HDI2010 scores, as well as the three ancestry variables. The results are
consistent with those from the previous four analyses. Figures 15 and 16 show
the scatter plots for cognitive ability and S scores.
MANKIND QUARTERLY 2016 56:3
286
Table 10. Zero-order correlations for sovereign nations. Weighted correlations
below the diagonal. N=35. CA, cognitive ability; S, socioeconomic (S) factor; HDI,
Human Development Index.
CA
S
HDI
African%
Amerindian
%
European
%
CA
0.69
0.64
-0.60
0.06
0.74
S
0.87
0.92
-0.22
-0.21
0.48
HDI
0.89
0.96
-0.15
-0.27
0.44
African%
-0.48
-0.34
-0.36
-0.66
-0.72
Amerindian
%
-0.35
-0.41
-0.40
-0.44
-0.05
European%
0.77
0.70
0.71
-0.46
-0.59
Figure 15. Scatter plot of cognitive ability scores and European ancestry.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
287
Figure 16. Scatter plot of socioeconomic (S factor) score and European
ancestry.
Regarding cognitive ability, there are no major outliers – not even the smaller
island states. Regarding S, we see a substantial difference between the results
depending on whether or not we use weighted correlations. As noted prior, this is
due to a number of smaller island states doing fairly well despite having very low
levels of European ancestry. We will return to the question of how to explain this
pattern in Section 17.
8. Sovereign nations and regional racial admixture together
We may wonder if the intranational associations more or less fall on the
international regression line and if plotting all the data together greatly affects the
overall association. If a racial model is correct, the result should hold when
analyzed together. Statistically this is not necessary; it is possible to obtain a
MANKIND QUARTERLY 2016 56:3
288
negative correlation when combining several datasets each of which contains
data with a positive correlation. This situation is visualized in Figure 17.
Figure 17. Simpson's paradox for continuous data.
From http://emilkirkegaard.dk/understanding_statistics/?app=Simpson_paradox
The general phenomenon is called Simpson's paradox. This situation may
seem strange, but it has in fact happened in important cases in medicine and
higher education (Kievit et al., 2013). When it happens, we have evidence that
the relationship between x and y is either not causal or at least is not simple. With
respect to the present analyses, Simpson's paradox would suggest that the
ancestry-outcome relations do not scale up in a simple fashion.
8.1.1. Admixture estimates
The admixture estimates from the above analyses were used without
modifications. Figure 18 shows the admixture plot.
8.1.2. Cognitive ability
The state-level ACHQ deviations, which were already scaled on the international
metric, were added to the sovereign nation NACHQs, which were computed as
discussed previously.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
289
Figure 18. Ternary plot for admixture in countries and states/districts.
8.1.3. Socioeconomic outcomes
When extracting an S factor, the scores will be standardized in the dataset
(i.e., with a mean of 0 and SD of 1). This however means that one cannot directly
combine data when dealing with countries with different mean levels of the
construct. To overcome this problem, we rescaled the state/district level S scores
using the within country HDI score standard deviations and the countries' mean
S scores (Kirkegaard, 2014b).
8.2. Zero-order correlations and scatter plots
Table 11 shows the zero-order correlations. Note that for these we have, as
before, excluded capitals. Additionally, we excluded countries which had
intranational units (Mexico, US, Brazil and Colombia), as otherwise we would be
double-counting them.
The results are comparable to those which we found before. European% is
strongly associated with better outcomes, while both Amerindian% and African%
is negatively associated with them. Weighting had little effect on these results.
Figures 19 and 20 show the scatter plots.
MANKIND QUARTERLY 2016 56:3
290
Table 11. Zero-order correlations. Weighted correlations below the diagonal. N
= 169-170. CA, cognitive ability; S, socioeconomic (S) factor; HDI, Human
Development Index.
CA
S
HDI
African
%
Amerindian
%
European
%
CA
0.89
0.76
-0.50
-0.43
0.82
S
0.91
0.84
-0.24
-0.63
0.79
HDI
0.85
0.90
-0.27
-0.46
0.66
African
%
-0.40
-0.25
-0.28
-0.37
-0.47
Amerindian
%
-0.51
-0.63
-0.49
-0.35
-0.65
European
%
0.78
0.80
0.70
-0.30
-0.78
Figure 19. Scatter plot of European ancestry and cognitive ability scores for
countries and states/districts.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
291
Figure 20. Scatter plot for European ancestry and S factor scores for countries
and states/districts.
In general, we find that results hold when countries and states/districts are
analyzed together. Moreover, the intranational regression lines are not very
divergent from the international one.
9. Taking into account all ancestry: multiple regression
In the previous sections, we simply looked at the correlations between
ancestries and outcomes. However, it is possible that combining two ancestries
in multiple regression would improve the predictive power. In this section we
present standardized betas for all three components. Since the three ancestry
values add up to 1, it is not possible to insert all three at once into a regression
model (perfect multicollinearity). As such, the betas for two at a time are
presented. We have retained models with one predictor for comparison. We also
report adjusted R as a measure of model fit, a metric akin to correlation that is
calculated as the square root of adjusted R2. We caution that standardized betas,
especially when weighted in multiple regression models, are not as easy to
interpret as correlations. We will see examples of this below. For these analyses,
the federal districts are excluded.
MANKIND QUARTERLY 2016 56:3
292
9.1. Mexico
Tables 12 and 13 show the beta matrix, generated using the method
presented in Kirkegaard (2015c), for the regression results.
Table 12. Multiple regression results for cognitive ability in Mexico. Each row
represents one model.
African%
Amerindian%
European%
adj. R
0.42
0.38
-0.61
0.57
0.54
0.50
0.44
-0.62
0.71
0.53
0.64
0.71
-3.51
-2.95
0.71
Table 13. Multiple regression results for S factor scores in Mexico.
African%
Amerindian%
European%
adj. R
0.24
0.16
-0.75
0.70
0.70
0.65
0.26
-0.76
0.74
0.38
0.77
0.74
-2.50
-1.78
0.74
For cognitive ability, the results show that we can improve our prediction
slightly by adding a second ancestry variable: from .57 (Amerindian) to .71.
However, for the S factor scores, there seems to be little gain from using more
than one predictor, as the gain is a mere .04. It is notable that African ancestry is
positively associated with outcomes, meaning that the strong positive
associations between European ancestry and outcomes are driven solely by the
strong negative ones between Amerindian ancestry and outcomes. As noted in
Section 3, not much can be concluded from these results as the African ancestry
estimates were unreliable and as the range and variance in African ancestry was
small.
The results in the last row of both tables seem strange: how can all predictors
be negative, and why is European ancestry negative when it is positive in the
other models? This is something that can happen when there is very little variation
in the predictor not included in the model (in this case African ancestry); refer
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
293
back to Figure 2. Note that one can compare the relative value of the betas across
the three 1-predictor models and within the three 2-predictor models. From this,
one can note their relative order. With regards to positive associations between
outcomes and ancestry, the order is consistently: European > African >
Amerindian. Not much should be made out of the finding that African ancestry is
associated with better outcomes than is Amerindian because of the unreliability
of the African ancestry estimates.
9.2. The US
Now we repeat the same procedure as before for the US. Tables 14 and 15 show
the beta matrices.
Table 14. Multiple regression results for cognitive ability in the US.
African%
Amerindian%
European%
adj. R
-0.42
0.38
-0.37
0.37
0.65
0.63
-0.57
-0.50
0.64
0.32
0.91
0.64
-0.18
0.58
0.64
Table 15. Multiple regression results for S factor scores in the US.
African%
Amerindian%
European%
adj. R
-0.38
0.34
-0.06
0.41
0.37
-0.43
-0.16
0.35
-0.15
0.29
0.35
0.08
0.44
0.35
The apparent missing value in the second row of Table 15 is because the
adjusted R2is negative and one cannot take a square root of a negative number
(without using imaginary numbers). Both with regards to cognitive ability and S
scores, there seems to be no gain from using more than just European ancestry
MANKIND QUARTERLY 2016 56:3
294
as a predictor. With regards to positive associations between outcomes and
ancestry, the order is consistently: European > Amerindian > African.
9.3. Brazil
We repeat the same procedure for Brazilian states. Tables 16 and 17 show
the beta matrices. As above, the results show that using multiple ancestry
variables provides no incremental predictive power. Using European ancestry
alone is sufficient. As in the Mexican analysis, the order in terms of positive
associations is consistently European > African > Amerindian.
Table 16. Multiple regression results for cognitive ability in Brazil.
African%
Amerindian%
European%
adj. R
-0.16
-0.70
0.63
0.74
0.72
-0.37
-0.83
0.72
0.17
0.82
0.72
-0.27
0.56
0.72
Table 17. Multiple regression results for S factor scores in Brazil.
African%
Amerindian%
European%
adj. R
-0.19
-0.76
0.65
0.82
0.75
-0.42
-0.90
0.76
0.17
0.90
0.76
-0.27
0.63
0.76
9.4. Colombia
Again, we repeat the same procedure for Colombian states. Tables 18 and
19 show the beta matrices. As in the other analyses, using two ancestry predictors
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
295
does not seem to give additional predictive power. It is notable that the order is
not consistent: European > Amerindian > African is the norm, but one model has
African > Amerindian, though only by .08 (Model 4, S regression).
Table 18. Multiple regression results for cognitive ability in Colombia.
African%
Amerindian%
European%
adj. R
-0.74
0.69
0.36
0.19
0.88
0.80
-1.28
-0.93
0.80
0.12
0.98
0.80
-0.10
0.91
0.80
Table 19. Multiple regression results for S factor scores in Colombia.
African%
Amerindian%
European%
adj. R
-0.35
0.30
-0.07
0.55
0.51
-0.93
-1.01
0.56
0.64
1.14
0.57
-0.41
0.67
0.56
9.5. Sovereign states
We repeat the same procedure for sovereign states. Tables 20 and 21 show
the beta matrices. Again, we see that using European ancestry alone is as good
as using two predictors. Also, the order is consistently European > Amerindian >
African.
MANKIND QUARTERLY 2016 56:3
296
Table 20. Multiple regression results for cognitive ability in Sovereign nations.
African%
Amerindian%
European%
adj. R
-0.73
0.45
-0.34
0.31
0.79
0.76
-1.19
-0.67
0.77
-0.23
0.72
0.77
0.16
0.90
0.77
Table 21. Multiple regression results for S factor scores in sovereign nations.
African%
Amerindian%
European%
adj. R
-0.67
0.30
-0.52
0.37
0.94
0.69
-1.27
-0.87
0.68
-0.03
0.93
0.68
0.02
0.95
0.68
9.6. Sovereign nations and regions
Tables 22 and 23 show the beta matrices for the combined samples of
states/districts and sovereign countries. This final analysis shows no surprises.
Using European ancestry alone is sufficient for predicting the outcomes. Here,
though, we see that the order, in terms of positive associations, is not consistent.
In Table 22, we see that it is European > Amerindian > African, but in Table 23, it
is either European > African > Amerindian or European > Amerindian > African.
However, the differences between the actual betas are small.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
297
Table 22. Multiple regression results for cognitive ability for countries and
states/districts.
African%
Amerindian%
European%
adj. R
-0.48
0.39
-0.47
0.51
0.77
0.78
-0.81
-0.68
0.80
-0.22
0.72
0.80
0.25
0.98
0.80
Table 23. Multiple regression results for S factor scores for countries and
states/districts.
African%
Amerindian%
European%
adj. R
-0.33
0.24
-0.63
0.62
0.87
0.80
-0.72
-0.82
0.80
-0.01
0.86
0.80
0.01
0.88
0.80
10. Adding non-admixture predictors: theory-driven approach
We now come to the issue of modeling outcomes using a combination of
admixture variables and non-admixture ones. There are a very large number of
ways to do this. Perhaps the most commonly used approach is the theory-driven
one: researchers select variables for the models based on prior beliefs they have
about causal relationships. They may either add them all into the model at once,
or in a stepwise fashion showing that some particular variable retains predictive
power while controlling for other variables. There are some problems with this
approach. First, researchers commonly select predictors that are causally related
to each other and try to have the model treat them as independent variables.
Second, researchers only try some possible models and may also not report all
MANKIND QUARTERLY 2016 56:3
298
the models they tried. This makes it possible for them to bias the results, perhaps
inadvertently (Kirkegaard, 2015c; Zigerell, 2015).
Due to these problems, we include another approach: automatic modeling.
This approach is not without its own problems. For one, there are a number of
methods to pick from: forward selection, backward, best subset, Bayes Factor
selection (Etz, 2015), ridge/lasso regression and more. For another, there is no
consensus as to which model fit criteria one should use: AIC, BIC, adjusted R2
and so on. Applying all of these approaches to every modeling question would
result in hundreds of tables which could not be presented succinctly. Our solution
to the aforementioned problems is to use both general approaches. We will fit
some theory driven models and report some automatic modeling results, leaving
the curious reader to explore the data for himself. In this section, we report some
theory-driven results; in the next, we will report automatic modeling results.
10.1. What sort of predictors are we looking for?
A large set of potential predictors for cognitive and socioeconomic outcomes
exist. Due to the nature of the S factor, a great number of these would actually be
part of the S factor and so would have a part-whole relationship with one of our
dependent variables. Given this, we will focus on what might be called geographic
variables: those related to geospatially specific climatic and ecological factors that
are under minimal human control. We will apply these method driven analyses
only to sovereign nations. Specifically, we will look at the independent effects of
cold weather (“cold demand”) and parasite load on outcomes. Previous research
has centered on these geographical variables in predicting cognitive ability scores
and has shown that these variables are robustly associated with national-level
outcomes (Kanazawa, 2008; Lynn, 2006; Templer & Arikawa, 2006). We will also
include a measure of institutional effects (“Anglo”) in the form of historic British
versus Iberian colonial rule (where British rule is operationalized by English as an
official language and Iberian rule is indexed by Spanish or Portuguese as an
official language).
Proponents of genetic models have interpreted such associations from an
evolutionary perspective. The association of cognitive ability with both latitude
and temperature, for example, has been interpreted in line with cold winters
theory (e.g., Hart, 2007; Kanazawa, 2012; Lynn, 2006; Templer & Rushton,
2011). According to this theory, as humans spread across the globe, some
populations ended up in cold climate regions in which survival was more difficult
than in warm climates. This led to increased selection for cognitive ability, which
accounts for some of the correlation between national ability, climate and latitude.
Similar models have been proposed to explain the association between parasite
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
299
load and outcomes (e.g., Woodley et al., 2014). In this case, increased parasite
load is said to be associated with an investment in immune defenses at the
expense of cognitive development. Over evolutionary time, this situation is said
to have depressed selection for cognitive ability in high parasite load regions. In
line with this model, Fedderke et al. (2014) found that ACP1 alleles associated
with immunological function predicted national outcomes independent of
contemporaneous disease/parasite burden. Regarding institutions, a genetic
explanation would point to intra-European genetic differences. One recent paper
showed that regional outcomes across the US are associated with European
regional ancestry (Fulford, Petkov & Schiantarelli, 2015). The same logic could
apply across countries.
It is not clear to what extent geographic variables index evolutionary effects
and to what extent they index the effects of contemporaneous environmental
factors. It might be thought that our analysis could disentangle causality, as two
of our geographic lineages (Africans and Europeans) hail from outside of the
Americas and, as such, could not be evolutionarily adapted to American
environments. Unfortunately, this is only partially the case, as our ancestry
variables are correlated with our geographic ones. This is the result of various
historical contingencies. For example, owing to their genetic adaptation to
parasite-ridden environments, West Africans were disproportionately imported
into American regions with high parasite and disease loads. Thus African%
correlates with parasite load, which in turn correlates with latitude and warm
weather. To some extent this tangled causality can be demonstrated. If we
assume that the association between European ancestry and skin reflectance
owes predominately to genes, then the degree to which geographic variables
mediate the ancestry-skin reflectance association indicates the degree to which
they covary with genetic effects. Regression results are shown in Table 24. As
can be seen in Model 2, parasite load and cold demand explain some of the
ancestry-reflectance association. Our measure of institutions also explains some
of the association.
To note, the cold demand variable comes from Van de Vliert (2013), who
argued that climate is a key determinant of outcomes. Canada was a major
outlier, so the Canadian value was reduced to 3 SD above the pan-American
average, in line with the recommendation of Field (2013). The parasite load
values come from the World Health Organizations’s (2004) The Global Burden of
Disease (Mathers, Fat & Boerma, 2008). To create the values, we used the
average of the parasite disease rates: numbers 8 (malaria) to 14 (intestinal
nematode infections) in Table 6 of “Age-standardized DALYs per 100,000 by
cause, and member state” (DALY = disability-adjusted life years). The
MANKIND QUARTERLY 2016 56:3
300
Anglophone variable was dichotomously coded 1 for English, as an official
language, and 0 for otherwise. For all analyses in this section, N=35.
Table 24.Regression results for skin reflectance. Standardized betas presented.
Model #
Euro%
Parasites
Cold demand
Anglo
adj. R2
1
0.475
0.410
2
-0.212
0.061
3
0.377
0.353
4
0.060
-0.031
5
0.465
-0.023
0.391
6
0.330
0.219
0.481
7
0.530
-0.344
0.434
8
0.116
0.443
0.348
9
-0.335
-0.426
0.075
10
0.540
-0.731
0.485
11
0.341
0.146
0.296
0.491
12
0.490
-0.178
-0.572
0.453
13
0.344
0.382
-0.762
0.636
14
-0.049
0.523
-0.776
0.470
15
0.342
-0.020
0.376
-0.780
0.623
10.2. Parasites, cold weather and Anglo institutions
In this analysis (Table 25), we look at the independent effect of European
genomic ancestry on cognitive ability scores. Model 1 includes just European
ancestry, while Model 15 includes all of the predictors. It can be seen that
European ancestry remains a robust predictor (Model 15). Parasite load is the
most significant other predictor. Table 26 shows the same models predicting S
factor scores. Again, European origin remains a robust predictor.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
301
Table 25. Regression results for cognitive ability scores. Standardized betas are
presented.
Model #
Euro%
Parasites
Cold demand
Anglo
adj. R2
1
0.794
0.581
2
-0.759
0.554
3
0.676
0.576
4
1.246
0.285
5
0.574
-0.532
0.815
6
0.509
0.429
0.738
7
0.680
0.756
0.675
8
-0.448
0.426
0.687
9
-0.686
0.251
0.548
10
0.586
0.413
0.588
11
0.483
-0.415
0.210
0.839
12
0.570
-0.502
0.107
0.811
13
0.505
0.345
0.391
0.753
14
-0.451
0.427
-0.015
0.677
15
0.483
-0.415
0.210
-0.002
0.834
Table 26. Regression results for S scores. Standardized betas are presented.
Model #
Euro%
Parasites
Cold demand
Anglo
adj. R2
1
0.939
0.477
2
-1.035
0.614
3
0.885
0.586
4
2.115
0.509
5
0.611
-0.793
0.787
6
0.518
0.633
0.681
7
0.695
1.614
0.749
8
-0.655
0.519
0.731
9
-0.726
1.063
0.683
10
0.615
1.241
0.706
11
0.479
-0.622
0.304
0.817
12
0.577
-0.540
0.917
0.842
13
0.507
0.373
1.219
0.803
14
-0.481
0.445
0.785
0.765
15
0.483
-0.444
0.228
0.799
0.857
MANKIND QUARTERLY 2016 56:3
302
10.3. Parasites, HIV and cognitive ability
Parasite load is a particularly problematic “geographic environmental” factor
because it significantly correlates with STD and HIV rates (at 0.47 for our 35
nations). Yet the spread of HIV throughout the Americas, in the 1970s and 1980s,
was subsequent to the origin of cognitive ability differences. Thus we can infer
that, if anything, HIV rate differences are a consequence of cognitive ability
differences. The correlation between HIV rates and parasite load suggests that
this may also be the case for some of the parasite load differences. To put the
point in simpler terms, countries with smarter populations might just do a better
job of controlling parasites. We can attempt to control for this reverse causation
if we allow for some assumptions. If we grant that cognitive ability protects against
HIV and parasite load to the same extent, we can regress out the effect of HIV
from parasite load to gain an estimate of parasite load without the causal influence
of cognitive ability on it. Finally, we can enter this corrected parasite load measure
into our regression model above. Table 27 shows the results.
Table 27. Parasite regression models for cognitive ability. Parasites cor is
parasite load, corrected for reverse effects of cognitive ability on parasite load.
Model #
Euro%
Parasites
Parasites cor
adj. R2
1
0.794
0.581
2
-0.759
0.554
3
-0.530
0.211
4
0.574
-0.532
0.815
5
0.745
-0.435
0.734
6
-1.317
0.702
0.653
7
0.520
-0.720
0.210
0.817
Alone, European ancestry and the original parasite load measure are about
equally good predictors (Models 1-2). The corrected parasite measure is still a
good predictor, but somewhat weaker than the original version (Model 2 vs. 3).
The predictive ability of European ancestry and the original parasite measure
overlap to some extent because their combined adj. R2is less than the sum of
their individual adj. R2's (Models 1 + 2 vs. 4). Interestingly, there is little overlap
between European ancestry and the corrected parasite measure, as their adj. R2
is nearly equal to the sum of the parts (.734 vs. .792). The results are consistent
with a model in which both European ancestry has an effect on parasite load by
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
303
way of cognitive ability and in which parasite load has a direct effect on cognitive
ability. Of course, proponents of a strict parasite model would argue that parasite
prevalence is antecedent to differences in cognitive ability, which are, in turn,
antecedent to differences in HIV rates. We cannot rule out this possibility; we can
merely show that causality is tangled.
10.4. Genetic distance controlling for European ancestry
Recent research has shown that measures of genetic distance from Africa
correlate with cognitive and socioeconomic outcomes at the global level (León &
Burga-León, 2015). We might wonder if such an index better explains outcomes
than our racial ancestry categories. Table 28 shows the results for cognitive
ability, using semi-partial correlations to control for the effect of European
ancestry on genetic distance. For comparison, we also did the same for cold
demand and parasite load.
Table 28. Weighted zero-order correlations and semi-partial correlations
(European ancestry controlled) of non-ancestry variables with cognitive ability.
Genetic distance is the extent of genetic differences from South Africans. N=35
countries. Correlation of European ancestry with cognitive ability is .77.
Secondary variable
Orig. cor
Semi-partial cor
Cold
0.767
0.517
Parasite
-0.753
-0.692
Genetic distance
0.322
0.113
We see that genetic distance from South Africans is a much worse predictor
than European ancestry, and that once European ancestry is taken into account,
the correlation between cognitive ability scores and genetic distance is strongly
reduced. In contrast, our geographic environmental variables do show sizable
correlations after European ancestry has been taken into account. This could
mean that they truly have independent effects. Table 29 shows the same as the
above, but for S. The results for S mirror those for cognitive ability.
MANKIND QUARTERLY 2016 56:3
304
Table 29. Weighted zero-order correlations and semi-partial correlations
(European ancestry controlled) of non-ancestry variables with the socioeconomic
factor (S). Genetic distance is the extent of genetic differences from South
Africans. N=35. Correlation of European ancestry with S is .70.
Secondary variable
Orig. cor
Semi-partial cor
Cold
0.774
0.526
Parasite
-0.791
-0.711
Genetic distance
0.240
0.017
10.5. Path diagram
In Figure 21 below, we depict a weighted path analysis for the sovereign
national analysis. This is our proposed model. Since we lacked cross-temporal
data, we were unable to test causal pathways. Standardized path coefficients are
shown. For parasite load, we use our corrected value from Section 10.3. In the
model, European Ancestry has a strong direct effect on cognitive ability
(βEU→CA = .55) and a smaller effect on S (βEU→S = .22). Cognitive ability has
a modest effect on S (βCA→S = .33) and strong effect on HIV (βCA→HIV = -
.66). Parasite load (corrected) has direct effects on cognitive ability
(βParCor→CA = -.28) and S (βParCor→S = -.15). Likewise, cold stress had
direct effects on cognitive ability (βParCor→CA = .34) and S(βParCor→S = .15).
Tourist expenditure had a negligible effect on S (βTourist→S = .05). Anglo had
no effect on cognitive ability (βAnglo→CA = -.02) but a modest one on S
(βAnglo→S = .31). To keep the figure readable, we excluded residuals. More
detailed results are shown in Appendix B. Regarding the relative effects of the
geographic and ancestry variables, it is worth keeping in mind that the former are
measured much more precisely than the latter. In this situation, regression and
path models will assign independent effects to the more precisely measured
variables even if they have no causal effects, because these geographic variables
capture some of the variance that is not captured by the ancestry variable owing
to measurement error.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
305
Figure 21. Path diagram for European Ancestry. N=35. Fit measures: CFI .93,
GFI .935, SRMR .077, indicating acceptable fit.
MANKIND QUARTERLY 2016 56:3
306
11. Adding non-admixture predictors: automatic approach
We now turn to automatic selection methods. They are called selection
methods because they select which variables to include in the model. Since many
readers are probably unfamiliar with these methods, we will describe them first
(James et al., 2013). The simplest idea is best subset selection. Here we simply
fit every possible model and then assess them by some model fitting criteria. R2
adjusted is a common choice, but alternatives include AIC, BIC and many others.
The reason for the adjusted is that R2increases monotonically when adding
variables, even if they have no real predictive power aside from that which
happens by chance from sampling error (overfitting). The adjusted version
penalizes models by the number of variables they include to avoid overfitting. AIC
and BIC also include a penalty for the number of predictors (James et al., 2013).
We use R2adj. as our primary model fit measure because it is the one readers
are most familiar with and because it has a natural interpretation (percent
variance accounted for).
Results were very similar using the two alternatives. One of the problems
with best subset selection is that it is computationally demanding. This is because
the number of possible models is 2p, where p is the number of variables. So, for
example, if we have 10 predictor variables, we have to fit 1024 models. Another
problem is that it tends to overfit the models, capitalizing on random patterns in
the dataset. Lasso regression is similar to best subset selection in that it involves
all the predictors initially. It differs in that it assigns a penalty for the sizes of the
betas, which results in the betas estimated being generally smaller. Due to the
way the penalizing works, many predictors are shrunken to exactly zero, which
means that they have been excluded from the model entirely (James et al., 2013).
The shrinkage parameter is found through cross-validation (i.e., through splitting
the dataset into parts and using one part to fit the model and using the other part
to test it). Because the cross-validation procedure is based on resampling
methods, the results are not deterministic and will vary somewhat each time the
algorithm is run. To stabilize the results, we ran the lasso regression 500 times
and calculated summary statistics for the results. For these analyses, the
cognitive ability and S factor variables were the same as those used earlier.
11.1. Mexico
11.1.1. Cognitive ability
A recent analysis (Cabeza de Baca & Figueredo, 2014) found that cold
weather (estimated based on latitude, altitude and temperate zone) predicted
regional cognitive ability in Mexico. We obtained the data from this study. We
used best subset selection by testing all possible models and selecting among
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
307
them based on the R2adj. R2fit measurement. It is easy to run all the regressions
using a function developed by one of us (Kirkegaard, 2015c). Standardized betas
and adjusted R2s are shown in Table 30 for all 31 models (5 predictors, 25=32,
and then we skip the null model).
Table 30. Beta coefficients for models predicting cognitive ability scores of
Mexican states.
Model
#
Euro%
Temperature
Latitude
Temperate
Altitude
adj. R2
1
0.544
0.247
2
-0.484
0.172
3
0.476
0.190
4
-0.018
-0.034
5
0.134
-0.018
6
0.497
-0.425
0.383
7
0.519
0.028
0.220
8
0.629
0.195
0.262
9
0.634
0.318
0.310
10
-0.416
0.416
0.316
11
-0.731
-0.369
0.260
12
-0.909
-0.542
0.261
13
0.538
0.156
0.189
14
0.563
0.305
0.244
15
-0.136
0.227
-0.039
16
0.558
-0.429
-0.068
0.362
17
0.438
-0.508
-0.113
0.369
18
0.446
-0.539
-0.137
0.366
19
0.586
0.048
0.196
0.235
20
0.558
0.086
0.322
0.286
21
0.648
0.053
0.285
0.286
22
-0.549
0.326
-0.176
0.311
23
-0.588
0.333
-0.202
0.301
24
-0.998
-0.282
-0.414
0.298
25
0.565
0.009
0.300
0.216
26
0.524
-0.521
-0.101
-0.124
0.347
27
0.550
-0.582
-0.135
-0.180
0.345
28
0.385
-0.626
-0.115
-0.140
0.350
29
0.572
0.086
0.053
0.290
0.261
30
-0.746
0.230
-0.186
-0.223
0.297
31
0.513
-0.696
-0.177
-0.133
-0.197
0.331
MANKIND QUARTERLY 2016 56:3
308
The best model according to adj. R2is Model 6, marked in italics above. This
is the model with just two predictors: European% and temperature. Generally,
when presenting the results, we will not present the entire beta matrix, as it is too
lengthy. We present it above for illustrative purposes. It is worth noting that the
top 8 models (by R2adjusted) include European% as a predictor (Models 6, 17,
18, 16, 28, 26, 27 and 31).
For lasso regression, we included all of the predictors and used cross
validation, as recommended by James et al. (2013) to select the most appropriate
shrinkage parameter value. Table 31 shows the mean beta for each predictor, as
well as how often lasso regression thought that it was identical to 0 (not a useful
predictor at all).
Table 31. Lasso regression results for cognitive ability scores and Mexican
states. 500 runs.
Statistic
Euro%
Temperature
Latitude
Temperate
Altitude
mean
0.199
-0.118
0
0
0
median
0.218
-0.135
0
0
0
sd
0.048
0.040
0
0
0
fraction zero
0.028
0.030
1
1
1
We see that of the geographic environmental predictors, only temperature
was non-redundant. European% was about twice as important a predictor as was
temperature. Both predictors failed to be identified as non-redundant in about 3%
of the runs.
11.1.2. S factor scores
As before, we attempt to predict general socioeconomic performance using
our set of predictors. Table 32 shows the top 5 models.
Table 32. Top 5 models from best subset selection for predicting Mexican
socioeconomic (S factor) scores.
Model #
Euro%
Temperature
Latitude
Temperate
Altitude
adj. R2
18
0.534
-0.436
-0.434
0.442
28
0.469
-0.528
-0.122
-0.437
0.430
1
0.705
0.428
27
0.495
-0.420
0.050
-0.418
0.421
7
0.507
0.219
0.418
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
309
We see that all 5 have European% as a strong positive predictor. In fact, all
top 14 models do. The inclusion of the other predictors was inconsistent across
the top 5 models.
Results for lasso regression predicting S factor scores are shown in Table
33. Lasso regression wants to only keep European% and to do so only in about
90% of the runs.
Table 33. Lasso regression results for prediction of socioeconomic (S factor)
scores of Mexican states. 500 runs.
Statistic
Euro%
Temperature
Latitude
Temperate
Altitude
mean
0.123
0
0
0
0
median
0.120
0
0
0
0
sd
0.058
0
0
0
0
fraction
zero
0.108
1
1
1
1
11.2. The US
11.2.1. Data sources
We collected a dataset of climatic and parasite variables from, respectively,
the website Currentresults.com and Thornhill & Fincher (2014, p. 164).
11.2.2. Cognitive ability
Results for best subset selection are shown in Table 34. European% has a
negative beta in one of the models and is absent from the others, which is odd. A
finding of this sort suggests multicollinearity or model misspecification. Lasso
regression results are shown in Table 35.
MANKIND QUARTERLY 2016 56:3
310
Table 34. Top 5 models from best subset selection for predicting US cognitive
ability scores. Hum mor. = morning humidity, Hum. after. = afternoon humidity,
Para. = parasites.
Temp
Rain
Hum.
mor.
Hum.
after.
Sun
%
Sun
hours
Clear
days
Para.
Euro
%
adj. R²
Model
#
-0.302
0.320
0.239
-0.450
-0.395
-0.745
0.634
387
-0.328
0.251
0.299
-0.471
0.685
-1.067
-0.743
0.634
467
-0.248
0.375
-0.347
-0.448
-0.710
0.627
276
-0.367
0.260
0.329
-0.507
0.781
-1.183
-0.841
-0.119
0.627
504
-0.300
0.220
0.283
-0.465
0.737
-1.042
-0.109
-0.728
0.626
502
Table 35. Lasso regression results for models predicting cognitive ability scores
of states in the US. 500 runs.
Statistic
Temp
Rain
Hum.
mor.
Hum.
after
Sun
%
Sun
hours
Clear
days
Para.
Euro
%
mean
-0.047
0
0
0
0
0
-0.028
-0.006
0.248
median
-0.055
0
0
0
0
0
-0.016
0
0.257
sd
0.025
0
0
0
0
0
0.034
0.019
0.043
fraction zero
0.110
1
1
1
1
1
0.426
0.888
0
The lasso results contrast strongly with those from best subset selection.
Only two environmental predictors are found to be non-redundant; for one of
them, it is found so in only 57% of the runs. Parasite load, the strongest predictor
in best subset selection, was considered redundant in 89% of the runs, while
European% was redundant in 0 out of 500. The nearly opposite results, with
regards to parasite load and European%, suggest that there is something going
on with these two variables and that further analysis is needed.
11.2.3. The sun factor
To see if we could identify the cause of the problem, we inspected the
predictor intercorrelations. These revealed that some variables had very strong
correlations:
Sun% and Sun hours r=.99.
Clear.days and the Sun variables r=.93 and .92.
Euro% and Para r=-.87.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
311
The first three arguably measure the same construct: the amount of time the
sun is shining, whether this is measured in days, hours, or days without
precipitation. Thus, a new variable was created by factor analyzing the sun
variables (all loadings >.92). This approach is somewhat akin to principal
component regression (James et al., 2013, p. 230). We reran the best subset
selection analysis for cognitive ability scores. Table 36 shows the results.
Table 36. Top 5 models from best subset selection for predicting US cognitive
ability scores. Sun factor introduced.
Temp.
Rain
Hum.
mor.
Hum
.after
Sun
Para.
Euro
%
adj.R²
Model
#
-0.307
0.339
0.226
-0.430
-0.373
-0.739
0.630
120
-0.247
0.388
-0.339
-0.437
-0.708
0.625
105
-0.321
0.347
0.234
-0.442
-0.380
-0.776
-0.045
0.621
127
-0.246
0.387
-0.339
-0.436
-0.704
0.005
0.616
124
0.269
-0.419
-0.648
-0.796
0.615
90
The results are substantially the same with regards to parasite load and
European%. Table 37 shows the lasso results. The lasso results are also
substantially the same as before. Both the new sun factor and parasite load are
found to be redundant in nearly all runs (89% and 96%), while European% is
found to be non-redundant in almost all runs (499 of 500). This matter requires
further analysis.
Table 37. Lasso regression results for cognitive ability scores and the US. 500
runs. Sun factor introduced.
Statistic
Temp.
Rain
Hum.
mor.
Hum.
after.
Sun
Para.
Euro
%
mean
-0.039
0
0
0
-0.004
-0.002
0.224
median
-0.040
0
0
0
0
0
0.234
sd
0.035
0
0
0
0.018
0.017
0.053
fraction
zero
0.246
1
1
1
0.884
0.964
0.002
MANKIND QUARTERLY 2016 56:3
312
11.2.4. S factor
For the S factor analyses, we used the same sun factor as mentioned before.
Results for best subset selection are shown in Table 38. The results for
European% are stranger yet. Surely, no plausible model of state differences in
socioeconomic well-being posits European ancestry as a strong negative
determinant. Lasso regression results are shown in Table 39.
Table 38. Top 5 models from best subset selection for predicting socioeconomic
(S factor) scores for the states of the US.
Temp
Rain
Hum.
mor.
Hum.
after.
Sun
Para.
Euro
%
adj.
R2
Model
#
-0.647
0.450
-0.251
0.284
-1.032
-0.739
0.662
122
-0.708
0.481
-0.231
0.330
0.107
-1.008
-0.724
0.656
127
-0.696
0.386
0.140
-1.128
-0.804
0.653
107
-0.782
0.441
0.228
0.161
-1.079
-0.773
0.651
124
-0.751
0.488
-1.173
-0.840
0.647
73
Table 39. Lasso regression results for socioeconomic (S factor) scores among
the states of the US. 500 runs. Sun factor introduced.
Statistic
Temp.
Rain
Hum.
mor.
Hum.
after.
Sun
Para.
Euro
%
mean
-0.434
0.136
-0.061
0.215
0
-0.612
-0.268
median
-0.448
0.148
-0.066
0.222
0
-0.639
-0.293
sd
0.056
0.064
0.034
0.027
0
0.110
0.112
fraction zero
0
0.104
0.108
0
1
0
0.102
In contrast to the analysis for cognitive ability, lasso regression did not show
that parasite load was a redundant predictor. In fact, the only predictor found to
be consistently redundant was the sun factor. Strangely, European% still had a
negative beta. The unexpected results, with regards to European ancestry, in this
section are the reason why we undertook a second study of the S factor across
the US. This, however, did nothing to change the results. Since the reliability of
the S factor across datasets is so high (r=.961), the strange results are unlikely
to be due to measurement error with respect to the S factor. More likely, the
results have to do with the strong relationship between European% and parasite
load (r=-.87). This matter will be discussed further in a later part of this paper.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
313
11.3. Sovereign nations
We repeat the sovereign national analyses from Section 10 using automatic
modeling.
11.3.1. Data sources
The variables were explained in Section 10.
11.3.2. Cognitive ability
Table 40 shows the best subset results for cognitive ability scores. Both
European% and parasite load were found to be consistently important predictors.
In fact, the top 1-124 of 511 models included both European% and parasite load.
The adjusted R2values are very high, suggesting that the distribution of cognitive
ability is well predicted by these variables. Lasso regression results are shown in
Table 41.
Table 40. Automatic modeling results for cognitive ability scores and sovereign
nations. Top 5 models. Cold dem. = cold demand, hot dem. = hot demand, Infec.
dis. = infectious diseases.
Cold
dem.
Hot
dem.
Infec
dis.
Para.
Tourism
Anglo
Tax
haven
Euro
%
Gen. dist.
SA
adj R²
Model
#
0.167
-0.472
-0.578
0.455
0.845
173
0.189
-0.477
-0.249
0.464
0.844
168
0.160
-0.105
-0.416
-0.562
0.423
0.844
298
0.168
-0.492
-0.154
0.453
0.842
314
0.129
-0.165
-0.345
0.214
0.401
0.842
423
Table 41. Lasso regression results for cognitive ability scores and sovereign
nations. 500 runs.
Statistic
Cold
dem.
Hot
dem.
Infec.
Dis.
Para.
Tourism
Anglo
Tax
haven
Euro
%
Gen.
dist.
mean
0.153
0
-0.090
-0.258
0
0
0
0.336
0
median
0.154
0
-0.090
-0.258
0
0
0
0.336
0
sd
0.008
0
0.004
0.016
0
0
0.003
0.018
0
fraction
zero
0
1
0
0
1
1
0.996
0
1
MANKIND QUARTERLY 2016 56:3
314
The results for lasso regression are similar to those for best subset, at least
with respect to European ancestry. A fairly large number of predictors were found
to be non-redundant for most analyses. For tax haven status, the methods did
produce substantially divergent results. Best subset selection thinks that it is a
strong negative predictor, while lasso finds that it is redundant in 99.6% of the
runs.
11.3.3. Socioeconomic (S factor) scores
Table 42 shows the best subset results for S. In contrast to the results for
cognitive ability, genetic distance from South Africa seems to be a useful
predictor. Likewise, Anglo seems to be associated with higher S, while strangely,
tourism is not. Both European% and parasite load were useful predictors as
before. Lasso regression results are shown in Table 43.
Table 42. Automatic modeling results for socioeconomic (S factor) scores and
sovereign nations. Top 5 models according to R2adj.
Cold
dem.
Hot
dem
Infec.
Dis.
Para.
Tourism
Anglo
Tax
haven
Euro
%
Gen.
dist.
SA
adj.
R2
Model
#
0.173
0.228
-0.528
0.990
0.320
0.450
0.888
410
0.186
0.228
-0.508
0.147
0.930
0.329
0.465
0.885
483
0.179
0.228
-0.079
-0.472
1.012
0.299
0.407
0.884
474
0.193
-0.558
1.203
0.372
0.555
0.884
354
0.182
0.226
-0.519
0.977
0.107
0.326
0.446
0.884
485
Table 43. Lasso regression results for socioeconomic (S factor) scores and
sovereign nations. 500 runs.
Statistic
Cold
dem.
Hot
dem.
Infec.
Dis.
Para.
Tourism
Anglo
Tax
haven
Euro
%
Gen.
dist.
SA
mean
0.190
0.028
-0.140
-0.317
0.003
0.788
0
0.348
0.037
median
0.187
0
-0.141
-0.304
0
0.788
0
0.350
0
sd
0.007
0.045
0.017
0.029
0.011
0.060
0
0.012
0.073
fraction
zero
0
0.528
0
0
0.936
0
1
0
0.682
The lasso results show a stark contrast for the genetic distance predictor,
which had a mean beta of only .04 and was found to be redundant in about 70%
of runs. Tourism was not found to be a useful predictor for S using lasso
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
315
regression, while Anglo was. Both infectious disease and parasite load were
useful predictors, which is not too surprising as these variables have a part whole
relation with respect to the S factor. Cold demand had positive but somewhat
weak predictive power. European% continued to be a useful predictor in all runs.
12. Units of unequal size: using weights
So far, we have used weighted correlations/regressions whenever possible.
The rationale is that countries/states/departments with populations of, let's say,
500,000 should not be treated as equally important as ones with populations ten
times the size (Hunt & Sternberg, 2006). However, there is a question of which
weighting method should be utilized. Clearly, it should be based on either
population or some transformation of population. We have used the square root
of population so far in this paper, but one could also use a logarithmic
transformation or no transformation at all. Since only the relative size matters,
one can rescale the weights using each method to have a mean of 1. Figure 22
shows density curves of the relative weights using three methods for assigning
weights in the dataset of sovereign nations. Similarly, Table 44 shows some
descriptive statistics for the weights by method.
Figure 22. Density plot of relative weights by method.
MANKIND QUARTERLY 2016 56:3
316
Table 44. Descriptive statistics of weighing methods by population size.
method
mean
median
sd
min
max
range
skew
kurtosis
max/min
no weights
1
1
0
1
1
0
1
log
1
1.04
0.16
0.72
1.29
0.58
3.34
-1.03
1.80
sqrt
1
0.72
1.13
0.06
5.10
5.04
-0.20
3.99
78.67
untransformed
1
0.23
2.33
0.00
11.61
11.61
2.01
11.08
6189
In short, using log transformed values makes the weights relatively
egalitarian: the ratio of largest to smallest value is only 1.80. Using untransformed
weights makes the weights extremely unequal: a max-min ratio of over 6,000.
The largest country is the United States, with a population of about 320 million,
and the smallest is Saint Kitts and Nevis with a population of about 50,000.
Square root weighting produces intermediate results with a max-min ratio of about
80. We decided to use this latter method as our primary one, since it strikes a
reasonable balance between taking the effect of population size into account and
not obscuring the effects of individual units, and because it is similar to the
standard error often used in meta-analysis. Nonetheless, in Tables 45 and 46 we
present the main results – the correlations with European ancestry – for cognitive
ability and S respectively for each method of weighing.
Table 45. Correlations between European ancestry and cognitive ability scores
by different methods of weighing.
Method
Mexico
USA
Brazil
Colombia
Sovereign
All
units
Mean
(no weights)
0.510
0.668
0.736
0.824
0.742
0.819
0.716
log
0.514
0.663
0.736
0.822
0.750
0.813
0.716
sqrt
0.522
0.635
0.729
0.808
0.770
0.781
0.708
untransformed
0.491
0.610
0.701
0.795
0.736
0.759
0.682
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
317
Table 46. Correlations between European ancestry and S by different methods
of weighing.
Method
Mexico
USA
Brazil
Colombia
Sovereign
All
units
Mean
(no weights)
0.642
0.437
0.767
0.615
0.485
0.794
0.623
log
0.647
0.431
0.768
0.605
0.525
0.797
0.629
sqrt
0.669
0.393
0.763
0.531
0.702
0.803
0.643
untransformed
0.666
0.367
0.739
0.425
0.743
0.795
0.623
As can be seen, the exact method chosen does not matter much except for
the S analysis in regards to sovereign nations. The reason for this was discussed
earlier.
13. Race~Cognitive ability-S: does cognitive ability mediate the relationship
between racial ancestry and S?
As mentioned in Section 2, according to the R~CA-S model, the primary
route of statistical relationship between racial ancestry and socioeconomic
outcomes runs through cognitive ability. Stated in simpler language, according to
this model, there is nothing special about Europeans compared to Africans and
Amerindians with regards to building better societies, except that the former are
smarter on average. One can test this model with our dataset by checking if there
is a relationship between S and ancestry, controlling for cognitive ability.
According to the R~CA-S model, this relationship should be small or nonexistent.
Is it?
Technically, there are various ways one could try to determine this. One
could enter both cognitive ability and European ancestry into multiple regressions,
with S as the dependent variable, but because the predictors correlate so highly
and are causally related, doing so would not produce readily interpretable results.
Another option is to use partial correlations. However, using partial correlations
would regress out the effect of cognitive ability on S and on European ancestry.
Doing the latter does not make sense. A third option is to use semi-partial
correlations. Here, one regresses out the effect of the controlling variable only on
one variable of a pair and then correlates the residuals of the one variable with
the other variable. In our case, we want to regress out the effect of cognitive ability
MANKIND QUARTERLY 2016 56:3
318
on S and then see if European ancestry can predict the remaining variance in S.
This is the method we will use. Table 47 shows the results.
Table 47. Weighted (semi-partial) correlations of European ancestry (Euro%)
and socioeconomic development (S), without and with controlling for cognitive
ability scores (CA). The capital districts were excluded from all datasets. 95%
confidence intervals in brackets. Confidence intervals generated by bootstrapping
with 1000 replications.
Dataset
rEuro%-S
rEuro%-S, CA controlled
Sample size
US
.39
-.09 [-.36; .21]
49
Mexico
.67
.42 [.13; .69]
31
Brazil
.76
.28 [.05; .51]
26
Colombia
.53
.01 [-.28; .31]
32
Sovereign nations
.70
.07 [-.14; .28]
35
Countries and
states/districts
.80
.22 [.11; .34]
169
The correlations in the left column merely recapitulate those presented in
earlier sections. As shown in the center column, except in the case of Mexico,
cognitive ability scores explain the major portion of the positive association
between European ancestry and S.
14. International colorism?
Within countries, it has often been found that cognitive ability, income and
other socioeconomic measures correlate positively with lighter skin color (M.
Hunter, 2007), which our measure of skin reflectance indexes. The usual
explanation offered for this state of affairs is color based discrimination: the
colorism hypothesis. Kinship studies have disconfirmed a strong version of this
explanation. The predominant portion of outcome variance, in mixed populations,
has been found to be between families. Little variance has been found between
biological siblings within families, even though such siblings differ substantially in
color (e.g., Mill & Stein, 2012). This indicates that family background is the major
cause of the color-outcome association. An alternative test of colorism would be
to look at the semi-partial correlations between color or skin reflectance and
outcomes, controlling for genomic ancestry. While this has been done on the
individual level in conjunction with Ruiz-Linares et al. (2014), results have yet to
be published (K. Adhikari, personal communications, November 04, 2014).
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
319
One might wonder if color or skin reflectance is associated with outcomes
net of genomic ancestry on the national level. Our data for sovereign countries
have the three necessary components to test this hypothesis: skin reflectance,
genomic ancestry, and some relevant dependent variable (in our case S and
cognitive ability). The correlations are shown in Table 48. We used semi-partial
correlations to control for European ancestry. As can be seen, regional-level
genomic ancestry mediates most of the association between reflectance and
outcomes.
Table 48. Results for international colorism hypothesis: correlation of skin
reflectance (light skin) with outcome variables. N=35 sovereign countries.
Variable
Skin reflectance
Skin reflectance controlling for
European ancestry
Cognitive ability
.62
.18
S
.60
.19
We might likewise ask if the associations of self-identified race and ethnicity
(SIRE) with outcomes are also mediated by genomic ancestry. Individual level
results, with respect to socioeconomic outcomes, have been reported (Ruiz-
Linares et al., 2014) for five Latin American countries. They indicate that
independent of genomic ancestry SIRE is only weakly associated with outcomes.
Results for the national level analysis are shown below in Table 49.
Table 49. Results for international culturalism hypothesis: correlations of self-
identified ethnic European identity with outcome variables. N=35 sovereign
countries.
Variable
European identity
European identity controlling for
European ancestry
Cognitive ability
.70
.08
S
.52
.08
Generally, genomic ancestry statistically explains most of the associations
between skin reflectance and outcomes and between SIRE and outcomes.
15. Other measures of cognitive ability and human capital
We have high confidence in our socioeconomic variable since it is based on
solid sources and numerous variables. This is not the case for our cognitive index
MANKIND QUARTERLY 2016 56:3
320
on the national level. While many of the scores were based on well-vetted
international achievement tests, others, particularly for the small Caribbean
nations, were based on less reliable indexes, such as results from US university
exams. As a robustness check, we examine whether the previously documented
positive relationship between European ancestry and cognitive ability scores
holds when using other measures. This section is concerned with sovereign
countries only as these other measures are not available for states or districts
within countries.
15.1. Measures
A wide variety of measures were sought, to preclude claims that we
employed only narrow measures of cognitive ability and skills:
15.1.1. Academic achievement (ACH)
This is the same as that used throughout the paper. See 7.1.2 for more
details.
15.1.2. Lynn and Meisenberg’s (2015) IQ dataset (IQ_L15)
This is an early version of Lynn and Meisenberg’s (2015) IQ dataset (G.
Meisenberg, personal communications, April, 06, 2015). Three points are worth
noting: First, Richard Lynn has rejected some of the values calculated by Gerhard
Meisenberg for various reasons, so some of the scores presented in Lynn and
Meisenberg's future compendium may be different from the ones we use here.
Second, we altered several of G. Meisenberg’s (April, 06, 2015) values. These
were Peru (94 to 90, added new data and removed some scores calculated based
on a Mexican sample), Cuba (84 to 86; removed a mentally disabled sample), El
Salvador (deleted, sample size below 100), Bolivia (deleted, sample size below
100), Barbados (93 to 87, removed a redundant source), Dominican Republic
(deleted, sample size below 100). Third, this dataset is still under construction.
Supplementary File 7 contains the national IQs provided to us along with our final
2015 national IQs.
15.1.3. Average achievement and IQ (ACH+IQ)
This is the average of the academic achievement scores and Lynn and
Meisenberg’s (2015) IQ scores (i.e., the previous two variables).
15.1.4. Lynn and Vanhanen’s (2012) IQ dataset (IQ_LV12)
This is the widely used and latest published list of national IQs by Richard
Lynn (Lynn & Vanhanen, 2012).
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
321
15.1.5. Altinok’s educational quality dataset (Altinok)
This is an alternative international achievement dataset calculated using a
different method than that employed by Lynn and Meisenberg (2015). Data from
(Altinok, Diebolt, & Demeulemeester, 2014)
15.1.6. GRE scores (GRE_Total)
These were the GRE (Graduate Record Examination) by citizenship scores
from ETS’s report, Snapshot of the Individuals Who Took the GRE® revised
General Test (2011-2012; 2012-2013; 2014-2015). We used the sum of
quantitative and verbal scores.
15.1.7. GMAT scores (GMAT)
These were the GMAT (Graduate Management Admission Test) by
citizenship scores from GMAC’s 2001 to 2012 Profile of GMAT Candidate
Executive Summary reports. We computed n-weighted average GMAT scores.
Note that the GMAT is used by 5,900 business programs at 2,100 universities
worldwide. While the test is given in English, it is designed to be as minimally
English dependent as necessary to predict successful completion of Business
programs taught in English.
15.1.8. Mean years of schooling in 2013 (YearsofSchool13)
This is the average number of years of education which individuals aged ≥
25 years are estimated to have. The data came from the Human Development
Index dataset (http://hdr.undp.org/en/content/mean-years-schooling-adults-
years).
15.1.9. Scientific papers per capita (SciPapers)
This is based on the World Bank's reported number of scientific papers
published per country, between 2005 and 2014. We divided the numbers by the
national populations to derive a per capita estimate. The data were extremely
skewed and some countries had 0. To get a more normal distribution, we changed
the 0 values to the smallest value and took the log which gave a satisfactory
result. The original dataset is available at: http://data.worldbank.org/indicator/
IP.JRN.ARTC.SC/countries.
15.1.10. Fraction of GDP spent on research and development (R&D)
This is the fraction of each country's GDP that is spent on research and
development (R&D). The data are from the Democracy Ranking dataset
(http://democracyranking.org/).
MANKIND QUARTERLY 2016 56:3
322
15.1.11. Math Olympics (MathOlympiad)
These are the average rankings based on International Math Olympiads.
2000 to 2014 national rankings were averaged. Because a smaller ranking is
better, we reversed this variable (multiplying by -1).
15.2. Factor analysis (G)
A country-level general cognitive factor (Rindermann, 2007) was extracted
from the non-overlapping variables. The factor was extracted using the least
squares method and scored using Bartlett's method. Figure 23 shows the
loadings plot.
Figure 23. International general cognitive factor, G.
15.3. Correlations
Table 50 shows the correlations among the cognitive measures. They all
show medium to strong intercorrelations, which forms the basis of the
international G factor (Rindermann, 2007),and all cognitive measures have
substantial correlations with European%. One of the weakest correlates, with
regards to both European% and the other cognitive measures, is R&D as a
fraction of GDP. This is probably because this variable is influenced more by
national policy than by mean cognitive ability. Generally, the results seem to be
robust across measures of cognitive ability.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
323
MANKIND QUARTERLY 2016 56:3
324
16. The S factor in the American sample
As a robustness check, we examined whether the S factor previously
extracted from samples of N=132 and N=115 countries was structurally similar in
the American sample (N=35). We did this by extracting the first factor from three
different datasets: the combined Social Progress Index and Democracy Ranking
datasets (96 indicators in total, N=18 with complete data), the Social Progress
Index dataset (54 indicators, N=18 with complete data) and the Democracy
Ranking dataset (42 indicators, N=23 with complete data). Analysis showed that
results were unstable across scoring methods (using the FA_all_methods()
function from the kirkegaard package).
8
Since a previous study had shown that
Bartlett's method works even when there are many more indicators than cases
(Kirkegaard, 2015d), we used this method in all analyses. Table 51 shows the
intercorrelations of the factor scores.
Table 51. Intercorrelations between S factor scores. Soc Progr, score from 54
variables in Social Progress dataset; Democracy, score from 42 variables in
Democracy dataset; SP + D, SocProgr and Democracy combined, 96 variables.
Weighted correlations below the diagonal. N's = 18-23 sovereign countries.
S_rescaled
SP + D
SocProgr
Democracy
S_rescaled
0.98
0.96
0.99
SP + D
0.97
0.98
0.99
SocProgr
0.94
0.98
0.95
Democracy
0.99
0.98
0.93
The correlations were very strong, especially for the unweighted results.
There was a slight difference between weighted and unweighted results. This
could have been due to the factor analysis process which does not use weights
to derive the covariance/correlation matrix. Regardless, there was no evidence
that the S factor structure was substantially different in the American sample as
compared to the worldwide sample.
Comparing factor scores is a more indirect method of comparing factor
structure similarity. Different sets of loadings can theoretically lead to the same
scores. Thus, we also investigated the factor loadings across samples. We
extracted the S factors using the full dataset for this purpose. The correlations
between loadings across datasets were .91, .93 and .84 for the 96, 56 and 42
indicator datasets respectively. It is often said that the most appropriate method
for comparing loadings is to use the factor congruence coefficient (Jensen, 1998,
8
The package is on Github: https://github.com/Deleetdk/kirkegaard
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
325
p. 99; Lorenzo-Seva & Ten Berge, 2006). For the same datasets, in all cases, the
coefficient was .93. A common rule of thumb for factor identity is ≥.95, which is
not quite met here. However, the samples were fairly small, so it is probable that
sampling error decreased the congruence coefficients (and the correlations)
somewhat.
16.1. Jensen's method applied to the European ancestry x S factor correlation
Our results show medium to strong correlations between European ancestry
and S scores at the international level. It is possible, however, that European
ancestry is not related to the latent S factor, but is solely related to one or more
group factors found in the data or to indicator specific (unique) variance. One can
use Jensen's method of correlated vectors (Jensen, 1998) to examine whether
European ancestry correlates more strongly with more S loaded variables. If so,
this would suggest that general factor differences explain the association. The
Jensen coefficients (MCV correlations) were .75, .69 and .85 across the three
datasets. The strength of these associations suggests that European ancestry is
substantially related to the underlying S factor. Figure 24 shows the scatter plot
for the analysis with all 96 S indicators.
Figure 24. Jensen's method scatter plot for 96 socioeconomic indicators and
European ancestry as criteria variable. Reversing is used. Variables with higher
loadings on the general socioeconomic factor tend to have higher correlations
with European ancestry.
MANKIND QUARTERLY 2016 56:3
326
17. The West Indies
In Section 7 we saw that West Indian countries were outliers when it came
to the association between S and European ancestry. Lynn and Vanhanen (2012)
had previously found similar results with respect to the association between
National IQ and various socioeconomic indicators. They attributed the large
positive residuals of West Indian countries to wealth gained from tourism. We test
this explanation by including a measure of per capita tourist spending in our
regression analysis. This measure was calculated using World Bank values for
tourist expenditure in 2010. We divided the expenditure values by the national
populations to compute per capita estimates. Because the island states in
question have small populations, we ran the regression analyses both with and
without weights. We also included a dichotomously coded European Union
classified tax haven variable. Table 52 shows the unweighted and Table 53 the
weighted results. In line with Lynn and Vanhanen’s conjecture, we find that when
adding per capita tourist expenditure and tax haven status, the model improves.
Table 52. Regression results for sovereign nations: tourism, tax haven and
European ancestry. N=35.
Model #
Euro%
Tax Haven
Tourist spending
adj. R2
1
0.48
0.212
2
0.16
-0.026
3
0.19
0.007
4
0.66
0.88
0.303
5
0.64
0.43
0.355
6
-0.23
0.25
-0.018
7
0.69
0.40
0.33
0.350
Table 53. Weighted regression results for sovereign nations: tourism, tax haven
and European ancestry. N=35.
Model #
Euro%
Tax_Haven
Tourist spending
adj. R2
1
0.939
0.477
2
-0.157
-0.030
3
0.775
0.040
4
1.006
1.196
0.493
5
0.965
0.911
0.560
6
-1.890
1.293
0.065
7
0.959
-0.138
0.948
0.546
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
327
17.1. West Indies territories
So far we have only presented results for sovereign nations and states.
However, the Caribbean is host to over a dozen tiny island territories.
9
These
territories do not have official HDI values, but more or less reliable estimates can
be constructed. For these territories, we used Avakov's (2012) HDI 2010
estimates, except for Puerto Rico, in which case we used Fuentes-Ramírez's
(2014) HDI 2012 estimate. Most of these territories lack genomic admixture data,
yet it was possible to create crude ancestry estimates using CIA Factbook SIRE
and ancillary data. More problematic was the poor quality of the cognitive data.
Estimates based on a combination of (1986 to 2014) GMAT
10
scores, CXC scores
(discussed below), Lynn's (2012/2015) IQ scores and other sources are shown in
Table 54.
11
Detailed computations are provided in Supplementary File 6. The
same file provides alternative estimates for the Caribbean nation states. It needs
to be emphasized again that, except in the case of the US Virgin Islands and
Puerto Rico, the quality of data is very poor. This was one of the reasons why
these territories were not included in the previous analyses – another being that
the populations are often minute.
Table 54. ACHQ scores for West Indian territories.
Nation
ACHQ
Data sources
Anguilla
74.3
CXCQ
British Virgin Islands
76.1
GMATQadj + CXCQ
Cayman Islands
85.7
GMATQadj + IQ + GCSE/CXCQ
Montserrat
79.9
CXCQ
Netherlands Antilles
84.4
GMATQadj + Lynn + OtherQ (Cito) + CXCQ
Turks & Caicos
79.3
Lynn + CXCQ
US Virgin Islands
72.3
GMATQadj + LYNN + Other (SAT+NAEP)
Bermuda
87.3
GMATQadj + LYNN + OtherQ (TerraNova+GED
scores)
Martinique
84.9
GMATQadj+ OtherQ (French Literacy exam)
Guadeloupe
83.7
GMATQadj+ OtherQ (French Literacy exam)
French Guiana
86.6
OtherQ (Based on French Literacy exam)
Aruba
76.8
GMATQadj
Puerto Rico
78.9
ACHQ
9
The non-sovereign Caribbean islands usually have the status of “overseas territory” or
something similar. Sometimes they constitute departments or comparable entities.
10
For several of the territories, we had to use older GMAT scores since they ceased to
be listed in later Profile reports.
11
For Puerto Rico, the ACHQ value was 78.9 and the ACHQ+IQ value was 81.4. We
used the former value since it was already in our dataset.
MANKIND QUARTERLY 2016 56:3
328
To get an idea of where these territories are located in the American context,
we show the scatter plots for European% and cognitive ability scores (as
operationalized above) and European% and S scores (Figs. 25 and 26). For
cognitive ability, the weighted correlation became negative. This result is mainly
driven by Puerto Rico, which has a relatively large population as compared to the
other territories, a relatively low cognitive score, and yet a relatively high
percentage of European admixture (64%).
Figure 25. European ancestry and cognitive ability scores for countries and
states/districts. Most of the “other non-sovereign” territories are Caribbean
islands.
Figure 26. European ancestry and socioeconomic (S factor) scores for countries
and states/districts.
FUERST, J. & KIRKEGAARD, E.O.W. ADMIXTURE IN THE AMERICAS
329
Clearly, the territories have higher S scores than expected from a simple
R~CA-S model. We might wonder if including per capita tourist expenditure and
tax haven status can, as before, salvage the model. Thus we repeat the analyses
above for the combined sample of territories and sovereign nations. Table 55
shows the beta matrix for results without weights. Table 56 shows the same but
with weighted results.
Table 55. Regression models for European ancestry, Tax haven status and
tourist spending for sovereign nations and territories. Dependent: socioeconomic
(S-factor) scores.
Model
Euro%
Tax Haven
Tourist spending
adj. R2
N
1
0.394
0.140
48
2
0.201
-0.013
48
3
0.311
0.052
47
4
0.559
0.764
0.226
48
5
0.492
0.459
0.273
47
6
-0.164
0.362
0.034
47
7
0.557
0.408
0.352
0.278
47
Table 56. Weighted regression models for European ancestry, Tax haven status
and tourist spending for sovereign nations and territories. Dependent:
socioeconomic (S-factor) scores.
Model
Euro%
Tax Haven
Tourist spending
adj. R2
N
1
0.808
0.440