ArticlePDF Available

Abstract and Figures

Analyses of the relationships between cognitive ability, socioeconomic outcomes, and European ancestry were carried out at multiple levels in Argentina: individual (max. n = 5,920), district (n = 437), municipal (n = 299), and provincial (n = 24). Socioeconomic outcomes correlated in expected ways such that there was a general socioeconomic factor (S factor). The structure of this factor replicated across four levels of analysis, with a mean congruence coefficient of .96. Cognitive ability and S were moderately to strongly correlated at the four levels of analyses: individual r=.55 (.44 before disattenuation), district r=.52, municipal r=.66, and provincial r=.88. European biogeographic ancestry (BGA) for the provinces was estimated from 25 genomics papers. These estimates were validated against European ancestry estimated from self-identified race/ethnicity (SIRE; r=.67) and interviewer-rated skin brightness (r=.33). On the provincial level, European BGA correlated strongly with scholastic achievement-based cognitive ability and composite S-factor scores (r's .48 and .54, respectively). These relationships were not due to confounding with latitude or mean temperature when analyzed in multivariate analyses. There were no BGA data for the other levels, so we relied on %White, skin brightness, and SIRE-based ancestry estimates instead, all of which were related to cognitive ability and S at all levels of analysis. At the individual level, skin brightness was related to both cognitive ability and S. Regression analyses showed that SIRE had little detectable predictive validity when skin brightness was included in models. Similarly, the correlations between skin brightness, cognitive ability, and S were also found inside SIRE groups. The results were similar when analyzed within provinces. In general, results were congruent with a familial model of individual and regional outcome differences.
Content may be subject to copyright.
MANKIND QUARTERLY 2017 57:4 542-580
542
Admixture in Argentina
Emil O.W. Kirkegaard
Ulster Institute for Social Research, London, UK
John Fuerst
Ulster Institute for Social Research, London, UK
*Corresponding author, email: emil@emilkirkegaard.dk
Analyses of the relationships between cognitive ability,
socioeconomic outcomes, and European ancestry were carried out
at multiple levels in Argentina: individual (max. n = 5,920), district (n
= 437), municipal (n = 299), and provincial (n = 24). Socioeconomic
outcomes correlated in expected ways such that there was a general
socioeconomic factor (S factor). The structure of this factor
replicated across four levels of analysis, with a mean congruence
coefficient of .96. Cognitive ability and S were moderately to strongly
correlated at the four levels of analyses: individual r=.55 (.44 before
disattenuation), district r=.52, municipal r=.66, and provincial r=.88.
European biogeographic ancestry (BGA) for the provinces was
estimated from 25 genomics papers. These estimates were
validated against European ancestry estimated from self-identified
race/ethnicity (SIRE; r=.67) and interviewer-rated skin brightness
(r=.33). On the provincial level, European BGA correlated strongly
with scholastic achievement-based cognitive ability and composite
S-factor scores (r's .48 and .54, respectively). These relationships
were not due to confounding with latitude or mean temperature when
analyzed in multivariate analyses. There were no BGA data for the
other levels, so we relied on %White, skin brightness, and SIRE-
based ancestry estimates instead, all of which were related to
cognitive ability and S at all levels of analysis. At the individual level,
skin brightness was related to both cognitive ability and S.
Regression analyses showed that SIRE had little detectable
predictive validity when skin brightness was included in models.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
543
Similarly, the correlations between skin brightness, cognitive ability,
and S were also found inside SIRE groups. The results were similar
when analyzed within provinces. In general, results were congruent
with a familial model of individual and regional outcome differences.
Key Words: Cognitive ability, Socioeconomic outcomes, Inequality,
SES, General socioeconomic factor, S-factor, Race, Ethnicity,
Ancestry, Biogeographical ancestry, Admixture, Skin color,
Argentina.
The causes of differences in social outcomes between countries, and
between ethnic groups within countries, represent two topics of active sociological
inquiry. Frequently, the two are analyzed independently with, for example,
climatic and environmental factors such as UV radiation (León & Burga-León,
2015), cold climate (Van de Vliert, 2013), and parasite load (Eppig, Fincher &
Thornhill, 2010) being investigated as causes of the former and with social
processes, such as racial discrimination and colorism (Rangel, 2014; Telles,
2014), being investigated as causes of the latter.
Some have proposed and explored a theoretically more parsimonious
familial model. According to this model individual-level cognitive ability (CA) or
human capital differences are transmitted across generations within families, and
are the cause of both inter-country and inter-ethnic differences (Easterly & Levine,
2016; Lynn, 2008, 2015; Putterman & Weil, 2010; Spolaore & Wacziarg, 2013).
Because the members of these countries and ethnic groups frequently do or
recently did form nations in the traditional sense (Latin Natio: stock, race, tribe),
this familial model predicts that ancestry will predict, to some degree, both inter-
country and inter-ethnic differences within multi-ethnic countries. Support for this
familial model was found by Putterman and Weil (2010), who concluded:
The overall finding of our paper is that the origins of a country’s population
more specifically, where the ancestors of the current population lived some 500
years ago matter for economic outcomes today. Having ancestors who lived in
places with early agricultural and political development is good for income today,
both at the level of country averages and in terms of an individual’s position within
a country’s income distribution. Exactly why the origins of the current population
matter is a question on which we can only speculate at this point. People who
moved across borders brought with them human capital, cultures, genes,
institutions, and languages. People who came from areas that developed early
evidently brought with them versions of one or more of these things that were
conducive to higher income. Future research will have to sort out which ones were
the most significant.
MANKIND QUARTERLY 2017 57:4
544
While Putterman and Weil looked at the effect of national-level ancestry on
outcomes both between and within countries, Fuerst and Kirkegaard (2016a,
2016b) and Kirkegaard, Wang and Fuerst (2017) examined the effect of
continental biogeographic ancestry (BGA) in the Americas. Mirroring Putterman
and Weil's results, the authors found that, at both the individual and regional
levels, European BGA, relative to African and Amerindian, was a robust predictor
of cognitive and/or socioeconomic outcomes (regional: mean correlations of .71
and .64, individual: .18). Furthermore, mediation analyses showed that the
relationship between regional European ancestry and better social outcomes was
to a large degree mediated by CA.
These findings are congruent with those of a large number of other studies
which have found robust and often strong relationships between CA and better
social outcomes at both the individual and aggregate levels (Carl, 2016;
Herrnstein & Murray, 1994; Kirkegaard, 2014, 2016a; Lynn & Vanhanen, 2012;
Strenze, 2007). On the other hand, and in agreement with León & Burga-León
(2015) and Van de Vliert (2013), respectively, the authors also found that absolute
latitude and/or cold climate were ecological predictors, which were robust to
controls for BGA. Given the results, Fuerst and Kirkegaard (2016b) proposed a
co-occurring model, and concluded that an accurate estimation of climatic effects
necessitates including ancestry as a control variable in analyses to avoid omitted
variable bias.
Fuerst and Kirkegaard (2016b) noted that their results needed further
replication, such as by analyzing patterns of regional differences within countries
not previously studied. While Fuerst & Kirkegaard (2016a) included Argentina as
a country in their pan-American analysis, they did not conduct a regional analysis.
Argentina, though, is suitable for this type of analysis, given the reasonable
number of major administrative units (24, including the capital district), the size of
the population (about 43 million as of 2015), and the diversity in ancestral
backgrounds. The primary purpose of this study was to examine how CA,
socioeconomic outcomes, and indexes of ancestry relate to each other in
Argentina, on both the individual and administrative unit level. The predictions of
the specific model being tested are that European ancestry and indicators of this,
including skin brightness and self-identified race/ethnicity (SIRE), will be
positively associated with cognitive and socioeconomic outcomes. Noting a
dearth of data on genetic ancestry, Fuerst and Kirkegaard (2016b) suggested that
SIRE and skin brightness data from the Latin American Public Opinion Project
(LAPOP; http://www.vanderbilt.edu/lapop/) and related surveys could be used in
lieu of genetic data. Thus, a secondary aim of this study was to determine whether
LAPOP CA, S, SIRE, and skin brightness data can be used to test the predictions
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
545
of the familial model. In this regard, it is expected that the ancestry, CA, and S
indices constructed from the LAPOP surveys will correlate with ones constructed
from other sources.
1. Individual level analysis
We begin by presenting the individual-level analyses. This is because some
of the aggregate-level analyses rely on aggregated data from the individual-level
analyses. All analyses were carried out in R. The scripts can be found in the
supplementary materials. Data sources are discussed below.
1.1. Cognitive variable
While the LAPOP surveys contain no traditional IQ test items (e.g., Raven’s
figures or synonym questions), individuals are asked several geopolitical
knowledge questions, which can be used to create knowledge factor scores. The
questions are as follows:
While the LAPOP surveys contain no traditional IQ test items (e.g.,
Raven’s figures or synonym questions), individuals are asked several
geopolitical knowledge questions, which can be used to create
knowledge factor scores. The questions are as follows:
gi1: What is the name of the president of the United States?
gi2: What is the name of the president of the Senate of Argentina?
gi3: How many provinces does Argentina have?
gi4: How long does the presidential term last in Argentina?
gix4: On which continent is Nigeria located?
gi5: What is the name of the president of Brazil?
The answers to some of the questions vary over time e.g., the president of
the US was not the same person in 2008 (George Bush) as in 2014 (Barack
Obama). It is possible that this changed the difficulty and discriminative ability of
the items. Not all items were given in all surveys. Table 1 shows the item pass
rates by year.
Table 1. Item Pass Rates by Year.
Year
gi1
gi2
gi3
gi4
gix4
gi5
mean
2008
0.83
0.39
0.50
0.86
0.80
0.67
2010
0.80
0.58
0.75
0.71
2012
0.76
0.86
0.81
2014
0.77
0.78
0.47
0.67
MANKIND QUARTERLY 2017 57:4
546
Except for gi2, all items had pass rates >.50, which indicates that there was
a lack of difficult items. Scoring such data so that it is comparable across survey
years is riddled with difficulty. For instance, a simple method such as taking the
mean score for each participant is problematic, as someone who participated in
the 2012 survey had to answer two items with a mean pass rate of .81, while a
person who participated in the 2008 survey had to answer 5 items with a mean
pass rate of .67. Thus, a mean score of .50 would not represent the same level
of ability for the two different years. Also, due to the distribution of items across
years, some item pairs never appear together. The correlation matrix, calculated
using hetcor from polycor package (Fox, 2016), is shown in Table 2.
Table 2. Item correlation matrix using latent correlations.
gi1
gi3
gi5
gix4
gi1
1.00
0.53
0.65
0.60
gi2
0.64
0.62
0.59
gi3
0.53
1.00
0.47
gi4
0.55
0.48
0.60
0.42
gi5
0.65
0.47
1.000
gix4
0.60
1.00
The holes in the correlation matrix mean that one cannot factor analyze it.
One solution to this problem is to combine items to remove these holes. Notice in
Table 2 that items gi5 and gix4 never appear together, but gi5 appears together
with every other item. Because of this, we can combine these two items without
any data loss and thereby remove all the holes in the matrix. (This approach is
not quite optimal because it combines data from items with dissimilar pass rates
of (.47 vs. .80) and somewhat different inter-item correlations (.65 vs. .60; .60
vs. .42).) This results in a new correlation matrix, which is shown in Table 3.
Table 3. Latent correlations after item combination.
gi1
gi2
gi3
gi3
gi45
gi1
1.00
0.64
0.52
0.58
0.65
gi2
0.64
1.00
0.62
0.52
0.59
gi3
0.52
0.62
1.00
0.62
0.47
gi3
0.58
0.52
0.62
1.00
0.60
gi45
0.65
0.59
0.47
0.60
1.00
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
547
As expected, the correlations between the new item, gi45, are similar to those
for the original items (gix4 and gi5). The new dataset was analyzed using item
response theory factor analysis. The 2-parameter normal (2PN) model was used.
In this model, item responses are modeled as being a function of item difficulties,
item discriminative abilities, and each person's latent ability score. There are both
more and less complicated models, but this one is the default used by the psych
package employed (Revelle, 2015). The advantage of this method is that it takes
into account the relative discriminative power (akin to factor loadings) and
difficulty (as indicated by pass rates) of items. For instance, a person answering
2 hard items correctly is given a higher score than a person answering 2 easy
items correctly (DeMars, 2010; Revelle, 2016). Figure 1 shows the discriminative
ability of items when the entire dataset is analyzed together. All items were useful
measures of latent CA, with gi2 being the best. The lack of items on the right side
means that the test has limited ability to discriminate between persons with above
average CA.
Figure 1. Discriminative ability of cognitive items.
Combining individual scores across years was difficult because of the
uneven distribution of items. After trying a number of methods, we found that
analyzing the data within years, treating missing data as incorrect answers, and
standardizing the scores within survey years produced the best results. We
experimented with methods by using the S scores (discussed in the next section)
MANKIND QUARTERLY 2017 57:4
548
as the criterion variable and by seeing which method resulted in the highest
criterion correlations. The benefit of standardizing the data within survey years is
that each survey has the same mean and standard deviation despite the uneven
item distribution. Still, the combined score distribution had a long left tail (skew: -
.79). This was mostly due to the lack of harder items and to the extremely easy
item set used in the 2012 survey.
1.2. Socioeconomic variables
The LAPOP questionnaires asked numerous questions about persons'
socioeconomic outcomes. From those, a set of 16 variables was identified:
Whether they feel safe [Likert]
Life satisfaction [Likert]
Educational attainment [number of years]
Family income [continuous]
Victim of crime in the last 12 months [dichotomous]
Ownership or availability of:
TV [dichotomous]
Refrigerator [dichotomous]
Landline phone [dichotomous]
Cell phone [dichotomous]
Cars [number]
Washing machine [dichotomous]
Microwave [dichotomous]
Motorcycle [dichotomous]
Drinking water in the home [dichotomous]
Indoors bathroom [dichotomous]
Computer [dichotomous]
The variables covered a fairly broad spectrum of socioeconomic outcomes,
though some topics were not covered, such as crimes committed and health.
About 3% of cells contained missing data. These were imputed using the VIM
package without noise (single imputation; Templ et al., 2015). Since some of the
items were dichotomous (owns a TV, phone etc.), it is preferable to analyze them
using a correlation matrix composed of latent (for pairs with one or two
dichotomous indicators) and Pearson correlations (for pairs with two continuous
indicators). However, using latent correlations means that one cannot regress out
the effect of age/sex/survey year on the indicators, though one can do this for the
final composite score (S score). The downside to this is that if age/sex/survey
year have differential impacts on the indicators, this will not be adjusted for.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
549
Because method variance may be large, we tried both methods, that is, using
regular factor analysis on the age/sex/survey year controlled indicators, and using
latent correlations + IRT on the discrete data, and then controlling the S scores
for age/sex/survey year.
To control for age/sex/survey year, we first fit a local regression (LOESS)
model for age to take non-linear effects into account. The residuals were then
regressed on sex and survey year in an OLS model. The residuals from the
second model were saved for further analysis. The reason to control for survey
year is that ownership of some items became more common over time (e.g.,
computer) for persons of the same socioeconomic level. Figure 2 shows the
loadings for the two analyses.
Figure 2. Socio-economic (S) factor loadings.
The loadings were stronger when using latent correlations. This is expected
because the use of Pearson correlations results in a downwards bias due to
dichotomization. The relative order of variables was quite similar, with a factor
congruence coefficient of .94 (Lorenzo-Seva & Ten Berge, 2006). Most loadings
were in the expected direction. However, feeling safe in one's residential area
and being a victim of crime had near-zero loadings. Based on the results above,
MANKIND QUARTERLY 2017 57:4
550
we decided to use the Pearson-based, age/sex/survey year corrected scores for
further analysis.
1.2.1. Robustness checks
Usually, factor analytic methods give near identical results, but in some rare
cases they diverge substantially. To check whether this was the case, we factor
analyzed the data using all possible combinations of method parameters (30
combinations) (Kirkegaard, 2016b). The median factor score correlation across
these was .999 indicating negligible method variance. A general factor extracted
from a dataset will be ‘flavored’ if the indicators in the sample are not
representative of all possible indicators (Carroll, 1993; Jensen, 1998). One can
measure the extent to which the indicator sampling impacts the scores by splitting
the indicators into two subsets at random, extracting the factors in each subset
and correlating the scores (Kirkegaard, 2016b). This was done 100 times. The
median intercorrelation was .66, indicating that it mattered to some degree which
indicators were used for the analysis, but not to the point where using the present
indicators would be indefensible. It is possible that the factor structure
substantially varies by year, perhaps due to changing social conditions. Thus, as
a robustness check, we analyze the S factor structure within years as well. The
factor loadings are shown in Figure 3. As can be seen, there was very little
variation between years. The mean factor similarity coefficient was .98.
Figure 3. Socio-economic indicator loadings across years.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
551
1.3. Demographic variables
The LAPOP questionnaire contained two relevant demographic variables:
SIRE: Blanca (White), Mestiza (Mestizo/Mixed White-Indigeneous), Indígena
(Indigeneous), Negra (Negro/Black), Mulata (Mulatto/Mixed White-Black), Otra
(Other).
Interviewer-rated skin brightness: 0-9 scale. (We reversed the scoring so that 9 =
lightest.)
The relationships between SIRE and skin brightness is shown in Table 4.
Despite the small sample sizes in some cases, the order was as one would
expect. Whites were the most light-skinned (7.93). Mestizos and Mulattoes fell
intermediate (6.73 and 6.29, respectively) along with “Others” (who presumably
are mostly admixed East Asians; 6.55). And finally, Indigenous (5.73) were
darker, followed by Blacks (4.73).
Table 4. Summary statistics for skin brightness by self-identified race/ethnicity
(SIRE) group.
SIRE
n
Mean
SD
Blanca
2845
7.93
1.26
Indigena
51
5.73
1.64
Mestiza
1246
6.73
1.48
Mulata
7
6.29
2.29
Negra
37
4.73
1.61
Otra
60
6.55
1.29
(not stated)
173
7.25
1.66
1.4. Relationship between CA and S
We expected there to be a strong relationship between cognitive ability (CA)
and socioeconomic status (S). This relationship should be stronger for the years
with higher quality CA data. The CA scores for 2008 were based on 5 items, while
those from 2012 were based on only 2 easy items. Figure 4 shows a scatterplot
of the relationships between CA and S by year. As can be seen, the regression
line was very similar across years. This does not necessarily mean that the
correlations were also similar, however. Table 5 shows the correlations between
the two variables by year. The size of the relationship is worth noting. Despite
being based on a 5-item test about political knowledge, the correlation between
CA and S still reached .52 in 2008. The correlations were lower for the other years
as expected due to the fewer items used, but even using 2 items resulted in a .33
MANKIND QUARTERLY 2017 57:4
552
correlation. We estimated test reliability from the IRT analyses and used this to
calculate the estimated correlation between CA and S (CA * S) if there was no
measurement error in CA. As expected, the reliabilities of the tests were fairly
low, especially for the 2-item version (.43). When corrected for the estimated
measurement error, the true score correlations were fairly strong, r’s .50 to .64.
The mean true score correlation (CA * S true) was .55, which was the same as
the estimate from the pooled data when that was corrected for measurement error
(based on the mean reliability).
Figure 4. Scatterplot of cognitive ability and socio-economic (S) factor. Total n =
3,902. Error bands = 95% analytic confidence intervals.
Table 5. Correlations between cognitive ability (CA) and socio-economic status
(S) by Year.
Year
CA
*
S
Reliability
CA
*
S
true
2008
0.52
0.67
0.64
2010
0.44
0.69
0.53
2012
0.33
0.43
0.50
2014
0.48
0.82
0.53
all
0.44
0.65
0.55
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
553
1.5. Relationship with demographic values
As mentioned in the introduction, it is well established that SIRE and
measures of racial ancestry are associated with both CA and S. A proposed
conceptual path model of these relationships is shown in Figure 5.
Figure 5. Conceptual path model.
In this model, factors associated with BGA are the ultimate (uncaused) cause
of social inequality (S) through three proposed mechanisms:
1. Through cognitive ability (BGA→CA→S path).
2. Through other human capital traits that aren't cognitive ability such as
personality (BGA→S path).
3. Through racial phenotype-based social processes of favoritism and
discrimination (paths involving Social processes node).
While (1) and (2) are human capital models based on meritocracy, (3) is not.
It would be possible to disentangle the causal network if one had cognitive,
socioeconomic, racial phenotype, and genomic ancestry variables. However, the
present dataset lacks the fourth. As such, social processes models cannot be
disentangled from human capital ones. Nonetheless, SIRE mediated social
processing models, specifically, can be tested by including SIRE and racial
phenotype jointly as predictors of CA and S. If there are social processes related
to SIRE independent of racial phenotypes, SIRE should have incremental
predictive validity (assuming no measurement error in the other predictors;
MANKIND QUARTERLY 2017 57:4
554
Westfall & Yarkoni, 2016). If, on the other hand, SIRE is a non-causal proxy for
factors related to racial phenotype and genetic ancestry, SIRE should have none.
1.6. Relationship between skin brightness, SIRE, CA, and S
There are two main methods one can use to analyze the relative predictive
validity of skin brightness and SIRE. First: subsetting the data by SIRE and
examining the relationships within each subset. Second: including both in a
regression model. The correlations between the variables across all years are
shown in Table 6. As expected, skin brightness was weakly to moderately
associated with both CA and S, as had been found in other studies (Lynn, 2002).
This finding is consistent with both classes of models. If such relationships are
due to social processes related to SIRE, they should disappear when data are
analyzed within SIRE groups. Table 7 shows the within SIRE correlations.
Table 6. Correlations between cognitive ability, S, and skin brightness.
Correlations corrected for measurement error in cognitive ability are shown above
the diagonal, with reliabilities shown in the diagonal.
CA
S
Skin brightness
CA
0.65
0.55
0.20
S
0.44
1.00
0.26
Skin brightness
0.16
0.26
1.00
Table 7. Correlations [with 95% confidence intervals] between cognitive ability
(CA), socio-economic status (S) and skin brightness (SB) within SIREs.
Correlations were corrected for measurement error in cognitive ability.
SIRE
SB * CA
SB * S
CA * S
n
Blanca
0.21 [0.17 - 0.24]
0.20 [0.17 - 0.24]
0.55 [0.53 - 0.57]
3869
Indigena
-0.09 [-0.36 - 0.19]
0.34 [0.07 - 0.56]
0.58 [0.41 - 0.71]
77
Mestiza
0.12 [0.06 - 0.17]
0.22 [0.17 - 0.27]
0.52 [0.48 - 0.55]
1596
Mulata
-0.29 [-0.86 - 0.59]
0.65 [-0.21 - 0.94]
-0.21 [-0.65 - 0.34]
15
Negra
0.03 [-0.30 - 0.35]
0.09 [-0.24 - 0.41]
0.38 [0.12 - 0.59]
54
Otra
0.20 [-0.05 - 0.43]
0.49 [0.26 - 0.66]
0.47 [0.27 - 0.63]
73
(not stated)
0.22 [0.08 - 0.36]
0.35 [0.21 - 0.47]
0.56 [0.47 - 0.65]
236
Only three of the SIRE groups had sufficient data to allow for a reasonable
estimation of effect size. Within all three, the relationships between skin
brightness and the two outcomes were positive, as was the relationship between
CA and S. We furthermore calculated the weighted mean correlations across
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
555
SIRE groups. These were .18, .22 and .54 for SB * CA, SB * S, and CA * S,
respectively. Thus, these relationships were not due to confounding with SIRE or
any associated social processes. To establish the baseline validity of SIRE, a
model with only that predictor was fit. Secondly, to examine whether SIRE and
SIRE-related effects had incremental validity above skin brightness, a model was
fit with both predictors. Both models were fit with both CA and S as outcome
variables. Finally, a model was fit with S ~ SIRE + skin brightness + CA. In this
last scenario, skin brightness reflects social processing effects related to racial
phenotypes and non-cognitive ability human capital traits. Unfortunately, due to
the large measurement error in the CA variable, the results are not strongly
informative because measurement error in one predictor results in its validity
being spread to correlated predictors (Westfall & Yarkoni, 2016). Results are
shown in Tables 8-12. The R2 values reported, denoted as R2-cv, for the
regression models were obtained by 10-fold cross-validation to protect against
overfitting (James et al., 2013).
Table 8. Regression Model for cognitive ability with dummy-coded SIRE groups
as a predictor. Blanca is the omitted control. N = 5,920; R2 = .017; R2-cv = .014;
CI = 95% analytic confidence intervals.
Predictor
Beta
SE
CI lower
CI upper
SIRE: Indigena
-0.56
0.11
-0.78
-0.33
SIRE: Mestiza
-0.15
0.03
-0.21
-0.09
SIRE: Mulata
-0.66
0.26
-1.17
-0.16
SIRE: Negra
-0.68
0.14
-0.95
-0.42
SIRE: not stated
-0.38
0.07
-0.51
-0.25
SIRE: Otra
-0.34
0.12
-0.57
-0.11
Table 9. Regression model for cognitive ability with dummy-coded SIRE groups
and skin brightness as predictors. Blanca is the omitted control. N = 4,419; R2 =
.032; R2-cv = .027; CI = 95% analytic confidence intervals.
Predictor
Beta
SE
CI lower
CI upper
SIRE: Indigena
-0.27
0.14
-0.55
0.00
SIRE: Mestiza
0.03
0.04
-0.05
0.10
SIRE: Mulata
-0.57
0.37
-1.30
0.16
SIRE: Negra
-0.19
0.17
-0.52
0.14
SIRE: not stated
-0.29
0.08
-0.44
-0.14
SIRE: Otra
-0.24
0.13
-0.50
0.01
Skin brightness
0.15
0.02
0.12
0.18
MANKIND QUARTERLY 2017 57:4
556
By comparing Tables 8 and 9 it can be seen that SIRE alone has some
validity (6/6 CIs don't overlap 0), which corresponds to a combined standardized
beta of .13. (These equivalent effect sizes were calculated by converting the
regression into an analysis of variance, and then calculating eta squared for the
predictors. Eta squared corresponds to r2, and thus one can take the square root
to get a measure akin to the standardized beta.) However, when SIRE was
entered together with skin brightness, it had little detectable validity (5/6 CIs
overlap 0) with a combined effect size equivalent to a standardized beta of .08.
Table 10. Regression model for socio-economic status (S) with dummy-coded
SIRE groups as predictor. Blanca is the omitted control. N = 5,920; R2 = .043; R2-
cv = .040; CI = 95% analytic confidence intervals.
Predictor
Beta
SE
CI lower
CI upper
SIRE: Indigena
-0.71
0.11
-0.93
-0.49
SIRE: Mestiza
-0.39
0.03
-0.45
-0.34
SIRE: Mulata
-0.43
0.25
-0.93
0.06
SIRE: Negra
-0.74
0.13
-1.00
-0.47
SIRE: not stated
-0.45
0.07
-0.57
-0.32
SIRE: Otra
-0.45
0.12
-0.67
-0.22
Table 11. Regression model for socio-economic status (S) with dummy-coded
SIRE groups and skin brightness as predictors. Blanca is the omitted control. N
= 4,419; R2 = .074; R2-cv = .071; CI = 95% analytic confidence intervals.
Predictor
Beta
SE
CI lower
CI upper
SIRE: Indigena
-0.14
0.14
-0.41
0.13
SIRE: Mestiza
-0.13
0.04
-0.20
-0.06
SIRE: Mulata
0.15
0.36
-0.56
0.87
SIRE: Negra
-0.19
0.16
-0.51
0.13
SIRE: not stated
-0.19
0.08
-0.34
-0.04
SIRE: Otra
-0.23
0.13
-0.48
0.02
Skin brightness
0.24
0.02
0.21
0.27
Likewise, by comparing Tables 10 and 11 it can be seen that SIRE has some
validity for S (5/6 CIs don't overlap 0, equivalent to a std. beta of .21), but much
less when entered together with skin brightness (4/6 CIs overlap 0, equivalent
std. beta of .07). The two SIRE effects that kept some detectable validity had their
betas strongly reduced (by 67% and 58%, respectively for Mestiza and ‘not
stated’). Finally, in Table 12, it can be seen the results hold when CA is also
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
557
entered as a predictor. A pure CA human capital model predicts the effect of skin
brightness should be zero when CA is controlled. However, as mentioned before,
the CA measure was poor, which results in imperfect control and thus some
residual validity of non-causal correlates is expected. In this case, the validity of
skin brightness was reduced by 25% (.24 to .18). As above, Mestiza continued to
not overlap with 0. Aggregating over the different SIREs, the effect size
attributable to SIRE, skin brightness and CA was equivalent to correlations of .06,
.16 and .38, respectively. Thus, whatever social processes related to SIRE exist,
their direct effect on social inequality is very small.
Table 12. Regression model for socio-economic status (S) with dummy-coded
SIRE groups, skin brightness and cognitive ability as predictors. Blanca is the
omitted control. N = 4,419; R2 = .220; R2-cv = 0.210; CI = 95% analytic confidence
intervals.
Predictor
Beta
SE
CI lower
CI upper
SIRE: Indigena
-0.03
0.13
-0.28
0.22
SIRE: Mestiza
-0.14
0.03
-0.20
-0.08
SIRE: Mulata
0.37
0.34
-0.28
1.03
SIRE: Negra
-0.11
0.15
-0.41
0.18
SIRE: not stated
-0.08
0.07
-0.21
0.06
SIRE: Otra
-0.13
0.12
-0.36
0.09
Skin brightness
0.18
0.01
0.15
0.21
CA
0.39
0.01
0.36
0.41
1.7. Robustness check: Within-province relations between skin brightness and
outcomes
As a robustness check, we calculated the within province correlations for skin
brightness, CA, and S. The correlations were somewhat smaller, with weighted
mean correlations of .14, .19 and .51 for CA * SB, S * SB, and CA * S, respectively
(corrected for measurement error). These reductions do not necessarily mean
that provincial effects were responsible because there was likely reduced
variation within provinces for the variables. This analysis was not a primary
interest and so the matter was not explored further, though more detailed results
can be found in the supplementary files.
2. Province-level analyses
Argentina has 24 provinces. LAPOP reports the province for each person, so
it is possible to aggregate the individual-level data by province to form additional
MANKIND QUARTERLY 2017 57:4
558
province-level variables. In the following, as a weighting scheme, we used the
square root of the province’s population size unless otherwise noted. For a
statistical justification for this weighting scheme, see Fuerst and Kirkegaard
(2016a). Results based on other weighting schemes can be found in the
supplementary files.
2.1. S factor analysis
As in previous studies (Fuerst & Kirkegaard, 2016a; Kirkegaard, 2016a), we
collected a broad set of socioeconomic variables concerning provinces from a
statistical agency and subjected them to factor analysis. This was done both to
see whether it was empirically possible to speak about provinces with generally
better or worse social outcomes (i.e., whether the S factor exists in the data), and
because it is convenient to analyze a single aggregate socioeconomic variable
instead of many (as with the Human Development Index).
2.1.1. Knoema data
Province-level socioeconomic data were copied from several sources. Most
data came from Knoema’s Argentina Regional Dataset, October 2013. See
datafiles for details. To validate the S-factor scores, Human Development Index
(HDI) scores were used. HDI scores for 1996 and 2011 were copied from
Argentina, P.N.U.D. (2013). Additional S data were used from the LAPOP
surveys.
Including many very similar or even identical variables in a factor analysis
tends to 'color' the general factor by these variables. To guard against this, an
algorithm was used that excluded variables from the dataset until no pair of
variables correlated at -.9 > x >.9. This resulted in the exclusion of 2 variables.
Factor analytic method variance was analyzed by extracting a single factor
using 30 combinations of scoring and estimation methods. There was near-zero
method variance because the scores from these 30 methods had a mean
correlation of .99. Factor analytic indicator sampling error was examined using
the split-half method, see Kirkegaard (2016b). The analysis indicated only a small
amount of indicator sampling error (median absolute correlation between scores
= .85). The primary factor analyses were carried out using Bartlett's scoring
method, and combinations of weighted (by population size)/unweighted and
interval/rank-level data. The factor loadings are shown in Figure 6. As can be
seen, the pattern of loadings was robust across analyses (median factor similarity
congruence coefficient = .98). Scores were extracted from the weighted interval-
level analysis, as this was the theoretically superior method.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
559
Figure 6. Factor loadings for the socio-economic (S) factor in the province-level
analysis. N = 24 provinces. Based on unweighted values.
2.1.2. LAPOP data.
There are two ways to utilize the LAPOP data to calculate province-level S
factor scores. First, one can factor analyze the data at the individual level, and
then average the scores within provinces. Second, one can average the indicator
scores within provinces, and then factor analyze the data at the province-level.
There is no guarantee that these methods will produce similar results (ecological
fallacy/Simpson's paradox; Kievit et al., 2013), however, previous studies have
found that they did (Kirkegaard, 2015b, 2016a). The second method has the
bonus that one can combine the indicators from it with those from Knoema to
produce a larger province-level dataset of indicators. Therefore, indicators were
averaged within province and the dataset factor analyzed. We also averaged the
indicators for other administrative divisions for comparison (districts and
municipalities), which will be discussed later. The factor structure was very similar
across levels with a median factor congruence coefficient of .97. In general,
MANKIND QUARTERLY 2017 57:4
560
loadings were stronger at the aggregate level.
2.1.3. Combined S factor
Given the lack of irregularities, a composite dataset was created from the
Knoema indicators and the LAPOP indicators. The loadings from this analysis are
shown in Figure 7. Note that because LAPOP had no data for the Santa Cruz
province, these data were imputed based on the data available from Knoema. As
before, there were no surprising loadings. Crime rate’s loading was positive as
has been found previously when aggregate data with large units were analyzed
(Kirkegaard, 2015a; Lynn, 1979). Curiously, though, at the same time crime
victimization was negative. As additional SES measures, we include Human
Development Index (HDI) scores for 1996 and 2011. We had previously found
that HDI and S factor scores correlated at typically >.90.
Figure 7. Socio-economic (S) factor loadings for the combined dataset analyzed
at the province level, N = 24 provinces. Based on unweighted values.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
561
2.2. Cognitive data.
The analysis of the cognitive data is similar to that of the socioeconomic
status above in that we have a province-level dataset as well as the ability to
create aggregate-level datasets from the LAPOP data. We used two cognitive
data sources, Operativo Nacional de Evaluación (ONE) achievement scores and
LAPOP political knowledge scores.
As for the ONE scores, we used those from the math, reading, social science
and natural science tests administered in 2005, 2006, 2007 and 2010. The 2005
ONE data were based on 6th and 9th graders (primary and secondary school
students), while the 2007 and 2010 ONE data were based on 9th graders
(secondary school students). Results can be accessed at the Dirección Nacional
de Información y Estadística de la Educativa website: http://portales.
educacion.gov.ar/diniece/2014/05/22/evaluacion-de-la-calidad-educativa-
documentos/. The data provided are unfortunately not mean or median scores,
but proportions for three groups: low scoring, medium scoring, and high scoring.
These are non-linear transformations of the underlying continuous data (La Griffe
du Lion, 2001). We estimated a single best score for each unit by calculating the
Z-scores within each combination of grade, year, and subject (32 subdatasets in
total), for low and low+medium scoring proportions. The reason to use the
combined low and medium group instead of the medium group is that the
proportion of students in the medium group is not a good measure of mean level
of achievement. This is because a small medium group can mean either that there
were many students in the high scoring group or in the low, or in some
combination (bimodal). The reason not to use the high scoring group proportion
is that it is perfectly negatively correlated with the low+medium proportion and
thus redundant. Since it is worse to have more pupils in the lower categories, we
reversed the scores. Finally, we averaged these scores to produce a single best
estimate for each province across years, grades, and subjects. The median
correlation across subdatasets was .86.
As with S scores, there are two ways of estimating mean scores for provinces
based on the LAPOP data. First, one can score the persons and then calculate
the mean score by province. Second, one can calculate the mean item score by
province, factor analyze the data at the province-level and then score the
provinces. In this case, the second option should be better because it gets around
the problem of having to score persons based on only 2 tests. (As before, we tried
both methods to estimate the method variance.)
MANKIND QUARTERLY 2017 57:4
562
2.3. Demographic variables
With the aid of a native Argentinean, 25 studies reporting provincial and
regional European, African and Amerindian biogeographic ancestry (BGA) were
located. Provincial ancestry percentages were calculated as detailed in
Supplementary File 1. When provinces had multiple data points, median values
were used. No data for either Tierra del Fuego or Entre Rios were available. For
these provinces, values were estimated based on the average values of the
bordering regions. Figure 8 shows the spatial map of European genetic ancestry.
It needs to be noted that these estimates should be viewed with caution, since
they are based on studies which often had small, unrepresentative samples, and
since estimates were often based on few markers; the latter point is relevant since
the number of markers used has been found to substantially influence admixture
estimates (Russo et al. 2016).
Figure 8. Map of European genetic ancestry.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
563
It has been previously found that SIRE-based ancestry estimates correlate
strongly with genetic-based ones in the case of Brazilian states and all American
nations (Fuerst & Kirkegaard, 2016a). As such, SIRE-based provincial ancestry
estimates were computed using self-reported race/ethnicity (White, Black,
Indigenous, Mestizo, and Mulatto) following the same procedure as before
(Fuerst & Kirkegaard, 2016a). The formula was:
SIRE-based European = %White + ½ %Mulatto + ½ %Mestizo
The SIRE data came from the LAPOP surveys and were based on over 5,800
individuals, aged 16 to 89. No data were available for Santa Cruz, so nEuropean_S IRE
= 23.
Fuerst & Kirkegaard (2016a) likewise found that skin reflectance strongly
correlated with national European ancestry estimates. Those findings were
replicated using national LAPOP interviewee color scores (results not shown).
Thus, provincial color estimates were also created using the LAPOP data. Again,
no data were available for Santa Cruz, so nskin_brightness = 23.
2.4. Geographic variables
In line with Fuerst and Kirkegaard (2016a; 2016b), latitude and yearly mean
temperature (in Celsius) of provincial capitals were used as climatic variables. We
obtained these from the English language Wikipedia.
2.5. Analysis
The correlations between the demographic variables are shown in Table 13.
The correlation between European BGA and SIRE was strong, though lower than
found in the case of Brazil (r = .77; Fuerst & Kirkegaard, 2016b). The correlation
between skin reflectance and European BGA was surprisingly low. Both Tierra
del Fuego, for which we had no genetic data, and Santiago del Estero, which had
substantial African ancestry (and so low brightness scores relative to other
provinces, which are inhabited by mostly European and Amerindian hybrid
populations), were major outliers. The relatively low correlations suggest that our
ancestry indices are somewhat poor indicators of provincial admixture.
The primary variables of interest were correlated; results are shown in Table
14. For the LAPOP CA and S scores, we report aggregate-level factor analysis
scores. As expected, the correlations between cognitive ability (CA),
socioeconomic development (S), and the three indicators of European ancestry
were moderate to strong and all positive, though this was not the case for HDI
2011 and European BGA. Figures 9 to 11 show the relationships between CA
MANKIND QUARTERLY 2017 57:4
564
(CA_One), S (S_Comb), and European BGA, with values weighted by the square
root of the provincial population size.
Table 13. Validation correlations for European ancestry, N = 23 provinces. BGA
= biogeographic ancestry; SIRE = self-identified race/ethnicity. Unweighted
correlations above the diagonal, correlations weighted by square root of
population below.
Euro BGA
Euro SIRE
Skin reflectance
Euro BGA
0.69
0.22
Euro SIRE
0.67
0.48
Skin reflectance
0.33
0.48
Figure 9. Relationship between cognitive ability and general socio-economic
factor.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
565
Table 14. Correlations between the primary variables at the province level. CA,
cognitive ability; S, socio-economic development; HDI, Human Development
Index; BGI, biogeographic ancestry; SIRE, self-identified race/ethnicity. N = 23-
24 provinces. Unweighted correlations below the diagonal, correlations weighted
by square root of population above.
MANKIND QUARTERLY 2017 57:4
566
Figure 10. Relationship between European biogeographic ancestry and cognitive
ability.
Figure 11. Relationship between European biogeographic ancestry and general
socio-economic factor.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
567
2.6. OLS regression
To examine how multiple predictors work together to predict CA and S, we
used OLS regression as was done in previous studies (Fuerst & Kirkegaard,
2016a). For predicting CA, we included variables that plausibly could be taken to
be causally prior to CA. We included only temperature and not latitude also
because these predictors are highly correlated (r = -.93, weighted) which leads to
very large standard errors when both variables are included, and because if
latitude has any causal effect, it is likely mediated by a geoclimatic factor such as
temperature. Results are shown in Table 15. The model was extremely overfitted
as indicated by the negative cross-validated R2. This is likely because of the small
sample size and the use of weights. However, the results without weights were
effectively the same (European = 0.48; temperature = -0.53). Despite this,
European BGA did predict higher CA at above chance levels.
Table 15. OLS full model results predicting cognitive ability. N = 24 provinces,
weighted by square root of population size. R2 = .43, R2-cv = -2.3.
Predictor
Beta
SE
CI lower
CI upper
European BGA
0.50
0.17
0.15
0.85
Temperature
-0.60
0.22
-1.06
-0.13
In the OLS regression for socioeconomic status (S), as above we entered
the predictor variables that plausibly could be taken to be causally prior to S.
Results are shown in Table 16. Despite the inclusion of cognitive ability (CA),
which had a sizable beta, European biogeographic ancestry kept much of its zero-
order validity. Temperature also had a strong negative beta.
Table 16. OLS full model results for S (general socioeconomic factor). N= 24
provinces, weighted by square root of population size. R2 = .84, R2-CV = .40
Predictor
Beta
SE
CI lower
CI upper
CA
0.65
0.12
0.41
0.90
European BGA
0.24
0.11
0.01
0.47
Temperature
-0.39
0.14
-0.68
-0.09
OLS regression tends to overfit models when too many predictors are used;
this makes the results difficult to interpret (James et al., 2013). To overcome this,
penalized regression methods such as LASSO regression were developed.
These methods systematically shrink the betas of the predictors towards 0 and
use within-sample cross-validation to select an optimal amount of shrinkage. The
MANKIND QUARTERLY 2017 57:4
568
result is that the models contain fewer predictors and perform better in out-of-
sample cross-validation tests. In line with some previous studies (Fuerst &
Kirkegaard, 2016a; Kirkegaard, 2016a), we employed LASSO regression to
estimate the best predictive model for the present dataset. As LASSO is a
conservative method, null results cannot be taken as strong evidence of no
validity unless sample sizes are large. LASSO regression for CA using the same
predictors as above indicated that no predictor was reliable in the present dataset
(500 runs). We repeated the above analyses for S. Table 17 shows the summary
statistics from the LASSO regressions with S as the dependent variable.
Table 17. LASSO regression betas summary statistics with S (general
socioeconomic factor) as the dependent variable. CA, cognitive ability; BGA,
biogeographic ancestry. N= 24 provinces, weighted by square root of population
size. 500 runs.
Statistic
CA
European BGA
Temperature
Mean
0.64
0.13
-0.24
Median
0.64
0.14
-0.25
SD
0.01
0.05
0.06
Mad
0.00
0.04
0.05
Fraction zero
0.00
0.01
0.00
We see that CA, European BGA, and temperature were useful predictors
across (nearly) all runs. In this case, the LASSO essentially confirmed the findings
from OLS. In the best model both CA and European had moderate to large
positive betas while mean temperature was strongly negative.
2.7. Path analysis
Regression methods rest on the assumption that all predictors are causally
independent and each causes or at least correlates with some cause of the
outcome variable. This is rarely the case, especially in analyses of sociological
data. Instead, one can posit explicit causal models using path analysis. Using this
method, one can estimate both the direct and indirect effects of variables given a
particular assumed model. We created a model incorporating temperature,
European ancestry, CA, and S. While a climatological variable cannot cause
someone to become more genetically European during their lifetime, it can lead
to migrational patterns that do effect the average European ancestry of the
provinces; for this reason, this variable was modeled as being causally prior to
European BGA. The results are shown in Figure 12. The results show that the
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
569
estimated relationships between European ancestry and CA/S hold when taking
into account a more complicated causal network.
Figure 12. Path model for European biogeographic ancestry (BGA), cognitive
ability (CA), socioeconomic outcomes (S), and temperature. N = 24 provinces,
weighted by square root of population size.
2.8. Robustness check
Owing to the unexpectedly low provincial-level correlations between skin
brightness and European ancestry (r = .33, weighted), we reran the path analysis
with European SIRE and skin brightness as independents instead of European
BGA. As expected, based on the correlations reported in Table 14, these
indicators of ancestry showed robust associations with outcomes. That is, using
different indicators of European ancestry did not substantially change the results.
Figure 13 shows the path diagram with skin brightness as a predictor. Latitude is
used instead of temperature as it is more plausibly causally related to brightness.
Figure 13. Path model for skin brightness, cognitive ability (CA), socioeconomic
outcomes (S), and Latitude.
MANKIND QUARTERLY 2017 57:4
570
2.9. Taking into account all ancestries
In the previous sections, we looked at the association between European
ancestry and outcomes. However, it is possible that combining two ancestries in
multiple regression would improve the predictive power. Below, we present
standardized betas for all three components. It is not possible to insert all three
at once into a regression model, since the three ancestry values add up to 1
(perfect multicollinearity). As such, the betas for two at a time are presented. We
have retained models with one predictor for comparison. We also report adjusted
R as a measure of model fit. We caution that standardized betas, especially when
weighted in multiple regression models, are not as easy to interpret as
correlations. Results are shown in Tables 18 and 19. Generally, both African and
Amerindian ancestry were associated with worse provincial outcomes. As the
variance in African ancestry was low, those results need to be read with caution.
Table 18. Multiple regression results for cognitive ability in Argentina. Each row
represents one model. N = 24 provinces, weighted by square root of population
size.
African%
Amerindian%
European%
adj. R
-0.46
.38
-0.40
.34
-0.42
0.49
.44
-0.31
-0.36
0.38
.48
1.43
.48
0.99
.48
Table 19. Multiple regression results for socioeconomic development (S) in
Argentina. Each row represents one model. N = 24 provinces, weighted by square
root of population size.
African%
Amerindian%
European%
adj. R
-0.38
.29
-0.49
.43
0.55
.50
-0.33
-0.46
.50
-0.19
0.48
.50
0.62
1.14
.50
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
571
3. Municipal and district-level analyses
Argentina has multiple levels of administrative divisions: the first-level
administrative divisions are the 24 provinces, below which are municipalities (n =
376), below which are departments and districts. We do not have extensive data
for municipalities and districts, but with the LAPOP data it is possible to aggregate
the scores to these levels. Since not all units at this level were sampled, we do
not have complete coverage of the country. Furthermore, since the aggregate-
level factor analyses seemed to generate somewhat more reliable scores, we
used this scoring method. We included the two demographical variables which
were calculated as before and added %White, which refers to the number of
individuals who identified as “White”. We weighted by the square root of the
LAPOP unit sample size to take into account the uneven distribution of cases by
administrative units. Results are shown in Tables 20 and 21.
Table 20. Correlations for municipalities. S, socio-economic development; CA,
cognitive ability; SIRE, self-identified race/ethnicity. N = 299, weighted by the
square root of the LAPOP municipal sample size.
S
CA
Euro SIRE
White SIRE
Skin bright.
S
1.00
CA
0.66
1.00
Euro SIRE
0.33
0.20
1.00
White SIRE
0.33
0.19
0.90
1.00
Skin brightness
0.53
0.46
0.45
0.46
1.00
Table 21. Correlations for districts. S, socio-economic development; CA,
cognitive ability; SIRE, self-identified race/ethnicity. N = 437, weighted by the
square root of the LAPOP district sample size.
S
CA
Euro SIRE
White SIRE
Skin bright.
S
1.00
CA
0.52
1.00
Euro SIRE
0.28
0.14
1.00
White SIRE
0.32
0.13
0.89
1.00
Skin brightness
0.43
0.28
0.44
0.48
1.00
In general, the lower-level results replicated those seen for provinces
although with some decrease in effect sizes. This presumably is due to the
MANKIND QUARTERLY 2017 57:4
572
decreased reliability of the measurements. Many of the samples were quite small,
being based only on a few persons.
4. Discussion and Conclusion
As in previous studies, for aggregate-level data we found strong, positive
relationships between cognitive ability (CA), the general factor of socioeconomic
outcomes (S), and measures of European ancestry. The general pattern of results
held for three different levels of aggregate analysis: provinces, municipalities, and
districts (n = 24, n = 299, n = 437). The provincial relationships generally
replicated in more complex analyses. However, while the mediation model for
European BGA, CA, and S has generally been supported by previous analyses
(Fuerst & Kirkegaard, 2016b), in the present study, much of the relationship
between European BGA and S was not mediated by CA, despite the fair quality
of cognitive data. As suggested below, this could be due to the relative crudeness
of both the BGA and CA indicators and to the fact that our CA measure was of
scholastic achievement, which is likely influenced by regional SES (thus, reverse
causality). We will have to wait for better data to become available before
investigating this issue.
We attempted to estimate European BGA at the municipal and district level
using weighted SIREs. Our estimates only produced weak to moderate
correlations with CA/S. Some of the weakness of the relationship was likely due
to sampling error, and to measurement error for CA. Another possibility is that
there is regional variation in the BGA proportions of the same SIRE groups,
making SIRE an unreliable predictor of BGA at these levels of analysis. For
instance, in Brazil we found that Whites (Branco) in Ceará (northeast) had a mean
European BGA of 65% while Whites in Minas Gerais (southeast) had a mean of
89%. Unfortunately, the BGA data for Argentina were very sparse and so we were
unable to examine this possibility.
At the individual level, we observed correlations between skin brightness,
CA, and S, just as have been found in many previous studies (e.g., Lynn, 2002).
These relationships were not due to SIRE-related social processing effects such
as discrimination. We could not investigate whether the observed relationships
involving skin brightness were due to racial phenotype-based social processes
with this dataset. To determine this, one must have genomic ancestry data or use
a between-sibling design (Dalliard, 2014). In general, the results are congruent
with models that involve robust links between persons' ancestry and outcomes,
and they are congruent with previous studies (Fuerst & Kirkegaard, 2016a,
2016b; Kirkegaard et al., 2017).
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
573
Interestingly, the correlation between European BGA and skin brightness
was weak, despite both European BGA and skin brightness being robustly related
to outcomes and despite our national-level analysis showing a strong relationship
between European BGA and skin brightness (using the same methods and, for
skin brightness, the same dataset). Why is not clear. The admixture estimates
were based on unrepresentative samples; additionally, the estimates were likely
somewhat unreliable owing to the number of molecular markers used. The
LAPOP surveys similarly were not representative on the provincial level;
moreover, interviewer assessed skin color is likely an unreliable measure of true
color. As such, both indexes are likely very imperfect indexes of “true” provincial
BGA; thus, it might be expected that the correlations between the ancestry
indexes would only be modest.
One possible reason for why they were lower than expected is that in
Argentina the source populations were predominately Europeans and
Amerindians and thus the color contrast is not high, as in cases where Africans
were a major source; in Argentina, African admixture is relatively low (unweighted
African BGA mean provincial : 6%, range provinci al 1-17%). An alternative possibility is
that one or the other measure is a poor index of actual provincial BGA. Skin
brightness is substantially influenced by environmental factors (e.g., exposure to
the sun), but also by genetic variance within continental races. As to this latter
point, among indigenous Argentinians, as among Europeans, there is an
evolutionary pigmentary cline which runs north to south (Chaplin, 2004). The non-
random settlement of different European groups (e.g., south versus central/north)
in Argentina would have increased the variance unrelated to continental level
BGA.
Regardless, it is unlikely that this intra-continental BGA related variance
could account for much of the provincial level associations between skin
brightness and outcomes, which holds when including latitude, European BGA,
and African BGA in path models (analyses not shown). On the provincial level,
skin brightness is perhaps better thought to be indexing sociological factors (e.g.,
sun exposure related to occupation). It is possible that the same holds on the
individual level, within a given province and given SIRE group. Given this
concern, future studies should ideally use data with individual-level BGA, instead
of relying on ancestry proxies such as SIRE and color.
5. Limitations
5.1. Individual analyses
First, the cognitive ability data was very limited, consisting of just a few items.
5/6 of the items concerned political knowledge (the last concerned geography),
MANKIND QUARTERLY 2017 57:4
574
so there is likely a strong coloring of any general factor. In some analyses, we
corrected for the estimated measurement error using the standard formula.
However, such corrections are not easy to do for multivariate analyses and so
were not done for these. Second, although the socioeconomic data were diverse,
they were not entirely satisfactory. There was a lack of crime and health indicators
and an abundance of property ownership indicators. This is likely to have colored
the general factor to some degree, a claim supported by the moderate indicator
sampling reliability observed. Third, we lacked individual level BGA data and so
had to rely on skin brightness and SIRE as ancestry proxies.
5.2. Aggregate level analyses
First, there were only 24 units in the province-level analyses. Small data
analyses tend to give unreliable results and important associations may go
unnoticed due to lack of power/precision. For this reason, it is important to compile
larger datasets, as done in Fuerst and Kirkegaard (2016a). Second, the quality of
the estimates of BGA was moderate. There are likely to be fairly large estimation
errors for multiple provinces that cause unknown errors in the results. Third, the
study relied on scholastic test data instead of traditional cognitive ability (IQ test-
based) data. Since scholastic ability is presumably more influenced by
socioeconomic variables such as the quality of schooling (Branigan, McCallum &
Freese, 2013), this limits the causal interpretations of the results, as there may
be causation back and forth between scholastic ability and the socioeconomic
outcomes. It is not possible to examine this issue with the present data. Fourth,
due to the moderate quality of the predictors, in particular the European BGA
variable, there is a high probability that some true validity of the causal variables
was 'displaced' to other causal or non-causal predictors in the models. This is a
direct statistical implication when one has correlated predictors with
measurement error (Westfall & Yarkoni, 2016). It is likely that higher quality
ancestry estimates will become available in the next couple of years, which can
be used for an updated study. Fifth, the present dataset is cross-sectional. This
makes it difficult to control for fixed-effects of provinces (e.g., latitude/average
temperature) that are correlated with changing circumstances of the provinces
(e.g., demographics). A better approach is to use a longitudinal design to control
for fixed effects as done by Deryugina and Hsiang (2014) and Fulford et al (2016).
Unfortunately, such datasets are hard to come by.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
575
Supplementary material and acknowledgments
The R code, full datasets and high quality figures can be found at
https://osf.io/etuy8/. Rpub can be found at http://rpubs.com/EmilOWK/
Argentina_admixture
We thank the Latin American Public Opinion Project (LAPOP) and its major
supporters (the United States Agency for International Development, the Inter-
American Development Bank, and Vanderbilt University) for making the data
publicly available.
References
Branigan, A.R., McCallum, K.J. & Freese, J. (2013). Variation in the heritability of
educational attainment: An international meta-analysis. Social Forces 92: 109-140.
Carl, N. (2016). IQ and socio-economic development across local authorities of the UK.
Intelligence 55: 90-94.
Carroll, J.B. (1993). Human Cognitive Abilities: A Survey of Factor-Analytic Studies.
Cambridge University Press.
Chaplin, G. (2004). Geographic distribution of environmental factors influencing human
skin coloration. American Journal of Physical Anthropology 125: 292-302.
Dalliard, M. (2014). The elusive X-factor: A critique of J. M. Kaplan’s model of race and
IQ. Open Differential Psychology.
DeMars, C. (2010). Item Response Theory. Oxford, New York: Oxford University Press.
Deryugina, T. & Hsiang, S. (2014). Does the environment still matter? Daily temperature
and income in the United States. NBER Working Paper 20750.
Easterly, W. & Levine, R. (2016). The European origins of economic development.
Journal of Economic Growth 21: 225-257.
Eppig, C., Fincher, C.L. & Thornhill, R. (2010). Parasite prevalence and the worldwide
distribution of cognitive ability. Proceedings of the Royal Society of London B: Biological
Sciences 277(1701): 3801-3808.
Fox, J. (2016). polycor: Polychoric and polyserial correlations (Version 0.7-9). Retrieved
from https://cran.r-project.org/web/packages/polycor/index.html
Fuerst, J. & Kirkegaard, E.O.W. (2016a). Admixture in the Americas: Regional and
national differences. Mankind Quarterly 56: 255-373.
MANKIND QUARTERLY 2017 57:4
576
Fuerst, J. & Kirkegaard, E.O.W. (2016b). The genealogy of differences in the Americas.
Mankind Quarterly 56: 425-481.
Fulford, S.L., Petkov, I. & Schiantarelli, F. (2016). Does it matter where you came from?
Ancestry composition and economic performance of U.S. counties, 1850-2010. SSRN
Scholarly Paper 2608567. Retrieved from http://papers.ssrn.com/abstract=2608567
Herrnstein, R.J. & Murray, C.A. (1994). The Bell Curve: Intelligence and Class Structure
in American Life New York: Free Press.
James, G., Witten, D., Hastie, T. & Tibshirani, R. (eds.) (2013). An introduction to
statistical learning: with applications in R. New York: Springer.
Jensen, A.R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kievit, R., Frankenhuis, W.E., Waldorp, L. & Borsboom, D. (2013). Simpson’s paradox in
psychological science: a practical guide. Frontiers in Psychology 4: 513.
Kirkegaard, E.O.W. (2014). The international general socioeconomic factor: Factor
analyzing international rankings. Open Differential Psychology.
Kirkegaard, E.O.W. (2015a). IQ and socioeconomic development across regions of the
UK: A reanalysis. The Winnower. Retrieved from https://thewinnower.com/papers/1419-
iq-and-socioeconomic-development-across-regions-of-the-uk-a-reanalysis
Kirkegaard, E.O.W. (2015b, September 23). The general religious factor among Muslims:
A multi-level factor analysis. Retrieved from http://emilkirkegaard.dk/en/?p=5485
Kirkegaard, E.O.W. (2016a). Inequality across US counties: An S factor analysis. Open
Quantitative Sociology & Political Science.
Kirkegaard, E.O.W. (2016b). Some new methods for exploratory factor analysis of
socioeconomic data. Open Quantitative Sociology & Political Science.
Kirkegaard, E.O.W., Wang, M.R. & Fuerst, J. (2017). Biogeographic ancestry and
socioeconomic outcomes in the Americas: A meta-analysis of American studies. Mankind
Quarterly 573: 398-427.
Knoema. (2013). Argentina regional dataset, October 2013. Retrieved from
https://knoema.com/aboybld/argentina-regional-dataset-october-2013?region=1000250-
binacional
La Griffe du Lion (2001, July). Pearbotham’s Law on the persistence of achievement
gaps. Retrieved August 16, 2015, from http://www.lagriffedulion.f2s.com/adverse.htm
León, F.R. & Burga-León, A. (2015). How geography influences complex cognitive ability.
Intelligence 50: 221-227.
Lorenzo-Seva, U. & Ten Berge, J.M. (2006). Tucker’s congruence coefficient as a
meaningful index of factor similarity. Methodology 2(2): 57-64.
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
577
Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of
Social and Clinical Psychology 18: 1-12. https://doi.org/10.1111/j.2044-
8260.1979.tb00297.x
Lynn, R. (2002). Skin color and intelligence in African Americans. Population and
Environment 23: 365-375.
Lynn, R. (2008). The Global Bell Curve: Race, IQ, and Inequality Worldwide. Augusta,
GA: Washington Summit.
Lynn, R. (2015). Race Differences in Intelligence, revised edition. Augusta, GA:
Washington Summit.
Lynn, R. & Vanhanen, T. (2012). Intelligence: A Unifying Construct for the Social
Sciences. London: Ulster Institute for Social Research.
Programa de las Naciones Unidas para el Desarrollo. (2013). Informe nacional sobre
desarrollo humano 2013. Argentina en un mundo incierto: Asegurar el desarrollo humano
en el siglo XXI Buenos Aires. Programa de las Naciones Unidas para el Desarrollo.
Retrieved from http://hdr.undp.org/sites/default/files/pnudindh2013.pdf
Putterman, L. & Weil, D.N. (2010). Post-1500 population flows and the long-run
determinants of economic growth and inequality. Quarterly Journal of Economics 125:
1627-1682.
Rangel, M.A. (2014). Is parental love colorblind? Human capital accumulation within
mixed families. Review of Black Political Economy 42: 57-86.
Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality
Research (Version 1.5.4). Retrieved from http://cran.r-
project.org/web/packages/psych/index.html
Revelle, W. (2016). An Introduction to Psychometric Theory with Applications in R.
Retrieved from http://www.personality-project.org/r/book/
Russo, M.G., Di Fabio Rocca, F., Doldán, P., Cardozo, D.G., Dejean, C.B., Seldes, V. &
Avena, S.A. (2016). Evaluación del número mínimo de marcadores para estimar
ancestría individual en una muestra de la población argentina. Revista del Museo de
Antropología 9(1): 49-56.
Spolaore, E. & Wacziarg, R. (2013). How deep are the roots of economic development?
Journal of Economic Literature 51: 325-369.
Strenze, T. (2007). Intelligence and socioeconomic success: A meta-analytic review of
longitudinal research. Intelligence 35: 401-426.
Telles, E.E. (2014). Pigmentocracies: Ethnicity, Race, and Color in Latin America, 1st
edition. Chapel Hill, NC: University of North Carolina Press.
MANKIND QUARTERLY 2017 57:4
578
Templ, M., Alfons, A., Kowarik, A. & Prantner, B. (2015, February 19). VIM: Visualization
and imputation of missing values. CRAN. Retrieved from http://cran.r-
project.org/web/packages/VIM/index.html
Van de Vliert, E. (2013). Climato-economic habitats support patterns of human needs,
stresses, and freedoms. Behavioral and Brain Sciences 36: 465-480.
Westfall, J. & Yarkoni, T. (2016). Statistically controlling for confounding constructs is
harder than you think. PLoS ONE 11(3): e0152719.
Papers for the admixture estimates
Alfaro, E.L., Dipierri, J.E., Gutierrez, N.I. & Vullo, C.M. (2005). Genetic structure and
admixture in urban populations of the Argentine North-West. Annals of Human Biology
32(6): 724-737.
Avena, S.A., Goicoechea, A.S., Bartomioli, M., Fernández, V., Cabrera, A., Dugoujon,
J.M., ... & Carnese, F.R. (2007). Mestizaje en el sur de la región pampeana (Argentina).
Revista Argentina de Antropología Biológica 9(2): 59-76.
Avena, S.A., Goicoechea, A.S., Rey, J., Dugoujon, J.M., Dejean, C.B. & Carnese, F.R.
(2006). Mezcla génica en una muestra poblacional de la ciudad de Buenos Aires.
Medicina (Buenos Aires) 66(2): 113-118.
Avena, S., Via, M., Ziv, E., Pérez-Stable, E.J., Gignoux, C.R., Dejean, C., ... & Beckman,
K. (2012). Heterogeneity in genetic admixture across different regions of Argentina. PLoS
ONE 7(4): e34695.
Avena, S.A., Parolín, M.L., Dejean, C.B., Ríos Part, M.C., Fabrykant, G., Goicoechea,
A.S., ... & Carnese, F.R. (2009). Mezcla génica y linajes uniparentales en Comodoro
Rivadavia (provincia de Chubut, Argentina). Revista Argentina de Antropología Biológica
9(1).
Avena, S.A., Parolin, M.L., Boquet, M., Dejean, C.B., Postillone, M.B., Alvarez Trentini,
Y., ... & Carnese, F.R. (2010). Mezcla génica y linajes uniparentales en Esquel (Pcia. de
Chubut): Su comparación con otras muestras poblacionales argentinas. BAG. Journal of
Basic and Applied Genetics 21(1): 1-14.
Bobillo, C., Navoni, J.A., Olmos, V., Merini, L.J., Lepori, E.V. & Corach, D. (2014). Ethnic
characterization of a population of children exposed to high doses of arsenic via drinking
water and a possible correlation with metabolic processes. International Journal of
Molecular Epidemiology and Genetics 5(1): 1-10.
Cano, H., Ginart, S., Caputo, M., Corach, D. & Sala, A. (2015). Analysis of the genetic
structure of Santa Cruz province and itS comparison with the other Southern Patagonian
KIRKEGAARD, E.O.W., & FUERST, J. ADMIXTURE IN ARGENTINA
579
provinces of Argentina. Forensic Science International: Genetics Supplement Series 5:
e114-e115.
Corach, D., Lao, O., Bobillo, C., van Der Gaag, K., Zuniga, S., Vermeulen, M., ... & De
Knijff, P. (2010). Inferring continental ancestry of Argentineans from autosomal, Y
chromosomal and mitochondrial DNA. Annals of Human Genetics 74: 65-76.
García, A., Dermarchi, D.A., Tovo-Rodrigues, L., Pauro, M., Callegari-Jacques, S.M.,
Salzano, F.M. & Hutz, M.H. (2015). High interpopulation homogeneity in Central
Argentina as assessed by ancestry informative markers (AIMs). Genetics and Molecular
Biology 38: 324-331.
Godinho, N.M.O., Gontijo, C.C., Diniz, M.E.C.G., Falcão-Alencar, G., Dalton, G.C.,
Amorim, C.E.G., ... & Oliveira, S.F. (2008). Regional patterns of genetic admixture in
South America. Forensic Science International: Genetics Supplement Series 1(1): 329-
330.
GómezPérez, L., AlfonsoSánchez, M.A., Dipierri, J.E., Alfaro, E., GarcíaObregón, S.,
De Pancorbo, M.M., ... & Peña, J.A. (2011). Microevolutionary processes due to
landscape features in the Province of Jujuy (Argentina). American Journal of Human
Biology 23: 177-184.
Marino, M., Furfuro, S. & Corach, D. (2009). Genetic structure of Mendoza province
population inferred from autosomal and Y-chromosome STRs analysis. Forensic Science
International: Genetics Supplement Series 2(1): 433-434.
Martinez Marignac, V.L., Bertoni, B., Parra, E.J. & Bianchi, N.O. (2004). Characterization
of admixture in an urban sample from Buenos Aires, Argentina, using uniparentally and
biparentally inherited genetic markers. Human Biology 76: 543-557.
Morales, J.O., Dipierri, J.E., Alfaro, E. & Bejarano, I.F. (2000). Distribution of the ABO
system in the Argentine Northwest: Miscegenation and genetic diversity. Interciencia
25(9): 432-435.
Parolin, M.L., Avena, S.A., Fleischer, S., Pretell, M., Rocca, F.D.F., Rodríguez, D.A., ... &
Manera, G. (2013). Análisis de la diversidad biológica y mestizaje en la ciudad de Puerto
Madryn (Prov. de Chubut, Argentina). Revista argentina de antropología biológica 15(1):
61-75.
Parolin, M.L., Carreras-Torres, R., Sambuco, L.A., Jaureguiberry, S.M. & Iudica, C.E.
(2014). Analysis of 15 autosomal STR loci from Mar del Plata and Bahia Blanca (Central
Region of Argentina). International Journal of Legal Medicine 128: 457-459.
Resano, M., Esteban, E., GonzálezPérez, E., Vía, M., Athanasiadis, G., Avena, S., ... &
Dejean, C. (2007). How many populations set foot through the Patagonian door? Genetic
composition of the current population of Bahía Blanca (Argentina) based on data from 19
Alu polymorphisms. American Journal of Human Biology 19: 827-835.
MANKIND QUARTERLY 2017 57:4
580
Rocca, F.D.F., Albeza, M.V., Postillone, M.B., Acreche, N., Lafage, L., Parolín, M.L., ... &
Avena, S. (2016). Historia poblacional Y analisis antropogenética de la Ciudad de Salta.
Andes (27).
Rocco, F.D.F, de la Vega, D., Russo, G., Avena, S. (2013). El aporte africano al acervo
génico de la población de Rosario, Provincia de Santa Fe. Conference: III Jornadas del
Grupo de Estudios Afrolatinoamericanos. Buenos Aires: Ediciones del CCC Centro
Cultural de la Cooperacion Floreal Gorini.
Seldin, M.F., Tian, C., Shigeta, R., Scherbarth, H.R., Silva, G., Belmont, J.W., ... &
Alvarellos, A. (2007). Argentine population genetic structure: Large variance in
Amerindian contribution. American Journal of Physical Anthropology 132: 455-462.
Sevini, F., Yao, D.Y., Lomartire, L., Barbieri, A., Vianello, D., Ferri, G., ... & Franceschi,
C. (2013). Analysis of population substructure in two sympatric populations of Gran
Chaco, Argentina. PLoS ONE 8(5): e64054.
Toscanini, U., Gusmão, L., Berardi, G., Gómez, A., Pereira, R. & Raimondi, E. (2011).
Ancestry proportions in urban populations of Argentina. Forensic Science International:
Genetics Supplement Series 3(1): e387-e388.
Trinks, J., Nishida, N., Hulaniuk, M.L., Caputo, M., Tsuchiura, T., Marciano, S., ... & Frías,
S.E. (2017). Role of HLADP and HLADQ on the clearance of hepatitis B virus and the
risk of chronic infection in a multiethnic population. Liver Int. 2017; 00:1-12.
https://doi.org/10.1111/liv.13405.
VieiraMachado, C.D., Carvalho, F.M., Silva, L.C., Santos, S.E., Martins, C., Poletta, F.A.,
... & Orioli, I.M. (2016). Analysis of the genetic ancestry of patients with oral clefts from
South American admixed populations. European Journal of Oral Sciences 124(4): 406-
411.
... This is related to aggregation issues with higher crime in wealthy urban areas. Aside from these, the pattern of loadings was much as expected and found in prior studies (Kirkegaard, 2014(Kirkegaard, , 2016Kirkegaard & Fuerst, 2017;Kirkegaard & Tranberg, 2015). Generally speaking, then, provinces that are better on one indicator, e.g. ...
Article
Full-text available
Italy shows a strong north-south gradient in measures of well-being, with the northern areas being far wealthier than the southern. Less well known is that there is also a latitudinal gradient in intelligence. We combined numeracy scores based on age heaping data for Italian provinces from the censuses of 1861, 1871, and 1881 with modern data about scholastic ability from the INVALSI, and important social outcomes such as mortality and income (up to 107 provinces in analyses). We show that there is a strong stability of the intelligence differences across 150 years for the overlapping set of 69 provinces. Intelligence measured in the 1800s predicts overall well-being just as well as modern data, r’s .78 and .82, for age heaping and INVALSI, respectively. We discuss the findings in light of recent evidence of genetic differences in regional intelligence levels. Keywords: Age heaping, Intelligence, Italy, 19th century
... In their rejoinder, Fuerst and Kirkegaard (2016b) questioned predictions from UVR theory on theoretical grounds and showed between-country data from the Americas indicating direct effects of European ancestry on cognitive ability controlling for absolute latitude. Subsequently, Kirkegaard and Fuerst (2017) utilized Argentinian data to show that skin brightness predicts cognitive ability controlling for latitude. ...
Article
Full-text available
A debate in Mankind Quarterly positing racial categorization of populations vis-à-vis biological effects of UV radiation was based on data from a single country, used absolute latitude instead of UV radiation, and limited the analysis to path analysis. To overcome limitations of the studies, we utilized measurements of UV radiation for 26 Brazilian and 48 USA states instead of absolute latitude and performed seemingly unrelated regressions in addition to path analysis. NAEP scores and infectious disease rate were collected in USA and PISA scores and infant mortality in Brazil. Significant cognitive effects of European ancestry were replicated, but showed spuriousness, disappearing when the effects of UV radiation were controlled. Our evidence strongly suggests that UV radiation is a consistent antecedent of cognitive ability directly and through income in the USA and Brazil and through infant mortality in Brazil, whereas European ancestry only influences cognitive ability positively by reducing infectious diseases in the USA or infant mortality in Brazil. The between-country consistency of our findings compensates for methodological weaknesses that took place especially in the Brazil study. Psychologists and economists should be aware of these findings to avoid making erroneous inferences based on genetic or cultural variables.
... It is well-established that cognitive ability varies across geopolitical divisions such as nations, states, and counties (e.g., nations: [1][2] ; Vietnamese provinces: [3] ; U.S. states: [4] ; U.S. counties: [5] ; Argentinian provinces: [6] ). These cognitive ability differences have frequently been quite large. ...
Conference Paper
Full-text available
Using a sample of ~3,100 U.S. counties, we tested geoclimatic explanations for why cognitive ability varies across geography. These models posit that geoclimatic factors will strongly predict cognitive ability across geography, even when a variety of common controls appear in the regression equations. Our results generally do not support UV radiation (UVR) based or other geoclimatic models. Specifically, although UVR alone predicted cognitive ability at the U.S. county-level (β = -.33), its validity was markedly reduced in the presence of climatic and demographic covariates (β = -.16), and was reduced even further with a spatial lag (β = -.10). For climate models, average temperature remained a significant predictor in the regression equation containing a spatial lag (β = .35). However, the effect was in the wrong direction relative to typical cold weather hypotheses. Moreover, when we ran the analyses separately by race/ethnicity, no consistent pattern appeared in the models containing the spatial lag. Analyses of gap sizes across counties were also generally inconsistent with predictions from the UVR model. Instead, results seemed to provide support for compositional models.
... This allows for a simple interpretation of the resulting scores as a clean measure of social well-being or generalized socioeconomic status, termed the S factor (see e.g. Kirkegaard, 2014a;Kirkegaard & Fuerst, 2017 for other examples of such factor analyses). Figure 6 shows the scatterplot between Muslim% and the S factor, while Figure 7 shows a map of Brussels with the S factor score. The supplementary materials have maps of Brussels with each variable. ...
Article
Full-text available
We examined regional inequality in Belgium, both in the 19 communes of Brussels and in the country as a whole (n = 589 communes). We find very strong relationships between Muslim% of the population and a variety of social outcomes such as crime rate, educational attainment, and median income. For the 19 communes of Brussels, we find a correlation of-.94 between Muslim% and a general factor of socioeconomic variables (S factor) based on 22 diverse indicators. The slope for this relationship is-7.52, meaning that a change in S going from 0% to 100% Muslim corresponds to a worsening of overall social well-being by 7.52 (commune-level) standard deviations. For the entire country, we have data for 8 measures of social inequality. Analysis of the indicators shows an S factor which is very similar to the one from the Brussels data only based on the full set of indicators (r's = .98). In the full dataset, the correlation between S and Muslim% is-.52, with a slope of-8.05. Adding covariates for age, population density, and spatial autocorrelation changes this slope to-8.77. Thus, the expected change going from a 0% to 100% Muslim population is-8.77 standard deviations in general social well-being. We discuss our findings in relation to other research on immigration and social inequality, with a focus on the causal influence of intelligence on life outcomes in general.
... The complete dataset from the website was obtained with an automated downloading tool. Social status was conceptualized as a general factor akin to general intelligence (Howe et al., 2012;Kirkegaard, 2014;Kirkegaard & Fuerst, 2017;Vyas & Kumaranayake, 2006) and extracted using factor analysis from the five available numerical indicators (the nonnumerical indicators were not used as they were not amenable to factor analysis). These were first adjusted for age using a polynomial regression model. ...
Article
Full-text available
It is well established that general intelligence varies in the population and is causal for variation in later life outcomes, in particular for social status and education. We linked IQ-test scores from the Danish draft test (Børge Prien Prøven, BPP) to social status for a list of 265 relatively common names in Denmark (85% male). Intelligence at the level of first name was strongly related to social status, r = .64. Ten names in the dataset were non-western, Muslim names. These names averaged an IQ of 81 (range 76-87) compared with 98 for the western, mostly Danish ones. Nonwestern names were also lower in social status, with a mean SES score of 2.66 standard deviations below that of western names. Mediation analysis showed that 30% of this very large gap can be explained by the IQ gap. Reasons for this relatively low level of mediation are discussed.
Article
Full-text available
We investigated how genetically measured ancestry relates to social status in Chile. Our study is based on a dataset of 1,805 subjects previously analyzed in another study. Ancestry was measured using genetic analysis based on microarray data. Overall we find that compared to European ancestry (44%), the Amerindian ancestries Mapuche (central Chile, 36%) and Aymara (northern, 17%) both predict lower social status (standardized betas =-1.77 and-0.97, p's < .001). The amount of African ancestry was relatively minor in this sample (3%), but tentatively was associated with lower social status (beta =-2.15, p = .03). These differences held controlling for age, gender, and region of residence. Our analyses of the regional-level data (n=13) did not produce any findings. The sample size is probably too small and coarse-grained for this analysis to be viable.
Article
Full-text available
Aiming to determine their ancestry diagnostic potential, we selected two sets of nuclear deletion/insertion polymorphisms (DIPs), including 30 located on autosomal chromosomes and 33 on the X chromosome. We analysed over 200 unrelated Argentinean individuals living in urban areas of Argentina. As in most American countries, the extant Argentinean population is the result of tricontinental genetic admixture. The peopling process within the continent was characterised by mating bias involving Native American and enslaved African females and European males. Differential results were detected between autosomal DIPs and X-DIPs. The former showed that the European component was the largest (77.8%), followed by the Native American (17.9%) and African (4.2%) components, in good agreement with the previously published results. In contrast, X-DIPs showed that the European genetic contribution was also predominant but much smaller (52.9%) and considerably larger Native American and African contributions (39.6% and 7.5%, respectively). Genetic analysis revealed continental genetic contributions whose associated phenotypic traits have been mostly lost. The observed differences between the estimated continental genetic contribution proportions based on autosomal DIPs and X-DIPs reflect the effects of autosome and X-chromosome transmission behaviour and their different recombination patterns. This work shows the ability of the tested DIP panels to infer ancestry and confirm mating bias. To the best of our knowledge, this is the first study focusing on ancestry-informative autosomal DIP and X-DIP comparisons performed in a sample representing the entire Argentinean population.
Article
Full-text available
That human ancestry predicts average IQ and socioeconomic outcomes is amongst the most thoroughly replicated findings of the social sciences. Since human ethnic and cultural descent is usually represented on national flags, it was hypothesized herein that national flag symbolism and colors would be predictive of a nation's average IQ and socioeconomic development. In order to test this hypothesis, national flag symbols and colors were coded, quantified, and correlated with country IQ and Human Development Index (HDI). Both country-level IQ and HDI are positively associated with Christian symbolism, and negatively associated with symbols representing celestial bodies. The color green predicts lower IQ and HDI, while the color white predicts higher IQ and HDI. The color red predicts higher IQ, but not higher HDI, and the color yellow predicts lower HDI, but not lower IQ. The correlations are generally higher for HDI than IQ. With the exception of the color yellow, the correlations with HDI are significant even when controlling for the correlation between HDI and IQ. The present study suggests national flag symbolism and colors as yet another correlate of average group intelligence.
Preprint
Full-text available
That human ancestry predicts average IQ and socioeconomic outcomes is amongst the most thoroughly replicated findings of the social sciences. Since human ethnic and cultural descent is usually represented on national flags, it was hypothesized herein that national flag symbolism and colors would be predictive of a nation's average IQ and socioeconomic development. In order to test this hypothesis, national flag symbols and colors were coded, quantified, and correlated with country IQ and Human Development Index (HDI). Both country-level IQ and HDI are positively associated with Christian symbolism, and negatively associated with symbols representing celestial bodies. The color green predicts lower IQ and HDI, while the color white predicts higher IQ and HDI. The color red predicts higher IQ, but not higher HDI, and the color yellow predicts lower HDI, but not lower IQ. The correlations are generally higher for HDI than IQ. With the exception of the color yellow, the correlations with HDI are significant even when controlling for the correlation between HDI and IQ. The present study suggests national flag symbolism and colors as yet another correlate of average group intelligence.
Article
Full-text available
Using data from the Philadelphia Neurodevelopmental Cohort, we examined whether European ancestry predicted cognitive ability over and above both parental socioeconomic status (SES) and measures of eye, hair, and skin color. First, using multi-group confirmatory factor analysis, we verified that strict factorial invariance held between self-identified African and European-Americans. The differences between these groups, which were equivalent to 14.72 IQ points, were primarily (75.59%) due to difference in general cognitive ability (g), consistent with Spearman’s hypothesis. We found a relationship between European admixture and g. This relationship existed in samples of (a) self-identified monoracial African-Americans (B = 0.78, n = 2,179), (b) monoracial African and biracial African-European-Americans, with controls added for self-identified biracial status (B = 0.85, n = 2407), and (c) combined European, African-European, and African-American participants, with controls for self-identified race/ethnicity (B = 0.75, N = 7,273). Controlling for parental SES modestly attenuated these relationships whereas controlling for measures of skin, hair, and eye color did not. Next, we validated four sets of polygenic scores for educational attainment (eduPGS). MTAG, the multi-trait analysis of genome-wide association study (GWAS) eduPGS (based on 8442 overlapping variants) predicted g in both the monoracial African-American (r = 0.111, n = 2179, p < 0.001), and the European-American (r = 0.227, n = 4914, p < 0.001) subsamples. We also found large race differences for the means of eduPGS (d = 1.89). Using the ancestry-adjusted association between MTAG eduPGS and g from the monoracial African-American sample as an estimate of the transracially unbiased validity of eduPGS (B = 0.124), the results suggest that as much as 20%–25% of the race difference in g can be natively explained by known cognitive ability-related variants. Moreover, path analysis showed that the eduPGS substantially mediated associations between cognitive ability and European ancestry in the African-American sample. Subtest differences, together with the effects of both ancestry and eduPGS, had near-identity with subtest g-loadings. This finding confirmed a Jensen effect acting on ancestry-related differences. Finally, we confirmed measurement invariance along the full range of European ancestry in the combined sample using local structural equation modeling. Results converge on genetics as a partial explanation for group mean differences in intelligence.
Article
Full-text available
A dataset of socioeconomic, demographic and geographic data for US counties (N≈3,100) was created by merging data from several sources. A suitable subset of 28 socioeconomic indicators was chosen for analysis. Factor analysis revealed a clear general socioeconomic factor (S factor) which was stable across extraction methods and different samples of indicators (absolute split-half sampling reliability = .85). Self-identified race/ethnicity (SIRE) population percentages were strongly, but non-linearly, related to cognitive ability and S. In general, the effect of White% and Asian% were positive, while those for Black%, Hispanic% and Amerindian% were negative. The effect was unclear for Other/mixed%. The best model consisted of White%, Black%, Asian% and Amerindian% and explained 41/43% of the variance in cognitive ability/S among counties. SIRE homogeneity had a non-linear relationship to S, both with and without taking into account the effects of SIRE variables. Overall, the effect was slightly negative due to low S, high White% areas. Geospatial (latitude, longitude, and elevation) and climatological (temperature, precipitation) predictors were tested in models. In linear regression, they had little incremental validity. However, there was evidence of non-linear relationships. When models were fitted that allowed for non-linear effects of the environmental predictors, they were able to add a moderate amount of incremental validity. LASSO regression, however, suggested that much of this predictive validity was due to overfitting. Furthermore, it was difficult to make causal sense of the results. Spatial patterns in the data were examined using multiple methods, all of which indicated strong spatial autocorrelation for cognitive ability, S and SIRE (k nearest spatial neighbor regression [KNSNR] correlations of .62 to .89). Model residuals were also spatially autocorrelated, and for this reason the models were re-fit controlling for spatial autocorrelation using KNSNR-based residuals and spatial local regression. The results indicated that the effects of SIREs were not due to spatially autocorrelated confounds except possibly for Black% which was about 50% weaker in the controlled analyses. Pseudo-multilevel analyses of both the factor structure of S and the SIRE predictive model showed results consistent with the main analyses. Specifically, the factor structure was similar across levels of analysis (states and counties) and within states. Furthermore, the SIRE predictors had similar betas when examined within each state compared to when analyzed across all states. It was tested whether the relationship between SIREs and S was mediated by cognitive ability. Several methods were used to examine this question and the results were mixed, but generally in line with a partial mediation model. Jensen's method (method of correlated vectors) was used to examine whether the observed relationship between cognitive ability and S scores was plausibly due to the latent S factor. This was strongly supported (r = .91, Nindicators=28). Similarly, it was examined whether the relationship between SIREs and S scores was plausibly due to the latent S factor. This did not appear to be the case.
Article
Full-text available
Some new methods for factor analyzing socioeconomic data are presented, discussed and illustrated with analyses of new and old datasets. A general socioeconomic factor (S) was found in a dataset of 47 French-speaking Swiss provinces from 1888. It was strongly related (r’s .64 to .70) to cognitive ability as measured by an army examination. Fertility had a strong negative loading (r -.44 to -.67). Results were similar when using rank-transformed data. The S factor of international rankings data was found to have a split-half factor reliability of .93, that of the general factor of personality extracted from 25 OCEAN items .55, and that of the general cognitive ability factor .68 based on 16 items from the International Cognitive Ability Resource.
Article
Full-text available
Narrative reports suggest that socioeconomic status (SES) is associated with biogeographic ancestry (BGA) in the Americas. If so, SES potentially acts as a confound that needs to be taken into account when evaluating the relation between medical outcomes and BGA. To explore how systematic BGA-SES associations are, a meta-analysis of American studies was conducted. 40 studies were identified, yielding a total of 64 independent samples with directions of associations, including 48 independent samples with effect sizes. An analysis of association directions found a high degree of consistency. The square root n-weighted directions were 0.83 (K = 36),-0.81 (K = 41) and-0.82 (K = 39) for European, Amerindian and African BGA, respectively. An analysis of effect size magnitudes found that European BGA was positively associated with SES, with a meta-analytic effect size of r = .18 [95% CI: .13 to .24, K = 28, n = 35,476.5], while both Amerindian and African BGA were negatively associated with SES, having meta-analytic effect sizes of-.14 [-.18 to-.10, K = 31, n = 28,937.5] and-.11 [-.15 to-.07, K = 28, n = 32,710.5], respectively. There was considerable cross-sample variation in effect sizes (mean I2 = 92%), but the sample size was not enough for performing credible moderator analysis. Implications for future studies are discussed.
Article
Full-text available
In the present study the genetic composition of Salta capital city was estimated in a population sample. A total of 223 non related blood-donors from the Centro Privado de Hemoterapia were included, who provided written informed consent and genealogical information. Twelve autosomal markers, GM allotypes, mtDNA and Y-chromosome continental origin were analysed; genetic admixture was estimated employing the ADMIX program. Autosomal markers show the presence of 50,02% for the Amerindian component, 46,29% for the European and 3,51% for the African component. Amerindians mitochondrial haplogroups represented a 93,75%, while the Europeans haplogroups represented a 3,85% and the Africans a 2,40%; 17,1% of males analysed exhibited the aboriginal variant Q*M3. The data were compared to those obtained previously in other cities, and the genetic admixture of Salta showed the highest values of Amerindian and African component. The intraregional immigration is much more remarkable than interregional or foreign immigration. These studies reinforce the idea that the Argentine population should not be considered as a homogeneus totality but variability must be taken into account.
Book
Pigmentocracies--the fruit of the multiyear Project on Ethnicity and Race in Latin America (PERLA)--is a richly revealing analysis of contemporary attitudes toward ethnicity and race in Brazil, Colombia, Mexico, and Peru, four of Latin America's most populous nations. Based on extensive, original sociological and anthropological data generated by PERLA, this landmark study analyzes ethnoracial classification, inequality, and discrimination, as well as public opinion about Afro-descended and indigenous social movements and policies that foster greater social inclusiveness, all set within an ethnoracial history of each country. A once-in-a-generation examination of contemporary ethnicity, this book promises to contribute in significant ways to policymaking and public opinion in Latin America. Edward Telles, PERLA's principal investigator, explains that profound historical and political forces, including multiculturalism, have helped to shape the formation of ethnic identities and the nature of social relations within and across nations. One of Pigmentocracies's many important conclusions is that unequal social and economic status is at least as much a function of skin color as of ethnoracial identification. Investigators also found high rates of discrimination by color and ethnicity widely reported by both targets and witnesses. Still, substantial support across countries was found for multicultural-affirmative policies--a notable result given that in much of modern Latin America race and ethnicity have been downplayed or ignored as key factors despite their importance for earlier nation-building. © 2014 The University of North Carolina Press. All rights reserved.
Article
The United States provides a unique laboratory for understanding how the cultural, institutional, and human capital endowments of immigrant groups shape economic outcomes. In this paper, we use census micro-sample information to reconstruct the country-of-ancestry distribution for US counties from 1850 to 2010. We also develop a county-level measure of GDP per capita over the same period. Using this novel panel data set, we investigate whether changes in the ancestry composition of a county matter for local economic development and the channels through which the cultural, institutional, and educational legacy of the country of origin affects economic outcomes in the US. Our results show that the evolution of the country-of-origin composition of a county matters. Moreover, the culture, institutions, and human capital that the immigrant groups brought with them and pass on to their children are positively associated with local development in the US. Among these factors, measures of culture that capture attitudes towards cooperation play the most important and robust role. Finally, our results suggest that while fractionalization of ancestry groups is positively related with county GDP, fractionalization in attributes such as trust is negatively related to local economic performance.
Article
Background & aims: HBV infection exhibits geographical variation in its distribution in South America. While HBV rates are low in Central Argentina, the North-western region exhibits intermediate HBV rates. Unfortunately, the reasons that could explain this difference are still unknown. Methods: 1440 Argentines were recruited and grouped into HBV patients, HBV resolved individuals and healthy controls. Genetic ancestry was assessed by analysis of bi-parental lineages and ancestry autosomal typing. SNPs of HLA-DPA1 (rs3077), HLA-DPB1 (rs9277542), HLA-DQB1 (rs2856718) and HLA-DQB2 (rs7453920) were determined and HBV genotyping was performed by phylogenetic analysis in HBV patients. Results: Native American ancestry prevailed in the North-western region when compared with Central Argentina (p<0.0001). However, no differences were observed among the three groups of each region. The distribution of HBV genotypes revealed significant differences (p<0.0001). Three SNPs (rs3077, rs9277542, and rs7453920) showed a significant association with protection against chronic HBV and viral clearance in both regions. The remaining SNP showed a significant association with susceptibility to chronic HBV. The frequency rates of rs3077-T, related to protection against chronic HBV and viral clearance, were lower in North-western Argentina when compared with Central Argentina. The same uneven frequency rates were observed for SNP rs9277542. Conclusions: This is the first study addressing the associations between the HLA-DP and HLA-DQ loci and the protection against chronic HBV and viral clearance in a multiethnic South American population. The uneven distribution of HLA-DP and HLA-DQ support the HBV epidemiological differences observed in these two regions of Argentina with dissimilar ancestry genetic background. This article is protected by copyright. All rights reserved.