ArticlePDF Available

The S factor in Brazilian states

Authors:
  • Ulster Institute for Social Research

Abstract and Figures

Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).
Content may be subject to copyright.
The S factor in Brazil
Emil O. W. Kirkegaard, University of Aarhus, Denmark
Abstract
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which
explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were
strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of
correlated vectors supported the interpretation that the latent S factor was primarily responsible for the
association (r's .71 to .81).
Introduction
Many recent studies have examined within-country regional correlates of (general) cognitive ability
(also known as (general) intelligence, general mental ability, g),. This has been done for the British
Isles (Lynn, 1979; Kirkegaard, 2015g), France (Lynn, 1980), Italy (Lynn, 2010; Kirkegaard, 2015e),
Spain (Lynn, 2012), Portugal (Almeida, Lemos, & Lynn, 2011), India (Kirkegaard, 2015d; Lynn &
Yadav, 2015), China (Kirkegaard, 2015f; Lynn & Cheng, 2013), Japan (Kura, 2013), the US
(Kirkegaard, 2015b; McDaniel, 2006; Templer & Rushton, 2011), Mexico (Kirkegaard, 2015a) and
Turkey (Lynn, Sakar, & Cheng, 2015). This paper examines data for Brazil.
Data
Cognitive data
Data from PISA was used as a substitute for IQ test data. PISA and IQ correlate very strongly (>.9;
(Rindermann, 2007)) across nations and presumably also across regions altho this hasn't been thoroly
investigated to my knowledge.
Socioeconomic data
As opposed to some of my prior analyses, there was no dataset to build on top of. For this reason, I
tried to find an English-language database for Brazil with a comprehensive selection of variables. Altho
I found some resources, they did not allow for easy download and compilation of state-level data,
which I needed. Instead, I relied upon the Portugeese-language site, Atlasbrasil.org.br, which has a
comprehensive data explorer with a convenient download function for state-level data. I used Google
Translate to find my way around the site.
Using the data explorer, I selected a broad range of variables. The goal was to cover most important
areas of socioeconomic development and avoid variables of little importance or which are heavily
influenced by local climate factors (e.g. amount of rainforest). The following variables were selected:
1. Gini coefficient
2. Activity rate age 25-29
3. Unemployment rate age 25-29
4. Public sector workers%
5. Farmers%
6. Service sector workers%
7. Girls age 10-17 with child%
8. Life expectancy
9. Households without electricity%
10. Infant mortality rate
11. Child mortality rate
12. Survive to 40%
13. Survive to 60%
14. Total fertility rate
15. Dependancy ratio
16. Aging rate
17. Illiteracy age 11-14 %
18. Illiteracy age 25 and above %
19. Age 6-17 in school %
20. Attendence higher education %
21. Income per capita
22. Mean income lowest quintile
23. Pct extremely poor
24. Richest 10 pct income
25. Bad walls%
26. Bad water and sanitation%
27. HDI
28. HDI income
29. HDI life expectancy
30. HDI education
31. Population
32. Population rural
Variables were available only for three time points: 1991, 2000 and 2010. I selected all three with
intention of checking stability of results over different time periods.
Most data was already in an appropriate per unit measure so it was not necessary to do extensive
conversions as with the Mexican data (Kirkegaard, 2015a). I calculated fraction of the population living
in rural areas by dividing the rural population by the total population.
Note that the data explorer also has data at a lower level, that of municipals. It could be used in the
future to see if the S factor holds for a lower level of aggregate analysis.
S factor loadings
I split the data into three datasets, one for 1991, 2000 and 2010.
I extracted S factors using the fa() function with default parameters from the psych package (Revelle,
2015).
S factor in 1991
Due to missing data, there were only 21 indicators available for this year. The data could not be
imputed since it was missing for all cases for these variables. The loadings plot is shown in Figure 1.
All indicators were in the expected direction aside from perhaps “aging rate”, which is somewhat
unclear and would perhaps be expected to have a positive loading.
S factor in 2000
Less missing data, 26 variables. Loadings plot shown in Figure 2.
Figure 1: Loadings plot for S factor for the data from 1991
All indicators were in the expected direction.
S factor for 2010
27 variables. Factor analysis gave an error for this dataset which means that I had to remove at least
one variable.1 This left me with the question of which variable(s) to exclude. Similar to the previous
analysis for Mexican states (Kirkegaard, 2015a), I used an automatic method. After removing one
variable, the factor analysis worked and gave no warning. The excluded variable was child mortaility
which was correlated near perfectly with another variable (infant mortality, r=.992), so little indicator
sampling error should be introduced because of this deletion. The loadings plot is shown in Figure 3.
Oddly, survival to 60 and 40 now have negative loadings, altho one would expect them to correlate
1 Error in min(eigens$values) : invalid 'type' (complex) of argument.
Figure 2: Loadings plot for S factor for the 2000 data
Figure 3: Loadings plot for S factor for the 2010 data, minus one
variable
highly with life expectancy which has a loading near 1. In fact, the correlations between life expectancy
and the survival variables was -.06 and -.21, which makes you wonder what these variables are
measuring. Excluding them does not substantially change results, but does increase the amount of
variance explained to .60.
Out of curiosity, I also tried versions where I deleted 5 and 10 variables, but this did not change much
in the plots, so I won't show them. Interested readers can consult the source code.
Mixed cases
To examine whether there are any cases with strong mixedness -- cases that are incongruent with the
factor structure in the data -- I developed two methods which are presented elsewhere (Kirkegaard,
2015c). Briefly, the first method measures the mixedness of the case by quantifying how predictable
indicator scores are from the factor score for each case (mean absolute residual, MAR). The second
quantifies how much the size of the general factor changes after exclusion of each individual case
(improvement in proportion of variance, IPV). Both methods were found to be useful at finding a
strongly mixed case in simulation data.
I applied both methods to the Brazilian datasets. For the second method, I had to create two additional
reduced datasets since the factor analysis could not run with the resulting combinations of cases and
indicators.
There are two ways one can examine the results: 1) by looking at the top (or bottom) most mixed cases
for each method; 2) by looking at the correlations between results from the methods. The first is
interesting if Brazilian state-level inequality in S has particular interest, while the second is more
relevant for checking that the methods really work -- they should give congruent results if mixed cases
are present.
Top mixed cases
For each method and each analysis, I extracted the names of the top 5 mixed states. They are shown in
Table 1.
Position_1 Position_2 Position_3 Position_4 Position_5
m1.1991 Amapá Acre Distrito Federal Roraima Rondônia
m1.2000 Amapá Roraima Acre Distrito Federal Rondônia
m1.2010.1 Roraima Distrito Federal Amapá Amazonas Acre
m1.2010.5 Roraima Distrito Federal Amapá Acre Amazonas
m1.2010.10 Roraima Distrito Federal Amapá Acre Amazonas
m2.1991 Amapá Rondônia Acre Roraima Amazonas
m2.2000.1 Amapá Rondônia Roraima Paraíba Ceará
m2.2010.2 Amapá Roraima Distrito Federal Pernambuco Sergipe
m2.2010.5 Amapá Roraima Distrito Federal Piauí Bahia
m2.2010.10 Distrito Federal Roraima Amapá Ceará Tocantins
Table 1: Top 5 mixed cases by method and dataset
As can be seen, there is quite a bit of agreement across years, datasets, and methods. If one were to do a
more thoro investigation of socioeconomic differences across Brazilian states, one should examine
these states for unusual patterns. One could do this using the residuals for each indicator by case from
the first method (these are available from the FA.residuals() in psych2). A quick look at the 2010.1 data
for Amapá shows that the state is roughly in the middle regarding state-level S (score = -.26, rank 15 of
27), Farmers do not constitute a large fraction of the population (only 9.9%, rank 4 only behind the
states with large cities: Federal district, Rio de Janeiro, and São Paulo). Given that farmers% has a
strong negative loading (-.77) and the state's S score, one would expect the state to have relatively more
farmers than it has, the mean of all states for that dataset is 17.2%.
Much more could be said along these lines, but I rather refrain since I don't know much about the
country and can't read the language very well. Perhaps a researchers who is a Brailizian native could
use the data to make a more detailed analysis.
Correlations between methods and datasets
To test whether the results were stable across years, data reductions, and methods, I correlated all the
mixedness metrics. Results are in Table 2.
m1.1991 m1.2000 m1.2010.1 m1.2010.5 m1.2010.10 m2.1991 m2.2000.1 m2.2010.2 m2.2010.5
m1.1991
m1.2000 0.88
m1.2010.1 0.81 0.85
m1.2010.5 0.77 0.87 0.98
m1.2010.10 0.70 0.79 0.93 0.96
m2.1991 0.48 0.64 0.45 0.48 0.40
m2.2000.1 0.41 0.58 0.34 0.39 0.27 0.87
m2.2010.2 0.53 0.63 0.66 0.66 0.51 0.58 0.68
m2.2010.5 0.32 0.49 0.60 0.64 0.51 0.49 0.59 0.86
m2.2010.10 0.42 0.44 0.66 0.65 0.59 0.32 0.44 0.75 0.76
Table 2: Correlation table for mixedness metrics across datasets and methods.
There is method specific variance since the correlations within method (topleft and bottomright
squares) are stronger than those across methods. Still, all correlations are positive, Cronbach's alpha is .
87, Guttman lambda 6 is .98 and the mean correlation is .61.
S and HDI correlations
HDI
Previous S factor studies have found that HDI (Human Development Index) is basically a non-linear
proxy for the S factor (Kirkegaard, 2014, 2015a). This is not surprising since the HDI is calculated
from longevity, education and income, all three of which are known to have strong S factor loadings.
The actual derivation of HDI values is somewhat complex. One might expect them simple to average
the three indicators, or extract the general factor, but no. Instead they do complicated things (WHO,
2014).
For longevity (life expectancy at birth), they introduce ceilings at 25 and 85 years. According to data
from WHO (WHO, 2012), no country has values above or below these values altho Japan is close (84
years).
For education, it is actually an average of two measures: years of education by 25 year olds and
expected years of schooling for children entering school age. These measures also have artificial limits
of 0-18 and 0-15 respectively.
For gross national income, they use the log values and also artificial limits of 100-75,000 USD.
Moreover, these are not simply combined by standardizing (i.e. rescaling so the mean is 0 and standard
deviation is 1) the values and adding them or taking the mean. Instead, a value is calculated for every
indicator using the following formula:
Note that for education, this formula is used twice and the results averaged.
Finally, the three dimensions are combined using a geometric mean:
The use of a geometric mean as opposed to the normal arithmetic mean, is that if a single indicator is
low, the overall level is strongly reduced, whereas with the arithmetic, only the sum of the indicators
matter, not the standard deviation of them. If the indicators have the same value, then the geometric and
arithmetic mean have the same value.
For instance, if indicators are .7, .7, .7, the arithmetic mean is .7+.7+.7=2.1, 2.1/3=.7 and the
geometric .73=0.343, 0.3431/3=.7. However, if indicators are 1, .7, .4, then the arithmetic mean is
1+.7+.4=2.1, 2.1/3=.7, but geometric mean is 1*.7*.4=0.28, 0.281/3=0.654 which is a bit lower than .7.
S and HDI correlations
I used the previously extracted factor scores and the HDI data. I also extracted S factors from the HDI
datasets (3 variables)2 to see how these compared with the complex HDI value derivation. Finally, I
correlated the S factors from non-HDI data, S factors from HDI data, HDI values and cognitive ability
scores. Results are shown in Table 2.
2 Factor loadings for HDI factor analysis were very strong, always >.9.
Equation 1: HDI index formula
Equation 2: HDI index combination formula
HDI.1991 HDI.2000 HDI.2010 HDI.S.1991 HDI.S.2000 HDI.S.2010 S.1991 S.2000 S.2010.1 S.2010.5 S.2010.10 CA2012
HDI.1991 0.95 0.92 0.98 0.95 0.93 0.96 0.93 0.86 0.89 0.90 0.59
HDI.2000 0.97 0.97 0.94 0.99 0.96 0.94 0.98 0.93 0.95 0.96 0.66
HDI.2010 0.94 0.98 0.93 0.98 0.99 0.93 0.97 0.94 0.97 0.98 0.65
HDI.S.1991 0.98 0.96 0.94 0.95 0.94 0.98 0.92 0.84 0.88 0.90 0.54
HDI.S.2000 0.97 1.00 0.98 0.97 0.97 0.95 0.98 0.92 0.95 0.97 0.65
HDI.S.2010 0.95 0.98 0.99 0.95 0.98 0.94 0.96 0.94 0.96 0.97 0.66
S.1991 0.96 0.96 0.94 0.97 0.97 0.96 0.92 0.86 0.90 0.91 0.60
S.2000 0.95 0.98 0.96 0.93 0.99 0.97 0.97 0.96 0.98 0.98 0.69
S.2010.1 0.89 0.94 0.94 0.86 0.94 0.95 0.92 0.97 0.99 0.96 0.76
S.2010.5 0.91 0.95 0.96 0.89 0.96 0.97 0.93 0.98 0.99 0.98 0.72
S.2010.10 0.93 0.96 0.98 0.93 0.97 0.98 0.93 0.97 0.96 0.98 0.71
CA2012 0.67 0.73 0.71 0.60 0.72 0.74 0.69 0.78 0.81 0.79 0.75
Table 3: Correlation matrix for S, HDI and cognitive ability scores. Pearson's below the diagonal,
rank-order above.
All results were very strongly correlated no matter which dataset or scoring method was used.
Cognitive ability scores were strongly correlated to all S factor measures. The best estimate of the
relationship between S factor and cognitive ability is probably the correlation with S.2010.1, since this
is the dataset cloest in time to the cognitive dataset and the S factor is extracted from the most
variables. This is also the highest value (.81), but that may be a coincidence.
It is worth noting that the rank-order correlations were somewhat weaker. This usually indicates that an
outlier case is increasing the Pearson correlation. To investigate this, I plot the S.2010.1 and CA2012
variables, see Figure 4.
The scatter plot however does not seem to reveal any outliers inflating the correlation.
Method of correlated vectors
To examine whether the S factor was plausibly the cause of the pattern seen with the S factor scores (it
is not necessarily), I used the method of correlated vectors with reversing. Results are shown in Figures
Figure 4: Scatter plot of S factor and cognitive ability
5-7.
Figure 5: MCV for the 1991 dataset
Figure 6: MCV for the 2000 dataset
Figure 7: MCV for the 2010 dataset
The first result seems to be driven by a few outliers, but the second and third seems decent enough. The
numerical results were fairly consistent (.71, .75, .81).
Discussion and conclusion
Generally, the results were in line with earlier studies. Sizeable S factors were found across 3 (or 6 if
one counts the mini-HDI ones) different datasets, which explained 56 to 71% of the variance. There
seems to be a decrease over time, which is intriguing as it is may eventually lead to the 'destruction' of
the S factor. It may also be due to differences between the datasets across the years, since they were not
entirely comparable. I did not examine the issue in depth.
Correlations of S factors and HDIs with cognitive ability were strong ranging from .60 to .81
depending on which year, analysis, dataset is chosen, and whether one uses the HDI values.
Correlations were stronger when they were from the larger datasets, which is perhaps because they
were better measures of latent S. MCV supported the interpretation that the latent S factor was
primarily responsible for the association (r's .71 to .81).
Future studies should examine to which degree cognitive ability and S factor differences are
explainable by ethnracial factors e.g. racial ancestry as done by e.g. (Kirkegaard, 2015b).
Limitations
There are some problems with this paper:
I cannot read Portuguese and this may have resulted in including some incorrect variables.
There was a lack of crime variables in the datasets, altho these have central importance for
sociology. None were available in the data source I used.
Supplementary material
R source code, data and figures can be found in the Open Science Framework repository.
References
Almeida, L. S., Lemos, G., & Lynn, R. (2011). Regional Differences in Intelligence and per Capita
Incomes in Portugal. Mankind Quarterly, 52(2), 213.
Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing
international rankings. Open Differential Psychology. Retrieved from
http://openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-
analyzing-international-rankings/
Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from
https://thewinnower.com/papers/examining-the-s-factor-in-mexican-states
Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from
https://thewinnower.com/papers/examining-the-s-factor-in-us-states
Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower.
Retrieved from https://thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-
analysis
Kirkegaard, E. O. W. (2015d). Indian states: G and S factors. The Winnower. Retrieved from
https://thewinnower.com/papers/indian-states-g-and-s-factors
Kirkegaard, E. O. W. (2015e). S and G in Italian regions: Re-analysis of Lynn’s data and new data. The
Winnower. Retrieved from https://thewinnower.com/papers/s-and-g-in-italian-regions-re-
analysis-of-lynn-s-data-and-new-data
Kirkegaard, E. O. W. (2015f). The S factor in China. The Winnower. Retrieved from
https://thewinnower.com/papers/the-s-factor-in-china
Kirkegaard, E. O. W. (2015g). The S factor in the British Isles: A reanalysis of Lynn (1979). The
Winnower. Retrieved from https://thewinnower.com/papers/the-s-factor-in-the-british-isles-a-
reanalysis-of-lynn-1979
Kura, K. (2013). Japanese north–south gradient in IQ predicts differences in stature, skin color, income,
and homicide rate. Intelligence, 41(5), 512–516. http://doi.org/10.1016/j.intell.2013.07.001
Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of Social and
Clinical Psychology, 18(1), 1–12. http://doi.org/10.1111/j.2044-8260.1979.tb00297.x
Lynn, R. (1980). The social ecology of intelligence in France. British Journal of Social and Clinical
Psychology, 19(4), 325–331. http://doi.org/10.1111/j.2044-8260.1980.tb00360.x
Lynn, R. (2010). In Italy, north–south differences in IQ predict differences in income, education, infant
mortality, stature, and literacy. Intelligence, 38(1), 93–100.
http://doi.org/10.1016/j.intell.2009.07.004
Lynn, R. (2012). North-South Differences in Spain in IQ, Educational Attainment, per Capita Income,
Literacy, Life Expectancy and Employment. Mankind Quarterly, 52(3/4), 265.
Lynn, R., & Cheng, H. (2013). Differences in intelligence across thirty-one regions of China and their
economic and demographic correlates. Intelligence, 41(5), 553–559.
http://doi.org/10.1016/j.intell.2013.07.009
Lynn, R., Sakar, C., & Cheng, H. (2015). Regional differences in intelligence, income and other socio-
economic variables in Turkey. Intelligence, 50, 144–149.
http://doi.org/10.1016/j.intell.2015.03.006
Lynn, R., & Yadav, P. (2015). Differences in cognitive ability, per capita income, infant mortality,
fertility and latitude across the states of India. Intelligence, 49, 179–185.
http://doi.org/10.1016/j.intell.2015.01.009
McDaniel, M. A. (2006). State preferences for the ACT versus SAT complicates inferences about SAT-
derived state IQ estimates: A comment on Kanazawa (2006). Intelligence, 34(6), 601–606.
http://doi.org/10.1016/j.intell.2006.07.005
Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research
(Version 1.5.4). Retrieved from http://cran.r-project.org/web/packages/psych/index.html
Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity
of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of
Personality, 21(5), 667–706. http://doi.org/10.1002/per.634
Templer, D. I., & Rushton, J. P. (2011). IQ, skin color, crime, HIV/AIDS, and income in 50 U.S. states.
Intelligence, 39(6), 437–442. http://doi.org/10.1016/j.intell.2011.08.001
WHO. (2012). Life expectancy. Data by country. Retrieved from
http://apps.who.int/gho/data/node.main.688?lang=en
WHO. (2014). Human Development Report: Technical notes: Calculating the human development
indices—graphical presentation. WHO. Retrieved from
http://hdr.undp.org/sites/default/files/hdr14_technical_notes.pdf
... In Brazil, the economists Curi and Menezes (2014) investigated the association of income with mathematical abilities measured in an educational testing program (SAEB -Assessment of the Basic Education System) at the level of Brazilian states. Fuerst and Kirkegaard (2016) investigated ancestry factors of Latin American countries, including Brazil; Kirkegaard (2015) also investigated PISA scores and their associations, but focusing on socioeconomic development. Our study adds another piece to the puzzle of intelligence and its correlates in the provinces of a large country. ...
Article
Full-text available
In a number of countries, earlier studies have reported significant associations between regional differences in intelligence within countries and economic and social phenomena. Using scores on the Program of International Student Assessment (PISA) tests as indicator of intelligence, we find statistically significant correlations for the 27 states of Brazil between intelligence and nine indicators of socioeconomic development. Spatial analysis indicates that relationships are present both at the level of differences between adjacent states and over long-distance clines. Most of the relationships observed after initial analysis persisted after controlling for spatial autocorrelation. Among the socioeconomic variables, those that describe the standard of living of the less affluent sections of the population tend to correlate most with PISA scores.
... In Brazil, the economists Curi and Menezes (2014) investigated the association of income with mathematical abilities measured in an educational testing program (SAEB -Assessment of the Basic Education System) at the level of Brazilian states. Fuerst and Kirkegaard (2016) investigated ancestry factors of Latin American countries, including Brazil; Kirkegaard (2015) also investigated PISA scores and their associations, but focusing on socioeconomic development. Our study adds another piece to the puzzle of intelligence and its correlates in the provinces of a large country. ...
Article
Full-text available
In a number of countries, earlier studies have reported significant associations between regional differences in intelligence within countries and economic and social phenomena. Using scores on the Program of International Student Assessment (PISA) tests as indicator of intelligence, we find statistically significant correlations for the 27 states of Brazil between intelligence and nine indicators of socioeconomic development. Spatial analysis indicates that relationships are present both at the level of differences between adjacent states and over long-distance clines. Most of the relationships observed after initial analysis persisted after controlling for spatial autocorrelation. Among the socioeconomic variables, those that describe the standard of living of the less affluent sections of the population tend to correlate most with PISA scores.
... I downloaded only the three most recent datapoints almost all of which were from between 2000 and 2013, and most were from 2005 and 2010. However, they were not consistently available in any particular years, so conducting a longitudinal study akin to the study of Brazilian states (Kirkegaard, 2015f) was not easy and was not done. ...
Article
Full-text available
Two datasets of Japanese socioeconomic data for Japanese prefectures (N=47) were obtained and merged. After quality control, there were 44 variables for use in a factor analysis. Indicator sampling reliability analysis revealed poor reliability (54% of the correlations were |r| > .50). Inspection of the factor loadings revealed no clear S factor with many indicators loading in opposite than expected directions. A cognitive ability measure was constructed from three scholastic ability measures (all loadings > .90). On first analysis, cognitive ability was not strongly related to 'S' factor scores, r = -.19 [CI95: -.45 to .19; N=47]. Jensen's method did not support the interpretation that the relationship is between latent 'S' and cognitive ability (r = -.15; N=44). Cognitive ability was nevertheless related to some socioeconomic indicators in expected ways. A reviewer suggested controlling for population size or population density. When this was done, a relatively clear S factor emerged. Using the best control method (log population density), indicator sampling reliability was high (93% |r|>.50). The scores were strongly related to cognitive ability r = .67 [CI95: .48 to .80]. Jensen's method supported the interpretation that cognitive ability was related to the S factor (r = .78) and not just to the non-general factor variance.
... Both S factor and HDI scores are available for Brazilian states for the years 1991, 2000 and 2010 (Kirkegaard, 2015k). These all correlate very strongly (range .90 to .98). ...
Article
Full-text available
We conducted novel analyses regarding the association between continental racial ancestry, cognitive ability and socioeconomic outcomes across 6 datasets: states of Mexico, states of the United States, states of Brazil, departments of Colombia, sovereign nations and all units together. We find that European ancestry is consistently and usually strongly positively correlated with cognitive ability and socioeconomic outcomes (mean r for cognitive ability = .708; for socioeconomic well-being = .643) (Sections 3-8). In most cases, including another ancestry component, in addition to European ancestry, did not increase predictive power (Section 9). At the national level, the association between European ancestry and outcomes was robust to controls for natural-environmental factors (Section 10). This was not always the case at the regional level (Section 18). It was found that genetic distance did not have predictive power independent of European ancestry (Section 10). Automatic modeling using best subset selection and lasso regression agreed in most cases that European ancestry was a non-redundant predictor (Section 11). Results were robust across 4 different ways of weighting the analyses (Section 12). It was found that the effect of European ancestry on socioeconomic outcomes was mostly mediated by cognitive ability (Section 13). We failed to find evidence of international colorism or culturalism (i.e., neither skin reflectance nor self-reported race/ethnicity showed incremental predictive ability once genomic ancestry had been taken into account) (Section 14). The association between European ancestry and cognitive outcomes was robust across a number of alternative measures of cognitive ability (Section 15). It was found that the general socioeconomic factor was not structurally different in the American sample as compared to the worldwide sample, thus justifying the use of that measure. Using Jensen's method of correlated vectors, it was found that the association between European ancestry and socioeconomic outcomes was stronger on more S factor loaded outcomes, r = .75 (Section 16). There was some evidence that tourist expenditure helped explain the relatively high socioeconomic performance of Caribbean states (Section 17).
... The farmer% is puzzling. In the Brazilian study (Kirkegaard, 2015i), farmers% had a strong negative loading of about -.75. However, Brazilian farmers are quite unlike French farmers. ...
Article
Full-text available
Two sets of socioeconomic data for 90-96 French departements were analyzed. One dataset was found in Lynn (1980) and contained four socioeconomic variables. Mixed results were found for this dataset, both with regards to the factor structure and the relationship to cognitive ability. Another dataset with 53 variables was created by compiling variables from the official French statistics bureau (Insee). This dataset contained an impure general socioeconomic (S) factor (some undesirable variables loaded positively), but after controlling for the presence of immigrants, the S factor became purer. This was especially salient for crime, unemployment and poverty variables. The two S factors correlated at r = 0.66 [CI95:0.52-0.76; N = 88]. The IQ scores from the 1950s dataset correlated at 0.33 [CI95:0.13-0.51, N = 88] with the S factor from the 2010-2015 dataset.
... Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a(Kirkegaard, , 2015b(Kirkegaard, , 2015d, I also created two further datasets without London: one with the redundant variables, one without. Thus, there were 4 datasets: 1. ...
Article
Full-text available
A reanalysis of (Carl, 2015) revealed that the inclusion of London had a strong effect on the S loading of crime and poverty variables. S factor scores from a dataset without London and redundant variables was strongly related to IQ scores, r = .87. The Jensen coefficient for this relationship was .86.
... Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a(Kirkegaard, , 2015b(Kirkegaard, , 2015d, I also created two further datasets without London. One with the redundant variables, one without. ...
Article
Full-text available
A dataset of 127 variables concerning socioeconomic outcomes for US states was analyzed. Of these, 81 were used in a factor analysis. The analysis revealed a general socioeconomic factor. This factor correlated .961 with one from a previous analysis of socioeconomic data for US states.
... By now, S factors have been found between countries (Kirkegaard, 2014b), twice between country-oforigin groups within countries (Kirkegaard, 2014a), numerous times within countries (reviewed in Kirkegaard, 2015c), and at the level of first names (Kirkegaard & Tranberg, 2015). This paper analyses data for 33 Colombian departments including the capital district. ...
Article
Full-text available
A dataset was compiled with 17 diverse socioeconomic variables for 32 departments of Colombia and the capital district. Factor analysis revealed an S factor. Results were robust to data imputation and removal of a redundant variable. 14 of 17 variables loaded in the expected direction. Extracted S factors correlated about .50 with the cognitive ability estimate. The Jensen coefficient for the S factor for this relationship was .60.
... When it has, it has repeatedly been found that there is a general socioeconomic factor to which good outcomes nearly always have positive loadings and bad outcomes have negative loadings. 3 Recent studies have examined S factors at the national, regional/state and country of origin-level; see (Kirkegaard, 2015c) for a review of regional/state-level studies, and (Kirkegaard, 2014a) for country of originlevel studies. In this paper we exploit a unique dataset to examine the S factor at the name-level in Denmark. ...
Article
Full-text available
We present and analyze data from a dataset of 2358 Danish first names and socioeconomic outcomes not previously made available to the public (“Navnehjulet”, the Name Wheel). We visualize the data and show that there is a general socioeconomic factor with indicator loadings in the expected directions (positive: income, owning your own place; negative: having a criminal conviction, being without a job). This result holds after controlling for age and for each gender alone. It also holds when analyzing the data in age bins. The factor loading of being married depends on analysis method, so it is more difficult to interpret. A pseudofertility is calculated based on the population size for the names for the years 2012 and 2015. This value is negatively correlated with the S factor score r = -.35 [95CI: -.39; -.31], but the relationship seems to be somewhat non-linear and there is an upward trend at the very high end of the S factor. The relationship is strongly driven by relatively uncommon names who have high pseudofertility and low to very low S scores. The n-weighted correlation is -.21 [95CI: -.25; -.17]. This dysgenic pseudofertility was mostly driven by Arabic and African names. All data and R code is freely available.
Article
Full-text available
Two methods are presented that allow for identification of mixed cases in the extraction of general factors. Simulated data is used to illustrate them.
Article
Full-text available
I analyze the S factor in Italian states by reanalyzing data published by Lynn (2010) as well as new data compiled from the Italian statistics agency (7 and 10 socioeconomic variables, respectively). The S factors from the datasets are highly correlated (.92) and both are strongly correlated with a G factor from PISA scores (.93 and .88).
Article
Full-text available
I analyzed the S factor in US states by compiling a dataset of 25 diverse socioeconomic indicators. Results show that Washington DC is a strong outlier, but if it is excluded, then the S factor correlated strongly with state IQ at .75. Ethnoracial demographics of the states are related to the state's IQ and S in the expected order (White>Hispanic>Black).
Article
Full-text available
I reanalyze data published by Lynn and Yadav (2015) for Indian states. I find both G and S factors which correlate at .61. The statistical language R is used thruout the paper and the code is explained. The paper thus is both an analysis as a walkthru of how to conduct this type of study.
Article
Full-text available
Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results. Method of correlated vectors was strongly positive for all 4 analyses (r’s .78 to .92 with reversing).
Article
Full-text available
I reanalyze data reported by Richard Lynn in a 1979 paper concerning IQ and socioeconomic variables in 12 regions of the United Kingdom as well as Ireland. I find a substantial S factor across regions (66% of variance with MinRes extraction). I produce a new best estimate of the G scores of regions. The correlation of this with the S scores is .79. The MCV with reversal correlation is .47.
Article
IQs are presented for fifteen regions of Spain showing a north-south gradient with IQs highest in the north and lowest in the south. The regional differences in IQ are significantly correlated with educational attainment, per capita income, literacy, employment and life expectancy, and are associated with the percentages of Near Eastern and North African genes in the population.
Article
Data are presented for intelligence in twelve regions in Turkey showing that intelligence is highest in the west and lowest in the east. The west–east intelligence gradient is significantly correlated with regional differences in educational attainment and per capita income and negatively correlated with fertility, infant mortality and the percentage of Kurds.
Article
This study reports the differences in intelligence across thirty-one regions of the People's Republic of China. It was found that regional IQs were significantly associated with the percentage of Han in the population (r = .59), GDP per capita (r = .42), the percentage of those with higher education (r = 38, p<.05), and non-significantly with years of education (r = .32). The results of the multiple regression showed that both the percentage of Han in the region and the GDP per capita were significant predictors of regional IQs, accounting for 39% of the total variance.