ArticlePDF Available

Examining the S factor in Mexican states

Authors:
  • Ulster Institute for Social Research

Abstract and Figures

Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results. Method of correlated vectors was strongly positive for all 4 analyses (r’s .78 to .92 with reversing).
Content may be subject to copyright.
The Winnower
Originally published April 19th, 2015. Modified March 22th 2017.
Examining the S factor in Mexican states
Emil O. W. Kirkegaard1
Abstract
Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and
revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to
.93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it
improved results.
Method of correlated vectors was strongly positive for all 4 analyses (r’s .77 to .92 with reversing).
Key words: intelligence, IQ, general socioeconomic factor, S factor, inequality, Mexico, Mexican states
1. Introduction
In a number of recent articles (Kirkegaard, 2015c, 2015b, 2015a, 2015d, 2015e), I have analyzed within-country
regional data to examine the general socioeconomic factor, if it exists in the dataset (for the origin of the term,
see e.g. Kirkegaard (2014)). This work was inspired by Lynn (2010) whose datasets I have also reanalyzed.
While doing work on another project (Fuerst & Kirkegaard, 2016), I needed an S factor for Mexican states, if
such exists. Since I was not aware of any prior analysis of this country in this fashion, I decided to do it myself.
The first problem was obtaining data for the analysis. For this, one needs a number of diverse indicators that
measure important economic and social matters for each Mexican state. Mexico has 31 states and a federal
district, so one can use a decent number of indicators to examine the S factor. Mexico is a Spanish speaking
country and English comprehension is fairly poor. According to Wikipedia, only 13% of people speak English
compared with 86% for Denmark, 64% for Germany and 35% for Egypt.2
2. S factor analysis 1 – Wikipedian data
2.1. Data source and treatment
Unlike for the previous countries, I could not easily find good data available in English. As a substitute, I used
data from Wikipedia:
en.wikipedia.org/wiki/List_of_Mexican_states_by_unemployment
en.wikipedia.org/wiki/List_of_Mexican_states_by_fertility_rate
1 Ulster Institute for Social Research. Email: emil@emilkirkegaard.dk
2https://en.wikipedia.org/wiki/List_of_countries_by_English-speaking_population
Page 1 of 14.
en.wikipedia.org/wiki/List_of_Mexican_states_by_homicides
en.wikipedia.org/wiki/List_of_Mexican_states_by_infant_mortality
en.wikipedia.org/wiki/List_of_Mexican_states_by_life_expectancy
en.wikipedia.org/wiki/List_of_Mexican_states_by_literacy_rate
en.wikipedia.org/wiki/List_of_Mexican_states_by_GDP
en.wikipedia.org/wiki/List_of_Mexican_states_by_poverty_rate
en.wikipedia.org/wiki/List_of_Mexican_states_by_Human_Development_Index
en.wikipedia.org/wiki/List_of_Mexican_states_by_population
en.wikipedia.org/wiki/Ranked_list_of_Mexican_states
These come from various years, are sometimes not given per person, and often have no useful source given. So
they are of unknown veracity, but they are probably fine for a first look. The HDI is best thought of as a proxy
for the S factor, so we can use it to examine construct validity.
The variables that had data for multiple time-points were averaged.
For the variables reporting numbers in absolute numbers, I calculated per capita versions.
2.2. Results
The variables above minus HDI and population size were factor analyzed using minimum residuals to extract 1
factor. The loadings plot is shown in Figure 1.
The literacy variables had a near perfect loading on S (.99). Unemployment unexpectedly loaded positively and
so did homicides per capita altho only slightly. This could be because unemployment benefits are only in
existence in the higher S states such that going unemployed would mean starvation. The homicide loading is
possibly related to the drug war in the country.
Page 2 of 14.
Figure 1: S loadings in the Wikipedia data.
3. Analysis 2 – INEG data
3.1. Data source and treatment
Since the results based on Wikipedia data was dubious, I searched further for more data. I found it on the
Spanish-language statistical database, Instituto Nacional De Estadística Y Geografía3, which however had the
option of showing poorly done English translations. This is not optimal as there are many translation errors
which may result in choosing the wrong variable for analysis. If any Spanish-speaker reads this, I would be
happy if they would go over my chosen variables and confirm that they are correct. I ended up with the
following variables:
1. Cost of crime against individuals and households
2. Cost of crime on economic units
3. Annual percentage change of GDP at 2008 prices
4. Crime prevalence rate per 10,000 economic units
5. Crime prevalence rate per hundred thousand inhabitants aged 18 years and over, by state
6. Dark figure of crime on economic units
7. Dark figure (crimes not reported and crimes reported that were not investigated)
8. Doctors per 100 000 inhabitants
9. Economic participation of population aged 12 to 14 years
10.Economic participation of population aged 65 and over
11.Economic units.
12.Economically active population. Age 15 and older
13.Economically active population. Unemployed persons. Age 15 and older
14.Electric energy users
15.Employed population by income level. Up to one minimum wage. Age 15 and older
16.Employed population by income level. More than 5 minimum wages. Age 15 and older
17.Employed population by income level. Do not receive income. Age 15 and older
18.Fertility rate of adolescents aged 15 to 19 years
19.Female mortality rate for cervical cancer
20.Global rate of fertility
21.Gross rate of women participation
22.Hospital beds per 100 thousand inhabitants
23.Inmates in state prisons at year end
24.Life expectancy at birth
25.Literacy rate of women 15 to 24 years
26.Literacy rate of men 15 to 24 years
27.Median age
28.Nurses per 100 000 inhabitants
29.Percentage of households victims of crime
30.Percentage of births at home
31.Percentage of population employed as professionals and technicians
32.Prisoners rate (per 10,000 inhabitants age 18 and over)
33.Rate of maternal mortality (deaths per 100 thousand live births)
3http://www3.inegi.org.mx/sistemas/biinegi/Default.aspx?ii=i
Page 3 of 14.
34.Rate of inhabitants aged 18 years and over that consider their neighborhood or locality as unsafe, per
hundred thousand inhabitants aged 18 years and over
35.Rate of inhabitants aged 18 years and over that consider their state as unsafe, per hundred thousand
inhabitants aged 18 years and over
36.Rate sentenced to serve a sentence (per 1,000 population age 18 and over)
37.State Gross Domestic Product (GDP) at constant prices of 2008
38.Total population
39.Total mortality rate from respiratory diseases in children under 5 years
40.Total mortality rate from acute diarrheal diseases (ADD) in population under 5 years
41.Unemployment rate of men
42.Unemployment rate of women
43.Households
44.Inhabited housings with available computer
45.Inhabited housings that have toilet
46.Inhabited housings that have a refrigerator
47.Inhabited housings with available water from public net
48.Inhabited housings that have drainage
49.Inhabited housings with available electricity
50.Inhabited housings that have a washing machine
51.Inhabited housings with television
52.Percentage of housing with piped water
53.Percentage of housing with electricity
54.Proportion of population with access to improved sanitation, urban and rural
55.Proportion of population with sustainable access to improved sources of water supply, in urban and rural
areas
There were data for multiple years for most of them. I used all data from the last 10 years approximately. For all
data with multiple years, I calculated the mean value.
For data given in raw numbers, I calculated the appropriate per unit measures (per person, per economically
active person (?), per household).
A matrix plot for all the S factor relevant data (e.g. not population size) is shown in Figure 2. It shows missing
data in red, as well as the relative difference between datapoints. Thus, cells that are completely white or black
are outliers compared to the other data.
Page 4 of 14.
One variable (inmates per person) had a few missing datapoints.
Multiple other variables had strong outliers. I examined these to determine if they were real or due to data error.
Inspection revealed that the GDP per person data was clearly incorrect for one state (Campeche) but I could not
find the source of error. The data is the same as on the website and did not match the data on Wikipedia. I deleted
it to be safe.
The GDP change outlier seems to be real (Campeche) which has negative growth. According to a Mexican blog4,
it is due to its oil fields closing.
The rest of the outliers were hard to say something about due to the odd nature of the data (“dark crime”?), or
were plausible. E.g. Mexico City (Federal District, the capital) was an outlier on nurses and doctors per capita,
but this is presumably due to many large hospitals being located there.
3.2. Factor analysis
Since there were only 32 cases — 31 states + federal district — and 47 variables (excluding the bogus GDP per
capita), this gives problems for factor analysis. There are various recommendations, but almost none of them are
met by this dataset (Zhao, 2009). To test limits, I decided to try factor analyzing all of the variables. This
produced warnings such as:
The estimated weights for the factor scores are probably incorrect. Try a different factor
extraction method.
In factor.scores, the correlation matrix is singular, an approximation is used
In cor.smooth(R) : Matrix was not positive definite, smoothing was done
In cor.smooth(R) : Matrix was not positive definite, smoothing was done
In cor.smooth(r) : Matrix was not positive definite, smoothing was done
4http://geo-mexico.com/?p=12070
Page 5 of 14.
Figure 2: Matrixplot of S data from INEG.
In cor.smooth(r) : Matrix was not positive definite, smoothing was done
Such warnings do not always mean that the result is nonsense, but they often do. For that reason, I wanted to
extract an S factor with a smaller number of variables. From the 47, I selected the following 21 variables as
generally representative and interpretable. These were:
1. GDP.change, #Economic
2. Unemploy.men.rate,
3. Unemploy.women.rate,
4. Low.income.peap,
5. High.income.peap,
6. Prof.tech.employ.pct,
7. crime.rate.per.adult, #crime
8. Inmates.per.pers,
9. Unsafe.neighborhood.percept.rate,
10. Has.water.net.per.hh, #material goods
11. Elec.pct,
12. Has.wash.mach.per.hh,
13. Doctors.per.pers, #Health
14. Nurses.per.pers,
15. Hospital.beds.per.pers,
16. Total.fertility,
17. Home.births.pct,
18. Maternal.death.rate,
19. Life.expect,
20. Women.participation, #Gender equality
21. Lit.young.women #education
Note that peap = per economically active person, hh = household.
The selection was made by my judgment call and others may choose different variables.
3.3. Automatic reduction of dataset
As a robustness check and evidence against a possible claim that I picked the variables such as to get an S factor
that most suited my prior beliefs, I decided to find an automatic method of selecting a subset of variables for
factor analysis. I noticed that in the original dataset, some variables overlapped near perfectly. This would mean
that whatever they measure, it would get measured twice or more when extracting a factor. Highly correlated
variables can also create nonsense solutions, especially when extracting more than 1 factor.
Another piece of insight comes from the fact that for cognitive data, general factors extracted from a less broad
selection of subtests are worse measures of general cognitive ability than those from broader selections (Johnson,
te Nijenhuis, & Bouchard, 2008).
Lastly, subtests from different domains tend to be less correlated than those from the same domain (hence the
existence of group factors).
Combining all this, it seems a decent idea that to reduce a dataset by 1 variable, one should calculate all the
intercorrelations and find the highest one. Then one should remove one of the variables responsible for it. One
Page 6 of 14.
can do this repeatedly to remove more than 1 variable from a dataset. Concerning the question of which of the
two variables to remove, I can think of three ways: always removing the first, always the second, choosing at
random. I implemented all three settings and chose the second as the default. This is because in many datasets
the first of a set of highly correlated variables is usually the ‘primary one’, E.g. unemployment, unemployment
men, unemployment women. The algorithm also outputs step-by-step information concerning which variables
was removed and what their correlation was.
Having written the R code for the algorithm, I ran it on the Mexican dataset. I wanted to obtain a solution using
the largest possible number of variables without getting a warning from the factor extraction function. So I first
removed 1 variable, and then ran the factor analysis. When I received an error, I removed another, and so on.
After having removed 20 variables, I no longer received an error. This left the analysis with 27 variables, or 6
more than my chosen selection.
Analysis of the output from the algorithm shows that the function works. In most cases, the pair of variables
found was either a (near-)double measure e.g. percent of population with electricity and percent of households
with electricity, or closely related e.g. literacy in men and women. Sometimes however, the pair did not seem to
be closely related, e.g. women’s participation and percent of households with a computer.
Since this dataset selected the variable with missing data, I used the irmi() function from the VIM package to
impute the missing data (Templ, Alfons, Kowarik, & Prantner, 2015).
3.4. Factor loadings: stability
The factor loading plots are shown in Figures 3-5.
Page 7 of 14.
Figure 3: S factor loadings in the INEG data. All variables.
Each analysis relied upon a unique but overlapping selection of variables. Thus, it is possible to correlate the
loadings of the overlapping parts for each analysis. This is a measure of loading stability in different factor
analytic environments, as also done by Ree and Earles (1991) for general cognitive ability factor (g factor). The
correlations were .98, 1.00, .98 (n’s 21, 27, 12), showing very high stability across datasets. Note that it was not
possible to use the loadings from the Wikipedian data factor analysis because the variables were not strictly
speaking overlapping.
Page 8 of 14.
Figure 4: S factor loadings in the INEG data. Chosen variables.
Figure 5: S factor loadings in the INEG data. Automatically chosen
variables.
3.5. Factor loadings: interpretation
Examining the factor loadings reveals some things of interest. Generally for all analyses, whatever that is
generally considered good loads positively, and whatever considered bad loads negatively.
Unemployment (together, men, women) has positive loadings, whereas it ‘should’ have negative loadings. This
is perhaps because the lower S factor states have more dysfunctional or no social security nets such that not
working means starvation, and that this keeps people from not working. This is merely a conjecture because I
don’t know much about Mexico. Hopefully someone more knowledgeable than me will read this and have a
better answer.
Crime variables (crime rate, victimization, inmates/prisoner per capita, sentencing rate) load positively whereas
they should have negative. This pattern has been found before, see Kirkegaard (2015e) for a review of S factor
studies and crime variables.
3.6. Factor scores
Next I correlated the factor scores from all 4 analysis with each other as well as HDI and cognitive ability as
measured by PISA tests (the cognitive data is from Fuerst & Kirkegaard, 2016; the HDI data is from Wikipedia).
The correlation matrix is shown in Table 1.
S all S chosen
S
automatic S wiki HDI mean
Cognitive
ability
S all 1.00 -0.08 -0.04 0.08 -0.17 -0.12
S chosen -0.08 1.00 0.94 0.83 0.93 0.65
S automatic -0.04 0.94 1.00 0.91 0.89 0.74
S wiki 0.08 0.83 0.91 1.00 0.76 0.78
HDI mean -0.17 0.93 0.89 0.76 1.00 0.53
Cognitive
ability -0.12 0.65 0.74 0.78 0.53 1.00
Table 1: Primary results – original scoring.
Strangely, despite the similar factor loadings, the factor scores from the S factor extracted from all the variables
(S.all) had about no relation to the others. This probably indicates that the factor scoring method could not
handle this type of odd case. The default scoring method for the factor analysis is “regression”, but there are a
few others. Bartlett’s method yielded results for S.all that fit with the other factors, while none of the other
scoring methods did. See the psych package documentation for details about this method (Revelle, 2015). I
changed the extraction method for all the other analyses to Bartlett’s to remove method specific variance. The
new correlation table is shown in Table 2:
S all S chosen S automatic S wiki HDI mean Cognitive ability
S all 1.00 0.96 0.99 0.93 0.86 0.76
S chosen 0.96 1.00 0.96 0.86 0.93 0.70
S automatic 0.99 0.96 1.00 0.91 0.88 0.76
S wiki 0.93 0.86 0.91 1.00 0.75 0.78
Page 9 of 14.
HDI mean 0.86 0.93 0.88 0.75 1.00 0.53
Cognitive ability 0.76 0.70 0.76 0.78 0.53 1.00
Table 2: Primary results – Bartlett scoring.5
Intriguingly, now all the correlations are stronger. Perhaps Bartlett’s method is better for handling this type of
extraction involving general factors from datasets with low case to variable ratios. It certainly deserves empirical
investigation, including reanalysis of prior datasets. I reran the earlier parts of this paper with the Bartlett
method. It did not substantially change results. The correlations between loadings across analysis increased a bit
(to .98, 1.00, .99).
One possibility however is that the stronger results is just due to Bartlett’s method creating outliers that happen
to lie on the regression line. Examination of the scatterplots revealed that this was not the case.
3.7. S factor scores and cognitive ability
The next question is to what degree the within country differences in Mexico can be explained by cognitive
ability. The correlations are in the above table as well, they are in the region .70 to .78 for the various S factors.
In other words, fairly high. One could plot all of them vs. cognitive ability, but that would give us 4 plots.
Instead, I plot only the S factor from my chosen variables, shown in Figure 6, since this has the highest
correlation with HDI and thus the best claim for construct validity. It is also the most conservative option
because of the 4 S factors, it has the lowest correlation with cognitive ability.
5 These results originally had somewhat weaker S intercorrelations, e.g. S.all x S.chosen = .79. I could not replicate this result, but it
matters little because the interpretation is about the same. It may be an error in the original write-up.
Page 10 of 14.
We see that the federal district is a strong outlier, just like in the study with US states and Washington DC
(Kirkegaard, 2015a). One should then remove it and rerun all the analyses. This includes the S factor extractions
because the presence of a strong ‘mixed case’ (to be explained further in a future publication) affects the S factor
extracted (ibid.).
4. Analyses without Federal District
I reran all the analyses without the federal district. The loading correlations increased slightly to 1.00.
The factor score correlations, shown in Table 3 below, increased meaning that the Federal District outlier was a
source of discrepancy between the extraction methods. After this is resolved, the S factors from the INEG dataset
are in near-perfect agreement (1.00, .99, .99) while the one from Wikipedia data is less so but still respectable
(.93, .93, .91). Correlations with cognitive ability also improved.
S all S chosen
S
automatic S wiki HDI mean
Cognitive
ability
S all 1.00 1.00 0.99 0.93 0.85 0.78
S chosen 1.00 1.00 0.99 0.93 0.87 0.81
S automatic 0.99 0.99 1.00 0.91 0.89 0.79
S wiki 0.93 0.93 0.91 1.00 0.75 0.77
Page 11 of 14.
Figure 6: Cognitive ability and S factor (based on chosen variables).
HDI mean 0.85 0.87 0.89 0.75 1.00 0.56
Cognitive
ability 0.78 0.81 0.79 0.77 0.56 1.00
Table 3: Primary results. Federal District excluded. Bartlett's scoring.
Figure 7 shows the scatterplot of cognitive ability and S (chosen).
4.1. Method of correlated vectors (MCV)
In line with earlier studies, I examined whether the measures that are better measures of the latent S factor are
also correlated more highly with the criterion variable, cognitive ability. Indicators with negative loadings were
reversed to avoid inflation the correlation. Figure 8 shows the MCV scatterplot for the analysis with the chosen
variables.
Page 12 of 14.
Figure 7: Cognitive ability and S (chosen variables). Federal District excluded.
The MCV results are strong: .90 .77 .92 and .92 for the analysis with all variables, chosen variables,
automatically chosen variables and Wikipedian variables respectively. Note that these are for the analyses
without the federal district, but they were similar with it too.
5. Discussion and conclusion
Generally, the present analysis found similar findings to earlier studies, especially the study pf US states
(Kirkegaard, 2015a). Cognitive ability was a very strong correlate of the S factors, especially once the federal
district outlier was excluded. Further work is needed to find out why unemployment and crime variables
sometimes load positively in S factor analyses with regions or states as the unit of analysis.
MCV analysis supported the idea that cognitive ability is related to the S factor, not just some non-S factor
source of variance also present in the dataset.
Supplementary material and acknowledgments
Supplementary materials including code, high quality figures and data can be found at https://osf.io/zk3yx/.
References
Fuerst, J., & Kirkegaard, E. O. W. (2016). Admixture in the Americas: Regional and national differences.
Page 13 of 14.
Figure 8: MCV scatterplot of S (chosen) and cognitive ability. Federal District excluded.
Mankind Quarterly.
Johnson, W., te Nijenhuis, J., & Bouchard, T. J. (2008). Still just 1 g: Consistent results from five test batteries.
Intelligence, 36(1), 81–95.
Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international
rankings. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/09/the-
international-general-socioeconomic-factor-factor-analyzing-international-rankings/
Kirkegaard, E. O. W. (2015a). Examining the S factor in US states. The Winnower. Retrieved from
https://thewinnower.com/papers/examining-the-s-factor-in-us-states
Kirkegaard, E. O. W. (2015b). Indian states: G and S factors. The Winnower. Retrieved from
https://thewinnower.com/papers/indian-states-g-and-s-factors
Kirkegaard, E. O. W. (2015c). S and G in Italian regions: Re-analysis of Lynn’s data and new data. The
Winnower. Retrieved from https://thewinnower.com/papers/s-and-g-in-italian-regions-re-analysis-of-
lynn-s-data-and-new-data
Kirkegaard, E. O. W. (2015d). The S factor in China. The Winnower. Retrieved from
https://thewinnower.com/papers/the-s-factor-in-china
Kirkegaard, E. O. W. (2015e). The S factor in the British Isles: A reanalysis of Lynn (1979). The Winnower.
Retrieved from https://thewinnower.com/papers/the-s-factor-in-the-british-isles-a-reanalysis-of-lynn-
1979
Lynn, R. (2010). In Italy, north–south differences in IQ predict differences in income, education, infant mortality,
stature, and literacy. Intelligence, 38(1), 93–100. https://doi.org/10.1016/j.intell.2009.07.004
Ree, M. J., & Earles, J. A. (1991). The stability of g across different methods of estimation. Intelligence, 15(3),
271–278. https://doi.org/10.1016/0160-2896(91)90036-D
Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research (Version
1.5.4). Retrieved from http://cran.r-project.org/web/packages/psych/index.html
Templ, M., Alfons, A., Kowarik, A., & Prantner, B. (2015, February 19). VIM: Visualization and Imputation of
Missing Values. CRAN. Retrieved from http://cran.r-project.org/web/packages/VIM/index.html
Zhao, N. (2009, March 23). The Minimum Sample Size in Factor Analysis. Retrieved November 16, 2016, from
https://www.encorewiki.org/display/~nzhao/The+Minimum+Sample+Size+in+Factor+Analysis
Page 14 of 14.
... When such socioeconomic variables are factor analyzed, though, a general socioeconomic factor (S factor) emerges such that, most of the time, desirable outcomes load positively and undesirable outcomes load negatively on it. Previous research has found S factors at the national level (Kirkegaard, 2014b), the state/region/department level (Carl, 2015;Kirkegaard, 2015bKirkegaard, , 2015d and the city district level (Kirkegaard, 2015a). Analyses of national and state level data showed that Human Development Index (HDI) scores correlated strongly with S factor scores at typically >.9. ...
... There are 31 states and a federal district. Since federal districts are often outliers (Kirkegaard, 2015d, we excluded the federal district from all analyses except the admixture plot. ...
... As discussed in Section 16, when analyzing socioeconomic outcomes, a general factor tends to emerge. Since no Mexican state S factor study existed, one of us conducted a thorough study (Kirkegaard, 2015d), using outcome data from approximately 2005 to 2015. We found that year 2010 HDI correlated very strongly (r = .93) ...
Article
Full-text available
We conducted novel analyses regarding the association between continental racial ancestry, cognitive ability and socioeconomic outcomes across 6 datasets: states of Mexico, states of the United States, states of Brazil, departments of Colombia, sovereign nations and all units together. We find that European ancestry is consistently and usually strongly positively correlated with cognitive ability and socioeconomic outcomes (mean r for cognitive ability = .708; for socioeconomic well-being = .643) (Sections 3-8). In most cases, including another ancestry component, in addition to European ancestry, did not increase predictive power (Section 9). At the national level, the association between European ancestry and outcomes was robust to controls for natural-environmental factors (Section 10). This was not always the case at the regional level (Section 18). It was found that genetic distance did not have predictive power independent of European ancestry (Section 10). Automatic modeling using best subset selection and lasso regression agreed in most cases that European ancestry was a non-redundant predictor (Section 11). Results were robust across 4 different ways of weighting the analyses (Section 12). It was found that the effect of European ancestry on socioeconomic outcomes was mostly mediated by cognitive ability (Section 13). We failed to find evidence of international colorism or culturalism (i.e., neither skin reflectance nor self-reported race/ethnicity showed incremental predictive ability once genomic ancestry had been taken into account) (Section 14). The association between European ancestry and cognitive outcomes was robust across a number of alternative measures of cognitive ability (Section 15). It was found that the general socioeconomic factor was not structurally different in the American sample as compared to the worldwide sample, thus justifying the use of that measure. Using Jensen's method of correlated vectors, it was found that the association between European ancestry and socioeconomic outcomes was stronger on more S factor loaded outcomes, r = .75 (Section 16). There was some evidence that tourist expenditure helped explain the relatively high socioeconomic performance of Caribbean states (Section 17).
... Many recent studies have examined within-country regional correlates of (general) cognitive ability (also known as (general) intelligence, general mental ability, g),. This has been done for the British Isles (Lynn, 1979;Kirkegaard, 2015g), France (Lynn, 1980), Italy (Lynn, 2010;Kirkegaard, 2015e), Spain (Lynn, 2012), Portugal (Almeida, Lemos, & Lynn, 2011), India (Kirkegaard, 2015d;Lynn & Yadav, 2015), China (Kirkegaard, 2015f;Lynn & Cheng, 2013), Japan (Kura, 2013), the US (Kirkegaard, 2015b;McDaniel, 2006;Templer & Rushton, 2011), Mexico (Kirkegaard, 2015a) and Turkey (Lynn, Sakar, & Cheng, 2015). This paper examines data for Brazil. ...
... amount of rainforest). The following variables were selected: Most data was already in an appropriate per unit measure so it was not necessary to do extensive conversions as with the Mexican data (Kirkegaard, 2015a). I calculated fraction of the population living in rural areas by dividing the rural population by the total population. ...
... 1 This left me with the question of which variable(s) to exclude. Similar to the previous analysis for Mexican states (Kirkegaard, 2015a), I used an automatic method. After removing one variable, the factor analysis worked and gave no warning. ...
Article
Full-text available
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).
... This method was first used in Kirkegaard (2015a). ...
... This improves the coverage of countries somewhat, but decreases the number of indicators in each analysis. because it has been found to work well in datasets with low case n cases /n indicators ratios, even ratios <1 (Kirkegaard, 2015a). The correlation between the new and previously published S scores was .997, ...
Article
Full-text available
Some new methods for factor analyzing socioeconomic data are presented, discussed and illustrated with analyses of new and old datasets. A general socioeconomic factor (S) was found in a dataset of 47 French-speaking Swiss provinces from 1888. It was strongly related (r’s .64 to .70) to cognitive ability as measured by an army examination. Fertility had a strong negative loading (r -.44 to -.67). Results were similar when using rank-transformed data. The S factor of international rankings data was found to have a split-half factor reliability of .93, that of the general factor of personality extracted from 25 OCEAN items .55, and that of the general cognitive ability factor .68 based on 16 items from the International Cognitive Ability Resource.
... Because the S factor is an aggregate of such outcomes, it is not surprising that S scores have been found to have strong positive correlations with cognitive ability as well, e.g. (Kirkegaard, 2015b(Kirkegaard, , 2015c. ...
Article
Full-text available
Two datasets of Japanese socioeconomic data for Japanese prefectures (N=47) were obtained and merged. After quality control, there were 44 variables for use in a factor analysis. Indicator sampling reliability analysis revealed poor reliability (54% of the correlations were |r| > .50). Inspection of the factor loadings revealed no clear S factor with many indicators loading in opposite than expected directions. A cognitive ability measure was constructed from three scholastic ability measures (all loadings > .90). On first analysis, cognitive ability was not strongly related to 'S' factor scores, r = -.19 [CI95: -.45 to .19; N=47]. Jensen's method did not support the interpretation that the relationship is between latent 'S' and cognitive ability (r = -.15; N=44). Cognitive ability was nevertheless related to some socioeconomic indicators in expected ways. A reviewer suggested controlling for population size or population density. When this was done, a relatively clear S factor emerged. Using the best control method (log population density), indicator sampling reliability was high (93% |r|>.50). The scores were strongly related to cognitive ability r = .67 [CI95: .48 to .80]. Jensen's method supported the interpretation that cognitive ability was related to the S factor (r = .78) and not just to the non-general factor variance.
... S factors can be unstable across methods of extraction (10). For this reason an S factor was extracted using every combination of extraction and scoring method available in the fa() function in the psych package (11). ...
Article
Full-text available
A dataset of 30 diverse socioeconomic variables was collected covering 32 London boroughs. Factor analysis of the data revealed a general socioeconomic factor. This factor was strongly related to GCSE (General Certificate of Secondary Education) scores (r's .683 to .786) and and had weak to medium sized negative relationships to demographic variables related to immigrants (r's -.295 to -.558). Jensen's method indicated that these relationships were related to the underlying general factor, especially for GCSE (coefficients |.48| to |.69|). In multiple regression, about 60% of the variance in S outcomes could be accounted for using GCSE and one variable related to immigrants.
... Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a(Kirkegaard, , 2015b(Kirkegaard, , 2015d, I also created two further datasets without London: one with the redundant variables, one without. Thus, there were 4 datasets: 1. ...
Article
Full-text available
A reanalysis of (Carl, 2015) revealed that the inclusion of London had a strong effect on the S loading of crime and poverty variables. S factor scores from a dataset without London and redundant variables was strongly related to IQ scores, r = .87. The Jensen coefficient for this relationship was .86.
... Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a(Kirkegaard, , 2015b(Kirkegaard, , 2015d, I also created two further datasets without London. One with the redundant variables, one without. ...
Article
Full-text available
A dataset of 127 variables concerning socioeconomic outcomes for US states was analyzed. Of these, 81 were used in a factor analysis. The analysis revealed a general socioeconomic factor. This factor correlated .961 with one from a previous analysis of socioeconomic data for US states.
... Very highly correlated variables cause problems for factor analysis and result in 'double weighing' of some variables. For this reason, an algorithm developed previously was used to find the most highly correlated pairs of variables and remove one of them automatically (Kirkegaard, 2015a). A threshold of r = .90 ...
Article
Full-text available
A dataset was compiled with 17 diverse socioeconomic variables for 32 departments of Colombia and the capital district. Factor analysis revealed an S factor. Results were robust to data imputation and removal of a redundant variable. 14 of 17 variables loaded in the expected direction. Extracted S factors correlated about .50 with the cognitive ability estimate. The Jensen coefficient for the S factor for this relationship was .60.
Article
Full-text available
Two sets of socioeconomic data for 90-96 French departements were analyzed. One dataset was found in Lynn (1980) and contained four socioeconomic variables. Mixed results were found for this dataset, both with regards to the factor structure and the relationship to cognitive ability. Another dataset with 53 variables was created by compiling variables from the official French statistics bureau (Insee). This dataset contained an impure general socioeconomic (S) factor (some undesirable variables loaded positively), but after controlling for the presence of immigrants, the S factor became purer. This was especially salient for crime, unemployment and poverty variables. The two S factors correlated at r = 0.66 [CI95:0.52-0.76; N = 88]. The IQ scores from the 1950s dataset correlated at 0.33 [CI95:0.13-0.51, N = 88] with the S factor from the 2010-2015 dataset.
Article
Full-text available
We present and analyze data from a dataset of 2358 Danish first names and socioeconomic outcomes not previously made available to the public (“Navnehjulet”, the Name Wheel). We visualize the data and show that there is a general socioeconomic factor with indicator loadings in the expected directions (positive: income, owning your own place; negative: having a criminal conviction, being without a job). This result holds after controlling for age and for each gender alone. It also holds when analyzing the data in age bins. The factor loading of being married depends on analysis method, so it is more difficult to interpret. A pseudofertility is calculated based on the population size for the names for the years 2012 and 2015. This value is negatively correlated with the S factor score r = -.35 [95CI: -.39; -.31], but the relationship seems to be somewhat non-linear and there is an upward trend at the very high end of the S factor. The relationship is strongly driven by relatively uncommon names who have high pseudofertility and low to very low S scores. The n-weighted correlation is -.21 [95CI: -.25; -.17]. This dysgenic pseudofertility was mostly driven by Arabic and African names. All data and R code is freely available.
Article
Full-text available
We conducted novel analyses regarding the association between continental racial ancestry, cognitive ability and socioeconomic outcomes across 6 datasets: states of Mexico, states of the United States, states of Brazil, departments of Colombia, sovereign nations and all units together. We find that European ancestry is consistently and usually strongly positively correlated with cognitive ability and socioeconomic outcomes (mean r for cognitive ability = .708; for socioeconomic well-being = .643) (Sections 3-8). In most cases, including another ancestry component, in addition to European ancestry, did not increase predictive power (Section 9). At the national level, the association between European ancestry and outcomes was robust to controls for natural-environmental factors (Section 10). This was not always the case at the regional level (Section 18). It was found that genetic distance did not have predictive power independent of European ancestry (Section 10). Automatic modeling using best subset selection and lasso regression agreed in most cases that European ancestry was a non-redundant predictor (Section 11). Results were robust across 4 different ways of weighting the analyses (Section 12). It was found that the effect of European ancestry on socioeconomic outcomes was mostly mediated by cognitive ability (Section 13). We failed to find evidence of international colorism or culturalism (i.e., neither skin reflectance nor self-reported race/ethnicity showed incremental predictive ability once genomic ancestry had been taken into account) (Section 14). The association between European ancestry and cognitive outcomes was robust across a number of alternative measures of cognitive ability (Section 15). It was found that the general socioeconomic factor was not structurally different in the American sample as compared to the worldwide sample, thus justifying the use of that measure. Using Jensen's method of correlated vectors, it was found that the association between European ancestry and socioeconomic outcomes was stronger on more S factor loaded outcomes, r = .75 (Section 16). There was some evidence that tourist expenditure helped explain the relatively high socioeconomic performance of Caribbean states (Section 17).
Article
Full-text available
I analyze the S factor in Italian states by reanalyzing data published by Lynn (2010) as well as new data compiled from the Italian statistics agency (7 and 10 socioeconomic variables, respectively). The S factors from the datasets are highly correlated (.92) and both are strongly correlated with a G factor from PISA scores (.93 and .88).
Article
Full-text available
I analyzed the S factor in US states by compiling a dataset of 25 diverse socioeconomic indicators. Results show that Washington DC is a strong outlier, but if it is excluded, then the S factor correlated strongly with state IQ at .75. Ethnoracial demographics of the states are related to the state's IQ and S in the expected order (White>Hispanic>Black).
Article
Full-text available
I reanalyze data published by Lynn and Yadav (2015) for Indian states. I find both G and S factors which correlate at .61. The statistical language R is used thruout the paper and the code is explained. The paper thus is both an analysis as a walkthru of how to conduct this type of study.
Article
Full-text available
I reanalyze data reported by Richard Lynn in a 1979 paper concerning IQ and socioeconomic variables in 12 regions of the United Kingdom as well as Ireland. I find a substantial S factor across regions (66% of variance with MinRes extraction). I produce a new best estimate of the G scores of regions. The correlation of this with the S scores is .79. The MCV with reversal correlation is .47.
Article
Full-text available
Many studies have examined the correlations between national IQs and various country-level indexes of well-being. The analyses have been unsystematic and not gathered in one single analysis or dataset. In this paper I gather a large sample of country-level indexes and show that there is a strong general socioeconomic factor (S factor) which is highly correlated (.86-.87) with national cognitive ability using either Lynn and Vanhanen's dataset or Altinok's. Furthermore, the method of correlated vectors shows that the correlations between variable loadings on the S factor and cognitive measurements are .99 in both datasets using both cognitive measurements, indicating that it is the S factor that drives the relationship with national cognitive measurements, not the remaining variance.
Article
Full-text available
In a recent paper, Johnson, Bouchard, Krueger, McGue, and Gottesman (2004) addressed a long-standing debate in psychology by demonstrating that the g factors derived from three test batteries administered to a single group of individuals were completely correlated. This finding provided evidence for the existence of a unitary higher-level general intelligence construct whose measurement is not dependent on the specific abilities assessed. In the current study we constructively replicated this finding utilizing five test batteries. The replication is important because there were substantial differences in both the sample and the batteries administered from those in the original study. The current sample consisted of 500 Dutch seamen of very similar age and somewhat truncated range of ability. The batteries they completed included many tests of perceptual ability and dexterity, and few verbally oriented tests. With the exception of the g correlations involving the Cattell Culture Fair Test, which consists of just four matrix reasoning tasks of very similar methodology, all of the g correlations were at least .95. The lowest g correlation was .77. We discuss the implications of this finding.
Article
Multiple methods were used to estimate g (general cognitive ability) from a representative multiple-aptitude test battery. These methods included unrotated principal components, unrotated principal factors, and hierarchical factor analysis. Several variants of the hierarchical factor analyses were used ranging from three to eight factors. Fourteen estimates of g were made and computed in the normative sample for the test. The correlations of these estimates were high, ranging from .930 to .999. For this test, all other multiple-aptitude batteries, and any other set of variables that displays positive manifold, it is argued that the methods are equivalent. This is not due to similarity of factoring techniques, but rather to the positive intercorrelations of the variables as demonstrated by Wilks (1938).