ArticlePDF Available

IQ and socioeconomic development across Regions of the UK: a reanalysis

  • Ulster Institute for Social Research

Abstract and Figures

A reanalysis of (Carl, 2015) revealed that the inclusion of London had a strong effect on the S loading of crime and poverty variables. S factor scores from a dataset without London and redundant variables was strongly related to IQ scores, r = .87. The Jensen coefficient for this relationship was .86.
Content may be subject to copyright.
IQ and socioeconomic development across Regions of the UK: a reanalysis
A reanalysis of (Carl, 2015) revealed that the inclusion of London had a strong effect on the S loading
of crime and poverty variables. S factor scores from a dataset without London and redundant variables
was strongly related to IQ scores, r = .87. The Jensen coefficient for this relationship was .86.
Carl (2015) analyzed socioeconomic inequality across 12 regions of the UK. In my reading of his
paper, I thought of several analyses that Carl had not done. I therefore asked him for the data and he
shared it with me. For a fuller description of the data sources, refer back to his article.
Redundant variables and London
Including (nearly) perfectly correlated variables can skew an extracted factor. For this reason, I created
an alternative dataset where variables that correlated above |.90| were removed. The following pairs of
strongly correlated variables were found:
1. median.weekly.earnings and log.weekly.earnings r=0.999
2. GVA.per.capita and log.GVA.per.capita r=0.997
3. R.D.workers.per.capita and log.weekly.earnings r=0.955
4. log.GVA.per.capita and log.weekly.earnings r=0.925
5. economic.inactivity and children.workless.households r=0.914
In each case, the first of the pair was removed from the dataset. However, this resulted in a dataset with
11 cases and 11 variables, which is impossible to factor analyze. For this reason, I left in the last pair.
Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a,
2015b, 2015d), I also created two further datasets without London: one with the redundant variables,
one without. Thus, there were 4 datasets:
1. A dataset with London and redundant variables.
2. A dataset with redundant variables but without London.
3. A dataset with London but without redundant variables.
4. A dataset without London and redundant variables.
Factor analysis
Each of the four datasets was factor analyzed. Figure 1 shows the loadings.
Removing London strongly affected the loading of the crime variable, which changed from moderately
positive to moderately negative. The poverty variable also saw a large change, from slightly negative to
strongly negative. Both changes are in the direction towards a purer S factor (desirable outcomes with
positive loadings, undesirable outcomes with negative loadings). Removing the redundant variables did
not have much effect.
As a check, I investigated whether these results were stable across 30 different factor analytic
methods.1 They were, all loadings and scores correlated near 1.00. For my analysis, I used those
extracted with the combination of minimum residuals and regression.
Due to London's strong effect on the loadings, one should check that the two methods developed for
finding such cases can identify it (Kirkegaard, 2015c). Figure 2 shows the results from these two
methods (mean absolute residual and change in factor size):
1 There are 6 different extraction and 5 scoring methods supported by the fa() function from the psych package (Revelle,
2015). Thus, there are 6*5 combinations.
Figure 1: S factor loadings in four analyses.
As can be seen, London was identified as a far outlier using both methods.
S scores and IQ
Carl's dataset also contains IQ scores for the regions. These correlate .87 with the S factor scores from
the dataset without London and redundant variables. Figure 3 shows the scatter plot.
However, it is possible that IQ is not really related to the latent S factor, just the other variance of the
extracted S scores. For this reason I used Jensen's method (method of correlated vectors) (Jensen,
1998). Figure 4 shows the results.
Figure 2: Mixedness metrics for the complete dataset.
Figure 3: Scatter plot of S and IQ scores for regions of the UK.
Jensen's method thus supported the claim that IQ scores and the latent S factor are related.
Discussion and conclusion
My reanalysis revealed some interesting results regarding the effect of London on the loadings. This
was made possible by data sharing demonstrating the importance of this practice (Wicherts & Bakker,
Supplementary material
R source code and datasets are available at the OSF.
Carl, N. (2015). IQ and socioeconomic development across Regions of the UK. Journal of Biosocial
Science, 1–12.
Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from
Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from
Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower.
Retrieved from
Figure 4: Jensen's method for the S factor's relationship to IQ scores.
Kirkegaard, E. O. W. (2015d). The S factor in Brazilian states. The Winnower. Retrieved from
Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research
(Version 1.5.4). Retrieved from
Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish
your data too? Intelligence, 40(2), 73–76.
... 6 For instance, a case might have high crime rates, high use of social benefits as well as high income and high educational attainment despite the loadings for these being negative and positive respectively. Such patterns are often seen for cases that consist mostly of one large city (Carl, 2015;Kirkegaard, 2015d). I previously called this phenomenon mixedness because the indicators of these cases give a decidedly mixed picture of the case, but it seems more suitable to use the term structural outlier (Kirkegaard, 2015b). ...
... This method was first used in Kirkegaard (2015d). ...
... Often it will be a good idea to conduct both standard factor analysis and ranked factor analysis so that one may compare the results, e.g. as in Kirkegaard (2015d). Figure 7 shows the factor loadings for standard factor analysis, standard factor analysis without Geneva and factor analysis on the rank-transformed data. ...
Full-text available
Some new methods for factor analyzing socioeconomic data are presented, discussed and illustrated with analyses of new and old datasets. A general socioeconomic factor (S) was found in a dataset of 47 French-speaking Swiss provinces from 1888. It was strongly related (r’s .64 to .70) to cognitive ability as measured by an army examination. Fertility had a strong negative loading (r -.44 to -.67). Results were similar when using rank-transformed data. The S factor of international rankings data was found to have a split-half factor reliability of .93, that of the general factor of personality extracted from 25 OCEAN items .55, and that of the general cognitive ability factor .68 based on 16 items from the International Cognitive Ability Resource.
... As before, there were no surprising loadings. Crime rate's loading was positive as has been found previously when aggregate data with large units were analyzed (Kirkegaard, 2015a;Lynn, 1979). Curiously, though, at the same time crime victimization was negative. ...
Full-text available
Analyses of the relationships between cognitive ability, socioeconomic outcomes, and European ancestry were carried out at multiple levels in Argentina: individual (max. n = 5,920), district (n = 437), municipal (n = 299), and provincial (n = 24). Socioeconomic outcomes correlated in expected ways such that there was a general socioeconomic factor (S factor). The structure of this factor replicated across four levels of analysis, with a mean congruence coefficient of .96. Cognitive ability and S were moderately to strongly correlated at the four levels of analyses: individual r=.55 (.44 before disattenuation), district r=.52, municipal r=.66, and provincial r=.88. European biogeographic ancestry (BGA) for the provinces was estimated from 25 genomics papers. These estimates were validated against European ancestry estimated from self-identified race/ethnicity (SIRE; r=.67) and interviewer-rated skin brightness (r=.33). On the provincial level, European BGA correlated strongly with scholastic achievement-based cognitive ability and composite S-factor scores (r's .48 and .54, respectively). These relationships were not due to confounding with latitude or mean temperature when analyzed in multivariate analyses. There were no BGA data for the other levels, so we relied on %White, skin brightness, and SIRE-based ancestry estimates instead, all of which were related to cognitive ability and S at all levels of analysis. At the individual level, skin brightness was related to both cognitive ability and S. Regression analyses showed that SIRE had little detectable predictive validity when skin brightness was included in models. Similarly, the correlations between skin brightness, cognitive ability, and S were also found inside SIRE groups. The results were similar when analyzed within provinces. In general, results were congruent with a familial model of individual and regional outcome differences.
... Often when analyzing socioeconomic datasets for within country regions, the capital region is found to be a strongly mixed case. For instance, it may have a high mean income and a high level of educational attainment, but also have a high crime rate and high unemployment rate e.g. as with London in an analysis of regions of the UK (Kirkegaard, 2015g). The two methods for examining mixedness developed by Kirkegaard (2015e) were used on the dataset. ...
Full-text available
Two sets of socioeconomic data for 90-96 French departements were analyzed. One dataset was found in Lynn (1980) and contained four socioeconomic variables. Mixed results were found for this dataset, both with regards to the factor structure and the relationship to cognitive ability. Another dataset with 53 variables was created by compiling variables from the official French statistics bureau (Insee). This dataset contained an impure general socioeconomic (S) factor (some undesirable variables loaded positively), but after controlling for the presence of immigrants, the S factor became purer. This was especially salient for crime, unemployment and poverty variables. The two S factors correlated at r = 0.66 [CI95:0.52-0.76; N = 88]. The IQ scores from the 1950s dataset correlated at 0.33 [CI95:0.13-0.51, N = 88] with the S factor from the 2010-2015 dataset.
Full-text available
Two datasets of Japanese socioeconomic data for Japanese prefectures (N=47) were obtained and merged. After quality control, there were 44 variables for use in a factor analysis. Indicator sampling reliability analysis revealed poor reliability (54% of the correlations were |r| > .50). Inspection of the factor loadings revealed no clear S factor with many indicators loading in opposite than expected directions. A cognitive ability measure was constructed from three scholastic ability measures (all loadings > .90). On first analysis, cognitive ability was not strongly related to 'S' factor scores, r = -.19 [CI95: -.45 to .19; N=47]. Jensen's method did not support the interpretation that the relationship is between latent 'S' and cognitive ability (r = -.15; N=44). Cognitive ability was nevertheless related to some socioeconomic indicators in expected ways. A reviewer suggested controlling for population size or population density. When this was done, a relatively clear S factor emerged. Using the best control method (log population density), indicator sampling reliability was high (93% |r|>.50). The scores were strongly related to cognitive ability r = .67 [CI95: .48 to .80]. Jensen's method supported the interpretation that cognitive ability was related to the S factor (r = .78) and not just to the non-general factor variance.
Full-text available
We conducted novel analyses regarding the association between continental racial ancestry, cognitive ability and socioeconomic outcomes across 6 datasets: states of Mexico, states of the United States, states of Brazil, departments of Colombia, sovereign nations and all units together. We find that European ancestry is consistently and usually strongly positively correlated with cognitive ability and socioeconomic outcomes (mean r for cognitive ability = .708; for socioeconomic well-being = .643) (Sections 3-8). In most cases, including another ancestry component, in addition to European ancestry, did not increase predictive power (Section 9). At the national level, the association between European ancestry and outcomes was robust to controls for natural-environmental factors (Section 10). This was not always the case at the regional level (Section 18). It was found that genetic distance did not have predictive power independent of European ancestry (Section 10). Automatic modeling using best subset selection and lasso regression agreed in most cases that European ancestry was a non-redundant predictor (Section 11). Results were robust across 4 different ways of weighting the analyses (Section 12). It was found that the effect of European ancestry on socioeconomic outcomes was mostly mediated by cognitive ability (Section 13). We failed to find evidence of international colorism or culturalism (i.e., neither skin reflectance nor self-reported race/ethnicity showed incremental predictive ability once genomic ancestry had been taken into account) (Section 14). The association between European ancestry and cognitive outcomes was robust across a number of alternative measures of cognitive ability (Section 15). It was found that the general socioeconomic factor was not structurally different in the American sample as compared to the worldwide sample, thus justifying the use of that measure. Using Jensen's method of correlated vectors, it was found that the association between European ancestry and socioeconomic outcomes was stronger on more S factor loaded outcomes, r = .75 (Section 16). There was some evidence that tourist expenditure helped explain the relatively high socioeconomic performance of Caribbean states (Section 17).
Full-text available
A dataset of 30 diverse socioeconomic variables was collected covering 32 London boroughs. Factor analysis of the data revealed a general socioeconomic factor. This factor was strongly related to GCSE (General Certificate of Secondary Education) scores (r's .683 to .786) and and had weak to medium sized negative relationships to demographic variables related to immigrants (r's -.295 to -.558). Jensen's method indicated that these relationships were related to the underlying general factor, especially for GCSE (coefficients |.48| to |.69|). In multiple regression, about 60% of the variance in S outcomes could be accounted for using GCSE and one variable related to immigrants.
Full-text available
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).
Full-text available
I analyzed the S factor in US states by compiling a dataset of 25 diverse socioeconomic indicators. Results show that Washington DC is a strong outlier, but if it is excluded, then the S factor correlated strongly with state IQ at .75. Ethnoracial demographics of the states are related to the state's IQ and S in the expected order (White>Hispanic>Black).
Full-text available
Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results. Method of correlated vectors was strongly positive for all 4 analyses (r’s .78 to .92 with reversing).
Cross-regional correlations between average IQ and socioeconomic development have been documented in many different countries. This paper presents new IQ estimates for the twelve regions of the UK. These are weakly correlated ( r =0.24) with the regional IQs assembled by Lynn (1979). Assuming the two sets of estimates are accurate and comparable, this finding suggests that the relative IQs of different UK regions have changed since the 1950s, most likely due to differentials in the magnitude of the Flynn effect, the selectivity of external migration, the selectivity of internal migration or the strength of the relationship between IQ and fertility. The paper provides evidence for the validity of the regional IQs by showing that IQ estimates for UK nations (England, Scotland, Wales and Northern Ireland) derived from the same data are strongly correlated with national PISA scores ( r =0.99). It finds that regional IQ is positively related to income, longevity and technological accomplishment; and is negatively related to poverty, deprivation and unemployment. A general factor of socioeconomic development is correlated with regional IQ at r =0.72.
The authors argue that upon publication of a paper, the data should be made available through online archives or repositories. Reasons for not sharing data are discussed and contrasted with advantages of sharing, which include abiding by the scientific principle of openness, keeping the data for posterity, increasing one's impact, facilitation of secondary analyses and collaborations, prevention and correction of errors, and meeting funding agencies' increasingly stringent stipulations concerning the dissemination of data. Practicing what they preach, the authors include data as an online appendix to this editorial. These data are from a cohort of psychology freshmen who completed Raven's Advanced Progressive Matrices, tests of Numerical Ability, Number Series, Hidden Figures, Vocabulary, Verbal Analogies, and Logical Reasoning, two Big Five personality inventories, and scales for social desirability and impression management. Student's sex and grade point average (GPA) are also included. Data could be used to study predictive validity of cognitive ability tests, Extraversion, Neuroticism, Conscientiousness, Openness to Experience, Agreeableness, and the general factor of personality, as well as sex differences, differential prediction, and relations between personality and intelligence.
A pesar de la relativamente corta historia de la Psicología como ciencia, existen pocos constructos psicológicos que perduren 90 años después de su formulación y que, aún más, continúen plenamente vigentes en la actualidad. El factor «g» es sin duda alguna uno de esos escasos ejemplos y para contrastar su vigencia actual tan sólo hace falta comprobar su lugar de preeminencia en los modelos factoriales de la inteligencia más aceptados en la actualidad, bien como un factor de tercer orden en los modelos jerárquicos o bien identificado con un factor de segundo orden en el modelo del recientemente desaparecido R.B.Cattell.
analysis Figure 4: Jensen's method for the S factor's relationship to IQ scores
  • E O W Kirkegaard
Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower. Retrieved from Figure 4: Jensen's method for the S factor's relationship to IQ scores.