ArticlePDF Available

The general socioeconomic factor among Colombian departments

  • Ulster Institute for Social Research

Abstract and Figures

A dataset was compiled with 17 diverse socioeconomic variables for 32 departments of Colombia and the capital district. Factor analysis revealed an S factor. Results were robust to data imputation and removal of a redundant variable. 14 of 17 variables loaded in the expected direction. Extracted S factors correlated about .50 with the cognitive ability estimate. The Jensen coefficient for the S factor for this relationship was .60.
Content may be subject to copyright.
The Winnower
Published 2015-06-16
The general socioeconomic factor
among Colombian departments
Emil O. W. Kirkegaard1
A dataset was compiled with 17 diverse socioeconomic variables for 32 departments of Colombia and
the capital district. Factor analysis revealed an S factor. Results were robust to data imputation and
removal of a redundant variable. 14 of 17 variables loaded in the expected direction. Extracted S
factors correlated about .50 with the cognitive ability estimate. The Jensen coefficient for the S factor
for this relationship was .60.
Key words: Colombia, departments, social inequality, S factor, general socioeconomic factor, IQ,
intelligence, cognitive ability, cognitive sociology
1. Introduction
The general socioeconomic factor is the mathematical construct associated with the idea that positive
outcomes tend to go along with other positive outcomes, and likewise for the negative. Mathematically,
this shows up as a factor where the desirable outcomes load positively and where the undesirable
outcomes load negatively. As far as I know, (Kirkegaard, 2014b) was the first to report such a factor,
although Lynn (1979) was close to the same idea. The factor is called s at the individual level, and S
when found in aggregated data following the proposed analogous terminology for the general factor of
cognitive ability (Rindermann, 2007).
By now, S factors have been found between countries (Kirkegaard, 2014b), twice between country-of-
origin groups within countries (Kirkegaard, 2014a), numerous times within countries (reviewed in
Kirkegaard, 2015c), and at the level of first names (Kirkegaard & Tranberg, 2015). This paper analyses
data for 33 Colombian departments including the capital district.
1 University of Aarhus, Denmark. Email:
Page 1 of 9.
2. Data sources
Most of the data were found via the English-language website which is an aggregator of
statistical information concerning countries and their divisions. A second source was a Spanish-
language report (DANE, 2011). One variable had to be found on Wikipedia (“List of Colombian
departments by GDP,” 2015). Finally, HDI20102 was found in a Spanish-language UN report (United
Nations Development Programme & UNDP Colombia, 2011).
Variables were selected according to two criteria: 1) they must be socioeconomically important and 2)
they must not be strongly dependent on local climatic conditions. For instance, fishermen per capita
would be a variable that fails both criteria, since it is not generally seen as socioeconomically important
and is dependent on having access to a body of water.
The included variables were:
SABER, verbal scores
SABER, math scores
Acute malnutrition, %
Chronic malnutrition, %
Low birth weight, %
Access to clean water, %
The presence of a sewerage system, %
Immunization coverage, %
Child mortality, rate
Infant mortality, rate
Life expectancy at birth
Total fertility rate
Births that occur in a health clinic, %
Unemployment, %
GDP per capita
Poverty, %
Domestic violence, rate
Urbanicity, %
Population, absolute number
2 Human Development Index.
Page 2 of 9.
HDI 2010
SABER (ICFES exam) is a local academic achievement test similar to SAT used in the United States
(Hattie & Anderman, 2012, sec. 9.7).
3. Analyses
3.1. Missing data
When collecting the data, it was noticed that quite a number of the variables have missing data. A plot
of the missing data is shown in Figure 1.
The red cells indicate missing data. The greyscale fields indicate high (dark) and low values in each
variable. Thus we can see that missing data tends to cluster within departments.
3.2. Redundant variables and imputation
Very highly correlated variables cause problems for factor analysis and result in 'double weighing' of
some variables. For this reason, an algorithm developed previously was used to find the most highly
correlated pairs of variables and remove one of them automatically (Kirkegaard, 2015a). A threshold of
r = .90 was used (absolute). Only one pair of variables was correlated above this threshold (infant
mortality and child mortality, r = .92; removed infant mortality).
The missing data was imputed using the irmi function from the VIM package (Templ, Alfons,
Kowarik, & Prantner, 2015). This was done without noise to make the results reproducible. No attempt
was made to estimate standard errors, so multiple imputation was unnecessary (Donders, van der
Heijden, Stijnen, & Moons, 2006).
To check whether results were comparable across methods, datasets were saved with every
Page 3 of 9.
Figure 1: Matrix plot for the dataset.
combination of imputation and removal of the redundant variable, thus creating 4 datasets.
3.3. Factor analysis
Factor analysis was carried out on the 4 datasets. The factor loadings plot is shown in Figure 2.
Results were were similar across methods. Per S factor theory, the desirable variables should have
positive loadings and the undesirable negative loadings. This was not entirely the case. 3 variables that
are generally considered undesirable loaded positively: unemployment rate, low birth weight and
domestic violence.
Unemployment rate and crime has been found to load in the wrong direction before when analyzing
state-like units. It may be due to the welfare systems being better in the higher S departments, making it
possible to survive without working.
It is said that cities breed crime and since urbanicity has a very high positive S loading, the crime result
may be a side-effect of that (Lynn, 1979). Alternatively, the legal system may be better (e.g. less
corrupt) in the higher S departments making it more likely for crimes to be reported. This is perhaps
especially so for crimes against women.
The result with low birth weight is more strange given that higher birth weight is a known correlate of
higher educational levels and cognitive ability (Shenkin, Starr, & Deary, 2004). One of the other
variables suggest an answer: in the lower S departments, a large fraction (30-40%) of births are home-
births, and it seems likely that this would result in fewer reports of low birth weights.
Generally, the results are consistent with those from other countries; 14 of 17 variables loaded in the
expected direction.
Page 4 of 9.
Figure 2: Factor loadings plot.
3.4. Mixed cases
Mixed cases are cases that do not fit the factor structure of a dataset. Previously I developed two
methods for detecting such cases (Kirkegaard, 2015b). Neither method indicated any strong mixed
cases in the unimputed, unreduced dataset or the imputed, reduced dataset. Removing the least
congruent case only increased the factor size by 1.2%point, and the case with the greatest mean
absolute residual had only .89.
Unlike previous analysis, the capital district was kept because it did not appear to be a structural outlier.
3.5. Cognitive ability, S and HDI
The two cognitive (SABER) variables correlated at .84, indicating the presence of the aggregate
general cognitive ability factor (G factor; Rindermann, 2007). They were averaged to form an estimate
of the G factor.
The correlations between S factors, HDI and cognitive ability is shown in Table 1.
S0.99 0.84 0.54
S.ri 0.99 0.85 0.49
HDI 0.84 0.87 0.44
CA 0.51 0.58 0.60
Table 1: Correlation matrix for cognitive ability, S factor and HDI. Correlations below diagonal are
weighted by the square root of population size. S.ri = reduced and imputed dataset.
Weighted and unweighted correlations were approximately the same. The imputed and trimmed S
factor was nearly identical to the HDI values, despite that the HDI values are from 2010 and the data
the S factor is based on is from 2005. Results are fairly similar to those found in other countries.
Figure 3 shows a scatter plot of S factor (reduced, imputed dataset) and cognitive ability.
Page 5 of 9.
3.6. Jensen's method
Finally, as a robustness test, Jensen's method (method of correlated vectors (Frisby & Beaujean, 2015;
Jensen, 1998)) was used to see if cognitive ability’s association with the S factor scores was due to the
latent S factor. Figure 4 shows the Jensen plot.
The correlation was .60, which is satisfactory given the relatively few variables (N=16).
4. Discussion
The present results were similar to previously reported results.
Some limitations of the present study include:
I don't speak Spanish, so I may have overlooked some variables that should have been included
Page 6 of 9.
Figure 3: Scatter plot of S factor scores and cognitive ability.
Figure 4: Jensen plot for S factor loadings and cognitive ability.
in the analysis. They may also be translation errors as I had to rely on those found on the
websites I used.
No educational attainment variables were included despite these often having very strong
loadings. None were available in the data sources I consulted.
Data was missing for many cases and had to be imputed.
Supplementary material
Data files, R source code and high quality figures are available at
This paper was updated 2016-11-16. The update concerned language and figures, but not results.
DANE. (2011). Pobreza Monetaria por Departamentos: Resultados. Retrieved from
Donders, A. R. T., van der Heijden, G. J. M. G., Stijnen, T., & Moons, K. G. M. (2006). Review: A
gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10),
Frisby, C. L., & Beaujean, A. A. (2015). Testing Spearman’s hypotheses using a bi-factor model with
WAIS-IV/WMS-IV standardization data. Intelligence, 51, 79–97.
Hattie, J., & Anderman, E. M. (Eds.). (2012). International Guide to Student Achievement (1 edition).
New York, NY: Routledge.
Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kirkegaard, E. O. W. (2014a). Crime, income, educational attainment and employment among
immigrant groups in Norway and Finland. Open Differential Psychology. Retrieved from
Kirkegaard, E. O. W. (2014b). The international general socioeconomic factor: Factor analyzing
international rankings. Open Differential Psychology. Retrieved from
Page 7 of 9.
Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from
Kirkegaard, E. O. W. (2015b). Finding mixed cases in exploratory factor analysis. The Winnower.
Retrieved from
Kirkegaard, E. O. W. (2015c). The S factor in Brazilian states. The Winnower. Retrieved from
Kirkegaard, E. O. W., & Tranberg, B. (2015). What is a good name? The S factor in Denmark at the
name-level. The Winnower. Retrieved from
List of Colombian departments by GDP. (2015, March 19). In Wikipedia, the free encyclopedia.
Retrieved from
Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of Social and
Clinical Psychology, 18(1), 1–12.
Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity
of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of
Personality, 21(5), 667–706.
Shenkin, S. D., Starr, J. M., & Deary, I. J. (2004). Birth Weight and Cognitive Ability in Childhood: A
Systematic Review. Psychological Bulletin, 130(6), 989–1013.
Templ, M., Alfons, A., Kowarik, A., & Prantner, B. (2015, February 19). VIM: Visualization and
Imputation of Missing Values. CRAN. Retrieved from http://cran.r-
United Nations Development Programme, & UNDP Colombia. (2011). Informe nacional de desarrollo
humano 2011: Resumen ejecutivo. Bogotá, Colombia: PNUD. Retrieved from
Page 8 of 9.
Page 9 of 9.
... No Colombian S factor study had previously been conducted. For this reason, one of us carried out such a study (Kirkegaard, 2015j). The study extracted an S factor from 16 diverse socioeconomic variables. ...
Full-text available
We conducted novel analyses regarding the association between continental racial ancestry, cognitive ability and socioeconomic outcomes across 6 datasets: states of Mexico, states of the United States, states of Brazil, departments of Colombia, sovereign nations and all units together. We find that European ancestry is consistently and usually strongly positively correlated with cognitive ability and socioeconomic outcomes (mean r for cognitive ability = .708; for socioeconomic well-being = .643) (Sections 3-8). In most cases, including another ancestry component, in addition to European ancestry, did not increase predictive power (Section 9). At the national level, the association between European ancestry and outcomes was robust to controls for natural-environmental factors (Section 10). This was not always the case at the regional level (Section 18). It was found that genetic distance did not have predictive power independent of European ancestry (Section 10). Automatic modeling using best subset selection and lasso regression agreed in most cases that European ancestry was a non-redundant predictor (Section 11). Results were robust across 4 different ways of weighting the analyses (Section 12). It was found that the effect of European ancestry on socioeconomic outcomes was mostly mediated by cognitive ability (Section 13). We failed to find evidence of international colorism or culturalism (i.e., neither skin reflectance nor self-reported race/ethnicity showed incremental predictive ability once genomic ancestry had been taken into account) (Section 14). The association between European ancestry and cognitive outcomes was robust across a number of alternative measures of cognitive ability (Section 15). It was found that the general socioeconomic factor was not structurally different in the American sample as compared to the worldwide sample, thus justifying the use of that measure. Using Jensen's method of correlated vectors, it was found that the association between European ancestry and socioeconomic outcomes was stronger on more S factor loaded outcomes, r = .75 (Section 16). There was some evidence that tourist expenditure helped explain the relatively high socioeconomic performance of Caribbean states (Section 17).
Full-text available
We present and analyze data from a dataset of 2358 Danish first names and socioeconomic outcomes not previously made available to the public (“Navnehjulet”, the Name Wheel). We visualize the data and show that there is a general socioeconomic factor with indicator loadings in the expected directions (positive: income, owning your own place; negative: having a criminal conviction, being without a job). This result holds after controlling for age and for each gender alone. It also holds when analyzing the data in age bins. The factor loading of being married depends on analysis method, so it is more difficult to interpret. A pseudofertility is calculated based on the population size for the names for the years 2012 and 2015. This value is negatively correlated with the S factor score r = -.35 [95CI: -.39; -.31], but the relationship seems to be somewhat non-linear and there is an upward trend at the very high end of the S factor. The relationship is strongly driven by relatively uncommon names who have high pseudofertility and low to very low S scores. The n-weighted correlation is -.21 [95CI: -.25; -.17]. This dysgenic pseudofertility was mostly driven by Arabic and African names. All data and R code is freely available.
Full-text available
Two methods are presented that allow for identification of mixed cases in the extraction of general factors. Simulated data is used to illustrate them.
Full-text available
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).
Full-text available
Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results. Method of correlated vectors was strongly positive for all 4 analyses (r’s .78 to .92 with reversing).
Full-text available
I present new predictive analyses for crime, income, educational attainment and employment among immigrant groups in Norway and crime in Finland. Furthermore I show that the Norwegian data contains a strong general socioeconomic factor (S) which is highly predictable from country-level variables (National IQ .59, Islam prevalence -.71, international general socioeconomic factor .72, GDP .55), and correlates highly (.78) with the analogous factor among immigrant groups in Denmark. Analyses of the prediction vectors show very high correlations (generally > ±.9) between predictors which means that the same variables are relatively well or weakly predicted no matter which predictor is used. Using the method of correlated vectors shows that it is the underlying S factor that drives the associations between predictors and socioeconomic traits, not the remaining variance (all correlations near unity).
Full-text available
Many studies have examined the correlations between national IQs and various country-level indexes of well-being. The analyses have been unsystematic and not gathered in one single analysis or dataset. In this paper I gather a large sample of country-level indexes and show that there is a strong general socioeconomic factor (S factor) which is highly correlated (.86-.87) with national cognitive ability using either Lynn and Vanhanen's dataset or Altinok's. Furthermore, the method of correlated vectors shows that the correlations between variable loadings on the S factor and cognitive measurements are .99 in both datasets using both cognitive measurements, indicating that it is the S factor that drives the relationship with national cognitive measurements, not the remaining variance.
Full-text available
Individual differences in cognitive ability may in part have prenatal origins. In high-risk (low birth weight/premature) babies, birth weight correlates positively with cognitive test scores in childhood, but it is unclear whether this holds for those with birth weights in the normal range. The authors systematically reviewed literature on the relationship between normal birth weight (more than 2,500 g) and childhood intelligence in term (37-42-week gestation) deliveries. Six studies met the inclusion criteria, and the authors present a comprehensive narrative review of these studies. There was a small, consistent, positive association between birth weight and childhood cognitive ability, even when corrected for confounders. Parental social class accounted for a larger proportion of the variance than birth weight, and these 2 variables were largely independent.
Spearman's hypothesis (SH) is a phrase coined by Arthur Jensen, which posits that the size of Black–White mean differences across a group of diverse mental tests is a positive function of each test's loading onto the general intelligence (g) factor. Initially, a correlated vector (CV) approach was used to examine SH, where the results typically confirmed that the magnitude of g loadings were positively correlated with the size of mean group differences in the observed test scores. The CV approach has been heavily criticized by scholars who have argued that a more precise method for examining SH can be better investigated using a multi-group confirmatory factor analysis (MG-CFA). Studies of SH using MG-CFA have been much more equivocal, with results not clearly confirming nor disconfirming SH.
Data are presented to show that there are differences in mean population IQ in different regions of the British Isles. Mean population IQ is highest in London and South-East England and tends to drop with distance from this region. Mean population IQs are highly correlated with measures of intellectual achievement, per capita income, unemployment, infant mortality and urbanization. The regional differences in mean population IQ appear to be due to historical differences which are measured back to 1751 and to selective migration from the provinces into the London area.
International cognitive ability and achievement comparisons stem from different research traditions. But analyses at the interindividual data level show that they share a common positive manifold. Correlations of national ability means are even higher to very high (within student assessment studies, r = .60–.98; between different student assessment studies [PISA-sum with TIMSS-sum] r = .82–.83; student assessment sum with intelligence tests, r = .85–.86). Results of factor analyses indicate a strong g-factor of differences between nations (variance explained by the first unrotated factor: 94–95%). Causes of the high correlations are seen in the similarities of tests within studies, in the similarities of the cognitive demands for tasks from different tests, and in the common developmental factors at the individual and national levels including known environmental and unknown genetic influences. Copyright © 2007 John Wiley & Sons, Ltd.