The general socioeconomic factor
among Colombian departments
Emil O. W. Kirkegaard1
A dataset was compiled with 17 diverse socioeconomic variables for 32 departments of Colombia and
the capital district. Factor analysis revealed an S factor. Results were robust to data imputation and
removal of a redundant variable. 14 of 17 variables loaded in the expected direction. Extracted S
factors correlated about .50 with the cognitive ability estimate. The Jensen coefficient for the S factor
for this relationship was .60.
Key words: Colombia, departments, social inequality, S factor, general socioeconomic factor, IQ,
intelligence, cognitive ability, cognitive sociology
The general socioeconomic factor is the mathematical construct associated with the idea that positive
outcomes tend to go along with other positive outcomes, and likewise for the negative. Mathematically,
this shows up as a factor where the desirable outcomes load positively and where the undesirable
outcomes load negatively. As far as I know, (Kirkegaard, 2014b) was the first to report such a factor,
although Lynn (1979) was close to the same idea. The factor is called s at the individual level, and S
when found in aggregated data following the proposed analogous terminology for the general factor of
cognitive ability (Rindermann, 2007).
By now, S factors have been found between countries (Kirkegaard, 2014b), twice between country-of-
origin groups within countries (Kirkegaard, 2014a), numerous times within countries (reviewed in
Kirkegaard, 2015c), and at the level of first names (Kirkegaard & Tranberg, 2015). This paper analyses
data for 33 Colombian departments including the capital district.
1 University of Aarhus, Denmark. Email: email@example.com
Page 1 of 9.
2. Data sources
Most of the data were found via the English-language website Knoema.com which is an aggregator of
statistical information concerning countries and their divisions. A second source was a Spanish-
language report (DANE, 2011). One variable had to be found on Wikipedia (“List of Colombian
departments by GDP,” 2015). Finally, HDI20102 was found in a Spanish-language UN report (United
Nations Development Programme & UNDP Colombia, 2011).
Variables were selected according to two criteria: 1) they must be socioeconomically important and 2)
they must not be strongly dependent on local climatic conditions. For instance, fishermen per capita
would be a variable that fails both criteria, since it is not generally seen as socioeconomically important
and is dependent on having access to a body of water.
The included variables were:
•SABER, verbal scores
•SABER, math scores
•Acute malnutrition, %
•Chronic malnutrition, %
•Low birth weight, %
•Access to clean water, %
•The presence of a sewerage system, %
•Immunization coverage, %
•Child mortality, rate
•Infant mortality, rate
•Life expectancy at birth
•Total fertility rate
•Births that occur in a health clinic, %
•GDP per capita
•Domestic violence, rate
•Population, absolute number
2 Human Development Index.
Page 2 of 9.
SABER (ICFES exam) is a local academic achievement test similar to SAT used in the United States
(Hattie & Anderman, 2012, sec. 9.7).
3.1. Missing data
When collecting the data, it was noticed that quite a number of the variables have missing data. A plot
of the missing data is shown in Figure 1.
The red cells indicate missing data. The greyscale fields indicate high (dark) and low values in each
variable. Thus we can see that missing data tends to cluster within departments.
3.2. Redundant variables and imputation
Very highly correlated variables cause problems for factor analysis and result in 'double weighing' of
some variables. For this reason, an algorithm developed previously was used to find the most highly
correlated pairs of variables and remove one of them automatically (Kirkegaard, 2015a). A threshold of
r = .90 was used (absolute). Only one pair of variables was correlated above this threshold (infant
mortality and child mortality, r = .92; removed infant mortality).
The missing data was imputed using the irmi function from the VIM package (Templ, Alfons,
Kowarik, & Prantner, 2015). This was done without noise to make the results reproducible. No attempt
was made to estimate standard errors, so multiple imputation was unnecessary (Donders, van der
Heijden, Stijnen, & Moons, 2006).
To check whether results were comparable across methods, datasets were saved with every
Page 3 of 9.
Figure 1: Matrix plot for the dataset.
combination of imputation and removal of the redundant variable, thus creating 4 datasets.
3.3. Factor analysis
Factor analysis was carried out on the 4 datasets. The factor loadings plot is shown in Figure 2.
Results were were similar across methods. Per S factor theory, the desirable variables should have
positive loadings and the undesirable negative loadings. This was not entirely the case. 3 variables that
are generally considered undesirable loaded positively: unemployment rate, low birth weight and
Unemployment rate and crime has been found to load in the wrong direction before when analyzing
state-like units. It may be due to the welfare systems being better in the higher S departments, making it
possible to survive without working.
It is said that cities breed crime and since urbanicity has a very high positive S loading, the crime result
may be a side-effect of that (Lynn, 1979). Alternatively, the legal system may be better (e.g. less
corrupt) in the higher S departments making it more likely for crimes to be reported. This is perhaps
especially so for crimes against women.
The result with low birth weight is more strange given that higher birth weight is a known correlate of
higher educational levels and cognitive ability (Shenkin, Starr, & Deary, 2004). One of the other
variables suggest an answer: in the lower S departments, a large fraction (30-40%) of births are home-
births, and it seems likely that this would result in fewer reports of low birth weights.
Generally, the results are consistent with those from other countries; 14 of 17 variables loaded in the
Page 4 of 9.
Figure 2: Factor loadings plot.
3.4. Mixed cases
Mixed cases are cases that do not fit the factor structure of a dataset. Previously I developed two
methods for detecting such cases (Kirkegaard, 2015b). Neither method indicated any strong mixed
cases in the unimputed, unreduced dataset or the imputed, reduced dataset. Removing the least
congruent case only increased the factor size by 1.2%point, and the case with the greatest mean
absolute residual had only .89.
Unlike previous analysis, the capital district was kept because it did not appear to be a structural outlier.
3.5. Cognitive ability, S and HDI
The two cognitive (SABER) variables correlated at .84, indicating the presence of the aggregate
general cognitive ability factor (G factor; Rindermann, 2007). They were averaged to form an estimate
of the G factor.
The correlations between S factors, HDI and cognitive ability is shown in Table 1.
S S.ri HDI CA
S0.99 0.84 0.54
S.ri 0.99 0.85 0.49
HDI 0.84 0.87 0.44
CA 0.51 0.58 0.60
Table 1: Correlation matrix for cognitive ability, S factor and HDI. Correlations below diagonal are
weighted by the square root of population size. S.ri = reduced and imputed dataset.
Weighted and unweighted correlations were approximately the same. The imputed and trimmed S
factor was nearly identical to the HDI values, despite that the HDI values are from 2010 and the data
the S factor is based on is from 2005. Results are fairly similar to those found in other countries.
Figure 3 shows a scatter plot of S factor (reduced, imputed dataset) and cognitive ability.
Page 5 of 9.
3.6. Jensen's method
Finally, as a robustness test, Jensen's method (method of correlated vectors (Frisby & Beaujean, 2015;
Jensen, 1998)) was used to see if cognitive ability’s association with the S factor scores was due to the
latent S factor. Figure 4 shows the Jensen plot.
The correlation was .60, which is satisfactory given the relatively few variables (N=16).
The present results were similar to previously reported results.
Some limitations of the present study include:
•I don't speak Spanish, so I may have overlooked some variables that should have been included
Page 6 of 9.
Figure 3: Scatter plot of S factor scores and cognitive ability.
Figure 4: Jensen plot for S factor loadings and cognitive ability.
in the analysis. They may also be translation errors as I had to rely on those found on the
websites I used.
•No educational attainment variables were included despite these often having very strong
loadings. None were available in the data sources I consulted.
•Data was missing for many cases and had to be imputed.
Data files, R source code and high quality figures are available at https://osf.io/92vqd/.
This paper was updated 2016-11-16. The update concerned language and figures, but not results.
DANE. (2011). Pobreza Monetaria por Departamentos: Resultados. Retrieved from
Donders, A. R. T., van der Heijden, G. J. M. G., Stijnen, T., & Moons, K. G. M. (2006). Review: A
gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10),
Frisby, C. L., & Beaujean, A. A. (2015). Testing Spearman’s hypotheses using a bi-factor model with
WAIS-IV/WMS-IV standardization data. Intelligence, 51, 79–97.
Hattie, J., & Anderman, E. M. (Eds.). (2012). International Guide to Student Achievement (1 edition).
New York, NY: Routledge.
Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kirkegaard, E. O. W. (2014a). Crime, income, educational attainment and employment among
immigrant groups in Norway and Finland. Open Differential Psychology. Retrieved from
Kirkegaard, E. O. W. (2014b). The international general socioeconomic factor: Factor analyzing
international rankings. Open Differential Psychology. Retrieved from
Page 7 of 9.
Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from
Kirkegaard, E. O. W. (2015b). Finding mixed cases in exploratory factor analysis. The Winnower.
Retrieved from https://thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-
Kirkegaard, E. O. W. (2015c). The S factor in Brazilian states. The Winnower. Retrieved from
Kirkegaard, E. O. W., & Tranberg, B. (2015). What is a good name? The S factor in Denmark at the
name-level. The Winnower. Retrieved from https://thewinnower.com/papers/what-is-a-good-
List of Colombian departments by GDP. (2015, March 19). In Wikipedia, the free encyclopedia.
Retrieved from https://en.wikipedia.org/w/index.php?
Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of Social and
Clinical Psychology, 18(1), 1–12. https://doi.org/10.1111/j.2044-8260.1979.tb00297.x
Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity
of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of
Personality, 21(5), 667–706. https://doi.org/10.1002/per.634
Shenkin, S. D., Starr, J. M., & Deary, I. J. (2004). Birth Weight and Cognitive Ability in Childhood: A
Systematic Review. Psychological Bulletin, 130(6), 989–1013. https://doi.org/10.1037/0033-
Templ, M., Alfons, A., Kowarik, A., & Prantner, B. (2015, February 19). VIM: Visualization and
Imputation of Missing Values. CRAN. Retrieved from http://cran.r-
United Nations Development Programme, & UNDP Colombia. (2011). Informe nacional de desarrollo
humano 2011: Resumen ejecutivo. Bogotá, Colombia: PNUD. Retrieved from
Page 8 of 9.
Page 9 of 9.