Content uploaded by Emil O. W. Kirkegaard
Author content
All content in this area was uploaded by Emil O. W. Kirkegaard on Apr 02, 2018
Content may be subject to copyright.
What is a good name? The S factor
in Denmark at the name-level
Emil O. W. Kirkegaard1
Bo Tranberg2
Abstract
We present and analyze data from a dataset of 2358 Danish first names and socioeconomic
outcomes not previously made available to the public (Navnehjulet, the Name Wheel). We visualize
the data and show that there is a general socioeconomic factor with indicator loadings in the
expected directions (positive: income, owning your own place; negative: having a criminal
conviction, being without a job). This result holds after controlling for age and for each gender
alone. It also holds when analyzing the data in age bins. The factor loading of being married
depends on analysis method, so it is more difficult to interpret.
A pseudofertility is calculated based on the population size for the names for the years 2012 and
2015. This value is negatively correlated with the S factor score r = -.35 [95CI: -.39; -.31], but the
relationship seems to be somewhat non-linear and there is an upward trend at the very high end of
the S factor. The relationship is strongly driven by relatively uncommon names who have high
pseudofertility and low to very low S scores. The n-weighted correlation is -.21 [95CI: -.25; -.17].
This dysgenic pseudofertility seems to be mostly driven by Arabic and African names.
All data and R code is freely available.
Key words: names, Denmark, Danish, social status, crime, income, education, age, scraping, S
factor, general socioeconomic factor
Introduction
It has been noted that good outcomes tend to go together, but to our knowledge, the factor structure
of such relationships have not been examined before recently by (Kirkegaard, 2014c). When it has,
it has repeatedly been found that there is a general socioeconomic factor to which good outcomes
nearly always have positive loadings and bad outcomes have negative loadings.3 Recent studies
have examined S factors at the national, regional/state and country of origin-level; see (Kirkegaard,
2015c) for a review of regional/state-level studies, and (Kirkegaard, 2014a) for country of origin-
level studies. In this paper we exploit a unique dataset to examine the S factor at the name-level in
Denmark.
1 University of Aarhus, Department of Culture and Society. Email: emil@emilkirkegaard.dk
2 University of Aarhus, Department of Physics and Astronomy. Email: bo@tberg.dk
3 Note that sometimes a factor is reversed such that the good outcomes have negative loadings, and the bad outcomes
have positive loadings. This reversing is quite arbitrary and depends on the balance of good and bad variables
included in the analysis. A preponderance of bad variables means that the factor will be reversed. If the factor is
thus reversed, one can just multiple all loadings by -1 to unreverse it.
The dataset
Last year the Danish newspaper Ugebrevet A4 published an interactive infographic called
"Navnehjulet" (" t he N ame W heel"). It's simple: you just enter a first name and it shows you some
numbers about that name. The data was initially bought from Statistics Denmark and is based on
2012 data. There is no option available to download the dataset. A screenshot of the Name Wheel is
shown in Figure 1.
The more technical aspects of the scraping (“automatic downloading of the data”) are covered
elsewhere (Tranberg, 2015), here we focus on the data and the statistical analyses.
The statistical information shown for each name varies (presumably due to data availability), but in
the cases with full data, it includes:
1. Number of persons with the name.
2. 3 most common job types.
3. 3 most common living areas.
4. Average age.
5. Percents who rent and own their home. Note that this does not always sum to 100%.
6. Percentage with at least one conviction in the last 5 years.
7. Average monthly income in DKK.
8. Marital status (married, cohabiting, registered partner4, single).
9. Employee rate.
10. Student rate.
11. Outside the job market rate.
4 This is a pre-2012 category as an alternative to marriage for same sex couples. One can no longer attain this legal
status, but one can retain it if one acquired it before 2012. See Borger.dk (Danish).
Figure 1: A screenshot of the
Name Wheel with "Emil" entered.
12. Independent rate.
13. Unemployment rate.
14. Chief executive rate.
Of note is that the unemployment variable includes only those who spent at least half the year
without work or who received dagpenge (a kind of unemployment benefit). The outside the job
market variable includes heterogeneous groups: førtidspensionister (pre-time retirees),
folkepensionister (ordinary retirees), efterlønsmodtagere (another type of pre-time retirement),
kontanthjælpsmodtagere (another type of unemployment benefit), and andre (others). As such, this
last variable is a mixture of situations that are normal (ordinary pension, efterløn) and some which
are used by unproductive members of society (førtidspension, kontanthjælp). Thus, interpretation of
that variable is not straightforward. There is a more detailed description of the variables available at
the website. We have taken a copy of this in case the site goes down (see supplementary material; in
Danish).
We downloaded the data for all variables for each of the 2358 names in the database. The gender of
the names was usually not marked, but because they were sorted by gender, we could easily assign
them genders. The gender distribution is 1266 females and 1092 males, or 54% female. This is a
higher female percentage than the actual population (50.3%5). This seems to be due to females
simply having a greater diversity of names. Table 1 shows the top 20 most common names by
gender.
5 Data from table FOLK1, year 2015Q1. Danish Statistics Agency.
http://www.statistikbanken.dk/statbank5a/SelectVarVal/Define.asp?
MainTable=FOLK1&PLanguage=0&PXSId=0&wsid=cftree
Rank Name (F) Thousands Name (M) Thousands
1 Anne 46.690 Peter 49.550
2 Kirsten 43.405 Jens 48.506
3 Hanne 39.680 Lars 45.507
4 Mette 39.007 Michael 45.322
5 Anna 34.995 Henrik 42.775
6 Helle 34.346 Thomas 42.134
7 Susanne 31.593 Søren 41.616
8 Lene 31.270 Jan 38.903
9 Maria 28.651 Niels 38.050
10 Marianne 27.366 Christian 37.528
11 Inge 26.186 Martin 37.151
12 Karen 25.974 Jørgen 35.608
13 Lone 25.695 Hans 35.400
14 Bente 24.845 Anders 34.613
15 Camilla 24.712 Morten 34.230
16 Pia 24.424 Jesper 34.092
17 Louise 23.847 Ole 32.746
18 Charlotte 23.804 Per 32.576
19 Jette 23.775 Mads 31.055
20 Tina 23.320 Erik 30.769
sum sum
603.585 768.131
Table 1: Top 20 most common names by gender.
As can be seen, the top 20 most common female names have a smaller sum than the male sum, by
21%.
A few names have genders marked which was because these were unisex names. Such names were
quite rare (36 pairs).
Missing data
There is quite a bit of missing data, 20% of names have have at least some missing data. For this
reason we examined the distribution of missing data to see if some of it could fruitfully be imputed
(Donders, van der Heijden, Stijnen, & Moons, 2006). The matrix plot is shown in Figure 2.6
6 This plot is made using the matrixplot() function from the VIM package (Templ, Alfons, Kowarik, & Prantner,
2015). The 5 character/string variables are left out because due to a bug in the function, such variables are always
shown as missing all data, whereas in fact in this case none of them had any missing data.
Note: Not all cases are shown due to insufficient resolution of the image.
We see that data is not missing at random but that some cases tend to have a lot of missing data. We
also see that some variables have no missing data (unisex, number, age, conviction).
Which kind of cases have missing data? It cannot be seen from the above, but the missingness is
strongly related to the number of persons with that name, which is not surprising. The data is
limited to names where there are 100 or more persons. To see the relationship, we sort the data by
number of persons and replot the matrix plot; Figure 3.
Another way to examine missingness is to examine the distribution of cases by the number of
missing cases. A histogram of this is shown in Figure 4.
Figure 2: Matrix plot for missing data.
Figure 3: Matrix plot of missing data, cases sorted by number of
persons with the name.
While about 20% of the data has 13 missing datapoints, a small number of datapoints (71) have
only 2 missing datapoints. These can be imputed to slightly increase the sample size.
Getting an overview of the data
Before running numerical analyses on data, it is important to get a solid overview of it. This is
because one can rapidly identify patterns by eye that may go unnoticed by numerical analyses. For
instance, relying on correlations can miss important non-linear patterns, which can easily be
identified by eye if data or plotted using a moving average or similar (Lubinski & Humphreys,
1996).
The classic example of this is Anscombe's quartet (“Anscombe’s quartet,” 2016), 4 bi-variate
datasets with which have (almost) the same mean of x and y, variance/standard deviation,
correlation and regression coefficients (intercept and slope). However, plotting the data reveals that
they are very different.
Histograms
Histograms are the easiest way to get a quick overview of the data structure. We plot selected
histograms in Figures 5-8. The rest are available in the supplementary material.
Figure 4: Histogram of cases by number of missing datapoints.
We see a power law distribution in that most of the names have only a few persons with it, while a
few have many thousands. The top 20 by gender were shown in Table 1. The mean and median
number of persons per name are: 2209 and 316. Since the data is capped at at least 100 persons per
name, showing the least common names is not particularly interesting. The curious reader can
consult the supplementary material (results/number_ranks.csv).
The distribution of the mean age of names is a fat normal distribution. Top 5 youngest: Elliot,
Milas, Noam, Storm, Mynte (MMMMF); oldest: Valborg, Hertha, Dagny, Magna, Erna (all F).
Figure 5: Histogram of number of persons per name. Note that the x-
axis is log-scale.
Figure 6: Histogram of ages.
The income distribution is fairly normal with a long right tail. Presumably, a few very rich people
with uncommon names result in those names having very high incomes. The top scores are: Renè
(M), Leise (F), Frants (M), Heine (M) and Thorleif (M). The bottom scorers are dominated by
names who are very young and thus have very low incomes, e.g. Alberte (mean age 8, mean income
4893 dkk). These have little interest so we shall not mention them.
It is clear that some names are much more criminal than others, the top scorers are: Alaa, Ferhat,
Walid, Rachid, Fadi (all male). The female top scorer is Vesna (top #51). These names are all
foreign, mostly Arabic, except the female name which is Slovenian according to
http://www.meaning-of-names.com. This result is expected because persons from Muslim countries
are highly overrepresented in crime statistics (Kirkegaard & Fuerst, 2014).
Variables by age and gender
Since the mean age of the names has central importance to the other variables (e.g. income) and
since gender is a suitable dichotomous variable, we plot the other variables by age and gender.
These are shown in Figures 9 to 17.
Figure 7: Histogram of incomes.
Figure 8: Histogram of mean convictions past 5 years.
We see the familiar pattern in that men earn more money than women. The difference is stable until
about age 45 where it increases. Interpretation is difficult because the data is cross-sectional, not
longitudinal and hence there are both age and cohort differences between the names. Still, one
would expect something to happen at about that age that increases the difference.
It is well-known that crime tends to be committed by younger males, we see the same pattern here.
Recall that this is the percentage of persons with the name who has at least one conviction the last 5
years. Thus, it has a bit of lag which is probably why it is fairly high for even men in their 40 --
they could have gotten their conviction at age 35.
Figure 9: Income by age and gender.
Figure 10: Convictions by age and gender.
This variable is the odd one comprising both regular pensions as well as some unemployment
benefits and other benefits given to people who cannot/won't work (e.g. who had a work accident,
have severe psychological problems, are just lazy). As expected, it goes up heavily with age as
people go on pension.
There are known gender differences in rates of self-employment, and we see it here as well at all
ages. It seems to increase over the lifespan a bit being at maximum value perhaps around 45-50.
Figure 11: Being outside the job market by age and gender.
Figure 12: Being independently employed by age and gender.
This one is interesting in that it has an odd pattern at old age. Our guess is that the men who are
married tend to live longer which explains the male pattern, while the female pattern is explained by
the fact that women live longer than men and their husbands die off before them, leaving them
widowed (unmarried). In discussion with EOWK, A. J. Figueredo suggested that it may be due to
serial monogamy. Simply put, some men divorce their aging wives and marry a younger one. This
would tend to keep men married at older ages as well as decreasing the marriage rate of older
women.
This one is odd in that at middle age around 30 more women have their own home, but men catch
up later. One could think of it as men making an earlier investment of their resources into career,
while women are more interested in getting a home. And when men’s careers get going at age 45
and above, they acquire their homes. Again, due to the cross-sectional data, it is difficult to say.
Figure 14: Owning a home by age and gender.
Figure 13: Marital status by age and gender.
This is the variable for 'pure' unemployment. The gender difference is only slight at early to mid
ages, while it reverses in direction at older ages. It is somewhat odd that it is highest around age 35.
Girls and women generally acquire more formal education than more and we see it in the data here
as well.
Figure 16: Being and student by age and gender.
Figure 15: No job by age and gender.
Finally, there is a clear gender and age pattern in being an executive. Males are more likely at all
ages, but there is an increase around age 45, especially for men. This is presumably the explanation
of the pattern seen for income in Figure 9.
Is there an S factor among names?
Some of the variables are (almost) linearly dependent on each other. Own.place and rent.place sum
to nearly 100, so using both in an analysis would perhaps cause problems. The same is true for the 4
civil status variables (married, cohabiting, reg. partnership, single), and the 6 employment
variables (no.job, employee, out.of.work.market, student, independent, executive). To be safe, one
should probably not pick more than one from each of these three sets.
To do a factor analysis we must however pick some of them. We decided on the following: no.job,
own.place, married, conviction and income. The expectation is that no.job and conviction will have
negative loadings, while own.place and income will have positive, and marriage perhaps somewhat
positive (Herrnstein & Murray, 1994).
What we want to measure is the general socioeconomic status factor (if it exists). However, gender
can disrupt the analysis. This is because men earn more money but are also more criminal. This may
lead to gender specific variance, which is error in the factor analysis. One could regress out the
effect of gender, but we instead divide the dataset into two which also allows for easier
interpretation of the results.
Age has a strong influence on the variables which may disrupt results. For instance, a very young
name will have lower income and a low conviction rate, which will result in high mixedness
(Kirkegaard, 2015b). For this reason, we use both the original variables for analysis and a version of
them where the effect of age has been partialed out. To do this, we regress every value on age, age2
and age3.
Some cases had some missing datapoints (refer back to Figure 2). We imputed the cases with 2 or
fewer missing datapoints and excluded the rest.
Correlation matrices
Figure 17: Being an executive by age and gender.
Before looking at the factor analysis results, we will look at the correlation matrices by gender and
together, as well as with and without partialing out the effects of age; Tables 2-4.
no.job own.place married conviction income age
no.job -0.43 0.42 0.28 -0.27 0.00
own.place -0.50 0.08 -0.31 0.68 0.00
married 0.18 0.43 0.00 0.09 0.00
conviction 0.42 -0.44 -0.13 -0.06 0.00
income -0.21 0.73 0.45 -0.12 0.00
age -0.28 0.58 0.57 -0.26 0.48
Table 2: Correlation matrix of S variables for both genders. Above diag., age partialed out.
no.job own.place married conviction income age
no.job -0.46 0.34 0.51 -0.37 0.00
own.place -0.53 0.13 -0.39 0.75 0.00
married 0.03 0.54 0.02 0.03 0.00
conviction 0.63 -0.59 -0.29 -0.35 0.00
income -0.29 0.78 0.46 -0.40 0.00
age -0.21 0.63 0.71 -0.35 0.60
Table 3: Correlation matrix of S variables for men. Above diag., age partialed out.
no.job own.place married conviction income age
no.job -0.43 0.55 0.26 -0.21 0.00
own.place -0.47 -0.12 -0.31 0.69 0.00
married 0.34 0.27 0.00 -0.04 0.00
conviction 0.42 -0.41 -0.06 -0.23 0.00
income -0.12 0.74 0.39 -0.18 0.00
age -0.33 0.56 0.46 -0.28 0.45
Table 4: Correlation matrix of S variables for women. Above diag., age partialed out.
Below the diagonal, one can see that the linear effect of age is often substantial, while above the
diagonal, the linear effect of age is zero, meaning that generally the partialization worked, at least
linearly speaking. Generally, the relationships were similar across gender. There are some
exceptions. To make them easier to see, Table 5 shows the delta (difference) correlation matrix.
no.job own.place married conviction income age
no.job -0.03 -0.20 0.25 -0.16 0.00
own.place -0.06 0.25 -0.09 0.06 0.00
married -0.31 0.28 0.03 0.07 0.00
conviction 0.20 -0.18 -0.23 -0.13 0.00
income -0.17 0.04 0.07 -0.22 0.00
age 0.12 0.06 0.26 -0.07 0.15
Table 5: Delta correlation matrix for genders. Higher values mean men's correlations are stronger.
The largest difference for the age-partialed data is the relationship between being married and
having no job (recall that this does not include those pensioned). Among female names, there is a
strong relationship between unemployment and being married. Perhaps because women are more
often reliant on their husbands (being a homemaker) than the reverse, but both correlations were
positive. It could also have something to do with Muslim immigrants (about 10% of the population)
who are often married and where a large fraction of the women are unemployed.
Factor analyses
The loadings plots are shown in Figure 18.
The factors were not particularly strong, as shown in Table 6.
Factor
analysis Var%
S.Both 0.43
S.Male 0.51
S.Female 0.40
S.BothNA 0.35
S.MaleNA 0.39
S.FemaleNA 0.35
Table 6: Variance explained by S factors.
The factors decreased in size after correcting for age, which could be because age was inflating the
factor size, or because the correction was too strong. The gender difference in the marriage indicator
is strong: about 0 vs. about .5 after age correction. Notice that the own.place has loadings near 1, so
the S factor is about equal to variable in these datasets. It is probably an indicator sampling error
that would be corrected if more indicators of greater diversity were available.7 Some previous S
factor studies have found the same when only a few indicators were used, e.g. Kirkegaard (2015a,
first analysis).
Still, the factor loadings are in the expected directions for all variables in all analyses.
Given the similar factor loadings, one would also expect the extracted factor scores to be similar,
which Table 7 shows them to be.
S.both S.both.no.age S.men S.men.no.age S.women S.women.no.age
S.both 0.75 1.00 0.67 1.00 0.76
S.both.no.age 0.75 0.82 0.92 0.65 0.98
S.men 1.00 0.82 0.67
S.men.no.age 0.67 0.92 0.67
S.women 1.00 0.65 0.76
7Indicator sampling error is meant to be a generalized version of Jensen's psychometric sampling error, see e.g.
(Kranzler & Jensen, 1991).
Figure 18: Loadings plot for factor analyses.
S.women.no.age 0.76 0.98 0.76
Table 7: Correlations between S factors across analyses.
Note: The apparently missing values are because the data does not overlap. There are no scores for
men in the S factor analyses with only women.
Using age bins instead
In the above analyses, we have analyzed data for all ages both with and without partialing the
effects of age out. However, age may be insufficiently dealt with by the chosen correction method,
and its effect may be so strong that not correcting for it also leads to spurious results. Hence we
employed a third method, that of age bins. The dataset is large enough that we can split it up into
age groups as well as gender and analyze each subgroup separately. While this does not entirely
remove the age effect, it is more likely to not introduce any spurious over-correction effects.
Concretely, we analyzed subgroups within 5 year brackets starting at age 20-25 and stopping at age
50-55. We do this for both genders together and each separately. The analysis procedure is the same
as above, namely extracting the general factor and examining the loadings and the factor sizes.
Figures 19-21 show the factor loadings by age bin for both genders together and each separately.
Figure 19: Factor loadings by age bins, both genders together.
Figure 20: Factor loadings by age bins, males only
The most conspicuous finding is the marriage loadings which are now negative! Apparently, the
positive loadings from before were an age confound. The exception is the last two age groups where
the marriage indicator is positive, especially for the last group. The odd finding that for 50-55 year
olds, crime has a loading around 0 is presumably sampling error as well as reflecting the fact that
crime among people in their 50s is fairly rare. When the base rate is low, correlations become
weaker and factor loadings are based on the correlation patterns in the data (Ferguson, 2009). The
sample sizes are not terribly impressive, 126 to 257, and the least for the last two groups. The ones
by each gender about half that.
For the male data, the marriage loadings are about 0. The two last age bins are again positive. The
other four loadings are somewhat stronger in males with criminality actually having stronger
(negative) correlations than unemployment. This is presumably because crime is more common
among males which means the correlations are stronger.
Finally, for the female data, marriage loadings are more strongly negative except for the last two
age bins, same as with the male data.
Figure 22 shows the factor strength by age bin and gender, together and separate.
Figure 21: Factor loadings by age bins, females only.
Generally the male-only analyses had the strongest S factors (6/7), with the female-only analyses
being above the one with both (5/7). One might interpret this as being due to the lower base rate of
crime making the correlations with the crime variable smaller for females which makes the factor
size smaller. The mixed-gender analyses usually had smaller factors, perhaps because the of the
mixedness that results from this as discussed earlier.
Pseudofertility and the S factor
Since the Name Wheel data contains the count of persons with each name in 2012, if we could find
some data for a later year for the same names, we could calculate a name-wise 'fertility', which we
shall call pseudofertility. It is the growth (or decrease) in number of persons with each name in
Denmark. This may be due to actual births, immigration or name-changes. This pseudofertility can
then be compared to the S factor score for each name to see if there is any relationship. A somewhat
negative relationship is expected due to low S immigrant names increasing their number via higher
than average fertility (at least in the first generation, (Kirkegaard, 2014b)) and immigration.
The Danish Statistics agency (Danish Statistics) maintains a web page where one can look up any
first or last name and see how many people have that name in the current year and last year. Using a
similar method to that using to scrape the data form the Name Wheel, we scraped the count data for
the years 2014 and 2015 for every name in our dataset. From these data, we calculated the
pseudofertility by the fractional increase (or decrease) of each name over both the period 2012-2015
and 2014-2015. The first should give a more reliable number since it's over a few years as opposed
to the second which is over 1 year only. Their correlation is .95 (no outliers), so reliability was very
high.
Figure 23 shows the scatter plot of pseudofertility 2012-2015 and S factor score (age adjusted, both
genders together).
Figure 22: Factor sizes by age bin and gender
Overall, there is a medium-sized negative relationship, r = -.35 [95CI: -.39; -.31], between
pseudofertility and S factor score (age-controlled). As can be seen in the plot, this is mainly due to
the names left of 0 S (the below average). There appears to be an upward trend at the other end, but
there are relatively few datapoints, so it may be a fluke. The point sizes show that the names
creating the trend are relatively uncommon (few people have those names, relatively speaking). The
largest names cluster around S [0-1.5]. For this reason, we also calculated the weighted correlation
which is -.21 [95CI: -.25; -.17], so the effect is still reliable but substantially smaller as expected
from the inspection of the plot.
We plotted the figure in very high resolution using vector graphics so that one can zoom in on any
given region. The reader can examine the pseudofertility_names.svg file in the supplementary
material to explore the figure. Looking at the names in the region creating the negative slope reveals
them to be almost exclusively immigrant names from Arabic or African countries, e.g.: Mohammad,
Hossein, Mostafa, Sayed, Malika, Mana, Slawomir, Omar (names from the region north of the
moving average near S = -1.5). Unfortunately the dataset does not contain information about the
immigration status of each name, so we could exclude all of them and see if the 'dysgenic'
relationship holds without immigrants.
Thus, the name data reveals a small 'dysgenic' effect on S in line with modeling by (Kirkegaard &
Tranberg, 2015). If the trend were to continue, and assuming that everything else is equal, then the
average level of socioeconomic status would fall in Denmark and there would be increasing
socioeconomic inequality.
Discussion and conclusion
Despite being a new level of analysis (at least to us), the results were generally in line with those
from more 'traditional' country, regional/state-level and origin country-level analyses.
This dataset contained first names, but one could also analyze last names which are more familial in
nature. Such data was not available at the Name Wheel website, but it could probably be acquired
from the statistical agency if one is willing to pay.
Figure 23: Pseudofertility 2012-2015 and S factor scores. Point
sizes are proportional to the number of persons with the name.
The dataset is especially useful for researchers wishing to investigate the (in)accuracy of
stereotypes of names, see e.g. (Jussim, Cain, Crawford, Harber, & Cohen, 2009; Jussim, 2012).
Limitations
As mentioned earlier, the data are an odd kind of cross-sectional data which makes it difficult to
infer causality. A given difference observed between names with a mean age of 20 and 40, could be
either an effect of age (being 20 versus 40), a cohort effect (being born in 1995 versus 1975), or
something more complicated.
The mean age of the names is tricky to interpret since the distribution of age of persons with the
name is not shown. This could be a normal distribution if the name was fashionable at some point
but then faded out. However, it could also be bi-modal. For instance, if a name was fashionable in
1965 and in 1995, there would be two groups of persons. One aged about 50 and one aged about 20.
If they are about evenly distributed the mean age of the name would be about 35 despite few people
with the name being that age.
Aside from the extra population data from Danish Statistics, the dataset only has data from one year
(2012). It would be better if data for more than one year was available. Both to avoid fluke effects,
but also to examine e.g. the effects of macroeconomics on the relationships between the variables.
To our knowledge, this is a new kind of grouped data and so methods for analyzing it have not been
well-tested. This should give some extra caution about the inferences drawn from it.
Supplementary material
Data, source code and figures are available at https://osf.io/t2h9c/.
References
Anscombe’s quartet. (2016, November 7). In Wikipedia. Retrieved from
https://en.wikipedia.org/w/index.php?title=Anscombe%27s_quartet&oldid=748318997
Donders, A. R. T., van der Heijden, G. J. M. G., Stijnen, T., & Moons, K. G. M. (2006). Review: A
gentle introduction to imputation of missing values. Journal of Clinical Epidemiology,
59(10), 1087–1091. https://doi.org/10.1016/j.jclinepi.2006.01.014
Ferguson, C. J. (2009). Is psychological research really as good as medical research? Effect size
comparisons between psychology and medicine. Review of General Psychology, 13(2), 130.
Herrnstein, R. J., & Murray, C. (1994). The Bell Curve: Intelligence and Class Structure in
American Life. New York: Free Press.
Jussim, L. (2012). Social Perception and Social Reality: Why Accuracy Dominates Bias and Self-
Fulfilling Prophecy. Oxford University Press.
Jussim, L., Cain, T. R., Crawford, J. T., Harber, K., & Cohen, F. (2009). The Unbearable Accuracy
of Stereotypes. In Handbook of Prejudice, Stereotyping, and Discrimination (p. 608). Taylor
& Francis Group, LLC.
Kirkegaard, E. O. W. (2014a). Crime, income, educational attainment and employment among
immigrant groups in Norway and Finland. Open Differential Psychology. Retrieved from
http://openpsych.net/ODP/2014/10/crime-income-educational-attainment-and-employment-
among-immigrant-groups-in-norway-and-finland/
Kirkegaard, E. O. W. (2014b). Criminality and fertility among Danish immigrant populations. Open
Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/03/criminality-and-
fertility-among-danish-immigrant-populations/
Kirkegaard, E. O. W. (2014c). The international general socioeconomic factor: Factor analyzing
international rankings. Open Differential Psychology. Retrieved from
http://openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-
analyzing-international-rankings/
Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved
from https://thewinnower.com/papers/examining-the-s-factor-in-mexican-states
Kirkegaard, E. O. W. (2015b). Finding mixed cases in exploratory factor analysis. The Winnower.
Retrieved from https://thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-
analysis
Kirkegaard, E. O. W. (2015c). The S factor in Brazilian states. The Winnower. Retrieved from
https://thewinnower.com/papers/the-s-factor-in-brazilian-states
Kirkegaard, E. O. W., & Fuerst, J. (2014). Educational attainment, income, use of social benefits,
crime rate and the general socioeconomic factor among 71 immigrant groups in Denmark.
Open Differential Psychology. Retrieved from
http://openpsych.net/ODP/2014/05/educational-attainment-income-use-of-social-benefits-
crime-rate-and-the-general-socioeconomic-factor-among-71-immmigrant-groups-in-
denmark/
Kirkegaard, E. O. W., & Tranberg, B. (2015). Increasing inequality in general intelligence and
socioeconomic status as a result of immigration in Denmark 1980-2014 |. Retrieved from
http://openpsych.net/ODP/2015/03/increasing-inequality-in-general-intelligence-and-
socioeconomic-status-as-a-result-of-immigration-in-denmark-1980-2014/
Kranzler, J. H., & Jensen, A. R. (1991). Unitary g: Unquestioned postulate or Empirical fact?
Intelligence, 15(4), 437–448. https://doi.org/10.1016/0160-2896(91)90005-X
Lubinski, D., & Humphreys, L. G. (1996). Seeing the forest from the trees: When predicting the
behavior or status of groups, correlate means. Psychology, Public Policy, and Law, 2(2),
363–376. https://doi.org/10.1037/1076-8971.2.2.363
Templ, M., Alfons, A., Kowarik, A., & Prantner, B. (2015, February 19). VIM: Visualization and
Imputation of Missing Values. CRAN. Retrieved from http://cran.r-
project.org/web/packages/VIM/index.html
Tranberg, B. (2015, May 1). Data mining: “Navnehjulet.” Retrieved from http://tberg.dk/post/data-
mining-navnehjulet/