ArticlePDF Available

Are there Complex Assortative Mating Patterns for Humans? Analysis of 340 Spanish Couples

  • Ulster Institute for Social Research

Abstract and Figures

Assortative mating for both physical and psychological traits is well-established in many animal species, including humans. Most studies, however, only compute linear measures of mate similarity, typically Pearson correlations. However, it is possible that trait similarity, or dissimilarity, has complex patterns missed by the correlation metric. We investigated a dataset of 340 Spanish couples for evidence of relationships across 7 traits: age, educational attainment, intelligence, and the scales of the Eysenck Personality Questionnaire: Extroversion, Psychoticism, Neuroticism, and the Lie scale. We replicated well known linear assortative mating for age, intelligence and education. Like most studies, we find weak to no assortative mating for the personality traits. Analysis of nonlinear patterns using regression splines failed to reveal anything beyond the linear relations. Finally, we examined cross-trait variation for couples but we found little of note. Overall, it does not appear that there are complex patterns for traits in human couples.
Content may be subject to copyright.
MANKIND QUARTERLY 2021 61:3 578-598
Are there Complex Assortative Mating Patterns for
Humans? Analysis of 340 Spanish Couples
Emil O. W. Kirkegaard*
Ulster Institute for Social Research, London, UK
* Corresponding author:
Assortative mating for both physical and psychological traits is
well-established in many animal species, including humans. Most
studies, however, only compute linear measures of mate similarity,
typically Pearson correlations. However, it is possible that trait
similarity, or dissimilarity, has complex patterns missed by the
correlation metric. We investigated a dataset of 340 Spanish couples
for evidence of relationships across 7 traits: age, educational
attainment, intelligence, and the scales of the Eysenck Personality
Questionnaire: Extroversion, Psychoticism, Neuroticism, and the Lie
scale. We replicated well known linear assortative mating for age,
intelligence and education. Like most studies, we find weak to no
assortative mating for the personality traits. Analysis of nonlinear
patterns using regression splines failed to reveal anything beyond
the linear relations. Finally, we examined cross-trait variation for
couples but we found little of note. Overall, it does not appear that
there are complex patterns for traits in human couples.
Key Words: Assortative mating, Mate similarity, Nonlinear, Trade-
off, Human, Cross-trait association
Large meta-analyses show that in nature, mates tend to be more
phenotypically similar to each other than random pairs of individuals. This is
referred to as assortative mating (Janicke et al., 2019; Vandenberg, 1972). It
contrasts with disassortative or negatively assortative mating, where mates are
less similar than chance for some phenotype (e.g. coloration in wolves, Hedrick
et al., 2016). Both patterns are found in nature, but the former is more common.
Evidence is abundant that humans also mate assortatively (Luo, 2017). Indeed,
most human languages contain sayings about this phenomenon such as “birds
of a feather flock together”, “krage søger mage” (crow seeks mate, in Danish), or
“rybak rybaka vidit izdaleka” (fisherman sees another fisherman from afar, in
Assortative mating is a special case of the more general phenomenon of
social similarity. In other words, it’s not just mates but also neighbors, friends,
colleagues, and any other network connections, that tend to be similar. Many or
most of these network links are genetically unrelated, at least for recent family
history, and thus the phenotypic similarity cannot be entirely explained with genes
shared by recent descent. Instead, social similarity mainly reflects self-selection
and various assortment processes in human networking behavior. For instance,
people who are similar in vocational interests often end up studying the same
topic in college, and end up friends. Similarly, people who share an interest in
dancing may meet at a club or dance and become friends. Unsurprisingly,
research also indicates that people are more friendly disposed to people who are
like themselves, called social homophily (love of the same, in Greek) (Aiello et al.,
2012; Huber & Malhotra, 2017; McPherson et al., 2001).
Social similarity, including assortative mating, differs importantly in strength
by phenotype. Age is probably the strongest sorter, as evidenced by the existence
of widespread terms used to refer to the ‘age-outgroup’ (“old people”, “boomers”,
“kids”, etc.). Most countries enforce age-groupings for children during childhood
children are largely segregated by year of birth in the first 10 or so years of
school systems so to some extent, the strong age similarity in networks is
socially imposed not merely self-selected. Some typically found values for
assortative mating correlations in humans are (findings summarized from Luo,
2017): age .70s to .90s, educational attainment or general social status .40s to
.60s, attitudes (e.g. political or religious opinions) .40s to .70s, and .10s to .40s
for personal values (e.g. risk taking), general intelligence .40s, mental health or
well-being .20s to .50s (depending on disorder), habitual behaviors (e.g. smoking,
exercise) .20s to .50s. Personality (mostly self-rated OCEAN/Big Five
) shows
surprisingly weak assortative mating, with typical coefficients from 0 to .30s. Even
various physical traits show some association (e.g. hand length r = .18;
Vandenberg, 1972), typically .10s to .20s for specific traits, and .30s to .40s for
physical attractiveness. Height is a particularly well-studied trait (k = 154, Stulp et
al., 2017), probably due to ease of accurate measurement and strong role in
A collection of these can be found at
Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism/
Emotional Stability.
human social perception, and shows an overall association of r = .23. There did
not seem to be any historical changes in the degree of similarity. There are also
strong patterns of assortative mating for race and ethnicity, decreasing in recent
decades. However, because these are measured as nominal traits in most
studies, Pearson correlations depend strongly on the sample proportions, so the
effect sizes are not comparable to the other traits listed, which are based on
continuous or quasi-continuous traits.
The existing literature suffers from some limitations. First, most studies only
examine the strength of a linear relationship, usually using Pearson correlations,
or similar effect size metrics appropriate for ordinal or nominal data (such as race
or religion). Thus, it is possible that they have missed nonlinear and especially
non-monotonic or threshold relationships. For instance, the review of effect sizes
summarized above (Luo, 2017) does not mention the word “linear” at all. There is
however at least one study that examined nonlinear assortative mating and its
importance for marital satisfaction (Luo & Klohnen, 2005; see also
Nikoloulopoulos & Moffatt, 2019). This study found mixed results, and thus there
is a need for more and larger studies.
Second, while assortative mating for height is well-established, it is also very
common to see women advertise in their dating profiles that the man must be
taller (so called male-taller norm). This female demand should result in a
particular nonlinear pattern for height relationships such that there is a curve
break close to the height parity line.
Not many studies have actually examined
this, but one that did found that this effect was relatively weak compared to what
might have been expected from the extensive talk about this in the online dating
literature (Stulp et al., 2013). Height was also found to be positively related to
fertility (and thus fitness) for men, but not women (Stulp et al., 2015). Both sexes
were subject to stabilizing selection (i.e. deviation from an optimal height had
lower fertility), but men were also subject to some directional selection, with
highest fertility at approximately 1 standard deviation in height above the mean.
Patterns such as these might exist for other traits as well, and should be explored
in detail to understand ongoing human sexual selection. Similarly, a recent study
using self-reported preferences found that very high levels of a trait appear
intimidating to potential mates and may be selected against. This could potentially
induce some nonlinear patterns, depending on the existence of assortative
Specifically, there should be a relative lack of males with just below female height
compared to what would be expected based on the linear measure of association (e.g.
Pearson correlation). This pattern would result if there was a tendency specific female
(or male or both) aversion to female > male height pairs.
mating for the trait in question and how much is too much (Gignac & Starbuck,
Third, phenotypes are usually examined one by one, with scant research on
cross-trait similarity or trade-offs. Trade-offs are expected based on some models
of human mate preferences. There is indeed some evidence of this for height and
education (Ponzo & Scoppa, 2015). Such models posit a composite score,
termed sexual market value or mate value (Fisher et al., 2008), for an individual
based on their phenotypes (and assumed genotypes due to genetic influence on
traits). Hence, a partner who is relatively poor in one phenotype can make up for
it by having a good ranking in another phenotype. Thus, we would expect to see
selection-induced negative correlations across differences on traits in pairs, such
that partners who are well matched on phenotype A show larger gaps on
phenotype B, where both are phenotypes that are sought.
This is an example of
the more general phenomenon of collider bias (Rohrer, 2018). Figure 1 illustrates
this concept.
Net negative correlations only result if the collider bias is strong enough to offset any
positive association between the traits that might exist in the general population.
Figure 1. Illustrative example of collider bias for two components of a composite
mate value score. Black line shows total sample fit (no association), while the
blue lines show the fit for the two subsamples (negative associations). Smoothing
lines by general additive model.
In the figure we see things from the perspective of a woman who is unwilling
to date a partner below composite mate value of 1, perhaps because she
perceives herself to be at this value and prefers hypergamy (Esteve et al., 2012;
Harinam & Henderson, 2020; Tuckfield, 2019). In this case, the men in the upper
right part (turquoise color) are in her potential dating pool. Among these men, her
two favored traits of attractiveness and niceness are negatively associated (r = -
.66, top blue line) despite being unassociated in the total male population (black
line, r = .00). There is also a weaker induced negative association among the men
with too low composite scores (r = -.22, bottom blue line).
Thus, bringing the above together, the purpose of the current study was to
examine possible nonlinear cross-mate similarity or dissimilarity across
phenotypes in a suitably large dataset.
Data and methods
We use archival data from Colom et al. (2002). This study was published in
Spanish. First author Roberto Colom kindly shared the anonymized data with me
(personal communication, 2015). The dataset contains trio data on 342 parent
duos and one child. The child was a student enrolled in a class at university, so
there was some indirect selection for parental education and intelligence levels.
The dataset includes measures of personality measured using Eysenck’s
Personality Questionnaire-Revised (Spanish adaptation, Aguilar et al., 1990),
which contains scales for Extroversion, Neuroticism, Psychoticism, and the Lie
scale. The first two are similar to the existing scales from other batteries such as
NEO-PI-R, IPIP and HEXACO tests among others. Psychoticism is unique to
Eysenck’s test and is plausibly seen as a combination of the OCEAN traits of
Agreeableness and Conscientiousness. It has been found to be related to
creative achievement and various forms of socially undesirable behavior
(Eysenck, 1995). The Lie scale is somewhat unusual. It consists of items that
relate to overly positive self-presentation, intended to measure social desirability
bias (Eysenck & Eysenck, 2011; Ferrando et al., 1997). Aside from personality,
the dataset includes age, educational attainment (coded on a 1-4 scale, 1 =
primary education level, 2 = secondary education level, 3 = technical studies, and
4 = university level), and intelligence. The latter was measured using the TIG-2
(Test De Inteligencia General, test of general intelligence, 2nd edition) (Gough &
Domino, 1963; publisher site TEA, 2020). This is a Spanish developed dominos-
based abstract reasoning test, similar to the Raven Progressive Matrices test.
The appendix contains some example items from the test manual.
We used regression splines to detect and estimate the effect size of nonlinear
associations. Since the reader may be unfamiliar with this approach, and question
its utility, we carried out a simple simulation to test whether it worked as expected.
Specifically, we simulated data based on a sine function in the region 0 to 3.14
(π), which has an inverted U-shaped pattern, and a correlation of 0.00 with a
uniform variable. We simulated datasets that varied in sample size and amount
of random normal error added to see how this affected results. For each simulated
dataset, we fit a linear and a natural spline model and computed the model
adjusted R2 values. From these we computed the gain in adj. R2 as the measure
of nonlinearity in the data, i.e., the extra ability of the natural spline to predict or
explain the data, adjusted for overfitting (James et al., 2013, Chapter 7). For each
parameter combination, we repeated the simulation 10 times. Figure 2 shows the
Figure 2. Results of simulation study for detection of linear pattern in regression
model. The R² adjusted gain is based on comparison of linear model fit with a
natural spline model. The grey ribbon shows the variation across each of the 10
We see that without noise, the spline model always fits near-perfectly. The
R2 gain is slightly above 1.00, showing that there is some bias in this metric, or at
least, bias in the correction when the pattern in the data is perfect.
For the
realistic cases with noise, we see that the spline is able to tell the pattern from the
noise, at least after about n > 100. When the noise amount is very large (e.g. SD
= 2), no amount of data helps (R2 gain 0).
Figure 3 shows the simulation data when n = 340, as in this study, and noise
is 0.5. In this figure, the R2 gain is 0.25 (R2 adj. linear = -.003, R2 adj. spline =
.245). Thus, if there are any nonlinear patterns in the data that are similar to those
in our simulation, we should be able to find them with this approach.
Figure 3. Sine function pattern with n = 340 and noise standard deviation = 0.5.
The red line shows the linear model fit, and the blue line shows the spline fit.
The theoretically expected value here is R2 = 1.00 and R2 gain of 1.00 since the linear
model has r = .00 and thus R2 = .00.
Figure 4 shows the heatmap of the correlation matrix for the included
quantitative variables. We used latent correlations for the ordinal educational
attainment variables, based on the assumption of an underlying continuous
educational attainment trait expressed as one of the four ordinal categories.
Figure 4. Heat map of correlation matrix of study variables. Correlations are
Pearson correlations except for education (ed) where latent (polychoric)
correlations are used. Correlations of |r| > .10 have p < .05. E = Extroversion, ed =
educational attainment, intel = intelligence, Lies = Lie scale (social desirability), N
= Neuroticism, P = Psychoticism.
We see evidence of assortative mating, replicating the findings of the original
Spanish study: .81 for age, .65 for education, .48 for intelligence, .26 for the Lie
scale (socially desirable responding). The main personality dimensions of the
Eysenck test, however, show scant evidence of assortative mating: -.01 for
Extroversion, .07 for Neuroticism, and .13 for Psychoticism. These findings are in
line with most of the literature on self-reported personality data. We also see some
expected cross-trait correlations within person, and cross-trait assortative mating
correlations, chiefly for education and intelligence. The within-person correlations
between education and intelligence are relatively weak compared to typical
values in the literature, which may be due to the student sampling procedure.
Generally speaking, the cross-trait assortative mating correlations are similar to
the within-self correlations. For instance, father educational attainment correlates
-.20 with mother’s Neuroticism, and so does mother’s own educational
To further explore possible nonlinear relationships, we plotted the six (quasi)-
continuous phenotypes across the couples. Prior to this, they were standardized
to allow plotting using the same x and y axis scales. The results are shown in
Figure 5.
Figure 5. Scatterplots with smoothing fits for 6 phenotypes (standardized within
sex). Smooth fit via LOESS. E = Extroversion, ed = educational attainment, intel
= intelligence, Lies = Lie scale (social desirability), N = Neuroticism, P =
Aside from a slight indication of a nonlinear pattern for Psychoticism (P),
there appears to be no evidence of nonlinearity in these figures. To quantitatively
confirm this result, we fit regression models for each phenotype. There were 4
models for each, 2 linear models in each direction (i.e., predict male phenotype
from female and vice versa), as well as 2 nonlinear models with a natural spline.
For the resulting models, we compared the adjusted R2 values to look for
incremental validity. Table 1 shows the results. We see that there is essentially
no evidence of any nonlinearity, the gain columns are near-zero, as was expected
from inspection of the scatterplots.
Table 1. Model fit results (adjusted R2) for models estimating nonlinear patterns.
Lie scale
To examine the possibility of cross-trait trade-offs, we computed the distance
between each couple based on their sex-normed standardized values. A value of
0 thus means partners have the same relative phenotypic standing (e.g., both are
average, or both are 1 standard deviation above average). A value of 1 means
that the female is 1 standard deviation higher in the phenotype, though not
necessarily in the natural unit (e.g. females who are +1 SD above female average
height are still below the average male norm). Figure 6 shows the correlation
matrix of the resulting distance scores.
The correlations among distance scores revealed a few results. Chiefly, we
see that couples more distant in Neuroticism and Psychoticism were closer in the
Lie scale. Reversely, couples more distant in education were also more distant in
intelligence, and the same were seen for Psychoticism and Neuroticism,
Extroversion and Psychoticism, as well as Lie scale and intelligence.
Figure 6. Heat map of correlation matrix of distance scores. Correlations of |r| >
.10 have p < .05. E = Extroversion, ed = educational attainment, intel =
intelligence, Lies = Lie scale (social desirability), N = Neuroticism, P =
Variance effects
A reviewer suggested looking into possible variance effects. The idea is here
that though trait levels might not predict the mean values of trait levels in mates,
they might nonetheless predict the variance. For instance, it is well known that
older people’s partners have a wider age span (and thus variance) than younger
people. In particular, older men may sometimes date women several decades
younger than themselves. Statistically, this would then result in heteroscedasticity
(HS), i.e., unequal variance of the residuals across the levels of the predictor
variable (in this case, male age). To test for this, we examined popular methods
for detecting and quantifying HS. However, we found them lacking robustness
against outliers and in ability to detect nonlinear HS. Figure 7 illustrates HS using
simulated data.
Figure 7. Illustrations of heteroscedasticity.
We follow the popular approach of using the squared residuals for testing for
HS. We add the twist that we employ rank transformed data to increase the
robustness, and we employ natural splines to detect nonlinear HS (Harrell, 2015).
We carried out a simulation study to test the properties of our approach and the
results indicated they work approximately as expected with regards to rates of
false positives, but that the test sometimes had troubles distinguishing between
linear and nonlinear HS. The details of this are given in the appendix.
Applying the method to the present dataset produces 48 results since there
are 6 traits, 2 directions (male to female trait, and reversely), and 4 tests for each
(linear vs. nonlinear, raw vs. rank transformation). Nominally, 38% of these found
HS (p < .05). However, most of these concern the raw data approach which is not
robust (42% for raw, 33% for ranks). HS for age is the most common finding, but
least interesting (75% for age but 30% for other traits). The supplementary
materials concern detailed output for some of these tests, though all in all, the
findings were not of much interest. Figure 8 shows an example. The results show
that while we detect some HS, it seems to be of either little importance
(intelligence, Extroversion), dubious reality (Psychoticism), or unsurprising (age).
Figure 8. Observed heteroscedasticity in cross-mate trait association for 4 traits.
The present study examined assortative mating in a relatively large Spanish
dataset of parents (npairs = 340). We find evidence of assortative mating of
approximately typical magnitudes as reported in the literature (Luo, 2017). We
examined the data for possible nonlinear mate trait associations but found not
much. Simulations indicated that we probably had good power to detect nonlinear
patterns if they had existed and were large enough to care about. Thus, nonlinear
mating patterns do not seem to exist for the traits examined in this study. We
furthermore examined the data for evidence of trade-offs between traits. We did
find some correlations that are larger than chance, but no particular model was
suggested by the overall pattern of results. Generally speaking, the associations
were weak across traits that were not correlated within persons, thus suggesting
a general lack of trade-offs. At a reviewer’s request, we examined the data for
heteroscedasticity (i.e. non-constant residual variance across predictor’s range).
We find evidence of such heteroscedasticity, but it was generally not interesting.
The most robust finding was that for age, mothers’ age varied more with
increasing fathers’ age, a well-known phenomenon (Buss, 2019).
The primary limitations of the study are as follows. First, the personality data
were based on self-report measures. Self-report measures have been found to
be problematic in multiple domains. In psychiatry, other-reports (and combined
self + other reports) produce higher heritability estimates (Faraone & Larsson,
2019), as do other-reports in personality psychology, with heritabilities around
80% compared to the usual 40-50% (Riemann et al., 1997). Multi-informant data
of anti-social behavior reached a heritability of 96% whereas typical single-
measure data only find about 30-50% (Baker et al., 2007). Other-reports of
personality also show stronger correlations to outcomes such as academic
achievement and job performance (Connelly & Ones, 2010). Other-reported job
performance (compared to self-reported) shows stronger correlations to
objectively measured job performance (Jaramillo et al., 2005). Taken together,
this casts doubt on some of the low or null findings in the present study. It may
be that self-rating biases are (negatively) confounded such that spouses rate
themselves unrelated in personality, perhaps using each other as reference
groups (Heine et al., 2002). Future studies should attempt to get third party-
reported, or at least, spouse-reported personality data for further analysis.
Second, the sample was recruited from students’ parents at the university,
causing some restriction in range in parental education and intelligence. This
seemed to cause some lower than expected levels of correlations among
education and intelligence, but was unlikely to cause notable range issues for the
other traits examined.
Third, the collection of traits examined was less than ideal. It would be
preferable to have a broader personality scale (e.g. based on item data from IPIP)
in future studies, as well as measures of attractiveness, BMI/obesity, humor
ability, psychopathology, political & religious views, and other traits known to
show strong relations between mates. Future studies should attempt to collect a
broad set of traits for analysis.
Supplementary materials
Data, R code, code output (R notebook), and figures are available at The R notebook is also available at For the study of
heteroscedasticity, see
Aguilar, Á., Tous, J.M. & Andrés Pueyo, A. (1990). Adaptación y estudio psicométrico
del EPQ-R. [Adaptation and psychometric study of EPQ-R.]. Anuario de Psicología
3(46): 101-118.
Aiello, L.M., Barrat, A., Schifanella, R., Cattuto, C., Markines, B. & Menczer, F. (2012).
Friendship prediction and homophily in social media. ACM Transactions on the Web
6(2): 9:1-9:33.
Baker, L.A., Jacobson, K.C., Raine, A., Lozano, D.I. & Bezdjian, S. (2007). Genetic and
environmental bases of childhood antisocial behavior: A multi-informant twin study.
Journal of Abnormal Psychology 116(2): 219-235.
Buss, D. (2019). Evolutionary Psychology: The New Science of the Mind, 6th ed.
Colom, R., Fabregat, A.A. & López, O.G. (2002). Tendencias de emparejamiento
selectivo en inteligencia, dureza de carácter, extraversión e inestabilidad emocional.
Psicothema 14(1): 154-158.
Connelly, B.S. & Ones, D.S. (2010). An other perspective on personality: Meta-analytic
integration of observers’ accuracy and predictive validity. Psychological Bulletin 136(6):
Esteve, A., GarcíaRomán, J. & Permanyer, I. (2012). The gender-gap reversal in
education and its effect on union formation: The end of hypergamy? Population and
Development Review 38: 535-546.
Eysenck. (1995). Genius: Natural History of Creativity, 1st ed. Cambridge Univ. Press.
Eysenck, H.J. & Eysenck, S.B.G. (2011). Eysenck Personality Questionnaire-Revised
[Data set]. American Psychological Association.
Faraone, S.V. & Larsson, H. (2019). Genetics of attention deficit hyperactivity disorder.
Molecular Psychiatry 24: 562-575.
Ferrando, P.J., Chico, E. & Lorenzo, U. (1997). Dimensional analysis of the EPQ-R lie
scale with a Spanish sample: Gender differences and relations to N, E, and P.
Personality and Individual Differences 23: 631-637.
Fisher, M., Cox, A., Bennett, S. & Gavric, D. (2008). Components of self-perceived mate
value. Journal of Social, Evolutionary, and Cultural Psychology 2(4): 156-168.
Gignac, G.E. & Starbuck, C.L. (2019). Exceptional intelligence and easygoingness may
hurt your prospects: Threshold effects for rated mate characteristics. British Journal of
Psychology 110: 151-172.
Gough, H.G. & Domino, G. (1963). The D 48 test as a measure of general ability among
grade school children. Journal of Consulting Psychology 27: 344-349.
Harinam, V. & Henderson, R. (2020, January 16). All the Single Ladies. Quillette.
Harrell, F.E. (2015). Regression Modeling Strategies: With Applications to Linear
Models, Logistic and Ordinal Regression, and Survival Analysis, 2nd ed. Springer.
Harrell, F.E. (2019). rms: Regression Modeling Strategies (5.1-3.1) [Computer
Hedrick, P.W., Smith, D.W. & Stahler, D.R. (2016). Negative-assortative mating for color
in wolves. Evolution 70: 757-766.
Heine, S.J., Lehman, D.R., Peng, K. & Greenholtz, J. (2002). What’s wrong with cross-
cultural comparisons of subjective Likert scales? The reference-group effect. Journal of
Personality and Social Psychology 82: 903-918.
Huber, G.A. & Malhotra, N. (2017). Political homophily in social relationships: Evidence
from online dating behavior. Journal of Politics 79: 269-283.
James, G., Witten, D., Hastie, T. & Tibshirani, R. (eds.) (2013). An Introduction to
Statistical Learning: With Applications in R. Springer.
Janicke, T., Marie-Orleach, L., Aubier, T.G., Perrier, C. & Morrow, E.H. (2019).
Assortative mating in animals and its role for speciation. American Naturalist 194: 865-
Jaramillo, F., Carrillat, F.A. & Locander, W.B. (2005). A meta-analytic comparison of
managerial ratings and self-evaluations. Journal of Personal Selling & Sales
Management 25(4): 315-328.
Luo, S. (2017). Assortative mating and couple similarity: Patterns, mechanisms, and
consequences. Social and Personality Psychology Compass 11(8): e12337.
Luo, S. & Klohnen, E.C. (2005). Assortative mating and marital quality in newlyweds: A
couple-centered approach. Journal of Personality and Social Psychology 88: 304-326.
McPherson, M., Smith-Lovin, L. & Cook, J.M. (2001). Birds of a feather: Homophily in
social networks. Annual Review of Sociology 27: 415-444.
Nikoloulopoulos, A.K. & Moffatt, P.G. (2019). Coupling couples with copulas: Analysis of
assortative matching on risk attitude. Economic Inquiry 57(1): 654-666.
Ponzo, M. & Scoppa, V. (2015). Trading height for education in the marriage market.
American Journal of Human Biology 27: 164-174.
Riemann, R., Angleitner, A. & Strelau, J. (1997). Genetic and environmental influences
on personality: A study of twins reared together using the self- and peer report NEO-FFI
scales. Journal of Personality 65: 449-475.
Rohrer, J.M. (2018). Thinking clearly about correlations and causation: Graphical causal
models for observational data. Advances in Methods and Practices in Psychological
Science 1(1): 27-42.
Stulp, G., Buunk, A.P., Pollet, T.V., Nettle, D. & Verhulst, S. (2013). Are human mating
preferences with respect to height reflected in actual pairings? PLoS One 8(1): e54186.
Stulp, G., Barrett, L., Tropf, F.C. & Mills, M. (2015). Does natural selection favour taller
stature among the tallest people on earth? Proceedings of the Royal Society B:
Biological Sciences 282(1806):
Stulp, G., Simons, M.J.P., Grasman, S. & Pollet, T.V. (2017). Assortative mating for
human height: A meta-analysis. American Journal of Human Biology 29: e22917.
Tuckfield, B. (2019, March 12). Attraction inequality and the dating economy. Quillette.
Vandenberg, S.G. (1972). Assortative mating, or who marries whom? Behavior
Genetics 2: 127-157.
TIG domino test
Example items from the TIG domino test. Copied from the Spanish test manual.
Heteroscedasticity testing
We implemented an approach using the popular squared residuals approach
to detect and quantify heteroscedasticity (HS) in model residuals. The novel part
about our approach is the use of both raw and rank transformed data for
robustness, and the use of natural splines (restricted cubic splines) to test for
nonlinear HS. Specifically, our approach consists of:
1. Fitting a linear model with the predictor and outcome variable.
2. Squaring the residuals from the fitted model.
3. Create rank transformed variables.
4. Fit linear and natural spline models to predict the transformed residuals
from the predictor. These are done using the rms package (Harrell, 2019).
5. Derive p values for these comparisons, as well as the effect sizes in
adjusted R2. The p values come from the model comparisons, either against the
null (linear model) or the linear model (spline model). These are based on the
likelihood ratio as recommended by Frank Harrell.
To see if the approach worked as expected, we simulated 30,000 datasets
at n = 200 with either no HS, linear HS, or nonlinear HS (10,000 for each). This is
the same approach as that for Figure 2 in the main text except that used n = 1000
instead of n = 200. The strength of the HS was set to 5. This is done by multiplying
the standard normal residuals by the HS strength as well as a factor based on the
X value. Thus the standard deviations of the residuals are up to 5 times larger at
the designated points on the X axis. For the linear HS, the max value is at x=10,
and for the nonlinear HS case, it is at both x=0 and x=10.
For each of the simulated datasets, we fit the 4 regression discussed above,
and saved the p values and adjusted R2 values. We then plotted the distribution
of the p values by simulation scenario, shown in Figure A1.
Figure A1. Distribution of p values from simulation tests.
Quantitatively, we can also examine how many p values are below a certain
threshold to examine the false positive and negative rates. These are shown in
Table A1.
Table A1. Quantitative results of p values from simulations.
p mean
Fraction p<05
Fraction p<01
no HS
linear rank
no HS
linear raw
no HS
spline rank
no HS
spline raw
linear HS
linear rank
linear HS
linear raw
linear HS
spline rank
linear HS
spline raw
nonlinear HS
linear rank
nonlinear HS
linear raw
nonlinear HS
spline rank
nonlinear HS
spline raw
The results show that multiple things of interest:
1. For the case with no HS (row 1 in Figure A1), the distribution of p values is
roughly uniform. The table results show the false positive rates are close to the
theoretically expected ones (rows 1-4 in Table A1).
2. For the case with linear HS (row 2 in Figure A1; rows 5-8 in Table A1), the
distribution of p values for linear tests is almost entirely at 0, indicating ~100%
power to detect the HS. For the spline tests, the p values are skewed towards
0, but not overwhelmingly so (12-13%).
3. For the case with nonlinear HS (row 3 in Figure A1; rows 9-12 in Table A1),
the distribution of p values from spline tests is almost entirely at 0 (i.e. ~100%
power), while the linear tests using raw data (but not ranks) are skewed
towards 0, but again not overwhelmingly so (14%).
Thus, to summarize, the false positive rates are close to the expected rates
(i.e., p = .05 means 5% false positives given null HS). For the true cases of linear
and nonlinear HS, the detections have good power, but sometimes detect the
wrong kind of HS, detecting as linear when nonlinear or reversely about 13% of
the time at this sample size and HS strength. It is strange that the spline false
positive is only seen for the raw data, but not the rank transformed. Further
research might clarify this anomaly.
... Turning to the question of heteroscedasticity, we employed the same method as in (Kirkegaard, 2021). The approach is as follows: ...
... In the simulation study carried out by (Kirkegaard, 2021), it was found that this approach was able to detect real heteroscedasticity, and without excessive false positives. It can also detect the difference between linear and nonlinear heteroscedasticity, though not with optimal statistical properties (elevated false positive rates with regards to confusion between types of heteroscedasticity). Figure 8 illustrates the concept of heteroscedasticity. ...
Full-text available
The Dunning-Kruger effect is a well-known psychological finding. Unfortunately, there are two aspects of the finding, one trivial, indeed a simple statistically necessary empirical pattern, and the other an unsupported theory that purports to explain this pattern. Recently, Gignac & Zajenkowski (2020) suggested two ways to operationalize and test the theory. We carried out a replication of their study using archival data from a larger dataset. We used two measures of self-estimated ability: estimated sumscore (correct responses), and estimated own-centile. We find no evidence of nonlinearity for either. We find evidence of heteroscedasticity for self-centile estimates, but not raw score estimates. Overall, the evidence was mostly inconsistent with Dunning-Kruger theory.
... Each of these allowed for between 2 to 4 answer options, and users could also assign them importance . The site's algorithm would then use the collected data in order to compute a match score with each other user, which is useful for dating purposes due to the very strong assortative mating in humans (Hur, 2003;Kirkegaard, 2021;Luo, 2017) . Almost all users allowed their answers to be visible to other users of the site, which is what allowed the dataset to be collected . ...
Full-text available
A body of research indicates that people who are more intelligent tend to have fewer children than do those who are less intelligent, at least since around 1900 (Lynn, 2011). Nyborg (2012) has predicted that the consequent IQ decline will lead to the eventual decay of Western civilization. However, there is little research on fertility intentions and intelligence. Do smarter people end up with fewer children because they ideally desire fewer, or is it due to competing interests, such as a desire for money and status combined with more efficient use of contraception, as Nyborg (2012) observes? We analysed the OKCupid dataset of predominantly Western, English-speaking online users. Employing an ad hoc intelligence test composed of 14 160 Intelligence, Race and Sex questions on the dating service, we find that intelligence does indeed negatively relate to fertility intentions (β = -0.15, ordinal regression), even adjusting for age, sex, and race/ethnicity (β = -0.14). We also replicate the usual pattern of a negative association between intelligence and actual fertility, though the dataset was suboptimal for this analysis as fertility was only a binary outcome.
Full-text available
Evolutionary theory predicts that positive assortative mating-the tendency of similar individuals to mate with each other-plays a key role for speciation by generating reproductive isolation between diverging populations. However, comprehensive tests for an effect of assortative mating on species richness at the macroevolutionary scale are lacking. We used a meta-analytic approach to test the hypothesis that the strength of assortative mating within populations is positively related to species richness across a broad range of animal taxa. Specifically, we ran a phylogenetically independent meta-analysis using an extensive database of 1,447 effect sizes for the strength of assortative mating, encompassing 307 species from 130 families and 14 classes. Our results suggest that there is no relationship between the strength of assortative mating and species richness across and within major taxonomic groups and trait categories. Moreover, our analysis confirms an earlier finding that animals typically mate assortatively (global Pearson correlation coefficient: r = 0.36 ; 95% confidence interval: 0.19-0.52) when accounting for phylogenetic nonindependence. We argue that future advances will rely on a better understanding of the evolutionary causes and consequences of the observed intra- and interspecific variation in the strength of assortative mating.
Full-text available
Decades of research show that genes play an vital role in the etiology of attention deficit hyperactivity disorder (ADHD) and its comorbidity with other disorders. Family, twin, and adoption studies show that ADHD runs in families. ADHD’s high heritability of 74% motivated the search for ADHD susceptibility genes. Genetic linkage studies show that the effects of DNA risk variants on ADHD must, individually, be very small. Genome-wide association studies (GWAS) have implicated several genetic loci at the genome-wide level of statistical significance. These studies also show that about a third of ADHD’s heritability is due to a polygenic component comprising many common variants each having small effects. From studies of copy number variants we have also learned that the rare insertions or deletions account for part of ADHD’s heritability. These findings have implicated new biological pathways that may eventually have implications for treatment development.
Full-text available
Correlation does not imply causation; but often, observational data are the only option, even though the research question at hand involves causality. This article discusses causal inference based on observational data, introducing readers to graphical causal models that can provide a powerful tool for thinking more clearly about the interrelations between variables. Topics covered include the rationale behind the statistical control of third variables, common procedures for statistical control, and what can go wrong during their implementation. Certain types of third variables—colliders and mediators—should not be controlled for because that can actually move the estimate of an association away from the value of the causal effect of interest. More subtle variations of such harmful control include using unrepresentative samples, which can undermine the validity of causal conclusions, and statistically controlling for mediators. Drawing valid causal inferences on the basis of observational data is not a mechanistic procedure but rather always depends on assumptions that require domain knowledge and that can be more or less plausible. However, this caveat holds not only for research based on observational data, but for all empirical research endeavors.
Full-text available
Assortative mating refers to the tendency of two partners' characteristics to be matched in a systematic manner, usually in the form of similarity. Mating with a similar partner has profound implications at the species, societal, and individual levels. This article provides a comprehensive review of research on couple similarity since 1980s. The review begins with the general patterns and trends observed in couple similarity on a range of domains including demographic variables, physical/physiological characteristics, abilities, mental well-being, habitual behaviors, attitudes, values, and personality. Next the bulk of the review focuses on analyses of 4 mechanisms leading to similarity: initial active choice, mating market operation, social homogamy, and convergence. Specific future research avenues are outlined to improve understanding of these mechanisms. Finally, the review discusses genetic, social, and psychological consequences of couple similarity.
Full-text available
Objectives: The study of assortative mating for height has a rich history in human biology. Although the positive correlation between the stature of spouses has often been noted in western populations, recent papers suggest that mating patterns for stature are not universal. The objective of this paper was to review the published evidence to examine the strength of and universality in assortative mating for height. Methods: We conducted an extensive literature review and meta-analysis. We started with published reviews but also searched through secondary databases. Our search led to 154 correlations of height between partners. We classified the populations as western and non-western based on geography. These correlations were then analyzed via meta-analytic techniques. Results: 148 of the correlations for partner heights were positive and the overall analysis indicates moderate positive assortative mating (r = .23). Although assortative mating was slightly stronger in countries that can be described as western compared to non-western, this difference was not statistically significant. We found no evidence for a change in assortative mating for height over time. There was substantial residual heterogeneity in effect sizes and this heterogeneity was most pronounced in western countries. Conclusions: Positive assortative mating for height exists in human populations, but is modest in magnitude suggesting that height is not a major factor in mate choice. Future research is necessary to understand the underlying causes of the large amount of heterogeneity observed in the degree of assortative mating across human populations, which may stem from a combination of methodological and ecological differences.
We investigate patterns of assortative matching on risk attitude, using self‐reported (ordinal) data on risk attitudes for males and females within married couples, from the German Socio‐Economic Panel over the period 2004–2012. We apply a novel copula‐based bivariate panel ordinal model. Estimation is in two steps: first, a copula‐based Markov model is used to relate the marginal distribution of the response in different time periods, separately for males and females; second, another copula is used to couple the males' and females' conditional (on the past) distributions. We find positive dependence, both in the middle of the distribution, and in the joint tails, and we interpret this as positive assortative matching (PAM). Hence we reject standard assortative matching theories based on risk‐sharing assumptions, and favor models based on alternative assumptions such as the ability of agents to control income risk. We also find evidence of “assimilation”; that is, PAM appearing to increase with years of marriage. (JEL C33, C51, D81)
Prospective mate characteristics such as kindness, intelligence, easygoingness, and physical attraction are ranked consistently highly by both men and women. However, rank measurement does not allow for determinations of what level of a mate characteristic is rated most desirable. Based on a more informative percentile scale measurement approach, it was reported recently that mean desirability ratings of IQ in a prospective partner peaked at the 90th percentile, with a statistically significant reduction from the 90th to the 99th percentiles. The purpose of this investigation was to replicate the recently reported non‐linear desirability effect associated with IQ, in addition to the evaluation of three other valued mate characteristics: easygoing, kindness, and physical attraction. Based on a sample of 214 young adults, it was found that all four mate characteristics peaked at the 90th percentile. However, the IQ and easygoing mean desirability ratings evidenced statistically significant mean reductions across the 90th to the 99th percentiles, whereas kindness and physical attraction did not. Finally, the objectively and subjectively assessed intelligence of the participants was not found to be associated with the participants’ desirability ratings of IQ. We interpreted the results to be consistent with a broadly conceptualized threshold hypothesis, which states that the perceived benefits of valued mate characteristics may not extend beyond a certain point. However, mate characteristics such as intelligence and easygoing become somewhat less attractive at very elevated levels, at least based on preference ratings, for reasons that may be biological and/or psycho‐social in nature.
Social comparison theory maintains that people think about themselves compared with similar others. Those in one culture, then, compare themselves with different others and standards than do those in another culture, thus potentially confounding cross-cultural comparisons. A pilot study and Study I demonstrated the problematic nature of this reference-group effect: Whereas cultural experts agreed that East Asians are more collectivistic than North Americans, cross-cultural comparisons of trait and attitude measures failed to reveal such a pattern. Study 2 found that manipulating reference groups enhanced the expected cultural differences, and Study 3 revealed that people from different cultural backgrounds within the same country exhibited larger differences than did people from different countries. Cross-cultural comparisons using subjective Likert scales are compromised because of different reference groups. Possible solutions are discussed.
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform.Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Do people form relationships based upon political similarity? Past work has shown that social relationships are more politically similar than expected by chance, but the reason for this concordance is unclear. Is it because people prefer politically similar others, or is it attributable to confounding factors such as convergence, social structures, and sorting on nonpolitical characteristics? Addressing this question is challenging because we typically do not observe partners prior to relationship formation. Consequently, we leverage the domain of online dating. We first conducted a nationwide experiment in which we randomized political characteristics in dating profiles. Second, we analyzed behavioral data from a national online dating community. We find that people evaluate potential dating partners more favorably and are more likely to reach out to them when they have similar political characteristics. The magnitude of the effect is comparable to that of educational homophily and half as large as racial homophily. © 2016 by the Southern Political Science Association. All rights reserved.