ArticlePDF Available

Abstract and Figures

Biases in meeting opportunities have been recently shown to play a key role for the emergence of homophily in social networks (see Currarini et al., 2009). The aim of this paper is to provide a simple microfoundation of these biases in a model where the size and type-composition of the meeting pools are shaped by agents׳ socialization decisions. In particular, agents either inbreed (direct search only to similar types) or outbreed (direct search to population at large). When outbreeding is costly, this is shown to induce stark equilibrium behavior of a threshold type: agents “inbreed” (i.e. mostly meet their own type) if, and only if, their group is above certain size. We show that this threshold equilibrium generates patterns of in-group and cross-group ties that are consistent with empirical evidence of homophily in two paradigmatic instances: high school friendships and interethnic marriages.
Content may be subject to copyright.
A Simple Model of Homophily in
Social Networks
Sergio Currarini, University of Leicester
Jesse Matheson, University of Leicester
Fernando Vega Redondo, Bocconi University
Working Paper No. 16/05
Department of Economics
A Simple Model of Homophily in Social Networks
Sergio CurrariniJesse MathesonFernando Vega Redondo
March 20, 2016
Biases in meeting opportunities have been recently shown to play a key role for the emergence
of homophily in social networks (see Currarini, Jackson and Pin 2009). The aim of this paper
is to provide a simple microfoundation of these biases in a model where the size and type-
composition of the meeting pools are shaped by agents’ socialization decisions. In particular,
agents either inbreed (direct search only to similar types) or outbreed (direct search to population
at large). When outbreeding is costly, this is shown to induce stark equilibrium behavior of a
threshold type: agents “inbreed” (i.e. mostly meet their own type) if, and only if, their group is
above certain size. We show that this threshold equilibrium generates patterns of in-group and
cross-group ties that are consistent with empirical evidence of homophily in two paradigmatic
instances: high school friendships and interethnic marriages.
Keywords: Homophily, social networks, segregation.
JEL Classification: D7, D71, D85, Z13.
Department of Economics, University of Leicester and Universita’ Ca’ Foscari di Venezia. Email:
This author wishes to acknowledge the support of the Ministry of Education and Science of the Russian Federation,
grant No. 14.U04.31.0002, administered through the NES CSDSI.
Department of Economics, University of Leicester. Email:
Bocconi University and IGIER. Email:
1 Introduction
A pervasive feature of social and economic networks is that contacts tend to be more frequent
among similar agents than among dissimilar ones. This pattern, usually referred to as “homophily”,
applies to many types of social interaction, and along many dimensions of similarity.1The presence
of homophily has important implications on how information flows along the social network (see, for
example, Golub and Jackson (2011)) and, more generally, on how agents’ characteristics impinge
on social behavior. It is therefore important to understand the generative process of homophilous
social networks, and how agents’ preferences and their meeting opportunities concur in determining
the observed mix of social ties.
The empirical evidence of many social networks shows that homophily is often in excess of
the “baseline” level that would be expected under a uniform random assortment that reflected
groups’ population shares, and that inbreeding (within-group interaction) occurs both in small and
large groups. In a seminal contribution, Currarini, Jackson and Pin (2009)—CJP henceforth—
investigate the extent of such biases in the context of American high school friendships. They show
that preferences which are biased in favor of same-type friendships help explain why member of
large ethnic groups enjoy more popularity—a higher number of friends—than members of small
ethnic groups. However, they find that the observed patterns of inbreeding homophily cannot be
explained by a process in which agents meet purely at random. They conclude, therefore, that
some kind of “meeting bias” must be at work.2
In this paper we study a micro-founded model of search which endogenously generates a meeting
bias, and whose equilibrium predictions are consistent with the observed non-linear relationship
between homophily and population shares common to both U.S. high school friendship nominations
and U.S. marriages. A characterizing, and novel, feature of our model is the role of absolute
group size in shaping agents incentives to either direct their search towards in-groups only, or
to open up to interactions with out-groups as well. This marks a stark difference between our
approach and the approach based on the role of population shares, central to all previous studies of
homophily in economics and to Blau (1977)’s structural approach. We will discuss this difference
in some detail in Sections 4 and 5. We test our model’s predictions on the role of group size using
micro data reflecting two different matching scenarios: friendship nominations and marriages. The
empirical results support our model. We now present in some detail the model, and then discuss
our contribution with respect to recent works on the subject.
The essential features of our theoretical framework can be outlined as follows. Agents derive
1For an account of the pervasiveness of homophily, see the seminal work of Lazarsfeld and Merton (1954) and,
more recently, Marsden (1987, 1988), Moody (2001), or the survey by McPearson, Smith-Lovin and Cook (2001).
2In particular, random meetings are shown to be inconsistent with the nonlinear relation between an index of
homophily first proposed by Coleman (1958) and groups’ population shares.
positive utility from the number of distinct ties they enjoy. Ties are formed from a fixed number of
meeting draws obtained from an endogenously chosen meeting pool. Agents affect the composition
of their meeting pool by choosing to either inbreed or outbreed. Inbreeding refers to the decision
to restrict search to one’s own group only; outbreeding refers to the decision to extend search to
the whole population. The decision to inbreed or outbreed involves weighing conflicting incentives.
Outbreeding is costly; we believe this reflects cultural, geographical, or linguistic barriers to access-
ing other types. Inbreeding limits the size of the search pool and, therefore, the efficacy of search
by affecting the probability of novel draws.3
An agent’s breeding decision depends crucially on the size of her group, to the extent that this
affects the probability of redundancies in search. Specifically, there exists a threshold group size
above which the agent will inbreed and below which the agent will outbreed. We highlight two
paradigmatic scenarios that embody polar assumptions on how agents connect. The first scenario
involves a meeting mechanism where links and payoff flows are one-sided. This represents, for
example, web-based social networks (such as Twitter) where links are directed and information flows
in one direction. This scenario also captures, to some extent at least, the friendship nomination
process on which the National Longitudinal Study of Adolescent Health is based. The second
scenario involves a meeting mechanism where both connections and payoff flows are two-sided;
links require some form of bilateral agreement or coordination. Marriages (mutual consent being
required) are a natural example of this scenario.
We show in Section 4 that the threshold equilibrium, together with some small random noise in
meetings, predicts a qualitative pattern of the Coleman index which is consistent with the hump
shaped pattern found in CJP for friendship nomination and also arising for U.S. marriages (see
Figure 1 in the present paper). In addition, we show that focusing on the role of absolute group
size helps explain differences in the aggregate homophily patterns of small vs. large schools that
were identified in Currarini, Jackson and Pin (2010) but could not be explained in their framework
(in large schools, the degree of homophily is uniformly higher). Using microlevel data on friendship
nominations and marriages, in Section 5 we test other novel predictions of the model. One such
prediction is that, conditional on relative population share and both in the one-sided and two-sided
scenarios, inbreeding is more likely to occur in groups that are large in absolute size than in smaller
groups. Another interesting theoretical prediction for which we find empirical support pertains,
specifically, to the matching performance of small groups. It concerns the following contrast between
one- and two-sided contexts. If matching is one-sided, the matches of any small group will have all
other groups (large or small) represented according to their population shares. Instead, if matching
is two-sided, the prediction is that outbreeders will meet each other with frequencies that reflect
3So, while in CJP the focus is on agents’ decision of how intensively (i.e. for how long) to search for social ties,
while the meeting probabilities are fixed exogenously (and hence outside of agents’ influence), in our case all agents
search with the same intensity, but are able to direct their search and thus affect their meeting probabilities.
their population shares within the pool of outbreeders. As a result, we have that outbreeding groups
will be over-represented in the matches of other outbreeders relative to their population shares. We
believe that all these findings provide strong empirical support to our model.
The general idea that homophily patterns may stem from selection and assortative matching is
present in many theoretical constructs and has been extensively tested empirically since Kandel’s
(1978) work on adolescent friendships. In Tiebout’s “voting-by-feet” model, agents selectively
structure their social interactions by forming homogeneous clubs along the preference dimension.
The anticipation of future interaction is also at the heart of Baccara and Yariv (2013), where
homophilous peer groups form in connected intervals along the preferences dimension. Selection
may also result from information and opinion seeking, as in Suen (2010) mutual admiration clubs,
where similar agents communicate in a sort of self confirming updating of information. Selection
of agents with similar preferences may also stem from the desire to avoid strategic manipulation of
information, as in Galeotti et al. (2013) model of cheap talk in networks. There is also a similarity
between the main feature of our process (the difference in the cost of linking with in-group versus
out-group agents) and the approach taken in Jackson and Roger (2005)’s islands model of network
formation. However, while in that paper a key role is played by indirect benefits and the focus was
on the emergence of small world architectures, here the focus is on the effect of different group sizes
on homophily patterns in the absence of indirect benefit from connections.
The importance of absolute group size has been put forward before in the attempt to explain
why groups with small relative population shares may end up displaying high homophily. Moody
(2001)’s discussion of U.S. high school friendships contains in fact the main elements of our analysis:
“...(s)ince people have a finite capacity for relationships (van der Poel 1993; Zeggelink 1993), in
a school where there are many minority students, minority students may be able to find their
desired number of friends within the minority friendship pool.” In Section 3 we show that our
model predicts that minority groups members, sampling a limited number of friends, will stick to
their own pool as long as its absolute size does not impose too large inefficiencies on search. This
happens when minorities are large in absolute terms; that is, in schools with large total populations
(see Fig. 2 in the present paper). Similar insights are contained in the discussion of inter-religious
marriages in Fisher(1992), where it is reported how “...resident of small towns risk falling away
from their religious roots, presumably because co-religionists are less likely to be available, while
resident of larger cities are more likely to be enveloped in a religious sub-culture”. The empirical
finding that inbreeding is often used by small minorities to counteract the averse effect of relative
population share on in-group ties (see McPherson et al., 2001) seems consistent with the results of
our regressions on micro-data, where we find that the marginal effect of releasing the constraint of
absolute size on the tendency to inbreed are larger the smaller is the relative population share of
the group, signalling an overall stronger tendency to inbreed of small minorities when this comes
at little cost in terms of search efficiency.
The remainder of the paper is organized as follows. In Section 2 we describe the model, including
the strategies and payoffs defining the underlying meeting game. In Section 3 we characterize the
equilibrium behaviour of agents, as a function of the size of their respective groups. In Section 4 we
discuss the aggregate empirical patterns of homophily for friendships and marriages, and derive the
main results that link our theory to the observed empirical patterns. In Section 5 we use micro-data
to test the novel predictions of the theory. Finally, in Section 6 we conclude the main body of paper
with a summary. For the sake of exposition, all proofs are included in the Appendix 1 and data
summaries are included in Appendix 2.
2 The model
2.1 Framework
Consider a set NNof nagents. The set Nis partitioned into qgroups, defined by a specific
common trait (ethnic, linguistic, religious, etc.), which we call “type”. Groups are indexed by l,
and we denote by nlthe (absolute) size of group l= 1,2, ..., q. Let us also consider a network
defined on the set N, where we use the terms “match” as synonymous of “link”. Now assume that
each agent iNdevotes a fixed amount of time to meet other agents in N. In this lapse of time he
obtains η > 1 random draws with replacement. Out of these draws, let ν(η) denote the number
of distinct agents he meets. In the end, not all of the distinct agents imeets turn out to be suitable
partners. We assume that this happens, in a stochastically independent manner for each of them,
with probability p(0 <p<1).
In this context, the sole decision every agent must take is how to allocate time between meeting
agents of her own group and agents of the whole population (including her group). The first type of
activity is referred to as “inbreeding”, and the second as “outbreeding”. We assume that, in order
for outbreeding to be feasible, the agents must incur a fixed cost c. This cost can be interpreted
as reflecting some form of investment required to interact with people of different groups (e.g.,
travelling, learning a language, or changing one’s habits).
The inbreeding/outbreeding decisions taken by all agents constitute their strategies in the
matching game. They determine the meeting pool each of them accesses, which in turn shapes
the probability distribution over the number of distinct partners they face, and thus their expected
2.2 Shaping the meeting pool
In general, the meeting pool faced by any given agent is a consequence of her own breeding deci-
sion, as well as that of all others. Denote by Iand Othe inbreeding and outbreeding decisions,
respectively. Then, in principle, the meeting pool of each agent is a set-valued function Θi(s) of
the profile s(si)iN∈ {I, O}Nspecifying the breeding decisions of all agents. The cardinality
of Θi(s), measuring the size of the meeting pool, is denoted by θi(s). Given any profile s, the
random variable ˜ν(η, θi(s)) specifies the number of distinct partners obtained from ηuniform and
independent draws with replacement, when the size of the pool is θi(s).
As advanced, we shall distinguish two different scenarios (one-sided and two-sided) concerning
how the meeting pool of an agent is shaped by the strategy profile s. Each scenario is captured by
specific forms for the functions θi(·) and ˜ν(·) that shape, respectively, the size of the meeting pool
and the meeting opportunities.
(a) One-sided Scenario
The simplest case is given by a meeting scenario that is one-sided, in the sense that the conditions
enjoyed by any given agent exclusively depend on her own choices and her own meeting draws. It
can be used to model situations in which the formation of a tie is fully determined by the initiator
of the tie, while the receiving agent has no control over it. As suggested before, this includes those
empirical setups where friendship is recorded through individual (independent) nominations and
two agents are identified as friends when at least one of them lists the other. One-sidedness is
also a feature displayed by those contexts where connections are established in order to acquire
information in a strictly unilateral endeavour. This happens, for example, in internet browsing
(where inbreeding may mean that an agent only connects to blogs of similar political orientation
or nationality) or in certain social networks (such as those supported by Twitter) where virtual
friends/followers cannot be refused.
To formalize matters, denote by l(i) the index of the group to which agent ibelongs, and let
θi(s) denote the size of the meeting pool accessed in this context by that agent, given the strategy
profile s. Then we posit:4
θi(s) = nnl(i)if si=I
nif si=O. (1)
where recall that nl(i)stands for the cardinality of group l(i).
In line with the postulated one-sidedness of the meeting mechanism in this case, the payoff
4For notational simplicity, agent iis included in the pool, even though she cannot obviously meet herself. The
same simplification is applied below to the two-sided scenario.
flows will be assumed to be one-way as well.5By this it is meant that payoffs accrue only to
the agent who actively finds a suitable partner but not in the opposite direction. Thus, ex ante,
the (uncertain) distribution of payoffs is governed by the random variables ˜ν(η, θ) that give the
number of distinct draws out of ηtries when the pool size is θ.
(b) Two-sided Scenario
In contrast to the previous case, a two-sided context is one where the formation of a tie requires
the consent of both parties involved, in the sense that both of them have to choose to be part of the
same meeting pool. The case of marriages falls clearly into this class of situations. In some cases,
the outbreeding choice may be implemented by moving to some fixed location (“downtown”) where
individuals from different groups meet, or by switching to a common lingua franca that is different
from the group’s native language. Formally, as before, meeting pools are determined by agents’
inbreeding/outbreeding decisions. Now, however, outbreeding agents only access (besides those of
their own group)6the agents of other groups that have themselves chosen to outbreed. This gives
rise to an alternative function
θi(s) specifying the size of the meeting pool. Let nI
l(s) and by nO
denote the number of agents of type lthat choose the strategy Iand Oin s, respectively. Then we
θi(s) = (nI
l(i)(s) if si=I
l(s) if si=O. (2)
When matching is two-sided, it it is natural to assume that the payoff flows are two-way, i.e. a
link established by two suitable partners generates positive payoffs to both of them. The random
variable used to account for this must therefore be different from the one used for the one-sided
scenario. The present one, denoted by ˜ν(η, θ), gives the random number of distinct meetings
obtained in a pool of θagents when: (a) each agent makes ηindependent draws with replacement;
(b) a meeting is said to occur between two agents, iand j, when either a draw by iselects jor vice
In both one-sided and two-sided scenarios, a larger pool obviously brings about richer meeting
possibilities – i.e. the range of distinct partners an agent can meet is wider. It is this simple feature
5The distinction between one-sided and two-sided link formation and the (conceptually different) contrast between
one-way and two-way flows is discussed at length in Bala and Goyal (2000), one of the earliest papers of the network
formation literature in economics.
6Given that our analysis focuses on symmetric equilibria (see Subsection 2.4), all agents of any given group must
belong to the same matching pool. This, however, could be easily generalized since, as the population gets large,
the relative size of a group whose size remains bounded becomes infinitesimal. Therefore, the situation faced by
outbreeders would be essentially the same whether all individuals of their group are outbreeders or not and hence
our main conclusions would still hold in the absence of group-based symmetry.
that introduces the basic tradeoff between the inbreeding and outbreeding decisions that is at the
core of our model.7Mathematically, such richer possibilities are captured by the fact that the
two families of random variables, {˜ν(η, θ)}θNand {˜ν(η, θ)}θN, can be suitably ranked when
parametrized by the size of the meeting pool θ. Indeed, we will show that those random variables
are strongly ordered as follows:
In the one-sided scenario, larger meeting pools yield probability distributions over the number
of distinct draws that dominate those of smaller pools in the First-Order Stochastic Domi-
nance sense.
In the two-sided scenario, larger meeting pools yield a expected number of distinct draws that
is higher than for smaller ones.
In general, we can conceive either the one- or the two-sided scenarios as a more appropriate
modeling choice depending on the characteristics of the situation. (For example, in Section 5 we
suggest that the first approach is more in line with our friendship data while the second is more
consistent with our data on marriages.) However, both modeling alternatives induce, under suitable
assumptions on preferences,8a similarly sharp trade-off between the inbreeding and outbreeding
options – cf. Theorems 1 and 2.
2.3 Preferences
Now we describe agents’ preferences over their meeting outcomes. Denote the number of distinct
meetings enjoyed by any given agent iby νi. Given that each distinct partner happens to be
suitable with probability p, the induced number of suitable partners, denoted by yi, is given by
the Binomial distribution Bin(νi, p). We assume that agents evaluate that (uncertain) outcome
according to some common von Neumann-Morgenstern (vNM) utility U:N∪ {0} → R, where we
normalize U(0) = 0 and posit that
U(yi+ 1) U(yi) for all yN(3)
U(1) > U(0).(4)
7Other specifications of the matching mechanism that display this trade-off yield conclusions that are qualitatively
the same as in our postulated one- and two-sided meeting scenarios. By way of illustration, let us sketch two examples.
In the first one, agents enjoy partner variety, which is in line with standard assumptions on preferences made in
economic theory. If we then make the natural assumption that such a variety grows in expectation as the pool of
alternative partners expands, the desired effect of pool size follows. As a second possibility, suppose that new partners
are searched through existing partners (i.e. as friends of friends) in a random social network whose size depends (as
in our model) on the breeding decisions taken. Then, the effectiveness of such search depends on network clustering,
which in turn is well know to decrease with size in random networks (see e.g. Vega-Redondo (1997)).
8Naturally, as we shall see, assumptions on preferences must be stronger in the second case since the corresponding
dominance criterion that is used is weaker.
Thus the assumption is that the utility does not fall as the number of suitable partners grows,
with a strict improvement only required when passing from a situation with no suitable partner to
one with some such partner. In general, therefore, the theoretical framework may accommodate
different applications, such as friendships or marriages. For example, in the former case, it would
be natural to posit that Uis strictly increasing throughout, while in the second case it may be
postulated to level at one.9
Next, we can define the expected utility V(νi) induced by any given number νiof distinct
partners of agent ias follows:
yi=0 νi
yipyi(1 p)νiyiU(yi).(5)
Finally, we take into account the fact that, given the pool size θifaced by an agent i, the number
νiof her distinct partners is uncertain from an ex ante viewpoint. It is determined by the random
variable ˜ν(η, θi), as particularized to the scenario under consideration (one- or two-sided). We thus
need to integrate (5) with the distribution over the number of distinct partners induced by pool
size θi. This gives rise to the expected utility W(θi) for a typical agent iis defined as follows:10
W(θi)E˜ν(θi)V(νi) =
where Pθi(·) denotes the probability distribution associated with the random variable ˜ν(θi) that
specifies the number of distinct meetings in the scenario under consideration.
2.4 The breeding game
We are now in a position to define the “breeding game.” This requires specifying both the strategy
sets and the payoff functions.
First, the strategy space of every player iis simply identified with the set {I, O}consisting of
the two possible breeding decisions she can take: inbreed and outbreed, respectively.
Second, the payoff of the agent is defined as follows:
πi(s) = E˜ν(θi(s))V(νi) =
yi=0 ri
yipyi(1 p)νiyiU(yi)
9Note that our model assumes that the type composition of an agent’s meetings is inessential for utility (which only
depends on the total number of meetings). While this is meant to isolate the effect of size on meeting possibilities,
homophilous preferences could be considered in the model without loosing the key mechanisms behind our results.
10For notational simplicity, we henceforth dispense with the parameter η, since it will remain fixed throughout our
where (as explained in Subsection 2.1), c(si) = c > 0 if si=Oand c(si) = 0 if si=I, and the
notation needs to be particularized to the scenario being considered (one- or two sided).
Our equilibrium analysis will focus throughout on profiles sthat are group-symmetric,11 i.e.
where si=sjwhenever l(i) = l(j). Within this class, the population behavior can be fully described
by the q-tuple γ(γ1, γ2, ..., γq) that specifies the common choice γl∈ {I, O}for every agent in
each of the groups l= 1,2, ..., q. Then, denoting by {θl(γ)}l=1,2,...,q the induced meeting pools, the
payoff of any typical agent iof group lis given by:
πl(γ) = E˜ν(θl(γ))V(νi) =
yi=0 ri
yipyi(1 p)νiyiU(yi)
3 Equilibrium
Here we characterize the group-symmetric Nash equilibria of the game. The key feature to highlight
is that the behavior of a group at equilibrium is fully dependent on whether the group is large or
small, relative to a certain threshold defined by the equilibrium. More specifically, we find that all
groups whose size is smaller than such a threshold outbreed, while larger groups inbreed. This is
the equilibrium pattern arising both in the one- and in the two-sided scenarios, but a significant
difference exists between them. In the one-sided case, we show that the equilibrium threshold is
unique. Instead, in the two-sided context, there is generally a range of possible thresholds that can
be supported at equilibrium, a reflection of the unavoidable “coordination problem” that agents
face in this case.
We start by characterizing the equilibrium for the one-sided scenario.
Theorem 1 (Threshold Equilibrium – one-sided scenario) Consider the one-sided scenario
and assume that the outbreeding cost satisfies c<V(η). Then, there exists some ˆnand a specific
(finite) τ2such that if nˆn, the strategy profile γ= (γ
l=1 satisfying:
l=Inlτ(l= 1, ..., q) (7)
defines the unique group-symmetric Nash equilibrium of the breeding game.
The previous result builds upon the fact that the smaller is a group, the higher the risk faced
by its members that, if they restrict to their own kind alone, their meetings may be wasteful (i.e.
11The restriction to symmetric equilibria is natural in our case, where all agents in every group are homogeneous
(i.e. ex ante identical). Homogeneity and the nature of the strategic situation imply that asymmetric equilibria are
either non-generic (in the one-sided scenario) or unstable (in the two-sided scenario).
redundant). This then leads to the conclusion that optimal behavior should be of threshold type in
group size. Indeed, a key step in the proof of Theorem 1 is showing that the higher redundancy risk
associated to smaller inbreeding groups is suitably captured by the strong criterion of First-Order
Stochastic Dominance (FOSD). More precisely, we shall prove the following auxiliary lemma.
Lemma 1For any given θ, θ0, if θθ0the random variable ˜ν(θ)dominates ˜ν(θ0)in the FOSD
One may worry, however, that the threshold τestablished by Theorem 1 may be so low that
the maximum group size leading to outbreeding is very small. In general, of course, this must
depend on the cost cof outbreeding. But it is straightforward to see that if the outbreeding cost c
is low enough, the equilibrium threshold can be made arbitrarily large. For completeness, we state
this conclusion in the following corollary:
Corollary 1Under the assumptions made in Theorem 1, for any τ0there is some ˆnand ¯c > 0
such that if nˆnand c < ¯cthen the equilibrium threshold ττ0.
An idea analogous to that underlying the one-sided scenario applies the two-sided case, but with
an important caveat already advanced: the benefit of outbreeding now depends on the endogenous
size of the outbreeders’ pool. Hence, in contrast with the one-sided scenario, the game now displays
equilibrium multiplicity. To see this, consider, for example, the situation where no group outbreeds,
independently of its size. Such a situation obviously defines an equilibrium. For, no matter how
small the outbreeding (positive) cost might be, no individual can find it optimal to pay it if the pool
of those who outbreed consists only of agents of their own type alone. Despite the possibility of
such a “deadlock,” the result below establishes that as long as (a) all the small groups (as identified
for the one-sided scenario) command in total a non negligible share of the whole population, and
(b) the utility Uover the number of distinct meetings is linear, the existence of a positive-threshold
equilibrium holds as well in a two-sided scenario.
Theorem 2 (Threshold Equilibrium – two-sided scenario) Consider the two-sided scenario,
and assume that the utility function Uis linear and the outbreeding cost satisfies c<V(η). Then,
every group-symmetric equilibrium ˜γ= (γl)q
l=1 is of the threshold type, i.e. there exists a ˜τsuch
˜γl=Inl˜τ(l= 1, ..., q).
Moreover, given any α > 0and the threshold τgiven in (7), there exists some ˆnsuch that if nˆn
and P
nl> αn, a threshold equilibrium exists with ˜τ=τ.
As for the one-sided scenario, the present result builds upon the fact that, under the (stronger)
assumptions it contemplates, increasing group size introduces a well-defined ranking on the (stochas-
tic) prospects faced by the corresponding agents. We shall use, specifically, the following lemma.
Lemma 2For any given θ, θ0, if θθ0the expected values of the corresponding random variables,
˜ν(θ)and ˜ν(θ0), satisfy Eν(θ)] E[˜ν(θ0)].
In contrast with Lemma 1, the previous Lemma 2 weakens the criterion used to rank the
meeting distributions.12 This weakening is motivated by the fact that, when matching is two-sided,
the size of the pool affects the (random) number of distinct meetings in two different ways. First, a
larger pool renders the search for new partners more effective by reducing the likelihood of wasteful
redundancies. This is just as in the one-sided model. There is, however, a second dimension of two-
sided matching that works in the opposite direction: as the matching pool grows, the probability
of being found by any other given agent decreases. These two conflicting considerations make it
difficult to analyse the overall effect of a varying pool size on expected utility unless preferences are
suitably restricted. A natural such restriction is to posit that agents’ risk aversion is limited – or,
as contemplated in Theorem 2, that agents are risk neutral and their utility function Ulinear.13
Risk neutrality is admittedly a strong assumption, and may fail to capture situations where agents
are willing to trade off a larger expected number of matches for a more ”stable” distribution. With
strongly risk averse agents, for instance, we could not rule out the possibility that members of very
small groups prefer to inbreed in order to avoid the risk of very few matches due to the difficulty
of being found in larger pools, while members of larger groups may decide to outbreed. We refer,
however, to our discussion in footnote 16, suggesting that such possibility should not occur.
To sum up, our analysis of the one- and two-sided scenarios show qualitatively similar threshold
behavior. Nevertheless, as emphasized, a key difference between the two cases is that the latter one
admits equilibrium multiplicity and, consequently, opens up the possibility of acutely inefficient
equilibria embodying miss-coordination. Intuitively, it is quite clear that all equilibria associated
to thresholds τ < τ embody a certain extent of coordination failure. Indeed, that such a failure is
indeed a possibility is well illustrated by the fact that, as explained above, there always exists the
12Note, of course, that when a distribution dominates another one in the FOSD sense, it also yields a higher
expected value.
13We have investigated the possibility that a ranking based on FOSD, as the one postulated in Lemma 1 for one-
sided matching, also applies to the two-sided model. While we could not obtain a general analytical result, we were
able to show using numerical analysis that the cumulative distribution of “passive” matches (those through which an
agent is found by other agents in the pool) decreases in the pool size θ. Given that the distribution of total matches
is the convolution of passive and active matches, using results on the preservation of FOSD in convlolutions (see
Lemma 2.1 in Aubrun and Nechita (2009)), we conclude that numerical simulations suggest that the stronger result
of Lemma 1 also applies to the two-sided model, and the stronger assumption of risk neutrality could be dispensed
of in Theorem 2.
extreme equilibrium with full-inbreeding, induced by a threshold τ= 0. The entailed coordination
problem is out the scope of this paper, so we choose to abstract from it by assuming throughout
that the highest threshold ˆτconsistent with equilibrium is played in the two-way scenario. As
stated in Theorem 2, if the population is large enough, we have ˆττ.
4 Equilibrium implications for homophily
Empirical evidence on the patterns of homophily has traditionally focused on its variation across
groups making up for different shares of a total population, or relative group size. This makes
sense as populations shares are a benchmark measure of expected homophily when social contacts
are made at random. Departure of homophily from population share signals some form of bias
with respect to uniform assortment. So, although the novelty of the present model is on the role
that absolute group size plays in determining incentives to outbreed and, therefore, the individual
and aggregate patterns of homophily, in this section we focus on some documented stylized facts
regarding the relationship between relative group size and homophily. These facts have been in-
terpreted in CJP’s (2009) analysis of friendships as evidence of substantial meeting biases at work,
and motivate the present exercise to micro-found such biases. We will show that, at the aggregate
level, the mechanism proposed in this paper is consistent with the observed pattern of homophily
both in friendships and in marriages. In the next Section 5 we shall undertake a complementary
micro-founded analysis of the problem and show that, also at the level of individual strategies, our
data on friendship nominations and marriages is consistent with our model.
We start our discussion in this section by defining the Coleman index, a measure to quantify
the phenomenon of homophily. We highlight some stylized facts on the Coleman index that are
observed in our data reflecting friendship nominations and marriages. Our analysis of homophily
focuses on race, a significant characteristic along which distinct groups can be suitably defined. As
we will discuss in greater detail in Section 5, friendship nominations plausibly reflect a one-sided
matching scenario, while marriages fit well with a two-sided scenario. A detailed discussion of each
of these data sets is provided in Section 5.14
Measuring homophily
Recall from Subsection 2.1 the basic framework: there is a set NNconsisting of nagents, who
are partitioned into qgroups (or “types”) indexed by land with respective cardinality nl. We now
denote by wlthe relative population share of group l:wlnl
n. The measures of homophily we are
about to discuss aim to quantify to what degree the type-distribution of matches is biased in favor
14Further details on data and empirical procedures are also available in Appendix 2.
of same-type matches. To this end, we denote by mll0the number of matches between agents of
type land agents of type l0, and by mlPq
l0=1 mll0the number of total matches of agents of type
A basic measure can be obtained by considering the ratio mll0
ml, expressing the representation
of type l0matches in the total matches of group l. The particular case where l0=lgives rise to
what is called the homophily index of group l,Hlmll
ml. This index is to be compared with the
expected proportion of same-type matches that would result if matches resulted from a uniform
random assortment process. Such a comparison is simply captured by what we shall call excess
homophily, which is defined as the difference Hlwlfor each group l.15
When it comes to comparing the homophily of different groups the excess homophily index
may provide a distorted picture. Groups with very large size wlwill never experience large excess
homophily as the maximal potential value of Hlwl, 1 wl, is small. The index proposed by
Coleman (1958) addresses the problem by normalizing the excess homophily of group lby its
maximal value 1 wl. This gives rise to what we shall call the Coleman (Homophily) Index,
which is defined as follows:
Empirical patterns for the Coleman index
Much attention has been devoted to the relationship between a groups’ tendency to inbreed and
their relative population share. In Figure 1 we report plots of the Coleman homophily index against
relative population shares for U.S. highschool friendships and U.S. marriages. Each observation
corresponds to a particular racial group in a corresponding population. The left panel is similar to
CJP: each dot refers to friendships for a specific ethnic group in a specific school. The right panel is
new, and each dot refers to marriages for a specific ethnic group in a specific city-year combination
(please refer to section 6 for a detailed description of the datasets).
In both plots, high values of the index are found consistently for groups that cover approximately
half of the population.16 The non-linear, hump shaped relationship between the Coleman index
and relative population share was used by CJP to highlight the role of meeting biases in the
process of friendship formation. In particular, positive levels of the index for all groups, and large
positive values for middle sized groups, led to CJP’s conclusion that some bias must be at work
15A positive difference between the index Hland the population share of group lis usually referred to as “inbreeding
homophily” of group l. We do not use this terminology here in order to avoid confusion with what we refer as the
“inbreeding” choice of agents in our model.
16The main difference between the right and the left panels of Figure 1 is that in the right one for marriages the
regressed values of the Cqat zero and one are significantly different from zero. The intercept of the Cqlocus was
used in Franz, Marsili and Pin (2008) to measure the bias in the meeting process.
-.2 0 .2 .4 .6 .8
Coleman index
0 .2 .4 .6 .8 1
Relative population share
0 .2 .4 .6 .8 1
Coleman index
0 .2 .4 .6 .8 1
Relative population share
Figure 1: Coleman homophily index: Friendship nominations (left) and marriages (right).
in the meeting process. Indeed, were such biases absent—so that agents would meet uniformly at
random—their model would imply that small groups should display negative values of the index,
while for groups that comprise half of the population the index should approach zero.
Matching the model to empirical patterns
The main qualitative features of Figure 1 can be summarised as:
1. The when population share is small the CI is small and positive. As population shares
approach 0 friendships have, on average, a null CI and marriages have a strictly positive CI.
2. The CI is first increasing with population share.
3. The CI approaches 1 for groups with population shares close to half.
4. The CI decreases for groups with large population shares, becoming very small (and negative
for friendships) as population share approaches 1.
We now match each of these features to the model, distinguishing between the one-sided and the
two-sided scenarios.
First consider the two-sided scenario. The next proposition shows that, in this context, the
expected CI’s of outbreeding groups satisfy the following two properties: (a) they are strictly
positive and bounded away from zero; (b) they are increasing in the groups’ population shares.
Note that both of these conclusions are in line with the evidence for marriages depicted in the
right-hand panel of Figure 1, where the CI of very small groups is strictly positive and increasing.
To state formally the result, let us denote by ˜
Clthe ex-ante random variable that determines the
CI of a group land by E[˜
Cl] its expected value.
Proposition 1Consider the two-sided scenario with ˆτdenoting the maximum equilibrium thresh-
old and n > 2qˆτ. Let lbe any given outbreeding group. Then, E[˜
Cl]is strictly positive and bounded
away from zero, uniformly in the population size n. Furthermore, if land l0are two outbreeding
groups with nl< nl0, then E[˜
The one-sided scenario requires additional structure. For, in this context, the Expected Cole-
man Index (the term E[˜
Cl], referred to as ECI henceforth in non-formal discussion) predicted at
equilibrium for an outbreeding group approaches zero when the overall population grows large, as
outbreeders randomly draw from the population at large. We next show, however, that by enriching
the model with a small noise term (reflecting an element of pure randomness which affects agents’
realized meetings) the increasing pattern that in Figure 1 applies to groups with small population
shares (and hence outbreeding) also characterizes the one-sided scenario. Specifically, we posit:
(F) Independently of their breeding choice, all agents obtain a certain number rI>1 of draws
from their own type as well as some number rO>1 from the population at large.
Since (F) is conceived as a “perturbation,” the numbers rIand rOare to be thought as small. The
best way to think of these noise terms is to imagine that not all realized meetings are under the
full control of agents. In particular, even an inbreeder may end up meeting people from outside
her group, possibly due to chance or to social or institutional constraints. And, similiarly, we also
allow for the possibility that outbreeders direct a small part of their search within their own group
only, be it for cultural, social, geographical or familial constraints. To repeat, however, these forces
are thought as small.
The implications of such noise effects on the ECI of small outbreeding groups in the one-sided
scenario are the object of the next proposition. First, it asserts that small outbreeding groups
display a small ECI if frictions are small—in particular, if rIis small relative to η. Second, it
indicates that the ECI is increasing in population shares for large enough populations.
Proposition 2Consider the one-sided scenario under (F). There exists some ˆnsuch that if nˆn,
the following applies:
(i) Let lbe an outbreeding group lof size nl. Then E[˜
Cl]is bounded above by rI
(ii) Let l, l0be two outbreeding groups with nl< nl0. Then, E[Cl]<E[Cl0].
Our next result concerns groups of intermediate relative population share, which are inbreeding
for large enough total population size. For these groups, as we next state formally, the ECI is
arbitrarily high if meeting frictions are small, in both the one-sided and two-sided scenarios.
Proposition 3Consider either the one- or the two-sided scenario under (F). Given any  > 0,
there exist some positive δ1,δ2,δ3, and ˆnsuch that if nˆnand rO
ηδ3then any group lwith
relative population share δ1> wl> δ2has E[˜
Finally, the next two results complete the present analysis by establishing how the homophily
index changes with group size among relatively large (inbreeding) groups. First, Proposition 4
states that, among groups that inbreed and have a non-negligible relative population share, the
expected Coleman index decreases as size grows. Second, Proposition 5 indicates that as a group
approaches a situation of almost complete dominance (i.e. a fraction of the whole population that
is close to one), its Coleman index falls to the point of becoming negative.
Proposition 4Consider either the one- or the two-sided scenario under (F). Let land l0be two
groups whose relative population shares are bounded away from 0 and 1 (i.e. there exists some
ϑ > 0such that 1ϑwl0> wlϑ). Then, for any $ > 0, there exists some ˆnsuch that if
nˆnand wl0wl$,E[˜
Proposition 5Consider either the one- or the two-sided scenario under (F). There exist some ˆn
and δ1> δ2>0such that if nˆn, then any group lwith relative population share 1δ2wl
1δ1has E[˜
We end this section by considering a remarkable implication of our threshold equilibrium for
the patterns of the Coleman Index in small vs. large populations. This part is motivated by
the observation, made by Currarini, Jackson and Pin (2010), that in the AddHealth dataset on
high school friendships, school size (in terms of total number of students) significantly affects the
homophilous (ethnic) bias in student friendships—see Figure 2. They report, in particular, that
larger schools (those with more than 1000 students) display larger Coleman indices than smaller
schools, which is a feature that their model can not directly accommodate. Intuitively, such an
increase in homophily is in line with the main idea underlying our approach—i.e. that a minimal
group size is needed for ”inbreeding” activities to be effective. In fact, as we now argue, our
theoretical setting provides a formal argument in support of these intuitions.17
Let τbe the equilibrium threshold, below which a group finds it profitable to outbreed. As it
is shown in Theorems 1 and 2, this threshold size refers to the absolute number of agents in the
group and is independent of the size of the network for large n. In particular, this threshold is
not defined in terms of the relative population share of groups (that is, their fraction of the total
population), which is measured on the horizontal axis of Figure 2. As the number nof students
in the school increases, there is a larger absolute size (that is, a larger number of group members)
17We are grateful to Matt Jackson for pointing out to us this property of our model.
-.2 0 .2 .4 .6 .8
Coleman Index
0 .2 .4 .6 .8 1
Relative Population Share
Large schools Small schools
Fitted values Fitted values
Figure 2: Coleman index in friendship nominations: Small schools (<1000) vs. large schools (>1000).
associated to any given group population share w. Thus denoting by w(τ, n) the relative population
share that corresponds to the τthreshold for total population n,w(τ , n) is obviously decreasing
in n. So, as population increases from nto n0, those groups with population share wsuch that
w(τ, n0)< w < w(τ, n) start inbreeding and experience an increase in their Coleman index, while
all other groups maintain their in/outbreeding strategy unaffected.
This logic can be used to explain the shift in the relation between Coleman index and relative
population share that we observe in Figure 2. The shift is substantial for small and medium sized
groups, and vanishes for very large groups. Indeed, in the sample of smaller schools, observations
with small population share size are more likely to refer to groups with size below the threshold
τ. Therefore, as we shift attention to the sample of larger schools, we should expect to find a
higher extent of inbreeding behavior. For observations corresponding to a medium population
share, the increase in inbreeding is less significant since many of the observations among small
schools correspond to groups that are already above the threshold. Finally, for observations with
large relative population share, no significant change is observed because essentially all of those
groups must be inbreeding, both in the sample of small and large schools.
5 Evidence from microlevel data
In Section 4 we demonstrate that the proposed theory replicates the aggregate behaviour of the
Coleman homophily index. In this section we further examine some implications of the theory
using microlevel data. The first implication concerns the strategic behaviour of individuals in both
the one-sided and two-sided matching scenarios: a) Conditional on population share, individuals
in large (absolute) groups will exhibit a greater tendency to inbreed than individuals in small
(absolute) groups; b) Conditional on absolute group size, individuals in large (relative) groups will
exhibit a greater tendency to inbreed than individuals in small (relative) groups. The microlevel
data allow us to observe equilibrium matches, but not strategies per se. We show that, using the
microlevel data, we can test a) but not b).
The second implication regards between-group matching in equilibrium, and differs between the
two scenarios: Consider a small (absolute) group. Excess representation of other small groups will
not significantly differ from zero when matching is one-sided, but excess representation of other
small groups will be positive is matching is two-sided.
We utilise microlevel data reflecting friendship nominations and marriages. Friendship nom-
ination provides a good example of one-sided matching. These data need not reflect consensual
friendships; we find that fewer than 40% of all nominations are reciprocated. Therefore, friendship
nominations provide useful information on one-sided matching, but should not be thought of as
shedding light on consensual friendship formation. Clearly, marriages fit with a two-sided scenario,
as both parties must agree to a observed match.
We emphasise that the purpose of this exercise is to test qualitative predictions of the model.
Structural estimates of the primitives corresponding to the theory, which would be an interesting
contribution, are beyond the scope of this paper.
5.1 Data
Here we provide an overview of the data used in our analysis. Summary statistics can be found in
Appendix 2.
Friendship nomination data come from the Add health network structure files (Moody; 2005).18
These files are constructed from the In-School questionnaire, administer across 90 118 adolescents,
in grades 7–12 in the United States, for the 1994–1995 school year. These data are extensively used
in sociological works on homophily (Moody (2001) for instance), and more recently by Currarini,
Jackson and Pin (2009, 2010) in their economic model of friendship. In this questionnaire, students
18Data files are available from Add Health, Carolina Population Center (
are asked to nominate up to 10 friends. The data record these friendship nominations and allow
us to map networks of friendship nominations, by race, within a given school.
For the purpose of our analyse, we define four different racial categories in the Add Health
data: White, Black, Hispanic and Asian. An observation is excluded if it does not include at least
one observable nomination19 or if the corresponding race is not identified by one of the four racial
categories. The resulting sample includes 55 676 students across 78 different schools.
Marriage data come from four waves of the U.S. population census (Ruggles et al; 2015).20
As with the friendship nominations, data should reflect a plausible link between match type (an
in-group/out-group match is defined by race of husband relative to race of wife) and group charac-
teristics (i.e. population share and absolute size). For this reason we focus the microlevel analysis
on marital matches in cities identified in the public use population census.21 We define five dif-
ferent racial categories: White, Black, American Indian (Native henceforth), Hispanic and Pacific
Asian (Asian henceforth). The data include a linking rule that allows us to match spouses within
a given census. Observations are excluded if: both spouses are not present in the data; wife does
not belong to one of the five racial groups; the age of either spouse is less than 20 or greater than
49; either spouse is identified as immigrating after age 15. The first two conditions are required
to identify the match-type. The third and fourth conditions increase the plausibility of the link
between the match-type and group characteristics. This results in a sample of 501 235 marriages
observed across 212 cities and 4 census-years.
We define the outcome of interest as follows. For an individual ibelonging to group lin
population mthe we define empirical variable s?
ilm, where s?
ilm = 1 if all observed matches are
of race land s?
ilm = 0 if at least one match is of a different race than l. For the friendship
nominations i,land mcorrespond to student, race and school; a match corresponds to student
i’s friendship nominations. For the marriage data i,land mcorrespond to wife, race and city; a
match corresponds to wife i’s husband. We drop subscripts for the remainder of the document.
5.2 Inbreeding and group size
The model implies that we should see a positive relationship between inbreeding and in-group size
in both the one-sided matching and two-sided matching scenarios. Here we test this relationship
using microlevel data.
19We only observe the racial characteristics of nominated students in the same school. For this reason, regressions
weight observations by the proportion of total observations which we observe. For example, if a student has 7
nominations but only 2 are in that students school, they receive a weight of 2/7. This technique gives more weight
to observations with more information.
20We utilize 5% samples for the years 1980, 1990 and 2000, and a 1% sample for 2010.
21For confidentiality purposes not all cities can be identified in the public files.
Let sdenote the true strategy played by an individual where s= 1 for inbreed and s= 0 for
outbreed. Let s?denote the empirical variable (defined above) where s?= 1 if all observed matches
are in-group and s?= 0 if at least one observed match is out-group. We denote the probability of
a random variable xtaking a specific value, ¯x, by P[x= ¯x].
Causal observation suggests that P[s?= 1] is increasing in nl. In figures 3 and 4 we plot
P[s?= 1] against nlby race, for each school and city-year respectively22. For each group, in both
the one-sided and two-sided scenarios, the proportion of observations matched only to in-group
members is increasing. The increase in P[s?= 1] is sharper at smaller sizes of the in-group.
0 .2 .4 .6 .8 10 .2 .4 .6 .8 1
0 500 1000 1500 2000 0 500 1000 1500 2000
White Black
Hispanic Asian
In-group nominations only (proportion)
In-group size
Figure 3: Proportion of students with all observed friendship nominations from in-group.
What can be learned about sfrom the observation of s?? Recall: ηdenotes the number of draws
from the strategy-determined pool (either in-group or full population); rO>1 and rI>1 denote
the number of draws, independent of strategy, made from the population, N, and the restricted
in-group pool, nl, respectively; pdenotes the probability of matching with any given draw. η,p,
rOand rIcan vary randomly across individuals but are independent of group size, population size
and strategy. These parameters are not observed by the econometrician.
The relationship between s?and scan be described as follows:
s?= (1 [outgroup match|rO])s+ (1 [outgroup match|η+rO])(1 s),(9)
22For presentation nlis restricted to less than 5000 for cities. Figures do not qualitatively change is the full domain
of values is considered.
0 .2 .4 .6 .8 10 .2 .4 .6 .8 1
0 5000
0 5000 0 5000
White Black Native
Asian Hispanic
In-group nominations only (proportion)
In-group size
Figure 4: Proportion of marriages between in-group members.
where [.|k] is an indicator function taking a value of 1 if the argument is true and 0 otherwise
given kdraws from the full population. A couple of things are worth noting. First, when s= 1,
we observe s?= 1 only if there is no matching error:rOdraws do not yield a suitable out-group
match. This is independent of ηand rIwhich are composed of draws only from the in group (and
without error relative to the strategy). Second, when s= 0, we observe s?= 0 only if at least one
of the η+rOdraws is a suitable match from the out-group. This is independent of rIas these
draws will never be from the out-group.
We show in Appendix 2 that the probability of s?= 1, given Equation 9, can be written as:
P[s?= 1] = P[s= 1][wI+ (1 wl)(1 p)]rO+ (1 P[s= 1])[wl+ (1 wl)(1 p)]η+rO
This equation highlights a fundamental problem with using s?to draw conclusions about strategy.
P[s?= 1] is likely to provide a biased estimate of P[s= 1]. Further, the direction of this bias
cannot be signed based on casual observation.
However, it is relatively straightforward to show that:
∂P [s?= 1|wl, nl]
=∂P [s= 1|wl, nl]
where ζ=wl+ (1 wl)(1 p). Notice that ζrOζη+rO>0; this implies that the sign of
the observable right-hand-side behaviour of s?is determined by the sign on the behaviour of s23.
23However, the magnitude of the observed effect will under-estimate the magnitude of the strategy response. If
This has an important implication: observable matches (s?) can be used to test the qualitative
relationship between sand nl.
A similar inference cannot be made from the behaviour of ∂ P [s?=1|wl, nl]
∂wl. Intuitively, if a positive
value is observed, we cannot determine whether this is due to an increase in P[s?= 1], or whether
it is due to an increase in the probability that out-breeders will only match with in-group members.
Therefore, we turn to regression analysis and focus on the relationship between P[s?= 1] and nl.
We use regression analysis to test the positive relationship between P[s= 1] and nl, conditioning
on wl, implied by the model. We estimate a Probit model, for each race, regressing s?
ilm on absolute
size, nl, and relative population share, wl, of group l(as well as the interaction between the two
measures). In addition to group size, friendship nomination regressions include the dummy variables
to control for student grade and marriage regressions include dummy variables for year of census,
the age of each spouse and the education of each spouse. The estimated coefficients are reported
in Table 1, for friendship nominations in the top panel and for marriages in the bottom panel.
The estimates reported in Table 1 are generally consistent with model. Two estimated coef-
ficients corresponding to nl—for White for friendship nominations and Black for marriages—are
small in magnitude and statistically indistinguishable from 0.24 For both friendship nominations
and marriages we find: 1) the probability of observing s?= 1 is increasing with absolute in-group
size; 2) the probability of of observing s?= 1 is increasing with relative population share; 3) the
positive relationship between absolute in-group size and s?is smaller when population share is
large. Further 1), 2) and 3) are all consistent with the model. 2) is consistent with a low return to
outbreeding (reflected by a small wlfor a given nl) leading to a higher probability of inbreeding. A
possible interpretation of 3) can be presented as: When wlis close to 1 inbreeding is high; there are
few individuals who will change strategy if nlwhere to increase. When wlis close to 0 outbreeding
is high; there are many individuals who may change strategy if nlwhere to increase. While both
2) and 3) are consistent with the model, we are cautious about interpreting this as evidence, for
the reasons discussed above. However, the estimates for nl, in 1), provide strong support for the
We can use the friendship nomination data to further explore a consequence of the proposed
model. In the model, the homophily observed in aggregate data results from the strategic choice
of individuals to inbreed or outbreed. Consider an alternative theory: All individuals outbreed,
P[s= 1] = 0, but they do so with an own-group matching bias. For example, this can be modelled
as ptaking a greater value for in-group draws than for out-group draws. We can test whether this
values of ηand rOwere observed we could derive the magnitude as well.
24Note that these are two of the largest groups in the samples. This may reflect the fact that these groups are
generally of a size that an increase in nlhas little impact on observed strategic choices (a large portion of each
population is already playing an in-group strategy).
Table 1: Probit regression, one-sided and two-sided matching (outcome s?).
White Black Hispanic Asian Native
One-sided(Friendship nominations)
nl0.038 0.102 0.139 0.551
(0.066) (0.041)** (0.037)*** (0.062)***
wl2.231 2.012 2.872 3.883
(0.369)*** (0.200)*** (0.357)*** (1.440)***
nl×wl-0.068 -0.180 -0.140 -1.171
(0.078) (0.052)*** (0.035)*** (0.320)***
Obs. 34 630 9 174 9 062 2 808
Pseudo R20.061 0.069 0.276 0.159
nl0.052 -0.016 0.832 0.139 7.001
(0.025)** 0.022 (0.073)*** (0.024)*** (2.039)***
wl1.164 1.673 7.660 3.114 41.253
(0.113)*** (0.217)*** (0.768)*** (0.252)*** (6.786)***
nl×wl-0.073 -0.092 -5.526 -0.427 -353.237
(0.041)* 0.084 (0.586)*** (0.089)*** (65.285)***
Obs. 365 442 85 805 8 377 38 626 2 985
Pseudo R20.0456 0.0849 0.0977 0.1764 0.0667
Notes: Robust standard errors, clustered by school (one-sided) or city (two-sided), reported in
parenthesis. *, **, and *** denote statistical significantly at 10%, 5% and 1%. Observations are
weighted by the proportion of total nominations observed
Additional co-variates include dummy variables for student’s grade at time of survey. Data from
Add Health Survey, see data appendix for details.
Group size scaled by 10 000. Additional co-variates include dummy variables for: year of census,
husband’s and wife’s age, husband’s and wife’s education.
alternative theory can explain observed patterns in the data, by looking at excess representation
when s?= 0. If pfavours in-group draws then the representation of out-group matches relative to
population share will be negative when s?= 0.
We calculate the excess representation , defined by:
Recall from Section 4 that mlPq
l0=1 mll0denotes the number of total matches of agents of type
l. This measures the representation of a group l0within the matches of group lover and above
what we would expect if all matches are made randomly. For example, we may record the number
of Blacks, or Hispanics, or Asians among the total nominations reported by White individuals.
In Figure 5 we plot, for each racial group, out-group excess representation for the entire group
(represented by the solid dots) and for the subgroups for whom s?= 0 (represented by hollow
diamonds). The latter can be thought of as being largely composed of out-breeders. If there is
an own-group bias in p, then we expect to see negative excess representation of outgroups when
looking s?= 0. Figure 5 is summarized in Table 2 where we report, by race, the estimated mean
(and standard error) of ∆ll0, for the full sample and conditional on s?= 0; we denote the estimated
means by E(∆ll0) and E(∆ll0|s?= 0) respectively.
-1 -.5 0 .5 1-1 -.5 0 .5 1
0 500 1000 1500 2000 0 500 1000 1500 2000
White Black
Hispanic Asian
All nominations Outbreeder nominations
Outgroup excess representation
In-group size
Figure 5: Excess representation for all students and s?= 0 only: Friendship nominations.
Both the fig. 5 and the mean values reported in Table 2 suggest that the alternative theory
cannot fully explain the observed patterns in friendship nomination. This is particularly stark for
White and Hispanic groups for which ∆ll0does not significantly differ from 0 in the s?= 0 subgroup.
5.3 Cross-group ties
Which scenario we are in—one-sided matching versus two-sided matching— will have implications
for the predictions of our model with respect to the cross-group ties observed in equilibrium. The
Table 2: Excess representation for all students and s?= 0 only: Friendship nominations.
White Black Hispanic Asian
E(∆ll0) -0.119 -0.316 -0.173 -0.342
(0.017)*** (0.044)*** (0.028)*** (0.047)***
E(∆ll0|s?= 0) 0.006 -0.192 -0.045 -0.195
(0.017) (0.041)*** (0.030) (0.0377)***
Notes: Robust standard errors in parenthesis. *, **, and *** denote excess
representation significantly differs from 1 at 10%, 5% and 1%.
two-sided scenario predicts that outbreeders meet agents in the restricted pool of outbreeders.
Thus, if meeting is uniform, outbreeding groups should display an excess representation of other
outbreeders. This simply follows from the fact that outbreeding groups are found with probabilities
that reflect the relative shares in the pool of outbreeders, and these shares exceed those in the overall
population. Since outbreeding groups are relatively small, it follows that cross-group matches are
primarily formed among agents of small groups.
To examine this prediction we look at the relative representation of small groups in friendship
nominations and the marriages of other small groups.25 Figure 6 plots excess representation of small
groups (racial groups with fewer than 80 members in a school (nominations) and 500 members in
a city(marriages)) against own-group size (restricted to groups smaller than 5000). Formally, for
each group l, we calculate P{l0:nl0x}ll0, where xstands for the corresponding threshold (x= 80
25As very small groups are relatively few in number, we do not analysis separately by race.
0 .2 .4 .6 .8 1
Excess representation
0 50 100 150 200
In-group size
0 .2 .4 .6 .8 1
Excess representation
0 1000 2000 3000 4000 5000
In-group size
Figure 6: Excess representation of small groups: Friendship nominations (left) and marriages (right).
Table 3: Excess representation of small groups.
One-sided (Friendship nominations) In-group size
1–20 20–50 50–200
E(Pll0|nl080) 0.013 0.008 -0.000
(0.009) (0.008) (0.004)
Number of schools 68 51 56
Two-sided (Marriages)
1–50 50–200 200–1 000
E(Pll0|nl0500) 0.048 0.007 -0.006
(0.018)*** (0.006) (0.002)**
Number of cities 112 148 155
Notes: Values reflect means. Robust standard errors, clustered by school (nominations) and city
(marriages), reported in parenthesis. *, **, and *** denote excess representation statistically
differs from 0 at 10%, 5% and 1%.
for nominations and x= 500 for marriages). These cases are shown for illustrative purpose, and
qualitatively similar pictures obtain when we fix different small thresholds.
Figure 6 suggests that a positive excess representation of ”small” groups is a feature of the
marriages (right-hand panel) of small groups only, while for larger groups tend to marry with these
small groups at rates below these groups’ population shares. In particular, there seems to be some
very small critical size of groups after which the over-representation of small groups disappears.
This insight is supported in Table 3, where we report the mean excess representation of small
groups by different racial groups, denoted by E(Pll0|nl0x), stratifying, for each l, by in-group
size from 1–20, 20–50 and 50–200 for friendship nominations and 1–50, 50–200 and 200–1 000 for
marriages26. Consistent with our theory, mean excess representation is significant and positive in
two-sided matching, but not distinguishable from zero in the one-sided matching. In the two-sided
scenario, mean excess representation decreases for medium sized in-groups and is significant and
negative for large in-groups. These results are consistent with the predictions of the model for
two-sided and one-sided matching.
26Bin sizes are chosen to keep the number of schools/cities relatively constant.
6 Summary and concluding remarks
The paper has proposed a very stylized model of homophily, which may be applied to a diverse
range of alternative phenomena such as friendships and marriages. Our main purpose has been
to provide a behavioral foundation to the meeting biases that have been shown in previous works
to play a key role in the emergence of homophily in social networks. Our approach hinges upon
two key assumptions: (i) the establishment of ties with individuals that differ in some relevant
characteristics (e.g. race or language) implies a costly investment; (ii) the search for suitable ties
is more effective in larger pools. Under these assumptions, the induced game was shown to have
a threshold equilibrium where groups outbreed if, and only if, their size falls below a certain level.
This simple structure of the equilibrium has implications that match the empirical evidence found in
both friendship and marriage data. Specifically, it is consistent with the nonmonotonicity displayed
by the Coleman homophily index as well as with regularities observed on the pattern of in-group
and cross-group ties.
While homophily is a complex and multifaceted phenomenon, we believe that our model high-
lights a very basic force underlying homophily that future analysis of the phenomenon may take
into account. Our extremely stylized model does not contain explicit elements of preferences, and
homophily is built in through the (fixed) cost of outbreeding. A more realistic model allowing for
preferences would contain additional interesting features, that we plan to integrate in the present
framework in future research. In fact, in the presence of preferences in favour of in-group contacts,
groups accounting for a small share of the population may face strong incentives to inbreed in order
to avoid mixes of realized contacts dominated by the out-group. Other issues that future research
should address include the consideration of flexible individual characteristics. In many social con-
texts, these characteristics (language, religion, etc.) are not forever fixed in individuals and their
descendants but can be changed through inter- action which may possibly mitigate differences,
but also exacerbate them in some other cases. In this sense, cross-ties among different types could
breed convergence of characteristics (and thus integration), or possibly the opposite. In general,
one might anticipate that interesting nonlinear dynamics may arise under some circumstances.
To understand better such interplay between in- teraction/segmentation on the one hand and ho-
mogenization/polarization on the other, seems a crucial issue for future theoretical and empirical
[1] Allport, W. G., 1954. The Nature of Prejudice. Cambridge, MA: Addison-Wesley.
[2] Aubrun, G. and I. Nechita, 2009. “Stochastic Ordering for Iterated Convolutions and Catalytic
Majorizations,” Annales de l’Institut Henri Poincar´e— Probabilit´e et Statistiques 45(3), 611–
[3] Baccara, M. and L. Yariv, 2013. “Homophily in Peer Groups,” American Economic Journal:
Microeconomics 5(3), 69–96.
[4] Bala, V. and S. Goyal, 2000. “A Noncooperative Model of Network Formation,” Econometrica
68(5), 1181–1229.
[5] Blau, P. M., 1977. Inequality and Heterogeneity: A Primitive Theory of Social Structure. New
York: Free Press.
[6] Bramoull´e, Y. and B. Rogers, 2009. “Diversity and Popularity in Social Networks,” mimeo.
[7] Coleman, J., 1958. “Relational Analysis: The Study of Social Organizations With Survey
Methods,” Human Organization 17, 28–36.
[8] Currarini, S., M.O. Jackson and P. Pin, 2009. “An Economic Model of Friendship: Homophily,
Minorities and Segregation,” Econometrica 77(4), 1003–1045.
[9] Currarini, S., M.O. Jackson, and P. Pin, 2010. “Identifying the Roles of Choice and Chance
in Network Formation: Racial Biases in High School Friendships,” Proceedings of the National
Academy of Science 107, 4857–4861.
[10] Dixit, A., 2003. “Trade Expansion and Contract Enforcement,” Journal of Political Economy
111(6), 1293–1317.
[11] Fischer, CS , 1982, To Dwell among Friends, Chicago, Univ. Chicago Press.
[12] Franz, S., M. Marsili and P. Pin, 2008. “Observed choices and underlying opportunities,”
Science and Culture 76, 471–476.
[13] Galeotti, A., F. Ghiglino and F. Squintani, 2013. “Strategic Information Transmission Net-
works” Journal of Economic Theory 148(5), 1751–1769.
[14] Giles, M. W., 1978. “White Enrolment Stability and School Desegregation: A Two-Level
Analysis,” American Sociological Review 43, 2448–2464.
[15] Golub, B. and M. O. Jackson, 2012. ”How Homophily Affects the Speed of Learning and
Best-Response Dynamics,” Quarterly Journal of Economics 127(3), 1287–1338.
[16] Jackson, M. and B. Rogers, 2005. ”The Economics of Small Worlds,” The Journal of the
European Economic Association (papers and proceedings) 3(2-3), 617–627.
[17] Kandel, D., 1978. “Homophily, Selection, and Socialization in Adolescent Friendships,” Amer-
ican Journal of Sociology 84(2), 427–436.
[18] Lazarsfeld, P.F. and R.K. Merton, 1954. “Friendship as a Social Process: A Substantive and
Methodological Analysis,” in M. Berger (ed.), Freedom and Control in Modern Society, New
York: Van Nostrand.
[19] Marsden, P.V., 1987. “Core Discussion Networks of Americans,” American Sociological Review
52, 122–313.
[20] Marsden, P.V., 1988. “Homogeneity in Confiding Relations,” Social Networks 10, 57–76.
[21] McPherson, M., L. Smith-Lovin and J. M. Cook, 2001. “Birds of a Feather: Homophily in
Social Networks,” Annual Review Sociology 27, 415–444.
[22] Moody, J., 2001. “Race, School Integration, and Friendship Segregation in America,” The
American Journal of Sociology 107(3), 679–716.
[23] Moody, J., 2005. “Add Health Network Structure Files,” technical document: Carolina Popu-
lation Center University of North Carolina at Chapel Hill.
[24] van der Poel, M. 1993. Personal Networks. Netherlands: Swets & B. V. Zeitlinger.
[25] Ruggles, S., K. Genadek, R. Goeken, J. Grover and M. Sobek, 2015. Integrated Public Use
Microdata Series: Version 6.0 [Machine-readable database]. Minneapolis: University of Min-
[26] Staje, W., 1990. “The Collector’s Problem with Group Drawings,” Advances in Applied Prob-
ability 22(4), 866–882.
[27] Suen, W, 2010. “Mutual Admiration Clubs,” Economic Inquiry 48(1), 123–132.
[28] Vega-Redondo, F., 2007. Complex Social Networks, Econometric Society Monograph Series,
Cambridge: Cambridge University Press.
[29] Zeggelink, E. P. H. 1993. Strangers into Friends: The Evolution of Friendship Networks Using
an Individual Oriented Modeling Approach. Amsterdam: ICS
Appendix 1
Here, we provide the proof for the formal results stated in the main text.
Proof of Theorem 1
First we note that, in the one-sided model, the payoff of any player ibelonging to an outbreeding
group lin a group-symmetric profile γis independent of the choice of groups different from l.
Specifically, the expected payoff πl(γ) for an individual iof an outbreeding group lis given by the
where δ(n)0 as n→ ∞. Similarly, we can write the payoff of any individual of group lwhen
inbreeding as:
Take the extreme case where nl= 1. Obviously, πI(nl) = 0 while, by virtue of the assumption that
V(1) > c.
we have πO(1) >0. This implies that outbreeding is always optimal for sufficiently small groups.
Next, we want to show that such inbreeding incentives decrease monotonically with group size.
To this end, we can invoke Lemma 1, already stated in Section 3, which claims that, as the pool
size becomes larger, the induced distributions over the number of distinct meetings improve in the
FOSD sense. Before proceeding with the proof of the Theorem, we provide a detailed proof of that
auxiliary result.
Proof of Lemma 1:
Let us denote by
Pθ(ν;η, ) the probability of νdistinct elements from ηdraws with replacement
out of a set of size θ. It is enough to show that, for all ηand θ, the probability distribution
Pθ+1(ν;η)oν=0,1,2,... dominates the distribution n
Pθ(ν;η)oν=0,1,2,... in the FOSD sense.
Following Staje (1990), we can write:
Pθ(ν;η) = θ
Let us now consider the ratio of
Pθ(ν;η) to
Pθ+1(ν;η) :
θ+ 1
θ+1 η(13)
which can be written as:
(θν)! ν!
(θ+ 1)η
(θ+ 1)!
(θ+ 1 ν)! ν!
or, equivalently: 1
(θν)! ν!
(θ+ 1)η
(θ+ 1)!
(θ+ 1 ν)! ν!
=(θ+ 1)η1(θ+ 1 ν)
Note that for ν= 1 this yields:
(θ+ 1)η1
Note also that for all admissible values of θand ν, the ratio
Pθ+1(ν;η)is decreasing in ν. Since these
are probability distributions, we conclude that these exists ¯νsuch that
Pθ+1(ν;η)<1 for all ν > ¯ν.
This implies that
Pθ+1(ν;η) First Order Stochastic Dominates
Pθ(ν;η), and thus completes the
proof of the Lemma.
Returning now to the proof of the Theorem, recall that U(yi+ 1) U(yi) for all yi1 and
U(1) > U(0). Therefore, combining Lemma 1 with the monotonicity of Uwe may conclude that,
for any group size nl,
πI(nl+ 1) πI(nl)>0. (14)
Let now τbe the lowest integer such that
πI(τ)V(η)c. (15)
Then, both if (15) holds strictly or with equality, it is clear that by making nlarge enough, we have
πI(τ1) < πO(τ)< πI(τ),
which proves that τis the desired threshold, and completes the proof of the Theorem.
Proof of Theorem 2
In the present two-sided scenario, we find again that the threshold features of the equilibria
hinge upon the monotonically decreasing incentives to outbreeding resulting from increasing pool
size. Such monotonicity is the essential implication of Lemma 2, already stated in Section 3, which
claims that the expected number of distinct meeting grows with pool size. Before tackling the proof
of the Theorem itself, we provide a detailed proof of that Lemma.
Proof of Lemma 2
Given a set Θ and some LΘ, let us first derive (cf. Stadje (1990)) the expected number of
distinct meetings that agent iobtains from the set Θ\Lby means of ηindependent draws with
replacement out of the set Θ. Denoting by θand l, the cardinalities of the sets Θ and Lrespectively,
that expected number is equal to
(θl)·qθ(η) (16)
qθ(η) = 1θ1
is the probability that an agent in the set Θ is found by means of ηdraws with replacement from
that set. (In our two-sided scenario, the set Lis to be interpreted as the set of agents that find i
through search, and that should not be counted twice in the union of passive and active draws if
found also by agent i.)
Consider now the random variable ˜ν(θ) considered in the statement of the Lemma, for some
given pool size θN. Recall that this variable gives the number of distinct meetings an agent
obtains from a pool of size θwhen meeting is two-sided and both this agent and all the others
obtain ηdraws. Its expected value can be computed by adding the expected number of agents who
are met “passively” by this agent, i.e.
l=0 θ
lqθ(η)l(1 qθ(η))θl·l
and those that are found through “active” search , i.e.
l=0 θ
lqθ(η)l(1 qθ(η))θl·(θl)·qθ(η).
Thus, combining both expressions, we can write:
Eν(θ)] =
l=0 θ
lqθ(η)l(1 qθ(η))θl·l+
l=0 θ
lqθ(η)l(1 qθ(η))θl·(θl)·qθ(η).(18)
Now note that by factoring the term θ·qθ(η) in the second summatory in (18) we can write this
sum as follows:
l=0 θ
lqθ(η)l(1 qθ(η))θlqθ(η)
l=0 θ
lqθ(η)l(1 p(η, θ))θll. (19)
Note that the second term of (19) is just θ·qθ(η)2, while the first term is simply θ·qθ(η). Integrating
all the former considerations into (18) we can write:
Eν(θ)] = qθ(η)·θ+qθ(η)·θqθ(η)2·θ
Let us define the function f(θ, η) by the right-hand side of the above expression. The derivative of
fwith respect to θis given by:
∂f (θ, η)
∂θ =1
θ1"(θ1) 1θ1
and the sign of ∂f (θ, η)
∂θ is the sign of the following expression:
(θ1) 1θ1
θ2η!2η 1θ1
Taking logs we have that f (θ,η)
∂θ >0 iff:
ln(θ1) >2ηln(θ1) 2ηln(θ) + ln(2η+θ1)
which rewrites as follows:
2η(ln(θ)ln(θ1)) >ln(2η1 + θ)ln(θ1).
The above condition is a direct consequence of the strict concavity of the logarithm function, which
establishes the Lemma.
Under the assumption that the utility function is linear, Lemma 2 readily implies that, in every
group-symmetric Nash equilibrium of the breeding game, if an agent of group linbreeds then every
other individual of a group l0such that nl0> nlmust inbreed as well. The equilibrium, therefore,
must be of the threshold type. Finally, we argue that one such equilibrium is defined by the
same threshold τestablished in Theorem 1 that defines the (unique) equilibrium in the one-sided
scenario. To see this, simply note that, if P
nl> αn we can still write
where δ(n)0 as n→ ∞. Therefore, for large enough nthe same argument as in the proof of
Theorem 1 establishes τas the only equilibrium threshold in this case, which completes the proof
of the result.
Proof of Proposition 1 First, consider any given outbreeding group l. Its expected excess
homophily is Eh˜
wO1/n wl, where wOdenotes the equilibrium fraction of outbreeders
in the population. If n > ˆτ q, since nOqˆτ, then
wO1/n =nl1
for some δ > 0, uniformly in n. Therefore, we have that
δ > 0 (21)
as claimed in the first part of the proposition. As for its second part, it immediately follows from the
fact that, in the expression for Eh˜
Cli, the numerator is increasing with wlwhile the denominator
decreases with it, for given wO.
Proof of Proposition 2
(i) Consider an outbreeding group lof given size nl. First note that, as n→ ∞, we have
n&0. Thus, for nlarge enough, the random variable ˜
Clcan be approximated as follows:
Cl'˜ν(rI, nl)
˜ν(rI, nl) + ˜ν(η+rO,)
and, therefore, for some arbitrarily small , one can write:
if is chosen small enough.
(ii) For simplicity, consider two outbreeding groups land l0whose cardinalities differ in just one
individual, i.e. nl+ 1 = nl0, and let nEh˜
Hliwlodenote the expected
change in excess homophily. Furthermore, let m(x, y)Eν(x, y)] stand for the expected number
of distinct meetings obtained when the number of draws is xand the pool size is y. Then, we can
∆ = m(rI, n0
m(rI, n0
l) + m(rO+η, )nl0
nm(rI, nl)
m(x, nl) + m(rO+η, )nl
=m(rI, n0
m(rI, n0
l) + m(rO+η, )m(rI, nl)
m(rI, nl) + m(rO+η, )1
Since, by Lemma 2, m(rI, n0
l)m(rI, nl)>0, the difference
m(rI, n0
m(rI, n0
l) + m(rO+η, n0
l)m(rI, nl)
m(rI, nl) + m(rO+η, nl)
is strictly positive and uniformly bounded away from zero. It follows, therefore, that, for nlarge
enough, ∆ is strictly positive. Recalling now the expression for the Coleman index, and since,
obviously, 1 wl0<1wl, the fact that ∆ is strictly positive implies that E[Cl]<E[Cl0]
Proof of Proposition 3 A preliminary observation is that, if nis large enough, then since
nis bounded away from zero by δ2it must be that nlτ(where τis as in Theorem 1) and
therefore group lmust inbreed in any equilibrium, either in the one- or two-sided scenarios. The
same considerations indicate that, for large enough n, the group size nlcan be made arbitrarily
large, in which case we can approximate its expected Coleman index as follows:
An appropriate choices of δ3ensures that the term η+rI
η+rI+rOis arbitrarily close to 1. Thus, by
choosing δ1small enough, the expected homophily E[˜
Cl] can be made arbitrarily close to 1, as
Proof of Proposition 4 Consider two groups, land l0, whose relative population shares are
bounded away from 0 and 1, as formulated in the statement of the result. As nbecomes large, both
groups must exceed the threshold τspecified in Theorem 1, so both find it optimal to inbreed.
Then, by invoking the usual approximations for large nto approximate the expected Coleman
index, the desired conclusion reads:
where we use the fact that the size of both groups grows unboundedly with n. In view of the fact
that m(η+rI,)
m(η+rI,) + m(rO,)=η+rI
it is immediate to see that (23) holds if, and only if, the difference wl0wlis bounded above zero,
as claimed.
Proof of Proposition 5 Given η,rO, and rI, choose δ1<1
m(η+rI,)+m(rO,). Now suppose
that 1 δ2wl1δ1for some arbitrarily given δ2< δ1. Then we claim that, if nis large, E[˜
is negative. To see this note that, if wlis bounded away from 1 and nis large enough, the sign of
C] is that of the term m(η+rI,)
m(η+rI,)+m(rO,)wl. Thus, since choice of δ1ensures that
wl1δ1>2m(η+rI,) + m(rO,)
m(η+rI,) + m(rO,).
the desired conclusion follows.
Appendix 2
Here we provide details regarding the link between observable matches and underlying strategy and
descriptive statistics for the sample used in the microlevel analysis.
The relationship between empirical matches and strategy
Let: ηdenote the number of draws from the strategy-determined pool (either in-group or full
population); rO>1 and rI>1 denote the number of draws, independent of strategy, made from
the population, N, and the restricted in-group pool, nl, respectively; pdenotes the probability
of matching with any given draw. η,p,rOand rIcan vary randomly across individuals but are
independent of group size, population size and strategy. These parameters are not observed by the
Let sdenote the strategy played by an individual where s= 1 for inbreed and s= 0 for
outbreed. Let s?denote the empirical variable where s?= 1 if all observed matches are from the
in-group and s?= 0 if at least one observed match is from the out-group. We want to know: What
can be learned about sfrom the observation of s??
s?can be mapped to sthrough the following relationship:
s?= (1 [outgroup match|rO])s+ (1 [outgroup match|η+rO])(1 s)
where [.|k] is an indicator function taking a value of 1 if the argument is true, based on kdraws
from the full population, and 0 otherwise.
This implies that the probability of observing s?= 1 is:
P[s?= 1] = (1 P[outgroup match|rO])P[s= 1] + (1 P[outgroup match|η+rO])(1 P[s= 1]).
Clearly, we need to derive P[outgroup match|rO] and P[outgroup match|η+rO] in terms of the
model’s parameters.
Let nldenote the (absolute) size of of the in-group for an individual of race l,nO=Pl06=lnl0
denote the (absolute) size of the out-group and wl=nl/(nl+nO) be its relative population share.
For any draw made from the full population, the probability of drawing an out-group member
is given by 1wl. The probability of any given draw being a suitable match is given by p. A given
individual will only be observed with in-group matches if one of the the following is true:
1. For all draws from the full population, only ingroup members are drawn.
2. Of the draws from the full population that result in outgroup members, none are a suitable
An inbreeder makes rOindependent draws (with replacement) from the full population, and an
outbreeder makes η+rOindependent draws (with replacement) from the full population. Therefore,
we can write
1P[outgroup match|rO]=[wl+ (1 wl)(1 p)]rO
1P[outgroup match|η, rO]=[wl+ (1 wl)(1 p)]η+rO
Letting ζ=wl+ (1 wl)(1 p) we can write:
P[s?= 1|wl, nl] = P[s= 1|wl, nl]ζrO+ (1 P[s= 1|wl, nl])ζη+rO
Taking the partial derivative of P[s?= 1|wl, nl], conditional on wl, we can say the following:
∂P [s?= 1|wl, nI]
=∂P [s= 1|wl, nl]
The object in parenthesis on the right-hand-side is strictly positive. It follows that the sign of
∂P [s?=1|wI, nI]
∂nIis determined by the sign of ∂ P [s=1|wI, nI]
∂nI. However, the magnitude will be an under-
A similar inference cannot be drawn from the behavior of ∂P [s?=1|wI, nI]
∂wI. To see this write:
∂P [s?= 1|wI, nI]
=∂P [s= 1|wI, nI]
+ (η+rO)η+rO1+P[s= 1|wI, nI]p(rOζrO1(η+rO)ζη+rO1)
The second right-hand-side term is positive, but the remaining terms are indeterminate. ∂P [s=1|wI, nI]
cannot be signed based on ∂ P [s?=1|wI, nI]
Descriptive statistics
Table 4: Racial matches: Add Health friendship nominations and U.S. population census marriages
Friendship nominations Race of nominee
Race of nominator White Black Hispanic Asian Other
White 81.41 2.03 6.11 1.64 7.96
Black 7.17 72.63 9.48 1.00 8.38
Hispanic 25.72 9.96 52.28 3.68 7.40
Asian 24.15 3.81 11.38 50.00 9.91
Marriages Race of husband
Race of Wife White Black Native Asian Hispanic Other
White 93.69 1.06 0.43 0.73 2.58 1.51
Black 1.64 96.66 0.10 0.16 0.73 0.71
Native 53.40 6.23 25.80 1.81 6.65 6.11
Asian 28.32 2.29 0.32 63.10 2.63 3.34
Hispanic 19.89 2.21 0.28 0.76 73.97 2.89
Each cell reports the percent of total nominations (marriages) for nominee (husband) race by nominator
(wife) race.
Table 5: Descriptive statistics: Add Health friendship nominations and U.S. population census marriages
Mean Std. Dev. Max. Min.
Friendship nominations
Students (per school) 1 178 552.54 20 2 284
Sex = female 0.52 0.51 0 1
Race = White 0.62 0.48 0 1
Race = Black 0.16 0.37 0 1
Race = Hispanic 0.16 0.37 0 1
Race = Asian 0.05 0.22 0 1
Grade 9.55 1.73 0 12
Nominations (per student, total) 7.81 2.57 1 10
Nominations (per student, observed) 5.20 2.55 1 10
Out-group nominations per student 0.87 1.46 0 10
Schools (total) 78
Students (total) 55 676
Race = White 0.72 0.45 0 1
Race = Black 0.17 0.37 0 1
Race = American Indian 0.01 0.08 0 1
Race = Hispanic 0.02 0.14 0 1
Race = Asian 0.09 0.29 0 1
Educ. <High school 0.11 0.31 0 1
Educ. = High school 0.35 0.48 0 1
Educ. >High school 0.54 0.50 0 1
Age 33.86 7.36 20 49
City population 354 538 721 988 80 800 8 184 900
City-year observations 620
Marriages (total) 528 489
... Many studies have considered the impact of individual homophilic behaviors on emergent social structures [1][2][3][4][5][6]. Communities in which individuals are more likely to interact with others like them have been discussed widely as both positives (e.g., modularity potentially inhibiting disease spread [7][8][9]) and negatives (e.g., echo chambers into which important information may be less able to penetrate [10]). ...
... If we relax this assumption by arranging players spatially and setting a x,y = 0 for players more than a certain distance apart then, assuming players from each group are equally distributed spatially and so is the initial infection, this will yield identical results to our model. If we do not make those assumptions, or relax this restriction in other ways such as arranging players on a social network graph, then this is likely to create new dynamics, but is beyond the scope of this paper.2 If f is an unbounded function, this raises the concern of a degenerate case where socialization rates increase infinitely and the cost of near-constant infection is made up for by unbounded gains from socializing. ...
How self-organization leads to the emergence of structure in social populations remains a fascinating and open question in the study of complex systems. One frequently observed structure that emerges again and again across systems is that of self-similar community, i.e., homophily. We use a game theoretic perspective to explore a case in which individuals choose affiliation partnerships based on only two factors: the value they place on having social contacts, and their risk tolerance for exposure to threat derived from social contact (e.g., infectious disease, threatening ideas, etc.). We show how diversity along just these two influences is sufficient to cause the emergence of self-organizing homophily in the population. We further consider a case in which extrinsic social factors influence the desire to maintain particular social ties, and show the robustness of emergent homophilic patterns to these additional influences. These results demonstrate how observable population-level homophily may arise out of individual behaviors that balance the value of social contacts against the potential risks associated with those contacts. We present and discuss these results in the context of outbreaks of infectious disease in human populations. Complementing the standard narrative about how social division alters epidemiological risk, we here show how epidemiological risk may deepen social divisions in human populations.
For social animals, group social structure has important consequences for disease and information spread. While prior studies showed individual connectedness within a group has fitness consequences, less is known about the fitness consequences of group social structure for the individuals who comprise the group. Using a long-term dataset on a wild population of facultatively social yellow-bellied marmots ( Marmota flaviventer ), we showed social structure had largely no relationship with survival, suggesting consequences of individual social phenotypes may not scale to the group social phenotype. An observed relationship for winter survival suggests a potentially contrasting direction of selection between the group and previous research on the individual level; less social individuals, but individuals in more social groups experience greater winter survival. This work provides valuable insights into evolutionary implications across social phenotypic scales.
Online travel communities (OTCs) enable users to interact and share travel information voluntarily. Extant research has primarily focused on the content generated through user interactions but neglected how user interactions are structured. This study employed exponential random graph models to examine the formation of user interactions and the outcomes of homophily in terms of network structure across levels (actor, dyad, triad, and network). A dataset of 2,926 posts and 25,854 replies involving 9,712 users in an OTC was used. Results reveal that users’ question initiating and replying ties in OTCs exhibit significant positive structural dependencies in terms of reciprocity, activity spread, generalized transitive closure, and multiple connectivity. Homophily serves as the basis of dyadic interactions and homophilous ties evolve after formation. The study advances hospitality and tourism network research and methodology by going beyond traditional dyadic user interactions, and provides insights into user interactions in OTCs from the social network perspective.
Full-text available
Using a unique firm‐level data set from Asia, this paper examines what determined the robustness and resilience of supply chain links, i.e., the ability of maintaining links and recovering disrupted links by substitution, respectively, when firms faced economic shocks due to the spread of the coronavirus disease (COVID‐19). We find that a supply chain link was likely to be robust if the link was between a foreign‐owned firm and a firm located in the foreign‐owned firm's home country, implying that homophily on a certain dimension generates strong ties and thus supply chain robustness. We also find that firms with geographic diversity of customers and suppliers tended to increase their transaction volume with one partner while decreasing the volume with others. This evidence shows that firms with diversified customers and suppliers are resilient, mitigating the damage from supply chain disruption through the substitution of partners. Furthermore, the robustness and resilience of supply chains are found to have led to higher performance.
The frequency and type of dyadic social interactions individuals partake in has important fitness consequences. Social network analysis is an effective tool to quantify the complexity and consequences of these behaviors on the individual level. Less work has used social networks to quantify the social structure-specific attributes of the pattern of all social interactions in a network-of animal social groups, and its fitness consequences for those individuals who comprise the group. We studied the association between social structure, quantified via five network measures, and annual reproductive success in wild, free-living female yellow-bellied marmots (Marmota flaviventer). We quantified reproductive success in two ways: (1) if an individual successfully weaned a litter and (2) how many pups were weaned. Networks were constructed from 38 968 interactions between 726 unique individuals in 137 social groups across 19 years. Using generalized linear mixed models, we found largely no relationship between either measure of reproductive success and social structure. We found a modest relationship that females residing in more fragmentable social groups (i.e., groups breakable into two or more separate groups of two or more individuals) weaned larger litters. Prior work showed that yellow-bellied marmots residing in more fragmentable groups gained body mass faster-another important fitness correlate. Interestingly, we found no strong relationships between other attributes of social group structure, suggesting that in this facultatively social mammal, the position of individuals within their group, the individual social phenotype, may be more important for fitness than the emergent group social phenotype.
Full-text available
In this paper a second-order adaptive network model is introduced for presenting a user with content they like on a platform. The platform's method is using its so-called fake state. Simulation results have been performed with different scenarios for different starting values of the user and the platform. In all scenarios the platform's method can move towards the state of the user in order to achieve a form of bonding through faked homophily. After the user indeed connects to the platform it becomes more easy for the platform to move the user's preferences in a desired direction so that the platform can offer more adequate content more efficiently.
Full-text available
Homophily, the tendency of humans to attract each other when sharing similar features, traits, or opinions has been identified as one of the main driving forces behind the formation of structured societies. Here we ask to what extent homophily can explain the formation of social groups, particularly their size distribution. We propose a spin-glass-inspired framework of self-assembly, where opinions are represented as multidimensional spins that dynamically self-assemble into groups; individuals within a group tend to share similar opinions (intra-group homophily), and opinions between individuals belonging to different groups tend to be different (inter-group heterophily). We compute the associated non-trivial phase diagram by solving a self-consistency equation for 'magnetization' (combined average opinion). Below a critical temperature, there exist two stable phases: one ordered with non-zero magnetization and large clusters, the other disordered with zero magnetization and no clusters. The system exhibits a first-order transition to the disordered phase. We analytically derive the group-size distribution that successfully matches empirical group-size distributions from online communities.
Full-text available
Link prediction is one of the most widely studied problems in the area of complex network analysis, in which machine learning techniques can be applied to deal with it. The biggest drawback of the existing methods, however, is that in most cases they only consider the topological structure of the network, and therefore completely miss out on the great potential that stems from the nodal attributes. Both topological structure and nodes’ attributes are essential in predicting the evolution of attributed networks and can act as complements to each other. To bring out their full potential in solving the link prediction problem, a novel Robust Graph Regularization Nonnegative Matrix Factorization for Attributed Networks (RGNMF-AN) was proposed, which models not only the topology structure of networks but also their node attributes for direct link prediction. This model, in particular, combines two types of information, namely network topology, and nodal attributes information, and calculates high-order proximities between nodes using the Structure-Attribute Random Walk Similarity (SARWS) method. The SARWS score matrix is an indicator structural and attributed matrix that collects more useful attributed information in high-order proximities, whereas graph regularization technology combines the SARWS score matrix with topological and attribute information to collect more valuable attributed information in high-order proximities. Furthermore, the RGNMF-AN employs the ℓ2,1-norm to constrain the loss function and regularization terms, effectively removing random noise and spurious links. According to empirical findings on nine real-world complex network datasets, the use of a combination of attributed and topological information in tandem enhances the prediction performance significantly compared to the baseline and other NMF-based algorithms.
This paper considers a semiparametric model of dyadic network formation under nontransferable utilities (NTU). Such dyadic links arise frequently in real-world social interactions that require bilateral consent but by their nature induce additive non-separability. In our model we show how unobserved individual heterogeneity in the network formation model can be canceled out without requiring additive separability. The approach uses a new method we call logical differencing. The key idea is to construct an observable event involving the intersection of two mutually exclusive restrictions—derived based on weak multivariate monotonicity—on the fixed effects. Based on this identification strategy we provide consistent estimators of the network formation model under NTU. Finite-sample performance of our method is analyzed in a simulation study, and an empirical illustration using the risk-sharing network data from Nyakatoke demonstrates that our proposed method is able to obtain economically intuitive estimates.
Full-text available
Our societies are heterogeneous in many dimensions such as census, education, religion, ethnic and cultural composition. The links between individuals – e.g. by friendship, marriage or collaboration – are not evenly distributed, but rather tend to be concentrated within the same group. This phenomenon, called imbreeding homophily, has been related to either (social) preference for links with own–type individuals (choice–based homophily) or to the prevalence of individuals of her same type in the choice set of an individual (opportunity–based homophily). Choices determine the network of relations we observe whereas opportunities pertain to the composition of the (unobservable) social network individuals are embedded in and out of which their network of relations is drawn. In this view, we propose a method that, in the presence of multiple data, allows one to distinguish between opportunity and choice based homophily. The main intuition is that, with unbiased opportunities, the effect of choice–based homophily gets weaker and weaker as the size of the minority shrinks, because individuals of the minority rarely meet and have the chance to establish links together. The occurrence of homophily in the limit of very small minorities is therefore an indicator of opportunity bias. We test this idea across the dimensions of race and education on data on US marriages, and across race on friendships in US schools. I ntegration is a major concern of our societies, whose relevance has increased as an effect of globalization. The prevalence of relations between individuals of the same type or community over links across types – a well known phenomenon called (inbreeding) homophily in sociology 1–8 – has been related to either opportunity–based or choice–based homophily 5,7 : while the former (also called induced homophily) refers to a prevalence of same–type neighbors in the underlying social network, the latter reflects a bias towards same–type links in the collective choice of mutual relations, among those possible in a given neighborhood of the social network. The relation between choice behavior and the underlying social network is a complex one. Indeed, the latter is often inferred from choice behavior – friendship, marriage, co-authorship among scientists 9 – which is relatively accessible to empirical studies. Second, opportunities constrains choices to the extent that choice behavior can hardly be related to choices of the individual, but rather to the choices of the population as a whole. For example, Refs. 10, 11 shows that individual choices influence in non–trivial ways the aggregate outcome and Ref. 12 argues that biased mixing of a minority may be due to homophily of both majority and minority individuals.
We consider sampling with replacement of equiprobable groups of a fixed size m from a finite population S. Given a subset A ⊂ S , the distributions of (a) the number of distinct elements of A in a sample of size k and (b) the sample size necessary to obtain at least say n elements of A are given. Neat formulas are given especially for the expected values of these, as well as of some related random variables. Further we derive an optimal strategy to collect all elements of S under the assumptions that sampling one group costs α monetary units and that it is possible to purchase the elements which are missing at the end of the sampling procedure at a price of β > α/m per element.
Integrated schools may still be substantively segregated if friendships fall within race. Drawing on contact theory, this study tests whether school organization affects friendship segregation in a national sample of adolescent friendship networks. The results show that friendship segregation peaks in moderately heterogeneous schools but declines at the highest heterogeneity levels. As suggested by contact theory, in schools where extracurricular activities are integrated, grades tightly bound friendship, and races mix within tracks, friendship segregation is less pronounced. The generally positive relation between heterogeneity and friendship segregation suggests that integration strategies built on concentrating minorities in large schools may accentuate friendship segregation.
The focus of this paper is the endogenous formation of peer groups. In our model, agents choose peers before making contributions to public projects, and they differ in how much they value one project relative to another. Thus, the group's preference composition affects the type of contributions made. We characterize stable groups and find that they must be sufficiently homogeneous. We also provide conditions for some heterogeneity to persist as the group size grows large. In an application in which the projects entail information collection and sharing within the group, stability requires more similarity among extremists than among moderate individuals. (JEL D03, D71, D82, D83).
Recent studies have provided conflicting evidence on the relationship between school desegregation and white enrollment stability. Pettigrew and Green (1976), Farley (1975), Rossell (1976) and Fitzgerald and Morgan (1977) have found desegregation to be unrelated to white enrollment stability. In contrast Bosco and Robin (1974), Lord (1975), Coleman et al. (1975), Munford (1976), Clotfelter (1976) and Giles (1977a; 1977b) have reported declines in white student enrollment concurrent with desegregation. The appropriate query therefore should probably not be whether school desegregation leads to white flight, but instead attention ought to focus on the conditions under which white enrollments decrease as a result of desegregation. Among the most frequently cited correlates of white withdrawal is the level of black concentration. There is disagreement, however, about the structure of this relationship. Some studies have found a linear relationship, whereas others have suggested the presence of a tipping point beyond which white withdrawal accelerates and schools become all-black. The present study reexamines the relationship between percent black enrollment and white enrollment change at both the district and the school levels. The analysis focuses on 60 districts and approximately 1,600 schools located in Southern SMSAs. Higher percent black enrollments are found to be associated with the rate of white withdrawal at both the district and the school levels. In both cases this relationship appears to be curvilinear with white withdrawals increasing exponentially with black enrollments over 30%.
Longitudinal sociometric data on adolescent friendship pairs, friends-to-be, and former friends are examined to assess levels of homophily on four attributes (frequency of current marijuana, use, level of educational aspirations, political orientation, and participation in minor delinquency) at various stages of friendship formation and dissolution. In addition, estimates are developed of the extent to which observed homophily in friendship dyads results from a process of selection (assortative pairing), in which similarity precedes association and the extent to which it results from a process of socialization in which association leads to similarity. The implications of the results for interpreting estimates of peer influence derived from cross-sectional data are discussed.