ArticlePDF Available

Simpler grammar, larger vocabulary: How population size affects language

The Royal Society
Proceedings of the Royal Society B
Authors:

Abstract and Figures

Languages with many speakers tend to be structurally simple while small communities sometimes develop languages with great structural complexity. Paradoxically, the opposite pattern appears to be observed for non-structural properties of language such as vocabulary size. These apparently opposite patterns pose a challenge for theories of language change and evolution. We use computational simulations to show that this inverse pattern can depend on a single factor: ease of diffusion through the population. A population of interacting agents was arranged on a network, passing linguistic conventions to one another along network links. Agents can invent new conventions, or replicate conventions that they have previously generated themselves or learned from other agents. Linguistic conventions are either Easy or Hard to diffuse, depending on how many times an agent needs to encounter a convention to learn it. In large groups, only linguistic conventions that are easy to learn, such as words, tend to proliferate, whereas small groups where everyone talks to everyone else allow for more complex conventions, like grammatical regularities, to be maintained. Our simulations thus suggest that language, and possibly other aspects of culture, may become simpler at the structural level as our world becomes increasingly interconnected.
Content may be subject to copyright.
rspb.royalsocietypublishing.org
Research
Cite this article: Reali F, Chater N,
Christiansen MH. 2018 Simpler grammar,
larger vocabulary: How population size affects
language. Proc. R. Soc. B 285: 20172586.
http://dx.doi.org/10.1098/rspb.2017.2586
Received: 16 November 2017
Accepted: 2 January 2018
Subject Category:
Neuroscience and cognition
Subject Areas:
computational biology, evolution, cognition
Keywords:
cultural evolution, language change, social
structure, population size, language complexity
Author for correspondence:
Morten H. Christiansen
e-mail: christiansen@cornell.edu
Electronic supplementary material is available
online at https://dx.doi.org/10.6084/m9.
figshare.c.3971847.
Simpler grammar, larger vocabulary: How
population size affects language
Florencia Reali1, Nick Chater2and Morten H. Christiansen3,4
1
Department of Psychology, Universidad de los Andes, G230, Cra. 1 Nro. 18A-12, Bogota
´11001000, Colombia
2
Behavioural Science Group, Warwick Business School, University of Warwick, Coventry CV4 7AL, UK
3
Department of Psychology, Cornell University, Uris Hall, Ithaca, NY 14853, USA
4
The Interacting Minds Centre and School for Culture and Communication, Aarhus University, 8000 Aarhus,
Denmark
FR, 0000-0003-3524-3873; NC, 0000-0002-9745-0686; MHC, 0000-0002-3850-0655
Languages with many speakers tend to be structurally simple while small
communities sometimes develop languages with great structural complexity.
Paradoxically, the opposite pattern appears to be observed for non-structural
properties of language such as vocabulary size. These apparently opposite
patterns pose a challenge for theories of language change and evolution.
We use computational simulations to show that this inverse pattern can
depend on a single factor: ease of diffusion through the population. A popu-
lation of interacting agents was arranged on a network, passing linguistic
conventions to one another along network links. Agents can invent new con-
ventions, or replicate conventions that they have previously generated
themselves or learned from other agents. Linguistic conventions are either
Easy or Hard to diffuse, depending on how many times an agent needs to
encounter a convention to learn it. In large groups, only linguistic conventions
that are easy to learn, such as words, tend to proliferate, whereas small groups
where everyone talks to everyone else allow for more complex conventions,
like grammatical regularities, to be maintained. Our simulations thus suggest
that language, and possibly other aspects of culture, may become simpler at
the structural level as our world becomes increasingly interconnected.
1. Introduction
It has often been observed [14] that the properties of human languages appear
to be influenced by the size and degree of isolation of the linguistic community.
Small, isolated linguistic communities often develop languages with great
structural complexity, elaborate and opaque morphology, rich patterns of
agreement and many irregularities [1– 5], and it has been argued that such
‘mature’ features of languages require long interactions in small, close-knit
societies [68]. By contrast, languages with large communities of speakers,
such as Mandarin or English, appear to be structurally simpler. Language com-
positionality has been shown to be inversely correlated to irregularities and
nonlinear morphology [3]: regular languages are more frequent in large-sized
communities, while irregular, morphologically complex languages tend to
arise in small-sized ones. Computer simulations have shown that linguistically
‘marked’, and hence complex, patterns arise more easily in small populations
[9,10] and that compositional structures tend to emerge more extensively for
larger groups [11]. The causal role of the size of the linguistic community is,
moreover, further indicated by the historical tendency towards structural
simplification as a language gains an ever-larger community of speakers [12].
But an apparently opposite pattern appears to be observed in relation to
non-structural properties language: languages with large linguistic commu-
nities tend to have larger vocabularies of content words. For example, the
vocabulary of wide-spread languages, such as English, appears to have
&2018 The Author(s) Published by the Royal Society under the terms of the Creative Commons Attribution
License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original
author and source are credited.
grown rapidly in historical times, and is typically estimated
to have many hundreds of thousands of words, including
those with highly specialized and technical meanings [13].
Despite their frequent spectacular structural complexity,
languages spoken by small bands of hunter gatherers are
typically assumed to have smaller vocabularies, although
reliable data for such languages are difficult to gather [14].
An analysis of Polynesian languages indicates, moreover,
that larger linguistic communities both create more new
words and lose fewer existing words over time [15]. These
contrasting patterns pose a challenge for theories based on
the cultural evolution of language. Recently, theorists have
suggested the erosion of complexity in larger language com-
munities arises from the greater proportion of second
language learners [1,16]. But why do such arguments for sim-
plification not also apply to the lexicon?
One possibility is that structural and lexical aspects of
language might diffuse through different mechanisms. For
example, adult– child interactions might be the primary vehicle
for regularizing morphology or syntax (see [17,18] for contrast-
ing perspectives) and adultadult interactions might be the
primary vehicle for lexical innovations. Moreover, there may
be differential impactsof language contact on structural andlex-
ical aspects of language: lexical items diffuse across languages
more readily [19]. Such effects might be amplified to the
extent that structural and lexical aspects of language share a
fixed communicative burden, so that, for example, simple
morphology must be compensated by a larger vocabulary.
While such factors may play a role, we focus on a more par-
simonious alternative: that the opposite relationships between
population size and lexical versus structural complexity
depend on a single parameter: ease of diffusion. Structural
aspects of language diffuse slowly because they are difficult to
learn, are absorbed slowly and piecemeal by first language
learners, and often present persistent challenges for second
languagelearners [20]. Words, by contrast, can oftenbe acquired
from just a few exposures [21]. An account based on ease-of-
learning suggests that an increasing population of speakers
should, even within the category of words, lead to an increasing
prevalence of easy-to-learn words (e.g. concrete words) over
hard-to-learn words (e.g. abstract words). A recent corpus
analysis of two centuries of American English does, indeed,
show an increasing proportion of concrete words [22].
To illustrate this scenario, we divide properties of language,
as a first approximation, into two basic categories—Easy and
Hard—requiring, respectively, few or many exposures to be
acquired by a new speaker. Easy properties of the language
can rapidly be transmitted across the linguistic community.
As the community grows in size, so does the number of
members who can spontaneously modify or invent new
Easy properties (such as lexical items) that can diffuse across
the community. Hence, large communities will end up with
large inventories of Easy features. Conversely, in large linguis-
tic communities, speakers will have minimal interactions with
many other speakers, so that typical interactions between indi-
viduals will be too limited to transmit the Hard linguistic
property successfully.
If correct, this simple mechanism should apply to cultural
evolution more broadly. Indeed, new and structurally complex,
and difficult to acquire, cultural forms develop in small, tight-
knit communities who interact intensely, as in the birth of
bebop in 1940sNew York [23], or thelindy hop at the Savoy Ball-
room in 1930s Harlem [24]. By contrast, mass cultural forms
tend to be structurally simple and easily learned. For example,
we are now exposed to, and recall, a huge number of popular
tunes; but most are harmonically and melodically simple; and
statistical analysis suggests that modern popular music appears
to be gradually getting simpler over the decades [25].
Can these intuitions be made precise by computer simu-
lation? Building on prior preliminary work [26], we created
a novel innovate-and-propagate (IAP) process, operating
over populations of simulated agents. Agents are arranged
on a network, so that agents connected by a link on the net-
work can ‘converse’ and hence, potentially pass linguistic
‘conventions’ to one other. Each agent is not only able to
‘invent’ entirely new conventions but can also replicate con-
ventions that they have previously generated themselves or
learned from other agents (i.e. agents to which they are con-
nected by links in the network). When an agent produces a
convention (whether novel or a replication), it propagates
that convention to one of its neighbours.
Our simulations show that the size of the network can
potentially have opposite effects on the richness of different
aspects of the language. A simple quantitative change—the
ease of learning of an item—responds qualitatively in entirely
different ways to population size. Linguistic innovations that
are relatively easy to learn (such as new lexical items or modifi-
cations to existing ones) increase in number as a linguistic
community grows, because the number of potential innovators
increases and innovations can spread more rapidly. By contrast,
small linguistic communities favour linguistic innovations that
are hard to learn (such as, we suggest, structural changes in
the language), because they require multiple interactions
between individual speakers for their continued existence.
2. Simulations
(a) The model
To capture the dynamics of individuals interacting with one
another, either conversing by way of old conventions or
inventing new ones, we use a modified version of the Chinese
restaurant process [27], which we call the IAP process. The
Chinese restaurant process is a widely used probabilistic
model defining the frequency distribution over a potentially
limitless number of types (e.g. linguistic conventions,
words, categories). It embodies the assumption that the
‘rich-get-richer’—the probability of a token of an existing
type is proportional to its current frequency (i.e. the chance
of the new diner sitting down at a given table is proportional
to the number of diners already at that table), while also
allowing the creation of new types (i.e. a diner being seated
at a previously unoccupied table).
In our extension to the IAP, we view each agent as
corresponding to a ‘restaurant with a finite, but infinitely
extendable, number of ‘tables’, i.e. conventions. Each time
the agent generates a convention, it chooses an existing con-
vention with a probability proportional to the number of
previous tokens of that convention; this is equivalent to seating
each new customer in the restaurant at a table in proportion to
the number of customers already seated at that table. But it is
also possible that an entirely novel convention will be gener-
ated (a new table in the restaurant is created, and the new
customer becomes the first person sitting at that table). This
occurs with probability 1/(Mþ1) (where Mis the number
of current restaurant customers stored tokens).
rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20172586
2
As described thus far, each agent generates conventions
entirely independently, not sharing those conventions with
the rest of the linguistic community. IAP introduces a
simple extension of the Chinese restaurant process to deal
with this. At each iteration, every agent ‘utters’ a convention
and passes it to a randomly chosen immediate neighbour. For
each agent, the probability of generating an existing conven-
tion is determined by the sum of the number of times that it
has, itself, previously generated that convention added to the
sum of the number of times it has received that convention
from an immediately neighbouring agent ( provided the
agent has already learnt that convention). Thus, in this
model, agents tend not merely to generate what they have
generated before; but also to generate what they have
‘heard’ (and learned) from neighbouring agents.
As the simulation progresses, agents will invent conven-
tions and pass them on to each other. Thus, initially the
number of conventions used by the agents (i.e. the complex-
ity of the language) will gradually increase. However, the
number of conventions is limited by restrictions in cultural
transmission. Two versions of information transmissions are
implemented. In the horizontal transmission version, conven-
tions are passed among immortal but forgetful peers. Each
time an agent picks a convention (new or old), then, for
each of the M convention tokens that it currently stores,
there is a probability that this token is forgotten. In the verti-
cal transmission version, peers eventually ‘die-off’ (their
convention repertoire disappearing with them) and are
replaced by new peers who are initially ‘blank slates’.
So far, we have not distinguished between Easy conventions
(which can be learned from another agent by minimal
exposure—these correspond to lexical items) and Hard conven-
tions (which require multiple exposures—these correspond to
structural properties of the language). To get started, we make
the simplest of distinctions between them: Easy conventions
can be learned by an agent from a single exposure. Once a con-
vention has been generated by a neighbour, an agent can
immediately generate that convention. Hard conventions can
only be learned from two exposures: only when an agent has
encountered two examples of the exact same convention from
its neighbours (whether from the same or different neighbour),
will this convention be seated at a new table (representing that
convention in the agent).
(b) Networks
Agents are represented as nodes in a non-directed graph
(one in which edges have no orientation) and links between
neighbouring agents are represented by edges between
nodes. Networks are characterized by three parameters: n
is the number of nodes (i.e. the population size), kis the
mean nodal degree (i.e. the number of links (neighbours)
that an agent can communicate with, averaged across
agents) and Cis the clustering coefficient (i.e. a measure of
the degree to which nodes (agents) in a graph tend to cluster
together).
The structure of our networks is inspired by real social
networks, based on recent work finding quantitative relations
between city size and the structure of human interaction net-
works from mobile communication records in Portugal and
the UK [28]. Mobile phone communication has been argued
to be a reliable proxy for the strength of individual-based
social interactions [29]. The results in [28] revealed that
the number of average contacts per mobile phone (nodal
degree, k) grows superlinearly with city population size,
according to the well-defined scaling relation: kn
b
21
,
where nis population size. These results fit prior theoretical
work suggesting that superlinear scaling stems from the
nature of human interactions [30]. Interestingly, the prob-
ability that an individual’s contacts are also connected
with each other—i.e. the clustering coefficient C—remained
constant (C0.25) across city sizes [28].
(c) Network sampling
Twenty-five networks were sampled using the method devel-
oped in [31], which consists of a graph Hamiltonian that
allows the creation of random networks close to specified
nodal degree and clustering coefficient values. Sampling con-
verges to networks with desired specified connectivity (details
on the algorithm and implementation can be found in [31]).
For sampling, values of kand Cwere set so that they matched
real social networks described in [28]. For each value of popu-
lation size (n¼30, 50, 100, 200 and 500), five networks were
sampled using a target value of kso that kn
b
21
,where
b
was set to a constant value of 1.677 for all population sizes
n, yielding target mean degree kof 10, 14.1, 22.6, 36.1 and
67.1 for population sizes n¼30, 50, 100, 200 and 500, respect-
ively. Note that, as nincreases, so does the number of
neighbours that an agent has on average. The value of
b
was set to be the minimum so that an agent has (at least)
10 neighbouring agents for the smallest population size (n¼
30). The target value of Cwas set to a constant of 0.25
across all sampling—i.e. the invariable value in real social
networks, regardless of population size [28].
Twenty networks were sampled each run (one for each
population size n), from which five were selected that had
parameters close to the target values. Results are shown in
table 1. All simulations were implemented using R [32].
(d) Implementation
A single run of our simulation is composed of many iter-
ations. On a given iteration, each agent ‘utters’ one
convention to one of its neighbours, who is randomly
picked from the set of all its neighbours in the graph. The
convention produced by the agent can be either part of its
repertoire (conventions that have been previously generated
or learned by the agent) or invented anew. Conventions are
divided into two types: Easy and Hard to learn. Each time
an agent ‘invents’ a new convention, that convention is
randomly defined to belong to one of these two categories
with probability 0.5.
We use an extension of the Chinese restaurant stochastic
sampling process to model an agent’s selection of a convention
to generate. The probability of choosing a given convention, c,
is proportional to the number of ctokens that it has previously
generated or heard from its neighbours. More precisely,
the probability of selecting an already used convention is
defined as
Pðconvention ¼cÞ¼ tc
Mþ1,ð2:1Þ
where t
c
is the number of tokens of convention cthat are
part of the agent’s repertoire and Mis the number of
convention tokens that the agent has stored in memory, thus
rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20172586
3
Pt
c
¼M. The probability of inventing a convention anew is
defined as
Pðconvention ¼anewÞ¼ 1
Mþ1:ð2:2Þ
The value of Mincreases over subsequent iterations. How-
ever, conventions are eventually lost, either by token forgetting
(Poisson forgetting in the horizontal transmission version) or
by death of the agent (vertical transmission version). Poisson
forgetting is defined at the level of tokens. Each time an
agent picks a convention (new or old), then, for each of the
Mconvention tokens that it currently stores, there is a prob-
ability pthat this token is ‘forgotten’. This would imply that,
on average Mptokens are forgotten each time a convention
is updated. Given that each time a new convention token is
generated, 1 new token is added to M, then M will be in bal-
ance when, on average, Mp¼1. Forgetfulness of tokens
captures the idea that cognitive constraints affect the cultural
evolution of language [33].
In the vertical transmission version of the model, each
time an agent conveys a convention, there is a probability p
that an agent ‘dies off’—i.e. all the ‘tokens’ in their ‘restau-
rant’ would disappear. That location in the network would
still exist, but is completely cleared, and the ‘dead agent’ is
just replaced by a ‘blank slate’ new agent at the same location
in the network (like being born into the social network).
Agents can learn conventions from neighbours. The learned
convention becomes part of the agent’s repertoire and can be
sampled during its own production. In the current simulations,
Easy conventions are defined as those that are learned from only
a single exposure, whereas Hard conventions require at least
two exposures to be learned.
When an agent uses a convention, to ‘communicate’ with its
neighbours, what is the probability that this communication will
be successful? We take ‘successful’ communication to imply
only that the ‘receiving’ agent also knows that same
convention. We are interested in determining the number of
Easy and Hard conventions that are successfully used at the
population level. Thus, a convention is considered ‘successful’
when it has been learned or generated by one of the agent’s
neighbours at some point across iterations. Additionally, to get
a better sense of successful communication, we measured the
proportion of neighbours that share each agent’s conventions.
For vertical and horizontal transmission, five separate
runs of 1000 iterations were carried out across a range of the
parameters n(population size) and p(probability of forgetting
or dying off). At the end of each run,three measures were taken
and compared as a function of population size: (i) the (absolute
and relative) number of Easy and Hard successful conventions
that remained part of the agents memory (tables in the restau-
rant), (ii) the (absolute and relative) number of Easy and Hard
conventions that remained part of the memory of at least 10%
of the agents in the population, and (iii) the mean proportion
of neighbours sharing an agent’s conventions—that is, for
each Easy and Hard convention and for each agent, the pro-
portion of neighbouring agents who had that convention as
part of their repertoire was counted. This quantity was averaged
over all conventions-agents.
(e) Results
Absolute and relative values of Hard and Easy conventions
after 1000 iterations are shown in figure 1, reflecting a
general trend towards an increasing frequency of Easy
conventions compared to Hard conventions as the popu-
lation size increases, in both the vertical and horizontal
transmission cases. When the population is small, Hard con-
ventions represent a sizable proportion of the total number
of conventions. As population size increases and the overall
number of conventions grows, the absolute and relative
number of Hard conventions decreases. Both the absolute
and relative patterns remain the same across the different
conditions, suggesting a robust effect of population size on
the proportion of Hard versus Easy to learn conventions.
The predictions of the model are, we stress, qualitative: pre-
dicting a cross-over between the prevalence of Easy and
Hard conventions as population size increases. The popu-
lation size at which the cross-over occurs depends on
parameters, such as the difference between the number of
learning trials required for Easy and Hard items (see
electronic supplementary material, appendix).
3. Discussion
The results suggest that the differential effects of population
size on structural complexity and vocabulary size can be
accommodated within a parsimonious model of cultural trans-
mission constrained by one cognitive constraint: Ease of
Learning. Linguistic innovations that are easy to learn tend
to increase in number as a linguistic community grows,
because the number of potential innovators increases, and
innovations can spread more rapidly. By contrast, small
linguistic communities favour linguistic innovations that are
hard to learn because they require multiple interactions
between individual speakers. It is likely, of course, that
many additional forces have shaped the relative development
of different aspects of linguistic complexity [2]. One factor that
may partly underlie the Easy/Hard distinction considered
Table 1. Graph connectivity properties: mean connectivity values averaged across the five graphs selected for each value of population size, n¼30, 50, 100,
200 and 500 (s.d., standard deviations).
population size
n
mean
b
in
k5n
b21
mean nodal
degree k
nodal degree,
s.d.
mean clustering
coefficient, C
clustering
coefficient, s.d.
30 1.676 9.9 0.08 0.251 0.007
50 1.685 14.8 0.2 0.256 0.002
100 1.684 23.4 0.089 0.242 0.001
200 1.681 36.9 0.9 0.250 0.001
500 1.655 62.4 2.2 0.246 0.009
rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20172586
4
here concerns the degree to which properties of language can
be learned independently. Perhaps an additional reason that
learning a lexical item is relatively easy is that word meanings
can, to a considerable degree, be learned independently of one
another. By contrast, structural aspects of language may inter-
lock in more complex ways, making the propagation of such
linguistic innovations more difficult.
More broadly, it is interesting to speculate whether other
aspects of cultural evolution may be subject to the pressures
described here. For example, perhaps an increase in commu-
nity size might be associated with a reduction in the
prevalence of complex dances, music, rituals, myths or reli-
gious beliefs, but an increase in the prevalence of simpler
variants (we leave aside skills relevant to survival, such as
tool use, whose diffusion will depend on objective measures
of efficacy, as well as direct person-to-person contact
[34 36]). Of course, such effects may, to some extent, be
counteracted by the ability of people to self-assemble into
10–1
1
10
102
103
no. active conventions
population size
0
10
20
30
40
50
60
70
80
90
100
% of active conventions
population size
10–1
1
10
102
103
30 50 100 200 500
30 50 100 200 500
conventions shared by 10% or more
population size
0
10
20
30
40
50
60
70
80
90
100
30 50 100 200 500
30 50 100 200 500
% conventions shared by 10% or more
population size
0
0.2
0.4
0.6
0.8
1.0
30 50 100 200 500
proportion of neighbours sharing
conventions
p
o
p
ulation size
EASY; horizontal; p= 1/200
HARD; horizontal; p= 1/200
EASY; vertical; p= 1/500
HARD; vertical; p= 1/500
EASY; vertical; p= 1/200
HARD; vertical; p= 1/200
EASY; horizontal; p= 1/500
HARD; horizontal; p= 1/500
(a)(b)
(c)(d)
(e)
Figure 1. Panels (aand b) display the results corresponding to the average number of successful conventions per agent—that is, conventions in the agent’s
repertoire that can be understood by at least one of its neighbours. Panels (cand d) display the results corresponding to the average number of conventions
that are shared by at least 10% of the population. Left panels display absolute numbers (aand c), and right panels display relative proportions (band d)of
conventions after 1000 iterations, obtained for increasing values of population size (displayed in the x-axis). Panel (e) displays the mean proportion of neighbours
that share an agent’s convention, averaged across all convention-agents. Blue lines correspond to Easy conventions, and red lines correspond to Hard conventions.
Dashed lines correspond to results of the horizontal transmission version (circles correspond to the agent’s probability of Poisson forgetting p¼1/500, while squares
correspond to a probability of p¼1/200). Solid lines correspond to results of the vertical transmission model (circles correspond to the agent’s probability of dying-
off p¼1/500, while squares correspond to p¼1/200).
rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20172586
5
small specialist groups whether face-to-face or virtual, and
formal (educational institutions) or informal (salons, discus-
sions groups, artistic movements), to innovate and propagate
cultural forms of high complexity. In the absence of the abil-
ity for people to self-organize in this way, our simulations
raise the possibility that language and culture might
become unrelentingly simpler, at the structural level, as
human societies become increasingly interconnected.
Data accessibility. All source code, data and results are available from:
https://github.com/mhchristiansen/lang-paradox.
Authors’ contributions. F.R., N.C. and M.H.C. conceived and designed
the study. F.R. conducted the simulations and analyses and wrote
the first draft of the paper. N.C. and M.H.C. edited the paper. All
authors gave final approval for publication.
Competing interests. The authors have no competing interests.
Funding. N.C. was supported by ERC grant no. 295917-RATION-
ALITY, the ESRC Network for Integrated Behavioural Science
(grant no. ES/K002201/1), the Leverhulme Trust (grant no.
RP2012-V-022) and Research Councils UK Grant EP/K039830/1.
Acknowledgements. We thank Daniel Nettle and an anonymous reviewer
for valuable comments on this work.
References
1. Lupyan G, Dale R. 2010 Language structure is partly
determined by social structure. PLoS ONE 5, e8559.
(doi:10.1371/journal.pone.0008559)
2. Trudgill P. 2011 Sociolinguistic typology: social
determinants of linguistic structure and complexity.
Oxford, UK: Oxford University Press.
3. Wray A, Grace GW. 2007 The consequences of
talking to strangers: evolutionary corollaries of
socio-cultural influences on linguistic form.
Lingua 117, 543 578. (doi:10.1016/j.lingua.2005.
05.005)
4. Nettle D. 2012 Social scale and structural complexity
in human language. Phil. Trans. R. Soc. B 367,
18291836. (doi:10.1098/rstb.2011.0216)
5. Haspelmath M, Dryer M, Gil D, Comrie B. 2008 The
world atlas of language structures online. Munich,
Germany: Max Planck Digital Library.
6. Wohlgemuth J. 2010 Language endangerment,
community size and typological rarity. In Rethinking
universals: how rarities affect linguistic theory (eds J
Wohlgemuth, M Cysouw), pp. 255– 277. Berlin,
Germany: De Gruyter.
7. Trudgill P. 2015 Societies of intimates and linguistic
complexity. In Language structure and environment:
social, cultural, and natural factors (eds R de Busser,
RJ LaPolla), pp. 133147. Amsterdam, The
Netherlands: John Benjamins.
8. Nettle D. 1999 Using social impact theory to
simulate language change. Lingua 108, 95117.
(doi:10.1016/S0024-3841(98)00046-1)
9. Nettle D. 1999 Is the rate of linguistic change
constant? Lingua 108, 119– 136. (doi:10.1016/
S0024-3841(98)00047-3)
10. Sampson G, David Gil, Peter Trudgill (eds). 2009
Language complexity as an evolving variable. Oxford,
UK: Oxford University Press.
11. Vogt P. 2007 Group size effects on the emergence
of compositional structures in language. In
Advances in Artificial Life (eds F Almeida e Costa,
LM Rocha, E Costa, I Harvey, A Coutinho). ECAL
2007. Lecture Notes in Computer Science, vol.
4648. Berlin, Germany: Springer.
12. MacWhorter J. 2002 What happened to English?
Diachronica 19, 217– 272. (doi:10.1075/dia.19.2.
02wha)
13. Goulden R, Nation P, Read J. 1990 How large can
a receptive vocabulary be? Appl. Linguist. 11,
341363. (doi:10.1093/applin/11.4.341)
14. Pawley A. 2006 On the size of the lexicon in
preliterate language communities: comparing
dictionaries of Australian, Austronesian and Papuan
languages. In Favete linguis: studies in honour of
viktor krupa (eds J Genzor, M Buckov), pp. 171–
191. Bratislava, Slovakia: Institute of Oriental
Studies.
15. Bromham L, Hua X, Fitzpatrick TG, Greenhill SJ.
2015 Rate of language evolution is affected by
population size. Proc. Natl Acad. Sci. USA 112,
20972102. (doi:10.1073/pnas.1419704112)
16. Dale R, Lupyan G. 2012 Understanding the origins
of morphological diversity: the linguistic niche
hypothesis. Adv. Complex Syst. 15, 1150017. (doi:10.
1142/S0219525911500172)
17. Lightfoot D. 1999 The development of language:
acquisition, change, and evolution. Oxford, UK:
Blackwell.
18. Bybee J. 2015 Language change. Cambridge, UK:
Cambridge University Press.
19. King R. 2000 The lexical basis of grammatical
borrowing. Amsterdam, The Netherlands:
Benjamins.
20. Clahsen H, Felser C, Neubauer K, Sato M, Silva R.
2010 Morphological structure in native and
nonnative language processing. Lang. Learn.
60, 2143. (doi:10.1111/j.1467-9922.2009.
00550.x)
21. Trueswell JC, Medina TN, Hafri A, Gleitman LR. 2013
Propose but verify: fast mapping meets cross-
situational word learning. Cogn. Psychol.
66, 126 156. (doi:10.1016/j.cogpsych.2012.10.001)
22. Hills TT, Adelman JS. 2015 Recent evolution of
learnability in American English from 1800 to 2000.
Cognition 143, 87– 92. (doi:10.1016/j.cognition.
2015.06.009)
23. DeVeaux SK. 1997 The birth of bebop: a social and
musical history. Berkeley, CA: University of California
Press.
24. Miller N. 1996 Swingin’ at the Savoy: the memoir of
a jazz dancer. Philadelphia, PA: Temple University
Press.
25. Serra
`J, Corral A
´, Bogun
˜a
´M, Haro M, Arcos JL. 2012
Measuring the evolution of contemporary western
popular music. Sci. Rep. 2, 521. (doi:10.1038/
srep00521)
26. Reali F, Chater N, Christiansen M. 2014 The paradox of
linguistic complexity and community size. In The
evolution of language (eds EA Cartmill, S Roberts,
H Lyn, H Cornish), pp. 270– 277. Singapore: World
Scientific.
27. Pitman J. 2006 Combinatorial stochastic processes.
Berlin, Germany: Springer.
28. Schla
¨pfer M, Bettencourt LMA, Grauwin S, Raschke M,
Claxton R, Smoreda Z, West GB, Ratti C. 2014 The
scaling of human interactions with city size. J. R. Soc.
Interface 11, 20130789. (doi:10.1098/rsif.2013.0789)
29. Saramaki J, Leicht EA, Lo
´pez E, Roberts SGB, Reed-
Tsochas F, Dunbar RIM. 2014 Persistence of social
signatures in human communication. Proc. Natl
Acad. Sci. USA 111, 942 947. (doi:10.1073/pnas.
1308540110)
30. Bettencourt LMA. 2013 The origin of scaling in
cities. Science 340, 14381441. (doi:10.1126/
science.1235823)
31. House T. 2014 Heterogeneous clustered random
graphs. Europhys. Lett. 105, 68006. (doi:10.1209/
0295-5075/105/68006)
32. R Development Core Team. 2008 R: a language and
environment for statistical computing. Vienna,
Austria: R Foundation for Statistical Computing.
33. Christiansen MH, Chater N. 2008 Language as
shaped by the brain. Behav. Brain Sci. 31,
489558. (doi:10.1017/S0140525X08004998)
34. Henrich J. 2004 Demography and cultural evolution:
why adaptive cultural processes produced
maladaptive losses in Tasmania. Am. Antiq. 69,
197214. (doi:10.2307/4128416)
35. Powell A, Shennan S, Thomas MG. 2009 Late
Pleistocene demography and the appearance of
modern human behavior. Science 324, 12981301.
(doi:10.1126/science.1170165)
36. Vaesen K, Collard M, Cosgrove R, Roebroeks W.
2016 Population size does not explain past
changes in cultural complexity. Proc. Natl Acad.
Sci. USA 113, E2241E2247. (doi:10.1073/pnas.
1520288113)
rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20172586
6
... Probability-based models (e.g. Reali, Chater, & Christiansen, 2018) keep a probability for every form, which represents how likely that form is to be produced or heard. This probability, depending on the model specifics, gets updated by linguistic interaction. ...
... And is the model implemented by staying close to the details of a specific language or by modelling an abstraction of language? Abstract models have been proposed for several mechanisms in the language evolution community (Dale & Lupyan, 2012;de Bie & de Boer, 2007;Lou-Magnuson & Onnis, 2018;Reali et al., 2018), while in linguistics, more language-specific agent-based models have been proposed (Chirkova & Gong, 2014;Hundt, Van Driessche, & Pijpops, 2022;Landsbergen, Lachlan, ten Cate, & Verhagen, 2010;Mudd, de Vos, & de Boer, 2022;Pijpops, 2022;Sevenants & Speelman, 2021;van Trijp, 2013). ...
... These models can be used to explore hypotheses, which can be subsequently further validated using empirical methods (de Boer, 2012;Gong, Shuai, & Zhang, 2014). Agent-based models have been used before to study the influence of social factors (like population size and language contact) on linguistic structure, usually in more abstract settings (Dale & Lupyan, 2012;de Bie & de Boer, 2007;Lou-Magnuson & Onnis, 2018;Reali et al., 2018). The models have also been applied to more concrete case studies of contact-induced morphological and phonological change, such as German case syncretism (van Trijp, 2013), convergence of the phonological inventory in Duoxu (Tibeto-Burman) (Chirkova & Gong, 2019), contact-induced morphological change in Mozambican Portuguese (Jon-And & Aguilar, 2019) and propagation of lexical items in the creole Sranan in Suriname (Cheung, Yakpo, & Coupé, 2022) 2 . ...
Thesis
Full-text available
In this thesis, I study how languages change in situations where languages or groups of speakers are in contact with each other. As language change is inherently caused by interaction between individuals, I use a technique from multi-agent Artificial Intelligence (AI) that puts the interaction of individuals central: agent-based computer simulations. I apply these agent-based models to specific case studies of language change in the real world. The goal of the thesis is two-fold: getting a better view of the mechanisms behind language change and studying how computational methods work on real-world problems with small amounts of data. I present three different computer models, which each answer a particular linguistic question given a specific case study or dataset. In my first model, I study how language contact can make languages simplify, using a case study of Alorese, a language in Eastern Indonesia. By integrating data from the language into an agent-based model, I study if the phonotactics of the language -- the allowed structure of sounds following each other -- could play a role in simplification. In my second model, I investigate if mechanisms in conversations could be a factor in language change. Using an agent-based model, I show how speakers influencing each other's linguistic choices in conversations can under certain circumstances, lead to spread of an innovative form In my third model, I investigate what could be a cognitively realistic computer model for the `brain' of the speakers, that could be used in an agent-based simulation. I develop a neural network model, based on a technique called Adaptive Resonance Theory, which has as its task to cluster verbs that conjugate in the same way into groups. The model is able to learn the systems of verbs of languages from different families while being interpretable: it is possible to visualise to which parts of the words the network attends. Together, the three models show how different mechanisms that interact with each other can lead to language change when languages are in contact. The models show how mechanisms working on short timescales, such as on the scale of a conversation, can cause effects in the longer term, leading to language change. At the same time, this thesis gives insights for the development of communication in multi-agent AI systems, especially when there are multiple types of agents, as is the case in language contact situations.
... A language's lexicon expresses thousands of conceptual distinctions, from feelings to foods. An active area of investigation in the study of language evolution concerns the relationship between the size and conceptual structure of a community's lexicon and aspects of their cultural context (Regier, Carstensen, & Kemp, 2016;Reali, Chater, & Christiansen, 2018;Tria, Galantucci, & Loreto, 2012). For example, the argument structure for verb constructions involving "give", "take", "borrow," or "promise" encode high-level relational templates for common types of interactions between agents (Goldberg, 2019). ...
Article
Full-text available
A large program of research has aimed to ground large-scale cultural phenomena in processes taking place within individual minds. For example, investigating whether individual agents equipped with the right social learning strategies can enable cumulative cultural evolution given long enough time horizons. However, this approach often omits the critical group-level processes that mediate between individual agents and multi-generational societies. Here, we argue that interacting groups are a necessary and explanatory level of analysis, linking individual and collective intelligence through two characteristic feedback loops. In the first loop, more sophisticated individual-level social learning mechanisms based on Theory of Mind facilitate group-level complementarity, allowing distributed knowledge to be compositionally recombined in groups; these group-level innovations, in turn, ease the cognitive load on individuals. In the second loop, societal-level processes of cumulative culture provide groups with new cognitive technologies, including shared language and conceptual abstractions, which set in motion new group-level processes to further coordinate, recombine, and innovate. Taken together, these cycles establish group-level interaction as a dual engine of intelligence, catalyzing both individual cognition and cumulative culture.
... The ILM has also been discussed in a broader Bayesian framework, [9,10] and this has been applied to populations [11] and to cultural evolution [12,13]. Similar models of language development include the talking heads experiment, [14], the seeded ILM [15], the neural iterated learning model, [16], and an extension of what is called the restaurant process, [17], to language [18]. These modeling-based approaches also have a counterpart in participant studies using artificial mini-languages, where similar issues of expressivity, compositionality and stability are also considered, see [19][20][21] for reviews. ...
Preprint
Full-text available
The iterated learning model is an agent-based model of language change in which language is transmitted from a tutor to a pupil which itself becomes a tutor to a new pupil, and so on. Languages that are stable, expressive, and compositional arise spontaneously as a consequence of a language transmission bottleneck. Previous models have implemented an agent's mapping from signals to meanings using an artificial neural network decoder, but have relied on an unrealistic and computationally expensive process of obversion to implement the associated encoder, mapping from meanings to signals. Here, a new model is presented in which both decoder and encoder are neural networks, trained separately through supervised learning, and trained together through unsupervised learning in the form of an autoencoder. This avoids the substantial computational burden entailed in obversion and introduces a mixture of supervised and unsupervised learning as observed during human development.
... (Gil 2009: 32-33) Gil's assessment is in line with the observation that, whereas languages with many speakers, like English, tend to have simpler grammars, small communities may create grammars of baffling complexity. On the other hand, the opposite pattern has been observed for vocabulary size (Reali et al. 2018). Of course, it stands to reason that large and complex societies require richer vocabularies than simpler societies do. ...
Chapter
Full-text available
Human societies are organised on three main levels. On the ground level, we engage in person-to-person interactions and treat each other as having normative statuses: commitments and entitlements. On the second level, we treat each other as instantiating roles, like 'spouse', 'teacher', 'doctor', etc.; a practice that enables complex forms of labour division. On the third level, we interact with and within social institutions: families , schools, villages, clubs, etc. All these structures serve to coordinate activities, all are normative, and communication is essential throughout: the primary function of language is to negotiate normative statuses, and we communicate with one another to define and allocate roles, as well as to create, maintain, and regulate our interactions with and within institutions. This chapter outlines how communicative practices coevolved with these social structures, with special emphasis on the normativity of human sociality in general and communication in particular.
Article
Full-text available
In this article, we investigate if conversational priming in repetitional responses could be a factor in language change. In this mechanism, an interlocutor responds to an utterance by the other interactant using a repetitional response. Due to comprehension-to-production priming, the interlocutor producing the repetitional response is more likely to employ the same linguistic variant as the interlocutor producing the original utterance, resulting in a double exposure to the variant which, in turn, is assumed to reinforce the original priming effect, making the form more familiar to the repeating interlocutor. An agent-based model, with interactions shaped as conversations, shows that when conversational priming is added as a parameter, interlocutors converge faster on their linguistic choices than without conversational priming. Moreover, we find that when an innovative form is in some way favoured over another form (replicator selection), this convergence also leads to faster spread of innovations across a population. In a second simulation, we find that conversational priming is, under certain assumptions, able to overcome the conserving effect of frequency. Our work highlights the importance of including the conversation level in models of language change that link different timescales.
Article
A central concern of the cognitive science of language since its origins has been the concept of the linguistic system. Recent approaches to the system concept in language point to the exceedingly complex relations that hold between many kinds of interdependent systems, but it can be difficult to know how to proceed when “everything is connected.” This paper offers a framework for tackling that challenge by identifying *scale* as a conceptual mooring for the interdisciplinary study of language systems. The paper begins by defining the scale concept—simply, the possibility for a measure to be larger or smaller in different instances of a system, such as a phonemic inventory, a word's frequency value in a corpus, or a speaker population. We review sites of scale difference in and across linguistic subsystems, drawing on findings from linguistic typology, grammatical description, morphosyntactic theory, psycholinguistics, computational corpus work, and social network demography. We consider possible explanations for scaling differences and constraints in language. We then turn to the question of *dependencies between* sites of scale difference in language, reviewing four sample domains of scale dependency: in phonological systems, across levels of grammatical structure (Menzerath's Law), in corpora (Zipf's Law and related issues), and in speaker population size. Finally, we consider the implications of the review, including the utility of a scale framework for generating new questions and inspiring methodological innovations and interdisciplinary collaborations in cognitive‐scientific research on language.
Article
Full-text available
Nouns and verbs are known to differ in the types of grammatical information they encode. What is less well known is the relationship between verbal and nominal coding within and across languages. The equi-complexity hypothesis holds that all languages are equally complex overall, which entails trade-offs between coding in different domains. From a diachronic point of view, this hypothesis implies that the loss and gain of coding in different domains can be expected to balance each other out. In this study, we test to what extent such inverse coevolution can be observed in a sample of 244 languages, using data from a comprehensive cross-linguistic database (Grambank) and applying computational phylogenetic modelling to control for genealogical relatedness. We find evidence for coevolutionary relationships between specific features within nominal and verbal domains on a global scale, but not for overall degrees of grammatical coding between languages. Instead, these amounts of nominal and verbal coding are positively correlated in Sino-Tibetan languages and inversely correlated in Indo-European languages. Our findings indicate that accretion and loss of grammatical information in nominal words and verbs are lineage-specific.
Preprint
Full-text available
One of the fundamental questions about human language is whether all languages are equally complex. To answer this long-standing question, we conduct a large scale quantitative cross-linguistic analysis of written language by training a language model on more than 6,500 different documents as represented in 41 multilingual text collections consisting of ~3.5 billion words or ~9.0 billion characters and covering 2,069 different languages that are spoken as a native language by more than 90% of the world population or ~46% of all languages that have a standardized written representation. Statistically inferring the entropy of each language-model as an index of (un)predictability/complexity allows us to refute the equi-complexity hypothesis, but also unveils a previously undocumented complexity-efficiency trade-off: high entropy languages are information-theoretically more efficient because they tend to need fewer symbols to encode messages. Our findings additionally contribute to debates about language evolution/diversity by showing that this trade-off is partly shaped by the social environment in which languages are being used.
Article
Full-text available
Significance Archaeologists have long tried to understand why cultural complexity often changed in prehistory. Recently, a series of highly influential formal models have suggested that demography is the key factor. According to these models, the size of a population determines its ability to invent and maintain cultural traits. In this paper, we demonstrate that the models in question are flawed in two important respects: They use questionable assumptions, and their predictions are not supported by the available archaeological and ethnographic evidence. As a consequence, little confidence can be invested in the idea that demography explains the changes in cultural complexity that have been identified by archaeologists. An alternative explanation is required.
Article
Full-text available
Concreteness-the psycholinguistic property of referring to a perceptible entity-enhances processing speed, comprehension, and memory. These represent selective filters for cognition likely to influence language evolution in competitive language environments. Taking a culturomics approach, we use multiple language corpora representing more than 350 billion words combined with concreteness norms for over 40,000 English words and demonstrate a systematic rise in concrete language in American English over the last 200years, both within and across word classes (nouns, verbs, and prepositions). Comparisons between new and old concreteness norms indicate this is not explained by semantic bleaching, but we find some evidence that the rise is related to changes in population demographics and may be associated with increasing numbers of second language learners or attention economics in response to crowding in the language market. We also examine the influence of gender and literacy. In sum, we demonstrate evolution in the psycholinguistic structure of American English, with a well-established impact on cognitive processing, which is likely to permeate modern language use. Copyright © 2015 Elsevier B.V. All rights reserved.
Article
Studies of vocabulary size based on dictionary sampling have faced several methodological problems. These problems occur in trying to answer the following three questions: (I) How do we decide what to count as words? (2) How do we choose what words to test? (3) How do we test the chosen words? The present study attempts to overcome these problems and checks in several ways to see if the problems have been overcome. The results indicate that what were previously thought of as conservative estimates of vocabulary size are likely to be the most accurate. These estimates suggest that well-educated adult native speakers of English have a vocabulary of around 17,000 base words. This represents an acquisition rate of around two to three words per day.