Page 1

Words as alleles: connecting language

evolution with Bayesian learners to

models of genetic drift

Florencia Reali* and Thomas L. Griffiths

Department of Psychology, 3210 Tolman Hall, MC 1650, University of California at Berkeley,

Berkeley, CA 94720-1650, USA

Scientists studying how languages change over time often make an analogy between biological and

cultural evolution, with words or grammars behaving like traits subject to natural selection. Recent

work has exploited this analogy by using models of biological evolution to explain the properties of

languages and other cultural artefacts. However, the mechanisms of biological and cultural evolution are

very different: biological traits are passed between generations by genes, while languages and concepts

are transmitted through learning. Here we show that these different mechanisms can have the same results,

demonstrating that the transmission of frequency distributions over variants of linguistic forms by Bayesian

learners is equivalent to the Wright–Fisher model of genetic drift. This simple learning mechanism thus

provides a justification for the use of models of genetic drift in studying language evolution. In addition

to providing an explicit connection between biological and cultural evolution, this allows us to define a

‘neutral’ model that indicates how languages can change in the absence of selection at the level of linguistic

variants. We demonstrate that this neutral model can account for three phenomena: the s-shaped curve of

language change, the distribution of word frequencies, and the relationship between word frequencies and

extinction rates.

Keywords: language evolution; genetic drift; Bayesian inference; neutral models

1. INTRODUCTION

Natural languages, like species, evolve over time. The

mechanisms of language evolution are quite different

from those underlying biological evolution, with learning

being the primary mechanism by which languages are

transmitted between people. However, accounts of

language evolution often appeal to forces that have ana-

logues in biological evolution, such as selection or

directed mutation. Recent computational work has

emphasized the role of selective forces by focusing on

the consequences of a language for the ‘fitness’ of

its speakers in terms of communication success (Cavalli-

Sforza & Feldman 1983; Hurford 1989; Oliphant 1994;

Komarova & Nowak 2001). Other studies have empha-

sized the effects of differential learnability of competing

linguistic variants, with selection or directed mutation

operating at the level of sounds, words or grammatical

structures (Batali 1998;

Christiansen & Chater 2008). These functional expla-

nations provide an intuitive and appealing account of

language evolution. However, it is possible that the

changes we see in languages over time could be explained

without appealingto such

processes analogous to genetic drift.

Evaluating the role of selective forces in language evol-

ution requires developing neutral models for language

evolution, characterizing how languages can be expected

to change simply as a consequence of being passed from

Pearl& Weinberg2007;

factors,resultingfrom

one learner to another in the absence of selection or

directed mutation. Neutral models have come to play a

significant role in the modern theory of biological evol-

ution, where they account for variation seen at the

molecular level and provide a tool for testing for the

presence of selection (Kimura 1983). The work men-

tioned in the previous paragraph illustrates that there

are at least two levels at which evolutionary forces can

operate in language evolution: at the level of entire

languages (through the fitness of speakers or directed

mutation when languages are passed from one speaker

to another), and at the level of individual linguistic

variants (with particular sounds, words or grammatical

structures being favoured over others by learners). In

this paper, we define a model that is neutral at the level

of linguistic variants, indicating how languages can

change in the absence of selection for particular variants.

Defining a model of language evolution that is neutral

at the level of linguistic variants requires an account

of learning that is explicit about the inductive biases of

learners—those factors that make some variants easier

to learn than others—so that it is clear that these biases

do not favour particular variants. We model learning as

statistical inference, with learners using Bayes’ rule to

combine the clues provided by a set of utterances with

inductive biases expressed through a prior distribution

over languages. We define a neutral model by using a

prior that assigns equal probability to different variants

of a linguistic form. While it is neutral at the level of

variants, this approach allows for the possibility that lear-

ners have more general expectations about the structure

of a language—such as the amount of probabilistic

variation in the language, and the tendency for new

* Author for correspondence (florencia.reali@gmail.com).

Electronic supplementary material is available at http://dx.doi.org/10.

1098/rspb.2009.1513 or via http://rspb.royalsocietypublishing.org.

Proc. R. Soc. B (2010) 277, 429–436

doi:10.1098/rspb.2009.1513

Published online 7 October 2009

Received 21 August 2009

Accepted 21 September 2009

429

This journal is q 2009 The Royal Society

Page 2

variants to arise—that can result in forces analogous to

directed mutation at the level of entire languages.

This allows us to explore the consequences of these

expectations in the context of language evolution, deter-

mining which phenomena can be explained as a result

of high-level inductive biases about the structure of

languages without appealing to selective forces at the

level of linguistic variants.

The basic learning problem we consider is estimation

of the frequencies of a set of linguistic variants. Learning

a language involves keeping track of the frequencies of

variants of a linguistic form at various levels of represen-

tation, including phonology, morphology and syntax.

The assumption that learners estimate the probabilities

of different variants is shared by many researchers in the

area of computational linguistics and sentence processing

(e.g. Bod et al. 2003) and is supported by a growing body

of experimental work in language acquisition (see Saffran

2003 for a review). From this perspective, a learner needs

to estimate a probability distribution over variants. We

assume priors on such distributions that differ in the

amount of variation expected in a language but remain

neutral between variants. We translate this Bayesian

model of individual learning into a model of language

evolution by considering what happens when learners

learn from frequencies generated by other learners. The

resulting process of ‘iterated learning’ (figure 1a) can

be studied to examine the dynamics of language evolution

and the properties of the languages that it produces

(Kirby 2001; Griffiths & Kalish 2007; Kirby et al. 2007).

We show that this simple model of language evolution

with Bayesian learners has two surprising consequences.

First, it is equivalent to a classic neutral model of allele

transmission that is well known in population genetics,

the Wright–Fisher model (Fisher 1930; Wright 1931).

This equivalence involves treating the linguistic variants

as different alleles of a gene, and establishes a mathemat-

ical connection between biological and cultural evolution.

Second, the model reproduces several basic regularities in

the structure and evolution of languages—the s-shaped

curve of language change, the power-law distribution of

word frequencies and the inverse power-law relationship

betweenword frequencies

suggesting that these regularities can be explained without

needing to appeal to forces analogous to selection or

directed mutation at the level of linguistic variants.

The plan of the paper is as follows. First, we introduce

the Bayesian model of individual learning in more detail,

explaining how we define priors corresponding to differ-

ent expectations about the properties of languages.

Next, we show how this model can be used to make

predictions about language evolution via iterated learn-

ing, and outline the connections to the Wright–Fisher

model and the implications of these connections. Finally,

we present a series of analytical results and simulations

illustrating that the model reproduces the three basic

regularities seen in the structure and evolution of

human languages mentioned above.

and extinctionrates—

2. MODELING INDIVIDUAL LEARNING

We model individual learning of frequency distributions

by assuming that our learners use Bayesian inference, a

rational procedure for belief updating that explicitly

represents the expectations of learners (Robert 1997).

Assume that a learner is exposed to N occurrences of a

linguistic form, such as a sound, word or grammatical

construction, partitioned over K different variants. Let

the vectors x ¼ ½x1;x2;...;xK? and u ¼ ½u1;u2;...;uK?

denote the observed frequencies and the estimated prob-

abilities of the K variants, respectively. The learner’s

expectations are expressed in a prior probability distri-

bution, p(u). After seeing the data x, the learner assigns

posterior probabilities p(ujx) specified by Bayes’ rule,

pðujxÞ ¼

pðxjuÞpðuÞ

ÐpðxjuÞpðuÞdu;

ð2:1Þ

where p(xju) is the likelihood, indicating the probability of

observing the frequencies x from the distribution u, being

the probability of obtaining the frequencies in x via N

draws from a multinomial distribution with parameters

u. Each draw is a statistically independent event. The

learner estimates the parameter u from a sample of N

tokens produced by a speaker before producing any utter-

ances himself. The posterior combines the learner’s

expectations—representedby theprior—with the

(a)

hypothesishypothesis

data data data

0

05

generations

1005 1005 10

0.5 1.0

(b) (i)(ii) (iii)

(i)(ii)(iii)

(c)

0 0.5 1.00 0.51.0

Figure 1. (a) Iterated learning: each learner sees data—i.e.

utterances—produced by the previous learner, forms a

hypothesis about the distribution from which the data were

produced, and uses this hypothesis to produce the data that

will be supplied to the next learner. (b) Prior distribution

of u1for the case of two competing variants (K ¼ 2), for

values of (i) a/2 ¼ 0.1, (ii) a/2 ¼ 1, (iii) a/2 ¼ 5. When

a/2 ¼ 1 the density function is simply a uniform distribution.

When a/2 , 1 the prior is such that most of the probability

mass is in the extremes of the distribution, favouring the

‘regularization’ of languages towards deterministic rules.

When a/2 . 1, the learner tends to weight both variants

equally, expecting languages to display probabilistic vari-

ation. (c) The effects of these expectations on the evolution

of frequencies for values of a/2 indicated at the top of each

column. Each panel shows changes in the probability

distribution of one of the two variants (v1) (horizontal axis)

over five iterations (vertical axis). The frequency of v1was

initialized at x1¼ 5 from a total frequency of N ¼ 10.

White cells have zero probability, darker grey indicates

higher probability.

430 F. Reali & T. L. Griffiths

Words as alleles

Proc. R. Soc. B (2010)

Page 3

evidence about the underlying distribution provided by

the observed frequencies.

We select the prior distribution to be neutral between

linguistic variants, with no variant being favoured a

priori over the others. This assumption differs from

other computational models that emphasize the selection

or directed mutation at the level of linguistic variants, as

discussed above. However, being neutral between variants

is not enough to specify a prior distribution: learners can

also differ in their expectations about the amount of

probabilistic variation in a language. For example, lear-

ners facing unpredictable variation may either reproduce

this variability accurately or collapse it towards more

deterministic rules—a process referred to as regularization

(Hudson & Newport 2005; Reali & Griffiths 2009). A

way to capture these expectations, while maintaining

neutrality between variants, is to assume that the prior

is a K-dimensional Dirichlet distribution, a multivariate

generalization of the Beta distribution (Bernardo &

Smith 1994). This is a standard prior used in Bayesian

statistics (Bernardo & Smith 1994). In the context of

language, Dirichlet priors have been recently used in

models of iterated learning (Kirby et al. 2007) and

language acquisition (Goldwater et al. 2009; Reali &

Griffiths 2009).

More formally, we assume that the prior p(u) is a

symmetric K-dimensional Dirichlet distribution with

parameters a/K, giving

pðuÞ ¼

GðaÞ

Gða=KÞK

Y

K

k¼1

ua=K?1

k

;

ð2:2Þ

where G(.) is the generalized factorial function. By using a

distribution that is symmetric we maintain neutrality

between different variants. When K ¼ 2, the prior reduces

to a Beta distribution—denoted as Beta(a/2, a/2). The

use of the same parameter, a/K, for all variants ensures

that the prior does not favour one variant over the

others, with the mean of the prior distribution being

the uniform distribution over variants for all values of a

and K. However, the value of a/K determines the expec-

tations that learners have about probabilistic variation.

When a/K , 1 the learner tends to assign high probability

to one of the K competing variants. This situation reflects

a tendency to regularize languages, with probabilistic vari-

ation being reduced towards more deterministic rules.

When a/K . 1, the learner tends to weight all competing

variants equally, producing distributions closer to the

uniform distribution over all variants (see figure 1b for

examples). Thus, despite the apparent complexity of the

formula, the Dirichlet prior captures a wide range of

biases that are intuitive from a psychological perspective.

Some intuitions for the consequences of using differ-

ent priors can be obtained by considering how they

affect the predictions that learners would make about

probability distributions. Under the model defined

above, the probability that a learner assigns to the next

observation being variant k after seeing xkinstances of

that variant from a total of N is (xkþ a/K)/(N þ a) (see

the electronic supplementary material for details). This

formula captures two aspects of the learners’ behaviour.

First, the probability that the learner assigns to a variant

is approximately proportional to its frequency xk. This

means that individual variants get strengthened by use.

Second, the parameter a/K acts like a number of

additional observations of each variant. The largest

effect of these additional observations will be when

there are no actual observations, with xk¼ 0. In this

case, a learner expecting a more deterministic language

(with a/K small) will assign a very small probability to

the unobserved variant, while a learner expecting prob-

abilistic variation (with a/K large) will assign it a much

higher probability. The prior thus expresses the willing-

ness of learners to consider unobserved variants part of

the language.

This model can be extended to cover learning a distri-

bution over an unbounded set, such as the vocabulary of a

language. In this case, word production can be viewed

intuitively as a cache model: each word in the language

is either retrieved from a cache or generated anew.

Using an infinite-dimensional analogue of the Dirichlet

prior (see the electronic supplementary material for

details), the probability of a variant that occurred with

frequency xkis xk/(N þ a), while the probability of a com-

pletely new variant is a/(N þ a). The parameter a thus

controls the tendency to produce new variants, as

before. There is also a two-parameter generalization of

the infinite-dimensional Dirichlet model, which gives a

variant that occurred with frequency xk probability

(xk2 d)/(N þ a), while the probability of a completely

new variant is (dKþþ a)/(N þ a), where d [ (0,1) is a

second parameter allowing Kþ, the number of variants

for which xk. 0, to influence the probability of producing

a new variant (see the electronic supplementary material

for details).

3. MODELS OF CULTURAL AND BIOLOGICAL

TRANSMISSION

(a) Cultural transmission through

iterated learning

We can now consider the predictions that the model of

individual learning defined in the previous section

makes about language evolution. As shown in figure 1a,

we translate this model of frequency estimation into a

model of language evolution by assuming that each learn-

er not only observes the frequencies of a set of variants

and estimates their probabilities, but then produces a

sample of variants from the estimated distribution, with

this sample providing the frequencies given to the next

learner. This process of iterated learning provides a

simple way to examine the consequences of this form of

cultural transmission in isolation from other factors that

might be involved in language evolution, such as the

pressure to create a shared communication system

(Komarova & Nowak 2001, 2003; Steels 2003) or the

effects of learning from multiple teachers (Niyogi 2006).

This emphasis on a single factor is consistent with our

goal of providing a simple null hypothesis against which

other models canbe compared:

which properties of human languages can be produced by

iterated learning alone, we can begin to understand when

explanations that appeal to other factors are necessary.

More formally, we assume that each learner selects a

hypothesis u based on the observed data, and then gener-

ates the data presented to the next learner—in this case a

frequency vector x—by sampling from the distribution

p(xju) associated with that hypothesis. We take u to be

by identifying

Words as alleles

F. Reali & T. L. Griffiths 431

Proc. R. Soc. B (2010)

Page 4

the distribution used in making predictions about the next

variant, with the probability of vkbeing uk¼ (xkþ a/K)/

(N þ a) (we motivate this choice and discuss the conse-

quences of using other estimators in the electronic

supplementary material). This results in a stochastic pro-

cess defined on estimates u and frequency vectors x. Since

the frequencies generated by a learner depend only on the

frequencies generated by the previous learner, this

stochastic process is a Markov chain. It is possible to

analyse the dynamics of the process by computing a tran-

sition matrix indicating the probability of moving from

one frequency value to another across generations, and

to characterize its asymptotic consequences by identifying

the stationary distribution to which the Markov chain

converges as the number of generations increases (for

other analyses of this kind, see Griffiths & Kalish 2007;

Kirby et al. 2007).

It is straightforward to calculate the transition matrix

on frequencies for particular values of a, K and N,

allowing us to examine how frequencies evolve over

generations. Figure 1c shows the changes in the frequency

of one variant (of two) over five generations, for three

different values of a/K. However, providing a general

analysis of the dynamics and asymptotic consequences of

this process—identifying how languages will change and

what languages will be produced by iterated learning—is

more challenging. In the remainder of this section, we

show that an equivalence between this model of cultural

transmission and a model of biological transmission of

alleles allows us to use standard results from population

genetics to investigate the consequences of iterated

learning.

(b) Biological transmission and

the Wright–Fisher model

The Wright–Fisher model (Fisher 1930; Wright 1931)

describes thebehaviour

random mating in the absence of selection. In the sim-

plest form of the model, we have just two alleles which

have frequencies x1and x2. If the next generation is pro-

duced by a process equivalent to selecting an allele at

random from the previous generation and duplicating it,

then the values of x1in the next generation will come

from a binomial distribution in which N draws are taken

where the probability of v1 is u1¼ x1/N.1Introducing

mutation into the model modifies the value of u1(and

thus implicitly u2¼ 1 2 u1). In the presence of symmetric

mutation, allele types can mutate to any other allele type

with equal probability u. Then x1in the next generation

follows a binomial distribution with N draws and u1¼

((1 2 u)x1þ u(N 2 x1))/N. This generalizes naturally to

K variants, with the frequencies x being drawn from a

multinomial distribution where the probabilities u are

given by uk¼ (xk(1 2 u) þ (N 2 xk) u/(K 2 1))/N.

The Wright–Fisher model with symmetric mutation is

a neutral model, characterizing how the proportions of

alleles change over time as a result of genetic drift and

mutation. This model can be analysed by observing that

it defines a stochastic process on allele frequencies x

and their corresponding probabilities u. This stochastic

process can be reduced to a Markov chain on x or u,

with the transition matrix and stationary distribution

being used to characterize the dynamics and asymptotic

ofallelesevolvingunder

consequences of this process, respectively. Standard

results from population genetics show that the stationary

distribution of the vector u in the K variant case with

mutation is approximated by a Dirichlet distribution

with parameters 2Nu/(K 2 1) (see Ewens 2004 for

details).

The Wright–Fisher model also extends beyond the

case where there are only finitely many allelic variants.

A significant amount of research in population genetics

has been directed at analysing the Wright–Fisher model

with infinitely many alleles (see Ewens 2004). As in the

symmetric version of the finite model, every allele can

mutate to any other allele with equal probability. Since

the number of possible alleles is infinite, the probability

of mutation to an existing allele is negligible. Thus, all

mutants are considered to represent new allelic types

not currently or previously seen in the population. An

analytical expression for the stationary distribution over

frequencies can also be obtained in this case, known as

the Ewens sampling formula (Ewens 1972; see the

electronic supplementary material for details).

(c) Equivalence between cultural and

biological transmission

The Markov chain produced by iterated learning is equiv-

alent to that produced by the Wright–Fisher model of

genetic drift, provided we equate linguistic variants with

alleles. The basic structure of the biological and linguistic

models of the evolution of frequency distributions are the

same: in both cases, a value of u is computed from the fre-

quencies x, and the next value of x is obtained by sampling

from a multinomial distribution with parameter u.

We can show that these two processes are equivalent

by demonstrating that the values taken for u are the

same in the two processes. With K variants, equivalence

toiteratedlearning by Bayesian

shown by taking the mutation rate to be u ¼ (a/(K 2 1)/

(a þ N))K. In this case, the frequency estimate for

allele k is uk¼ (xkþ a/K)/(N þ a), identical to the esti-

mate of u derived in the iterated learning model. Note

that the equivalence holds in general and not for a

restricted set of parameters. This is because for each

value of mutation rate u, there exists values of a and K

satisfying the condition for equivalence. It is easy to see

that the reciprocal is true also. We can thus use results

from population genetics to characterize the dynamics

and stationary distribution of the Markov chain defined

by iterated learning, indicating what kind of languages

will emerge over time.

Other recent work has pointed out connections

between the Wright–Fisher model in population genetics

and theories of language change (Baxter et al. 2006). The

model presented here differs from this previous work in a

number of ways. First, our learning model provides a

way to explicitly relate language change to individual

inductive biases. This allows us to investigate the conse-

quences ofiteratedlearning

parameters of the prior. For example, in the case of K

variants, the stationary distribution of each probability

ukis approximated by a Dirichlet distribution with par-

ameters 2a/K(1 þ a/N). The stationary distribution can

now be interpreted as follows: languages in which a

single variant dominates are favoured when the prior

learnerscanbe

bymanipulatingthe

432 F. Reali & T. L. Griffiths

Words as alleles

Proc. R. Soc. B (2010)

Page 5

parameters meet the condition 2a/K(1 þ a/N) , 1, while

languages in which all variants are weighted equally will

prevail when 2a/K(1 þ a/N) . 1.2

Another way in which our approach differs from pre-

vious related work is that our mathematical formulation

allows us to generalize the biological–linguistic equival-

ence to the case of an unbounded number of variants.3

Following an argument similar to that for the finite

case, iterated learning with Bayesian learners considering

distributions over an unbounded vocabulary can be

shown to be equivalent to the Wright–Fisher model for

infinitely many alleles (see the electronic supplementary

material for a detailed proof). The stationary distribution

for this process is thus the Ewens sampling formula. The

two-parameter generalization discussed above has not

previously been proposed in population genetics, and its

stationary distribution is unknown. Consequently, we

assess the predictions of this model through computer

simulation, although a related analytic result is presented

in the electronic supplementary material.

4. IMPLICATIONS FOR THE PROPERTIES

OF LANGUAGES

Determining how languages change purely as a result of

iterated learning allows us to explore the forces that

drive language evolution. Our model is neutral with

respect to both selection and directed mutation at the

level of linguistic variants: no fitness-related factor is

associated with any of the competing linguistic variants,

and we assume symmetric mutation between them.

Thus, our analysis of iterated learning identifies a ‘null

hypothesis’ that is useful for evaluating claims about the

importance of selective pressures at this level in account-

ing for statistical regularities found in the form and

evolution of languages, playing a role similar to that of

neutral models of genetic drift such as the Wright–

Fisher model in biology (Kimura 1983). The model

also allows for a kind of directed mutation at the level

of entire languages, with expectations about the amount

of probabilistic variation in a language shaping the struc-

ture of that language over time. These expectations play

a role analogous to setting the mutation rate in the

Wright–Fisher model. The equivalence between these

models implies that the outcome of the transmission of

linguistic variants will be the same as that expected

under the neutral version of the Wright–Fisher model.

In this section, we show that this model can account for

three basic regularities in the form and evolution of

languages.

(a) S-shaped curves in language change

When old linguistic variants are replaced by new ones, an

s-shaped curve is typically observed in plots of frequency

against time (Yang 2001; Pearl & Weinberg 2007). This

phenomenon has been documented in numerous studies

of language change (e.g. Weinreich et al. 1968; Bailey,

1973; Kroch 1989). An example is the way in which

modern verb–object (VO) word order gradually replaced

object–verb (OV) in English (Pearl & Weinberg 2007;

Clark et al. 2008). Speakers’ preferences shifted from

OV phrases such as you God’s commandment keep will in

Old English to modern VO phrases such as you will

keep God’s commandment (Clark et al. 2008). Existing

models of this phenomenon have typically assumed that

the data used by the learner are filtered in some way

(Pearl & Weinberg 2007) or explore the effect of the

different distributions with which sentences are presented

to the learner (Niyogi & Berwick 1997). Other models

have assumed that the emerging linguistic variant has

functional advantages over the replaced one (Yang

2001; Christiansen & Chater 2008; Clark et al. 2008).

By contrast, our neutral model assumes that the learner

uses the entire input and that variants carry no functional

advantages. Consider the specific case of two competing

variants v1 and v2, such as the VO and OV word

orders.4When learners have a prior near or below the

threshold for favouring regularization (a , N/(N 2 1)),

the model predicts that v1 frequency should gradually

converge to extreme values, meaning that v1can emerge

even when its initial frequency is zero (figure 2). Similar

to the case of biological genetic drift, the probability of

a variant appearing and going to fixation is very small

for large values of N. However, historical documentation

of the s-shaped curve comes from cases where a change is

observed, which corresponds to conditioning on fixation

taking place. We therefore restrict our analyses to such

cases. When we condition on v1eventually taking over

the language, the trajectory of its frequencies follows an

s-shaped curve, provided learners have priors favouring

(a)

(b)

50

(i)

(i)

50

(ii)

(ii)

frequency 40

30

20

10

504030

generations

20 10504030

generations

2010

frequency 40

30

20

10

Figure 2. Changes in the probability (vertical axis) of a new

variant (v1) over 50 iterations of learning (horizontal axis) as

a function of the value of a. Total frequency of v1and v2was

N ¼ 50, but the same effects are observed with larger values

of N. (a) Changes in the probability of v1using a ¼ 0.05 cor-

responding to a prior that favours regularization. (i)

probability changes when conditioning on initial frequency

only (x1¼ 0), (ii) changes in the probability of v1when con-

ditioning on both initial frequency (x1¼ 0) and final

frequency (x1¼ 50), corresponding to the situation when

the new variant (v1) eventually takes over the language.

Under these conditions, s-shaped curves are observed, con-

sistent with historical linguistic data. (b) Changes in the

probability of v1when a ¼ 10 is used, corresponding to a

prior that favours probabilistic variation. (ii) Case condition-

ing on initial frequency only (x1¼ 0), (i) case of conditioning

on both initial (x1¼ 0) and final (x1¼ 50) frequencies,

illustrating that the appearance of the s-shaped curve

depends on the expectations of the learners. White cells

havezeroprobability, darker

probability.

greyindicates higher

Words as alleles

F. Reali & T. L. Griffiths433

Proc. R. Soc. B (2010)