# Information theoretic models in language evolution.

**ABSTRACT** We study a model for language evolution which was introduced by Nowak and Krakauer ([M.A. Nowak and D.C. Krakauer, The evolution of language, PNAS 96 (14) (1999) 8028-8033]). We analyze discrete distance spaces and prove a conjecture of Nowak for all metrics with a positive semidefinite associated matrix. This natural class of metrics includes all metrics studied by different authors in this connection. In particular it includes all ultra-metric spaces.Furthermore, the role of feedback is explored and multi-user scenarios are studied. In all models we give lower and upper bounds for the fitness.

**0**

**0**

**·**

**0**Bookmarks

**·**

**38**Views

- Citations (0)
- Cited In (0)

Page 1

Information theoretic models in language

evolution1

Rudolf Ahlswede, Erdal Arikan, Lars B¨ aumer, Christian Deppe

Universit¨ at Bielefeld, Fakult¨ at f¨ ur Mathematik, Postfach 100131, 33501 Bielefeld,

Germany

Abstract

We study a model for language evolution which was introduced by Nowak and

Krakauer ([2]). We analyze discrete distance spaces and prove a conjecture of Nowak

for all metrics with a positive semidefinite associated matrix. This natural class of

metrics includes all metrics studied by different authors in this connection.

particular it includes all ultra-metric spaces.

Furthermore, the role of feedback is explored and multi-user scenarios are studied.

In all models we give lower and upper bounds for the fitness.

In

The human language is used to store and transmit information. Therefore

there is significant interest in the mathematical models of language develop-

ment. These models aim to explain how natural selection can lead to the

gradual emergence of human language. Nowak and coworkers created such

a mathematical model [2], [3]. A language L in Nowak’s model is a system

L = (O,Xn,d,r) consisting of the following elements

(i) O is a finite set of objects, O = {o1,...,oN}.

(ii) X is a finite set of phonemes which model the elementary sounds in the

spoken language. The set Xnmodels the set of all possible words of

length n.

(iii) Each object is mapped to a word by the function r : O → Xn. Thus, the

words for all objects have the same length n. The model allows several

objects to be mapped to the same word. With some abuse of notation,

1supported in part by INTAS-00-738

Electronic Notes in Discrete Mathematics 21 (2005) 97–100

1571-0653/$ – see front matter © 2005 Elsevier B.V. All rights reserved.

doi:10.1016/j.endm.2005.07.002

www.elsevier.com/locate/endm

Page 2

we use L to denote the set of all words in the language, L = {xn: xn=

r(oi) for some 1 ≤ i ≤ N}.

(iv) d : X × X → R+ is a measure of distance between phonemes; i.e., a

function that is symmetric d(x,y) = d(y,x) and non-negative d(x,y) ≥

0, with d(x,y) = 0 if and only if x = y. The distance between two

words is defined by dn(xn,yn) =?n

(v) The model postulates that the conditional probability of the event that

the listener understands the word yn∈ L given that the speaker utters

the word xn∈ L is given by

exp(−dn(xn,yn))

?

Nowak defined the fitness of a language L with words over Xnas

F(L,Xn) =

xn∈L

i=1d(xi,yi), where xn,yn∈ Xn, xn=

(x1,...,xn), yn= (y1,...,yn).

p(yn|xn) =

vn∈Lexp(−dn(xn,vn))

?

p(xn|xn)

Nowak was interested in the maximum possible fitness for languages. So, he

defined the fitness of the space Xnas

F(Xn) = sup{F(L,Xn) : L is a language over Xn}

and he posed the determination of the quantity F(Xn) for general spaces (X,d)

as an open problem. He conjectured that F(Xn) = (F(X))nwhen (X,d) is a

metric space, i.e., when the distance function d satisfies the triangle inequality

d(x,y)+d(y,z) ≥ d(x,z). We show that Nowak’s conjecture is true for a class

of spaces defined by a certain condition on the distance function. Let us call

a space (X,d) a p.s.d. space if the matrix [e−d(x,y)]x∈X,y∈X is positive semi-

definite. The main result is the following

Theorem 1 For any p.s.d. space (X,d) where X is a finite set, the fitness is

given by

F(Xn) = F(X)n= enR0

where

(1)

R0= R0(X,d) = −logmin

λ

?

x

?

y

λxλye−d(x,y)

(2)

where the minimum is over all probability distributions λ = (λ1,...,λ|X|) on

X.

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

98

Page 3

In other words, for p.s.d. spaces Nowak’s conjecture holds and the fitness

is given by powers of eR0. For any p.s.d. space, there exists a “channel”

[W(z|x)]x∈X,z∈Zfor some set Z such that (i) W(z|x)≥0, all x,z, (ii)?

equals the cutoff rate of the channel W in the standard information-theoretic

sense. This indicates a connection between Nowak’s model and standard

information-theoretic models. Indeed, the proof of the above result makes

use of Gallager’s results on reliability exponents and specifically his “paral-

lel channels theorem” [1, p. 149] to achieve the single-letterization demanded

by Nowak’s conjecture. Examples of spaces (X,d) for which Nowak’s con-

jecture is settled by the above result are (i) the Hamming space where X is

an arbitrary finite set and d(x,y) = δx,yis the Hamming metric, (ii) X is a

finite set of reals and d(x,y) = |x − y|, and (iii) X is a finite set of reals and

d(x,y) = (x − y)2. All of these spaces are p.s.d. Some other partial results

are as follows: (i) All finite ultra-metric spaces are p.s.d. (Recall that in an

ultra-metric space for all three points a,b,c it holds that

d(a,b) ≤ max{d(a,c),d(c,b)}.) (ii) All metric spaces with 3 and 4 elements

are p.s.d. (iii) There exists some metric spaces with 5 elements which are not

p.s.d. (iv) For every metric space (X,d) where X is a subset of reals, there

exists a scaling dα(x,y) = αd(x,y) for some α > 0 and for all x,y ∈ X such

that the space (X,dα) is p.s.d. (v) Nowak’s conjecture does not hold if we do

not allow multiplicity of words.

We have shown that the product conjecture is true in particular for the Ham-

ming model. The optimal fitness is attained, if one use all possible words in

the language. In general the memory of the individuals is restricted. For this

reason we look for languages, which use only a fraction of all possible words,

but have large fitness.

We consider simple and perfect codes: The Hamming codes ([?]).

With FH(n) we denote the fitness of a Hamming Code of length n.

zW(z|x) =

1, all x, and (iii) e−d(x,y)=?

z

?W(z|x)W(z|y), all x,y. The parameter R0

Theorem 2 The fitness of the Hamming code approaches asymptotically the

optimal fitness. Not only limn→∞

1, but even the stronger condition

1

nFH(n)=limn→∞

1

nF(Xn) and limn→∞

FH(n)

F(Xn)=

lim

n→∞FH(n) − F(Xn) = 0

holds.

Next we show that ratewise the fitness of the Hamming space is attained if we

choose the middle level as a language.

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

99

Page 4

Theorem 3 Let L be the language in the Hamming space Xnthat consists of

all words of weightn

i.e.,

1

nlogF(L,Xn) −1

These theoretical models of fitness of a language enable the investigations of

classical information theoretical problems in this context. In particular this

is true for feedback problems, transmission problems for multiway channels

etc. In the feedback model we developed we show that feedback increases the

fitness of a language.

2. Then the fitness of the language L is ratewise optimal,

lim

n→∞

nlogF(Xn) = 0.

Acknowledgment: The authors would like to thank V. Blinovsky and E.

Telatar for discussions on these problems and P. Harremoes for drawing their

attention to the counter-example in the case without multiplicity.

References

[1] R.G. Gallager, Information Theory and Reliable Communication, New York,

Wiley, 1968.

[2] M.A. Nowak and D.C. Krakauer, The evolution of language, PNAS 96, 14,

8028-8033, 1999.

[3] M.A. Nowak, D.C. Krakauer, and A. Dress, An error limit for the evolution of

language, Proceedings of the Royal Society Biological Sciences Series B, 266,

1433, 2131-2136, 1999.

R. Ahlswede et al. / Electronic Notes in Discrete Mathematics 21 (2005) 97–100

100