Page 1

INFORMATION AND CONTROL 26, 256-271 (1974)

The Number of Occurrences of Letters Versus

Their Distribution in Some EOL Languages

A. E H R E N F E U C H T

Department of Computer Science, University of Colorado,

Boulder, Colorado 30302

AND

G. ROZENBERG

Department of Mathematics, Utrecht University, Utrecht-Uithof, The Netherlands

A characterization theorem is given for a class of developmental languages.

The theorem binds together the number of occurrences of letters in the words

of the given language with the distribution of these letters.

INTRODUCTION

This paper deals with a class of developmental languages. The theory

of developmental systems and languages originated in the works of

Lindenmayer (1968). This theory provided a useful theoretical framework

within which the nature of cellular behavior in development can be discussed,

computed and compared (see, e.g., Herman and Rozenberg, to appear,

Lindenmayer, 1968, and Lindenmayer and Rozenberg, 1972). It turned out

that developmental systems and languages are interesting and novel objects

from the formal language theory point of view. Especially in comparison

with Chomsky grammars and languages (see, e.g., Ginsburg, 1966) they

provided a lot of insight into the basic problems of formal language theory.

An important subclass of developmental systems are the so-called E0L

systems (see, e.g., Herman, 1974, or Herman, Lindenmayer, and Rozenberg),

which were devised to allow descriptions of development which take into

account the inaccuracy of our observations.

One of the basic open problems within the theory of E0L systems (and

in fact within the whole theory of developmental systems) is the charac-

256

Copyright © 1974 by Academic Press, Inc.

All rights of reproduction in any form reserved.

Page 2

E0L LANGUAGES 257

terization theorems that allow one, for example, to prove that some languages

are not E0L languages (i.e., languages generated by the E0L systems).

This paper provides such a characterization for a subclass of E0L

languages. The characterization theorem binds together the number of

occurrences (in the words of the given E0L language) of letters from a

given set of letters with the distribution of these letters.

The paper also discusses some applications of the main result.

1. PRELIMINARIES

We assume the reader to be familiar with the basics of formal language

theory (see, e.g., Ginsburg, 1966, whose notation and terminology we shall

mostly follow). In addition to this, we shall use the following notation:

(i) N denotes the set of nonnegative integers and N+ = N -- {0}.

If n is an integer, then abs(n) denotes its absolute value.

(ii) If x is a word over an alphabet Z, then I x I denotes the length

of x and Min(x) denotes the set of letters which occur in x. For a in Z,

#~(x) denotes the number of occurrences of the letter a in x and if B is a

subset of Z then #~(x) = ~2~B #~(x). If k is a positive integer, then x k

denotes x catenated h times with itself.

(iii) If A is a finite set, then #A denotes its cardinality. If B C A

and #B = 1 then B is called a singleton in A.

(iv) A coding is a letter to letter homomorphism. If h is a homo-

morphism from Z* into V* and L _C V* then h-l(L) = {x ~ Z*: h(x) = y

for some y in L}.

(v) If ~ = sl, s~, s a ,... is a sequence of objects and il, i2, ia .... are

such that i 1 < i~ < i a < -'-, then sil, si2, s<~ .... is called a subsequence of 7.

(vi) z denotes the empty set and A denotes the empty word.

(vii) If d is a (nondeterministic) finite automaton, then L(d) denotes

its language.

(viii) If A is an ultimately periodic sequence (set) of nonnegative

integers then thres(A) denotes the smallest integer j for which there exists

a positive integer q such that, for every i >/j, if i is in A then (i -t- q) is

in A. The smallest positive integer p such that, for every i/> thres(A),

whenever i is in A then also (i + p) is in A, is denoted by per(A).

Page 3

258

EHRENFEUCHTAND ROZENBERG

2. DEFINITIONS AND EXAMPLES OF E0L SYSTEMS

In this section we give basic definitions concerning developmental lan-

guages, which are relevant for this paper.

DEFINITION 1.

such that

An E0L system is a construct G = <VN, Vr, P, to)

V N is a finite alphabet (of nonterminal letters and symbols),

V T is a finite nonempty alphabet (of terminal letters or symbols), such

that VN n V T = 2J,

to is an element of (V N k_) VT) ÷ (called the axiom of G),

P is a finite nonempty set (called the set of productions of G) each

element of which is of the form a --+ ~, where the symbol "--+" is not in

V u td VT, a is in VN tJ VT, and ~ is in (V N k) Vr)*. Moreover, for every a

in VN U U r there exists a word a in (V u u fiT)* such that a ~ ~ is in P.

In the sequel we shall often write "a --+, £' rather than "a --+ c~ is in P."

Also a production of the form a ~ a is called a production for a in P.

DEFINITION 2.

0L system if, and only if, K~ = ~. (In this case we write G as < KT, P, co>).

0L systems are investigated, for example, in Rozenberg and Doucet (1971).

An E0L system G ~-<V w, VT,P, to) is called a

DEFINITION 3. Let G = < Vw, VT, P, to> be an E0L system.

(i) Let xa(Vlvt)VT)+,

V N td Vr, and let y c (V N tj Vr)*. We say that x directly derives y (in G),

denoted as x ~a Y, if there exists a sequence 7r 1 ,..., ~r t of productions from P,

such that, for everyiin{1 .... ,t},~r i =b i-+~iandy

say x =b l'-'b t for some b 1 .... ,b t in

~ c~ l'''c~ t.

(ii) As usual, ~+ denotes the transitive closure of the relation ~a

and ~a denotes the reflexive and transitive closure of the relation ~a.

If x *~a Y then we say that x derives y in G.

(iii) A finite sequence D =- (Xo, x 1 ,..., xr) of words from (Vw t.) VT)*

such that, r >~ 1 and, for each i in {1,., r}, xi-1 ~a xi, is called a derivation

(of xr from Xo) in G. If x o = to, then D is called a derivation of xr in G.

(iv) An infinite sequence D = (Xo, x I ,...) of words from (VN L; VT)+

such tha L for each i ~ 1, xi_ 1 ~c xl, is called an infinite derivation in G.

(v) If D = (Xo, xl ,..., xr) is a derivation in G, then its control sequence

Page 4

E0L LANGUAGES 259

is any sequence r = (T 1 ,..., Tr) of subsets of P, such that, for each i in

{1,..., r}, x~_ 1 ~a xi "using" all and only productions from T~.

(vi) For x in (VN t.3 VT)+,y in (VN tO VT)* and a positive integer r we

write x ~ y if there exists a derivation D = (x o = x, x 1 .... , x r = y) in G.

We also write x 9 ° x, for every x in (V N W ~/'T) +.

(vii) max(G) is defined as max{I a I: a ~

and o~ in (V iv L3 VT)*}.

a for some a in VN V) V r

DEFINITION 4.

language of G, denoted as L(G), is defined by L(G) = {x c VT*: o, *~G X}.

Let G = (VN, Vr, P, ~o) be an E0L system. The

DEFINITION 5.

and only if, there exists an E0L system (0L system) G such that L(G) ~- K,

A language K is called an E0L language (0L language) if,

Remark 1.

D = (x o ..... x~) and its control sequence r = (T 1 .... , T,.), the pair (D, r),

in general, does not tell us which productions are used to rewrite the particular

occurrences of letters in the words x 0 ,..., x~_ 1 . However (to avoid cum-

bersome notation and to keep the size of this paper decent), we shall often

assume that the pair (D, r) provides such information. This should not

lead to confusion.

Given an E0L system G = (VN, Vr, P, w), a derivation

Remark 2.

for finite languages, hence we consider only infinite languages. Thus in the

sequel if we write "a language" (or "an E0L language") we mean an infinite

one, unless explicitly stated otherwise. Also whenever we write "an E0L

system" we mean one generating an infinite language.

The properties we are interested in (in this paper) are trivial

Remark 3.

sometimes consider P to be the "set of names for productions" rather than

the set of productions itself. In this sense we can talk about the words over P,

etc., and this should not lead to confusion.

We end this section with two examples of EOL systems.

Given an EOL system G = (VN, VT, P, oJ) we shall

EXAMPLE 1.

{S ---> a, S --+ b, a --+ a 2, b --+ b a} and w = S, is an E0L system such that

L(G) = {aS": n /> O} U {ba': n >~ 0}.

G = (VN, VT, P, ~), where 1/- N = {S}, Vr = {a, b}, P =-

EXAMPLE 2.

and co = ab, is a 0L system such that L(G) = {(ab)~=: n >/0}.

G = (2J, P, w), where 27 = {a, b}, P = {a --* (ab) 2, b --+ A}

Page 5

260

EHRENFEUCHT AND ROZENBERG

3. BASIC NOTIONS AND THEIR PROPERTIES

In this section we introduce basic notions describing the structure of OL

languages we are interested in and we prove some properties of these notions.

DEFINITION 6.

nonempty subset of Z. Let IL. B = {n E N: there exists a word w in L such

that #~(w) = n}.

Let L be a language over an alphabet Z and let B be a

(i) B is numerically dispersed (in L) if, and only if, IL, B is infinite

and for every positive integer k there exists a positive integer n7¢ such that,

for every ul, u 2 in IL. B , if u 1 :# u s , u 1 > n k and u 2 > n k then abs(u 1 -- us) > k.

(ii) B is clustered (in L) if, and only if, IL, B is infinite and there exist

positive integers k 1 , k s such that k s > l, k s > 1 and, for every word w in L,

if #B(w) >/k 1 , then w contains at least two occurrences of symbols from B

which are distant less than kz.

DEFINITION 7.

The symbol a is said to be frequent (in L) if, and only if, for every positive

integer n there exists a word w in L such that #a(W) > n; otherwise a is

called nonfrequent (in L).

Let L be a language over an alphabet Z and let a be in 2:.

DEFINITION 8.

subset of 27 and let a be in Z.

Let G = (Z, P, oJ) be a 0L system, let B be a nonempty

(i) We define a B-characteristic sequence of a (in G), denoted as

Seq(G, B, a), as an infinite sequence Z 1 , Z~ ,... of finite subsets of N such

that, for each i ~> 1 and every nonnegative integer n, n is in Zi if, and only if,

i

a :~a w for some w in Z* such that #B(w) = n.

(ii) Seq(G, B, a)= Z 1 , Z 2 .... is called unique if, and only if, for

every i >/ 1, #Zi = 1.

(iii) Seq(G, B, a) -- Z1, Z 2 ,... is called bounded if, and only if, there

exists a constant C such that, for every i/> 1, n < C for every n in Z~.

In this case we also say that a is B-bounded (in G) and that C bounds

Seq(G, B, a). We say that a is B-unbounded (in G) otherwise.

(iv) Seq(G, B, a)= Z1, Z 2 .... is called constant if, and only if, for

each i, j >~ 1, Zt = Zj. In this case we also say that a is B-constant (in G).

LEMMA 1. Let G = <Z, P, oJ) be a 0L system, let B be a nonempty

subset of Z and let a be a symbol in Z. Let Seq(G, B, a) = Z 1 , Z2 ,... and let