Content uploaded by Martin Hassner

Author content

All content in this area was uploaded by Martin Hassner on Oct 17, 2012

Content may be subject to copyright.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.

IT-29,

NO. 1, JANUARY

1983

Algorithms for Sliding Block Codes

An Application of Symbolic Dynamics to Information Theory

ROY L. ADLER, DON COPPERSMITH,

AND

MARTIN HASSNER,

MEMBER, IEEE

Abstract--Ideas which have origins in Shannon’s work in information

theory have arisen independently in a mathematical discipline called sym-

bolic dynamics. These ideas have been refined and developed in recent

years to a point where they yield general algorithms for constructing

practical coding schemes with engineering applications. In this work we

prove an extension of a coding theorem of Marcus and trace a line of

mathematics from abstract topological dynamics to concrete logic network

diagrams.

I.

INTRODUCTION

A. The Problem

W

E ADDRESS the problem of encoding and decod-

ing digital data from one type of constraint to

another by means of finite state automata. The data are

long strings of symbols from a finite alphabet, usually

zeros and ones or blocks of them. In this paper, we

consider encoding arbitrary sequences of zeros and ones

into a constrained format dictated by the data processor.

The constraints may be due to physical limitations of a

transmission or storage system or artificial limitations dic-

tated by data processing procedures.

B. The Model

The appropriate mathematical models for dealing with

the problem are symbolic dynamical systems, i.e., spaces,

invariant under the shift transformation, of two-sided in-

finite sequences of symbols from finite alphabets. The term

dynamical system is due to the fact that such spaces are

composed of discrete time orbits, each orbit consisting of a

succession of shifted sequences.

Practical encoders and decoders have short finite mem-

ories but strings they process are so long as to seem

infinite. The proposed model is suitable to the problem

because

i) the constraints are time independent (shift invariant);

ii) encoders and decoders can be constructed from map-

pings between systems which commute with some

power of the shift (sliding block codes) [6], [25].

Manuscript received March IO, 1982; revised July 1, 1982. This work

was supported &part by NSF Grant MCS81-07092. This work was

presented in part at the IEEE International Symposium on Information

Theory, Santa Monica, CA, February 1981.

The authors are with IBM Thomas J. Watson Research Center, York-

town Heights, NY 10598.

Constraints encountered in practice are standard ones in

symbolic dynamics. Of particular importance are those

specified by a finite list of forbidden blocks of symbols,

e.g., upper and lower bounds on run lengths of zeros and

ones (Sec. VII), [14], [15], [ 161, [29], [33], [49]. In symbolic

dynamics such systems are called shifts of finite type or

topological Markov shifts [4], [51]. More complex con-

straints are also important such as ones involving the

power spectrum of symbol sequences, e.g., no dc compo-

nent in a signal representing the symbol sequence [ 14],[22],

[30], [32], [34], [41], [46]. These can be described as outputs

of a finite state automaton whose inputs are shifts of finite

type. In symbolic dynamics such systems are called sofic

(from the Hebrew word for finite) [50]. In engineering

contexts, the above constraints have been described by the

notion of a channel: namely, shifts of finite type are

deterministic finite state channels with finite memory, and

sofic systems are deterministic finite state channels with

infinite memory. Some areas where these constraints are

met are magnetic recording, fiber optics, and data proto-

cols in communication networks.

C. Shannon Theory

Suppose we wish to encode in a decodable way every

sequence (. . . x- ,, x0, x, * .

f ) of a system X satisfying one

set of constraints into another system Y of sequences

(. . ‘Y-l, Yo, Yl *.

. ) satisfying another. Each component

x,, y,, may itself consist of a finite block of symbols, say of

length p and q, respectively, in which case we say that the

coding rate r = p/q. The concept of topological entropy

governs when this is possible. Topological entropy is defined

as the exponential growth rate, as n --) co, of the number of

different strings of length n appearing in the infinite se-

quences of a symbolic system. The term “topological” is

used to distinguish this entropy from its probabilistic coun-

terpart. It was defined in purely topological terms in [3].

In the present context where output symbols are of equal

duration, Shannon’s noiseless coding theorem [48, p. 281

amounts to the following obvious statement: coding of

arbitrarily long finite strings is possible when the topologi-

cal entropy of X is less than that of Y and impossible when

the inequality is reversed, the case of equality being left

unresolved. Shannon called the system Y, a channel, and its

topological entropy, the channel capacity. He called the

system X, the source, and endowed it with a probabilistic

0018-9448,‘83,‘0100-0005$01.00 01983 IEEE

entropy (topological entropy maximizes probabilistic ent-

ropy supported by the source [ 121, [23]). The full content of

Shannon’s theorem applies to the situation where the topo-

logical entropy of the source is greater than that of the

channel but its probabilistic entropy is less.

We sharpen Shannon’s theorem for the special case

where the source entropy is the topological entropy to

show that coding is possible even in the case of equality. In

addition, the method of proof provides efficient sequential

encoding and decoding algorithms which do not depend on

the length of strings processed, a feature absent from

Shannon’s original theorem. We treat the class of shifts of

finite type (channels with finite memory) and leave the

more general case of sofic systems (channels with infinite

memory) to subsequent work. Actually, we deal with the

case where the topological entropy of the source is the

logarithm of an integer, the most common one in applica-

tions. The more general case where it is the logarithm of an

algebraic integer can be handled by a slight extension of

the “tableaux” method of [4], but then certain desirable

error propagation properties usually must be forgone.

and clarify the relationship between the two methods.

Franaszek’s ideas are very interesting and maybe lead to

simpler implementations. From a mathematician’s point of

view these works [ 191, [20] leave something to be desired-

namely, precise statements on the scope of the method

along with complete proofs. A. Lempel and M. Cohn [35]

work some examples by Franaszek’s method, but leave the

same mathematical questions unsettled.

The main results of our work were presented at the IEEE

International Symposium on Information Theory, Santa

Monica, CA (Feb. 1981) [2]. The theme, which is the.

application of recent developments in symbolic dynamics

to coding problems in information theory, was suggested

by M. Hassner in [26].

II.

ABSTRACTDYNAMICALSYSTEMS

This paper is written for two worlds, engineering and

mathematics, at the risk of satisfying neither. It runs the

gamut from the sublimely abstract to the hard-nose con-

crete. We start with notions of sets, mappings, and topol-

ogy in Section II, supplant these with combinatorial ideas

by Section V, and finish at the end of Section IX with logic

circuit diagrams. The complete trip is hardly needed for

constructing codes, but it is useful in organizing the flow of

ideas and bringing order to the subject. For the less

mathematically minded, interested only in making codes

for some practical purpose, we suggest concentrating on

the description of the symbol splitting process of Section

VI (not the proof), the example of Section VIII, and the

implementation of it in Section IX. Following the pattern

there one should be able to construct encoders and de-

coders for any shift of finite type constraint. The method

can be extended to cover sofic systems arising from the

aforementioned spectral constraints, but this has not yet

appeared in print 1381 and its applicability is not fully

assessed. The excessive number of tables included in Sec-

tions VIII and IX are there to indicate the labor involved

in constructing codes.

Let X be a compact metric space and u a homeomor-

phism-i.e., continuous one-to-one map-of X onto itself.

We call the pair (X, a) an abstract dynamical system. For a

comprehensive treatment of such systems see [ 111. If X’ is a

closed u-invariant subset of X then the system (X’, a) is

called a subsystem of (X, a), and we write (X’, u)

C

(X, a).

We define the orbit, future orbit, and past orbit of a point x

by the respective sequences orb x 3 {a”~}, EZ, orb+ x =

{u’*x},~a, orb-x G {u”x},<~. In order to economize on

notation we shall always use the following convention for

metrics. We denote the distance between two points x, y

by ] x - y ] even though subtraction and absolute value

may not be defined.

Two orbits, orb x and orb y, are called positively (negu-

tively) asymptotic if 1 u”x - u”y I+ 0, n ---, co (n + - ~9).

We have the following indecomposability conditions for

a dynamical system and its higher iterates. A system is

called nonwandering transitive if for every pair of neighbor-

hoods there is a point in the first whose future orbit hits

the second. A system is called aperiodic if (X, a”) is

nonwandering transitive for all n.

Let (X, u), (Y, 7) be two abstract dynamical systems. A

continuous map cp of Y onto X such that cp o u = r o r,o is

called a topological homomorphism. If such a map exists we

have the following commutative diagram: (Fig. 1).

For the paper as a whole, we assume knowledge of the

Perron-Frobenius theory of nonnegative matrices [21], 1471

and some basic elements of symbolic dynamics which can

be found in [4], [8], [ 111, [27], [28].

We present only one proof, that of the main theorem in

Section VI. All others are standard and easy ones from

symbolic dynamics. These are stated without proof. Where

possible, references are cited in which proofs can be found;

otherwise they should be treated as exercises. Actually

Sections II-V is to be regarded as a survey.

Methods for doing noiseless coding have also been de-

veloped by P. Franaszek [14]-[ 191. In [19], [20] he gives a

general one which is different from ours yet intriguingly

based on the same inequality (6.1) from the Perron-

Frobenius theorv. We hone somedav to return to this tonic

J I

_I I

respectively. We represent this situation as in Fig. 2.

We also refer to ‘p as a factor map and call (X, r) a

factor of (Y, a) and (Y, a) an extension of (X, 7). If, in

addition, cp is one-to-one (hence invertible, q-’ being con-

tinuous by compactness) we call it an isomorphism and say

that (X, a) is topologically conjugate to (Y, r) and write

(X, a) = (Y, r). Nonwandering transitivity and aperiodic-

ity are preserved under isomorphism. Topological con-

jugacy is the strongest sense of equivalence of dynamical

systems from the purely topological point of view and too

strong for many practical applications (see Remark 8.1).

Consequently, we introduce a weaker one.

Definition 2.1 (Parry 1451): We say (X, a) and (Y, r)

are finitely equivalent, and write (X, a) - (Y, r), if there

exists a common extension (Z, p) and boundedly finite-to-

one factor maps (p, II, of (Z, p) onto (X, a) and (Y, r),

6

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.

IT-29,

NO. 1, JANUARY

1983

ADLER

et

d: ALGORITHMS FOR SLIDING BLOCK CODES

7

YOY

I I

For coding applications we need a certain kind of invert-

v

cp.

ibility condition for factor maps which for abstract systems

LA

is expressed by the following.

Fig. 1.

Commutative diagram of topological homomorphism.

Definition 2.4 (Kitchens [31 I): A factor map cp of (Y, a)

onto (X, r) is called right, left, or two-sided closing if

GLP)

/\

CJJX # ‘py whenever x # y and orb x, orb y are, respec-

tively, negatively, positively or both negatively and posi-

tively asymptotic.

‘p $J

III.

SYMBOLIC SYSTEMS

/ \

(Y,d

(ka)

Fig. 2. Finite equivalence diagram.

Finite equivalence can be slightly strengthened, which is

done in the next definition, and still be weaker than

topological conjugacy.

Definition 2.2 (see [ 41, [24]): We say that (X, a) and

(Y, r) are almost one-to-one finitely equivalent and write

(X, u) 2 (Y, r) if, in addition to being finitely equivalent,

the factor maps are one-to-one except on nondoubly tran-

sitive points. A point x is said to be nondoubly transitive if

either orb+ x or orb- x fails to be dense in its dynamical

system, otherwise it is called doubly transitive.

Theorem 2.1 [ 4, p. IO]: = , - , and 5 are equivalence

relations and

(X,u) -(Y,r) -(X,u)“(Y,r) *(X,u)-(Y,r).

We remark that the set of doubly transitive points is an

invariant subset of a dynamical system and that factor

maps in Definition 2.2 are isomorphisms between subsys-

tems which, however, are not compact.

Definition 2.3 (Bowen-Dinaburg [7], [12]): The topo-

logical entropy h( X, a) for abstract dynamical systems is

defined as the largest growth rate possible, as n -+ co, of

the number of c-separated orbits of length n, i.e.,

h(X,

u) =

sup lim ilogN(e, n),

rao n+cc

where N(c, n) denotes the number of e-separated orbits of

length n. Two orbits of length n{uk~}05k5n--l and

{uky}aSkcn-, are said to be r-separated if ] ukx - uky 12 E

>Oforsomek,OSkIn- 1.

An easy consequence of the definition is that, if (X, u) is

a factor of (Y, r), then h( X, a) 5 h(Y, r). Also, if (X’, u)

is a subsystem of (X,

a),

then h(X’, u) 5 h(X,

a).

We have

Theorem 2.2.

Theorem 2.2 [4, p. 91: If (X, a) is a finite-to-one factor

of (Y, r) then h( X, a) = h( Y, r).

Corollary 2.3: If (X, u) = (Y, r) then h( X, u) = h( Y, r).

Thus topological entropy is an invariant for all three

equivalence relations. We also have the next theorem.

Theorem 2.4 [3, p. 3211: h(X, a”) =I n ] -h(X, a), n E

Z.

Let A be an alphabet (sometimes called a state space), by

which we mean a finite set of symbols (also called states)

with an ordering. We denote the cardinality of a set A by

I A ] . Examples of alphabets are & = (0, 1 }, @ =

{(O,O), (0, l), (l,O), (1, l)}, etc. We freely abuse notation by

using indexing symbols to represent interchangeably both

an element of an alphabet and its ordinal number. Which it

should be clear from context. This sloppiness is often

compounded by the fact that numbers also appear as

alphabet symbols and that a numerical symbol may not

coincide with its ordinal number. The advantage of incon-

sistency here is that it keeps notation to a minimum.

As is customary, &’ denotes the set of two-sided infinite

sequences of elements of &. The space @’ can be endowed

with a metric, the distance between sequences x = {x~}~~=,

Y = {YAEZ

beingdefinedbylx-y]=Zr=‘=_,]x,-y,]

/21” where I x, -

y, 1 is defined to be one when x, # y,

and zero otherwise. In this metric the more two sequences

consecutively agree the closer they are, and we have a

neighborhood basis which consists of the family of sets,

called cylinder sets, of the form {x = ( . . . x-,, x0,

XI,. .

.): (x,+,; . .,xntk) = (a,;. .,a,)} where

(a,,* . -,

ak) is some fixed k-tuple of symbols of & In this

topology @’ is compact.

We define the shift transformation u of Qz onto itself by

(ux), = x,+1

for x E &?‘, n E Z. In the above metric u is a

homeomorphism and we form the dynamical system (&=, a)

which is called the full N-shift where N = ] 6? I . Any subsys-

tem (X, (7)

C

(6?=,

u) is called a subshift. We use symbols

@, 3, e to denote alphabets. Occasionally we use @x to

denote the alphabet of a dynamical system (X, a) which in

the above is a subset of &. Any finite n-tuple (a,; . .,a,)

which appears in any sequence of X is called an admissible

n-block. The topological entropy of a subshift is given by

h(X,

u)

= lim l/nlog N(n),

n-)-x

(3.1)

where N(n) is the number of admissible n-blocks. We

observe that

h(@=,u) =logI&l

(3.2)

and h( X, a) 5 log ] @] for (X, a)

C

(gz, a). The reader

can regard (3.1) as a definition [44] although it is an easy

exercise to derive it from Definition 2.3.

We introduce the notation x 1: = (xm, x,+,; . .,x,) for

a sub-(n - m + 1)-tuple of an n-tuple or sequence x. Given

a subshift (X, a) we can form a subshift (Xrnl, a), called

the higher n-block system of (X, a), where ( XrH1, a) consists

8

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.

IT-29,

NO. 1, JANUARY

1983

of sequences (. . . ,x 1’!T2, x I:-‘, x 17, . . . ), where x E X.

The higher n-block system (XL”], a) is canonicahy isomor-

phic to (X, a) under the correspondence

(-x(“,,x);;-‘,x1;) ++(~-X-,,Xo,X,,-~)

(Xt”], a) is a subshift of the full shift based on an alphabet

of symbols consisting of all admissible n-blocks of X.

Definition 3.1: Let (X, a), (Y, u) be subshifts of two full

shifts (P, a), (%Z,

a), respectively. A mapping ‘p of (Y, a)

onto (X, a) is called a k-block map requiring memory I and

anticipation m, if there exists a function cp: gk --) % such

that if {xn} = cp{y,} then

x, = dY,-6. hYn-t,>>

wherek=m+I+l,fornEZ.

We use here the following abuse of notation. The same

symbol ‘p is used to denote a mapping defined on se-

quences and the component function of several variables

by which it is specified. What is meant will always be clear

from context, hopefully.

In dynamical systems only the finiteness of k, not its

size, is important. We can often take advantage of the

conceptual simplification of regarding a k-block map as a

l-block map on a system with a larger alphabet. This is

done by replacing a k-block map cp by the l-block map

(~9~ ’ where 8 is the canonical isomorphism between (Y, a)

and ( Ytkl, a). (See Fig. 3.)

However, for construction of encoders and decoders in

engineering applications it is important to have k and the

alphabet size as small as possible; so the above artifice is of

no advantage from this point of view.

Theorem 3.1 [ 11, p. 31: An onto mapping cp between

subshifts is a homomorphism if and only if it is a k-block

map.

If a k-block map QI between subshifts is invertible, then

its inverse ‘p-l is also a k-block map, perhaps with a

different k. We define a weaker form of invertibility.

Definition 3.2: Let ‘p be as in Definition 3.1 with

cp({Yn)) = {-%I>. ‘p

is said to be right resolving with parame-

ters p, q, r of memory and anticipation if each y,, is uniquely

determined from the upcoming x,-~,. . .,x,,+~ and preced-

ingy,-,; . .,ynpl-

in other words, there exists a function

f:

ar X @p+q+’ + % such that

Y, =f(Yn-r,-,Yn-l; X,-p,-,X,+q).

A similar definition is given for

left

resolving by replacing

{xn}, {y,,} by {x-n}, {y-,} in the above definition.

We remark that Definition 3.2 is what Definition 2.4

becomes in the context of subshifts. Kitchens [31] used the

term right (left) closing here. In [4] the definition of right

and left resolving covered only the cases where r = 1,

p = q = 0. The concept of right resolving is not new in

information theory. It was called unifilar by McMillan [36,

p. 2161. We could also give the definition of two-sided

resolving [4] which is what two-sided closing becomes for

symbolic systems, but that will not be needed in the

present work. Suffice it to say that the importance of the

YBY’k’

P

I/

cpa-’

X

Fig. 3.

Equivalence of k-block to the l-block map.

concept of resolvability is put into evidence by the follow-

ing theorem.

Theorem 3.2 [ 4, p. 231: A homomorphism is finite-to-one

if it is right or left resolving. (Right or left resolving imply

two-sided resolving but not conversely. For subshifts of

finite type which are defined in Section V finite-to-one

implies two-sided resolving.)

Closely associated with the concept of right and left

resolving is the notion of resolving block. Such blocks serve

as a means for resetting an encoding automaton con-

structed from a right resolving map. Let ‘p in Definition 3.1

be a l-block map, i.e., x, = cp( y,).

Definition 3.3 [ 11, [ 41: An X-admissible m-block

(a,,. * .,

a,) is called a resolving block if there exists an

index i,, 1 I i, I m,

such that if (y,, . * . ,y,) and

(Y;,. * *

,yA) are two Y-admissible m-blocks such that ‘p( yi)

= ‘p( y,‘) = a,, 1 I i 5

m,

then y,, = y:,. In other words,

the block

(a,; * -,

a,)

determines a unique preimage in the

i,th coordinate.

Remark 3.1: If QI is also right resolving with r = 1,

p = q = 0 and x I? = (a,;. .,a,) then the sequence y 1:

can be uniquely determined from x 1;” whenever x = ‘p(y).

Furthermore, if there are so many resolving blocks that in

every k-block there exists at least one, then ‘p is invertible,

in fact, ‘p-l can be seen to be a k-block map.

IV.

ENCODERS AND DECODERS

Let (X, a) and (W, a) be two subshifts with alphabets @

and 3, respectively. In the vocabulary of engineering let us

call (X, u) the source and (W, a) the channel. Usually @

and $8 consist, respectively, of admissible p-blocks and

q-blocks of 0 and 1 for some fixed p and q. Furthermore,

the source usually consists of unconstrained sequences and

the channel of constrained ones. Thus 6! is the set of all 2P

p-blocks whereas $B is some subset of q-blocks.

Our problem is to construct two finite state automata:

an encoder which converts source sequences {xn} E X to

channel sequences {y,} E W, and a decoder which recovers

{xn} from {y,}. The coding rate is r = p/q (p source

symbols per q channel symbols) and we want this as large

as feasible. At the same time we want q and p small to

minimize complexity.

A finite state automaton, say the encoder, is given by two

functions

Y, = 4x,-,,-. *,x,+,, tn),

2, = f(&-[,‘. ‘,X,tm, z,-J,

(4.1)

where z, belongs to some finite alphabet (J?, called the

internal states of the automaton. The elements y, are called

the output and x,,-,, . . *,x,+,, the inputs, with I, m param-

ADLER et

al.:

ALGORITHMS FOR SLIDING BLOCK CODES

9

I

YLY

d

I I

d

X “IX

Fig. 4. Commutative diagram of decoding map.

eters of memory and anticipation. The function e is called

the output function and f the next state function. By sub-

stitution, y, is a function of x,-,;..,x,+~, I,-,. The

numbering of variables in (4.1) may be slightly off from

standard usage. We adopt the present one to conform to

our notation of subsequent constructions.

The output sequences {y,} belong to a subsystem (Y, a)

C

(W, a). By virtue of (4.1) this subsystem is a factor of a

system of ((3 X e)‘,

a), which in turn is a factor of a

subsystem of ((a X e)‘, a). A single error in the input will

possibly propagate forever in the output. We take the point

of view that the source is error free. However, errors may

occur in the channel, and we want to limit the range of

their propagation in the decoder. In order to do this, we

should also make the decoder a finite state automaton, but

one in which the internal state at a particular time does not

depend on the input, but only on the previous state.

Thus the decoder is given by two functions:

xn

= d(y,-r,. . .ryn+rii; F,>,

(4.2)

where Z belongs to some other finite alphabet 6?. Since the

set of states is finite, say ( 6 ( = v, we can label them so that

&I

= n (mod v).

From this we see that we require the decoder to be a

finite-block map satisfying

da’= u”d;

that is, we have the diagram of Fig. 4.

In designing a decoder, we try to make v as small as

possible, hopefully v = 1. This condition can always be

trivially achieved at the expense of increasing p and q by a

multiplicative factor v, but this would not count as an

improvement.

We would also like the encoder to be given by a finite

block map of (X, a) onto (Y, a), which would mean (X, u)

= (Y, a). This is usually not possible, so we must be

content with some weaker form of invertibility of the

decoding map, like right resolvability, which is sufficient

for constructing an encoding automaton.

V.

SUBSHIFTS OF FINITE TYPE

We single out a special class of subshifts which go under

a variety of names, two of which we shall use, the choice

depending on the mode of description. The term subshift of

finite type shall be used to designate a subshift (X, a) when

X is defined by specifying a finite list of forbidden finite

blocks which do not occur anywhere in the sequences of X.

Let T = (ti,) be an N X N matrix of zeros and ones,

which we call a transition matrix. A k-tuple (x, , . . . , xk) of

symbols xi E @, is called a T-admissible k-block if t,z,,,+I

- 1 for i= 1,s..

TX)

, k - 1. A two-sided infinite sequence

n ,,== is called T-admissible if t,., x,,+, = 1 for n E 2. Let

{T} denote the set of T-admissible sequences. We use the

term topological Markov shift to describe a subshift (X, u)

when X = {T}.

The first description tells what is forbidden, the second

what is allowed. Both definitions describe the same class of

dynamical systems: for one obtains a finite list of forbid-

den 2-blocks (i, j) from a transition matrix T whenever

ti, = 0. Conversely, if L is the length of the longest block

in the forbidden list, then a new alphabet can be chosen to

be the admissible (L - 1)-blocks. Tacking on the right

single symbols from the original alphabet in such a way as

to get admissible L-blocks defines a transition matrix be-

tween (L - I)-blocks which overlap in L - 2 places. The

system that results is isomorphic to the original one. For

this reason subshifts of finite type could also be called

(L - 1)-step Markov systems and topological Markov

shifts, l-step Markov.

A transition matrix T defines a directed graph, the

symbols are nodes and the transitions edges. If we label

edges with a new alphabet, then a new transition matrix

T[‘] is formed by specifying how the edges are connected.

The topological Markov shift ({T[*]}, a) is merely the

higher 2-block system of ({T}, u). Similarly we can form

still higher edge graphs to obtain all the higher block

systems ({T[“]}, a).

Using the notion of directed graph we can also define a

dynamical system (X, a) for arbitrary nonnegative integral

matrices T = (t,,) in the following manner. From i to j

draw t,, directed edges and label each with a distinct

symbol. Let us again use the notation T[*] to designate the

directed edge graph. Then T[*] is a zero-one transition

matrix, so we can form the dynamical system ({T12]}, u)

which serves as a definition of a topological Markov shift

given by a matrix in which appear positive integers larger

than one.

Sometimes we must deal with dynamical systems

({T}, up) involving a higher power of the shift. In order to

apply the results as they are expressed in Section VI we

must represent it as a first power and to this end we have

the following theorem.

Theorem 5.1: If the pth matrix power TIkl’ of the k th

higher edge graph T tkl for a transition matrix T is zero-one

(which is always the case for k = p), then ({T}, up} =

({TrklP}, a) with the conjugacy given by a canonical map

like in Section III. Alternatively if TP is not zero-one, then

its edge graph T P12] defined above is zero-one and

({T}, up) = ({TPrzl}, u).

A subshift which is a finite-to-one homomorphic image

of a topological Markov shift is called a sofic system. A

sofic system need not be a subshift of finite type (these

systems were studied in [9], [lo], [50]). However, a subshift

which is an isomorphic image of a subshift of finite type is

again a subshift of finite type. Sometimes, when the transi-

tion matrix specifying a topological Markov shift is large,

we can take advantage of the above fact by specifying the

10

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.

IT-29,

NO. 1, JANUARY

1983

system by an isomorphism (invertible k-block map) from a

system given by a much smaller transition matrix, thus

reducing the overall complexity of the description. Symbols

in sequences in the domain of the above isomorphisms are

sometimes called channel states and symbols in sequences

of the range, channel symbols. An isomorphism of a shift of

finite type is sometimes called a deterministic channel with

finite memory and a homomorphism with a sofic image

which is not a shift of finite type, a deterministic channel

with infinite memory. We remark that the relation of the

above terminology to that in engineering literature is a bit

blurred. For example, whether the word, channel, should

refer to a mapping, its image, or both is nebulous. We shall

not dwell further on these pedantic difficulties, except to

say that (W, a) was called a channel in Section IV because

the constraints on W are typically specified by defining it

as the homomorphic,image of a shift of finite type.

We introduce some useful term@ology with regard to

topological Markov shifts. We say j is a (T-admissible)

successor of i, or equivalently the transition i to j is

allowable (under T), and write i --) j, if

tij

= 1. We also say

in this case, i is a (T-admissible) predecessor of j. We

denote the successors of i by the set T(i) = {j,; . . JIr(i,I}.

The transpose matrix T* defines another transition matrix

in which the roles of predecessor and successor have been

interchanged. Observe that ({T*}, a) is isomorphic with

C(T), a-‘>.

Definition 5.1: T is said to be irreducible if for every

i, j E W there exists a positive integer n (depending on i, j)

such that

tl’J”)

> 0, i.e., there exists i,; * -,in-, E & such

that i = i, + i, - . . . + i,-, --) i, = j.

Definition 5.2: We shall call T aperiodic if there exists

n > 0 such that T” > 0, i.e.,

tl’/“)

> 0 for all i, j (n being

independent of i, j).

Definition 5.3: The greatest common divisor (gcd) of the

set {n: t(F) > 0, i E @, n = 1,2, . . * } of cycle lengths is

called thlperiod of T.

Theorem 5.2 [21, pp. 651, [97]: T is aperiodic if and

only if T is irreducible and has period 1.

Theorem 5.3 [ 4, p. 191, [ 24, p. 151: A topological Markov

shift ({T}, a) is nonwandering transitive if and only if T is

irreducible.

Theorem 5.4 [4, p. 191: A topological Markov shift

({T}, a) is aperiodic if and only if T is aperiodic.

Definition 5.4: A subalphabet a’

C

& under transitions

T’ inherited from T is called an irreducible component if

i) i E a- T(i)

C

&?’

ii) i, j E @’ * ji,; . .,in-, E @’ such that i = i, + i,

-3 . ..j.-

, + i, =j.

Theorem 5.5 [4, p. 211: If T(i) # 0 for every i E W,

then there exists an irreducible component.

The number N(n) of T-admissible n-blocks is given by

N(n) = $ t,‘.“,:‘),

n Z-2.

(5.1)

i,;=l "

It follows from the Perron-Frobenius theory of nonnega-

tive matrices that there exist positive constants a, b such

that

aX” 5 N(n) 5 bh”,

(5-2)

where A is the largest positive characteristic value (spectral

radius) of T. Thus from (3.1)

h({T}, u) = logh.

(5.3)

Let us address the problem of topological conjugacy.

Besides the topological entropy invariant some stronger

ones are known which are contained in the following

theorems.

Theorem 5.6 [31]: If ({T,}, a) and ({T,}, a) are two

equal entropy topological Markov shifts with the first a

factor of the second, then the block of the Jordan canoni-

cal form of T, with nonzero characteristic values is a

principal submatrix of that of T,.

Corollary 5.7 [40]: In Theorem 5.6 the characteristic

polynomial of T, divides that of T2 when the monomial

factors are deleted.

We also have an algebraic characterization of topological

conjugacy.

Theorem 5.8 [51]: ({T,}, a) = ({T2}, a) if and only if

there exists nonnegative integral rectangular matrices Ai, B,,

j= I,...

,n, for some n such that A,+,B,+, = BiAi, i =

1; * .)

n - 1, T, = A,B, and T2 = B,,A,.

Using Theorem 5.8 it is easy to construct matrices T,, T2

with the same Jordan canonical form such that ({T/21}, a)

5 4

* ({Ti[‘]}, a). For example T, = 1 1

= 5

2

i 1

( 1

and T,

2 1 . Assuming conjugacy it would follow from The-

orem 5.8 that there exists an integral 2 X 2 matrix S such

that ST, = T,S and det S = 1. However, an easy computa-

tion shows that 2 divides det S, a contradiction. Thus the

invariants presented above are not complete ones for topo-

logical conjugacy. In fact, the major unsolved problem in

symbolic dynamics is to give a finite procedure for de-

termining when two shifts of finite type are topologically

conjugate. Possibly there is none. Also unsolved is the

following conjecture which is still a far cry from a finite

procedure.

Conjecture 5.1 1511: ({T,}, a) = ({T2}, a) if and only if

there exists a positive integer 1 and nonnegative integral

matrices A, B such that AT, = T,A, T,B = BT,, T,’ = AB,

and Ti = BA.

The situation for determining finite equivalence or al-

most one-to-one finite equivalence is just the opposite. We

do have a finite procedure which comes down to checking

whether transition matrices have the same largest char-

acteristic value. The completeness of topological entropy

for finite equivalence and almost one-to-one finite equiva-

lence is revealed in the following theorems.

Theorem 5.9 [4], [45]: Let (X, u), (Y, a) be two non-

wandering transitive subshifts of finite type, i.e., their

transition matrices are irreducible. Then (X, a) - (Y, a) if

and only if h(X, a) = h(Y, a).

ADLER et d. : ALGORITHMS FOR SLIDING BLOCK CODES

(Lo)

A

P #

(Y.0)

(X.0)

Fig. 5. Finite equivalence diagram (same as Fig. 2, reproduced of con-

venience1.

Theorem 5.10 [4]: Let (X, a) and (Y, a) be two

aperiodic subshifts of finite type. Then (X, a) * (Y, a) if

and only if h(X, a) = h(Y, u). Furthermore in [4], [45]

methods are given for constructing the associated factor

maps which are depicted in Fig. 5.

In the constructions one of the factor maps is right

resolving and the other is left-one is free to choose which.

We usually draw the right and left resolving maps on the

corresponding side of the diagram.

The special case where (X, u) is the full N-shift and

h(Y, a) = h( X, a) = log N, N an integer, was treated in

[l]. Marcus (371 showed how to achieve an invertible map

cp, which is not always possible if (X, a) is not a full

N-shift. If we select a right resolving $, then from Marcus’

result we obtain a right resolving finite block map cp- ’ IJ,

the very thing needed to construct an encoder and decoder

of Section IV.

In applications we generally have h( W, a) > h( X, a), so

we must find a subsystem (Y, a)

C

(W, a) such that

h(Y, a) = h( X, a). This problem was not addressed by

Marcus, but it can be done by strengthening his result,

which is the main theorem of this work.

VI.

METHODOF SYMBOLSPLITTING

Main Theorem 6.1: Let (X, a) be the full N-shift for an

integer N > 2, i.e., X = {S} where S is an N X N matrix

all of whose entries are one. If T is an M

X

M irreducible

transition matrix such that h({T}, a) 2 h({S}, a) = log N,

then there exists by construction an irreducible transition

matrix ? with row sum N, an invertible left resolving

l-block factor map (isomorphism) ‘p of ({ ?}, u) onto a

subshift of finite type (Y, a)

C

({T}, a), and a right resolv-

ing 2-block factor map 4 of ({T}, a) onto ({S}, a). The

composition cp-‘$ is a right resolving factor map of (Y, a)

onto (X, a).

Proof: The plan is to construct from T a matrix with

row sum > N, then delete excess transitions, that is change

some entries from 1 to 0, in order to get a matrix with row

sum N.

We have by hypothesis that h((T}, u) 2 log N, so h 2 N

where A is the spectral radius of T. From the Perron-

Frobenius theory of nonnegative matrices [21], [47] there

exists a column vector 0 = (o~),<,,~ which we call an

approximate characteristic vector, satisfying

TV 2 NV,

v > 0,

(6.1)

Fig. 6. Coding scheme

where inequality here means componentwise inequality.

Since T has integer entries, in fact zeros and ones, we can

satisfy (6.1) with integers oui > 0 and furthermore with

gcd (0,) = 1. (See Appendix for a method of solving this

integer programming problem.)

If all u), = 1, then T itself is the sought after matrix; so

we assume max 0, > 1, which also implies min vi < max ui.

Let us call oi the weight of i.

Consider the set T(i) = {j,, . . . ,jlTCrj,} of successors of i.

For each i, 1 5 i 5 M, choose a disjoint,partition (Y = (Y, =

{A,,. . .

,Aia,} of T(i) where the following conditions hold

x II,=O (modN),

1 ~k<lal -1, (6.2)

iEAk

k- ’

vi - 2 2 v,/N 10.

(6.3)

k=l

~Ez‘f~

We dispense with the subscript on (Y when it is clear from

context. Usually we pick partitions of T(i) with largest

possible 1 (Y 1 , but sometimes we must settle for 1 (Y I= 1.

Nevertheless, we show there exists an index i,, for which

namely, take i, for which o10 = max o, and T(i,) contains

an index, say j,,

such that vJ, < v,~. Such an i, exists; for

otherwise T would be reducible because the indices of

maximum weight would “circulate” only among them-

selves. We are free to order symbols so that i, = 1. From

(6.1) follows

v,IT(l)l> 2 v$Nv,

/EVI)

from which we conclude 1 T(1) I> N. Consider next the

following sums modulo N:

V

II

v,o/, + vJ2

vj, + . * . +f&.

Either there are N distinct values and one of them is 0

(mod N), or two repeat, in which case their difference

Y,> + ’

+ . . .

+~,~=O(modN)wherel_(p<q5N.Thus

we can find a nonempty subset A, CJ T(l), ] A, IS N <

1 T(lf] , such that

and

2 vJ=O (modN)

I-f,

1 v,<Nv,,

J’EA,

12

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL.

IT-29,

NO. 1, JANUARY

1983

equivalently

as follows:

v, > 2 vi/N.

J-I

lal- ’ k--

I$ = v, - x v;i = v, - 2 2 vj,‘N

k=l k==l

J’A,

This last inequality holds because state one is heaviest and

either (A, I< N, or IA, I= N andj, E A,,j, being lighter

than state 1.

k--l

Next we “split” i into new symbols i’, . ,ilall , which we

call offspring of i. With respect to the alphabet e’ = {ik:

1 < k 5 1 (Y~ 1, 1 5 i I M} of new symbols ordered in some

fashion, we obtain a new transition matrix T’ specified by

the transitions

Nv, - 2 2 ‘j

k=l jEA,

x vj - JET;-Al lvj

JET(i)

u

LIZI

N

5

N

ik --f

j',

j E A, C T(i),

wherej’ are offspring ofj, i.e., T’(ik) = {j’: j E Ak}.

We define a l-block map cp of {T’} to {T} by

cpik = i.

This map is obviously onto. Suppose i is a predecessor ofj

under transitions of T. Then, because no elements of (Y,

overlap, there is a unique offspring ik of i such that T’( ik)

contains the offspring of j. This fact establishes that ‘p is

left resolving and that every 2-block (i, j) is a resolvable

one. Consequently by Definition 3.2, ‘p is invertible, in

other words, an isomorphism.

We form an approximate characteristic subvector v’ with

components vi*, 1 5 k I) ai 1, 1 5 i 5 M, for T’ as fol-

lows:

= 2

v,/N = 2 v;,/N.

J’EAI.1

j’E T’( A”l)

Next we take the subalphabet e” = {i” E (I?‘, vjk > 0}

along with transitions which we denote by T”, that are

inherited from T’ by deleting transitions ik + j’ whenever

v:k or I$ = 0, in other words, by crossing out rows and

columns of T’ corresponding to components of o’ that

vanish. From (6.10), (6.11) we see that T”(ik) # 0, ik E

e”. So by Definition 5.4 there eg-sts an irreducible compo-

nent, i.e., a further subal