ArticlePDF Available

Algorithms for sliding block codes---An application of symbolic dynamics to information theory



Ideas which have origins in C. E. Shannon's work in information theory have arisen independently in a mathematical discipline called symbolic dynamics. These ideas have been refined and developed in recent years to a point where they yield general algorithms for constructing practical coding schemes with engineering applications. In this work we prove an extension of a coding theorem of B. Marcus and trace a line of mathematics from abstract topological dynamics to concrete logic network diagrams.
Algorithms for Sliding Block Codes
An Application of Symbolic Dynamics to Information Theory
Abstract--Ideas which have origins in Shannons work in information
theory have arisen independently in a mathematical discipline called sym-
bolic dynamics. These ideas have been refined and developed in recent
years to a point where they yield general algorithms for constructing
practical coding schemes with engineering applications. In this work we
prove an extension of a coding theorem of Marcus and trace a line of
mathematics from abstract topological dynamics to concrete logic network
A. The Problem
E ADDRESS the problem of encoding and decod-
ing digital data from one type of constraint to
another by means of finite state automata. The data are
long strings of symbols from a finite alphabet, usually
zeros and ones or blocks of them. In this paper, we
consider encoding arbitrary sequences of zeros and ones
into a constrained format dictated by the data processor.
The constraints may be due to physical limitations of a
transmission or storage system or artificial limitations dic-
tated by data processing procedures.
B. The Model
The appropriate mathematical models for dealing with
the problem are symbolic dynamical systems, i.e., spaces,
invariant under the shift transformation, of two-sided in-
finite sequences of symbols from finite alphabets. The term
dynamical system is due to the fact that such spaces are
composed of discrete time orbits, each orbit consisting of a
succession of shifted sequences.
Practical encoders and decoders have short finite mem-
ories but strings they process are so long as to seem
infinite. The proposed model is suitable to the problem
i) the constraints are time independent (shift invariant);
ii) encoders and decoders can be constructed from map-
pings between systems which commute with some
power of the shift (sliding block codes) [6], [25].
Manuscript received March IO, 1982; revised July 1, 1982. This work
was supported &part by NSF Grant MCS81-07092. This work was
presented in part at the IEEE International Symposium on Information
Theory, Santa Monica, CA, February 1981.
The authors are with IBM Thomas J. Watson Research Center, York-
town Heights, NY 10598.
Constraints encountered in practice are standard ones in
symbolic dynamics. Of particular importance are those
specified by a finite list of forbidden blocks of symbols,
e.g., upper and lower bounds on run lengths of zeros and
ones (Sec. VII), [14], [15], [ 161, [29], [33], [49]. In symbolic
dynamics such systems are called shifts of finite type or
topological Markov shifts [4], [51]. More complex con-
straints are also important such as ones involving the
power spectrum of symbol sequences, e.g., no dc compo-
nent in a signal representing the symbol sequence [ 14],[22],
[30], [32], [34], [41], [46]. These can be described as outputs
of a finite state automaton whose inputs are shifts of finite
type. In symbolic dynamics such systems are called sofic
(from the Hebrew word for finite) [50]. In engineering
contexts, the above constraints have been described by the
notion of a channel: namely, shifts of finite type are
deterministic finite state channels with finite memory, and
sofic systems are deterministic finite state channels with
infinite memory. Some areas where these constraints are
met are magnetic recording, fiber optics, and data proto-
cols in communication networks.
C. Shannon Theory
Suppose we wish to encode in a decodable way every
sequence (. . . x- ,, x0, x, * .
f ) of a system X satisfying one
set of constraints into another system Y of sequences
(. . Y-l, Yo, Yl *.
. ) satisfying another. Each component
x,, y,, may itself consist of a finite block of symbols, say of
length p and q, respectively, in which case we say that the
coding rate r = p/q. The concept of topological entropy
governs when this is possible. Topological entropy is defined
as the exponential growth rate, as n --) co, of the number of
different strings of length n appearing in the infinite se-
quences of a symbolic system. The term topological is
used to distinguish this entropy from its probabilistic coun-
terpart. It was defined in purely topological terms in [3].
In the present context where output symbols are of equal
duration, Shannons noiseless coding theorem [48, p. 281
amounts to the following obvious statement: coding of
arbitrarily long finite strings is possible when the topologi-
cal entropy of X is less than that of Y and impossible when
the inequality is reversed, the case of equality being left
unresolved. Shannon called the system Y, a channel, and its
topological entropy, the channel capacity. He called the
system X, the source, and endowed it with a probabilistic
0018-9448,83,0100-0005$01.00 01983 IEEE
entropy (topological entropy maximizes probabilistic ent-
ropy supported by the source [ 121, [23]). The full content of
Shannons theorem applies to the situation where the topo-
logical entropy of the source is greater than that of the
channel but its probabilistic entropy is less.
We sharpen Shannons theorem for the special case
where the source entropy is the topological entropy to
show that coding is possible even in the case of equality. In
addition, the method of proof provides efficient sequential
encoding and decoding algorithms which do not depend on
the length of strings processed, a feature absent from
Shannons original theorem. We treat the class of shifts of
finite type (channels with finite memory) and leave the
more general case of sofic systems (channels with infinite
memory) to subsequent work. Actually, we deal with the
case where the topological entropy of the source is the
logarithm of an integer, the most common one in applica-
tions. The more general case where it is the logarithm of an
algebraic integer can be handled by a slight extension of
the tableaux method of [4], but then certain desirable
error propagation properties usually must be forgone.
and clarify the relationship between the two methods.
Franaszeks ideas are very interesting and maybe lead to
simpler implementations. From a mathematicians point of
view these works [ 191, [20] leave something to be desired-
namely, precise statements on the scope of the method
along with complete proofs. A. Lempel and M. Cohn [35]
work some examples by Franaszeks method, but leave the
same mathematical questions unsettled.
The main results of our work were presented at the IEEE
International Symposium on Information Theory, Santa
Monica, CA (Feb. 1981) [2]. The theme, which is the.
application of recent developments in symbolic dynamics
to coding problems in information theory, was suggested
by M. Hassner in [26].
This paper is written for two worlds, engineering and
mathematics, at the risk of satisfying neither. It runs the
gamut from the sublimely abstract to the hard-nose con-
crete. We start with notions of sets, mappings, and topol-
ogy in Section II, supplant these with combinatorial ideas
by Section V, and finish at the end of Section IX with logic
circuit diagrams. The complete trip is hardly needed for
constructing codes, but it is useful in organizing the flow of
ideas and bringing order to the subject. For the less
mathematically minded, interested only in making codes
for some practical purpose, we suggest concentrating on
the description of the symbol splitting process of Section
VI (not the proof), the example of Section VIII, and the
implementation of it in Section IX. Following the pattern
there one should be able to construct encoders and de-
coders for any shift of finite type constraint. The method
can be extended to cover sofic systems arising from the
aforementioned spectral constraints, but this has not yet
appeared in print 1381 and its applicability is not fully
assessed. The excessive number of tables included in Sec-
tions VIII and IX are there to indicate the labor involved
in constructing codes.
Let X be a compact metric space and u a homeomor-
phism-i.e., continuous one-to-one map-of X onto itself.
We call the pair (X, a) an abstract dynamical system. For a
comprehensive treatment of such systems see [ 111. If Xis a
closed u-invariant subset of X then the system (X, a) is
called a subsystem of (X, a), and we write (X, u)
(X, a).
We define the orbit, future orbit, and past orbit of a point x
by the respective sequences orb x 3 {a~}, EZ, orb+ x =
{u*x},~a, orb-x G {ux},<~. In order to economize on
notation we shall always use the following convention for
metrics. We denote the distance between two points x, y
by ] x - y ] even though subtraction and absolute value
may not be defined.
Two orbits, orb x and orb y, are called positively (negu-
tively) asymptotic if 1 ux - uy I+ 0, n ---, co (n + - ~9).
We have the following indecomposability conditions for
a dynamical system and its higher iterates. A system is
called nonwandering transitive if for every pair of neighbor-
hoods there is a point in the first whose future orbit hits
the second. A system is called aperiodic if (X, a) is
nonwandering transitive for all n.
Let (X, u), (Y, 7) be two abstract dynamical systems. A
continuous map cp of Y onto X such that cp o u = r o r,o is
called a topological homomorphism. If such a map exists we
have the following commutative diagram: (Fig. 1).
For the paper as a whole, we assume knowledge of the
Perron-Frobenius theory of nonnegative matrices [21], 1471
and some basic elements of symbolic dynamics which can
be found in [4], [8], [ 111, [27], [28].
We present only one proof, that of the main theorem in
Section VI. All others are standard and easy ones from
symbolic dynamics. These are stated without proof. Where
possible, references are cited in which proofs can be found;
otherwise they should be treated as exercises. Actually
Sections II-V is to be regarded as a survey.
Methods for doing noiseless coding have also been de-
veloped by P. Franaszek [14]-[ 191. In [19], [20] he gives a
general one which is different from ours yet intriguingly
based on the same inequality (6.1) from the Perron-
Frobenius theorv. We hone somedav to return to this tonic
_I I
respectively. We represent this situation as in Fig. 2.
We also refer to p as a factor map and call (X, r) a
factor of (Y, a) and (Y, a) an extension of (X, 7). If, in
addition, cp is one-to-one (hence invertible, q- being con-
tinuous by compactness) we call it an isomorphism and say
that (X, a) is topologically conjugate to (Y, r) and write
(X, a) = (Y, r). Nonwandering transitivity and aperiodic-
ity are preserved under isomorphism. Topological con-
jugacy is the strongest sense of equivalence of dynamical
systems from the purely topological point of view and too
strong for many practical applications (see Remark 8.1).
Consequently, we introduce a weaker one.
Definition 2.1 (Parry 1451): We say (X, a) and (Y, r)
are finitely equivalent, and write (X, a) - (Y, r), if there
exists a common extension (Z, p) and boundedly finite-to-
one factor maps (p, II, of (Z, p) onto (X, a) and (Y, r),
For coding applications we need a certain kind of invert-
ibility condition for factor maps which for abstract systems
is expressed by the following.
Fig. 1.
Commutative diagram of topological homomorphism.
Definition 2.4 (Kitchens [31 I): A factor map cp of (Y, a)
onto (X, r) is called right, left, or two-sided closing if
CJJX # py whenever x # y and orb x, orb y are, respec-
tively, negatively, positively or both negatively and posi-
tively asymptotic.
p $J
/ \
Fig. 2. Finite equivalence diagram.
Finite equivalence can be slightly strengthened, which is
done in the next definition, and still be weaker than
topological conjugacy.
Definition 2.2 (see [ 41, [24]): We say that (X, a) and
(Y, r) are almost one-to-one finitely equivalent and write
(X, u) 2 (Y, r) if, in addition to being finitely equivalent,
the factor maps are one-to-one except on nondoubly tran-
sitive points. A point x is said to be nondoubly transitive if
either orb+ x or orb- x fails to be dense in its dynamical
system, otherwise it is called doubly transitive.
Theorem 2.1 [ 4, p. IO]: = , - , and 5 are equivalence
relations and
(X,u) -(Y,r) -(X,u)(Y,r) *(X,u)-(Y,r).
We remark that the set of doubly transitive points is an
invariant subset of a dynamical system and that factor
maps in Definition 2.2 are isomorphisms between subsys-
tems which, however, are not compact.
Definition 2.3 (Bowen-Dinaburg [7], [12]): The topo-
logical entropy h( X, a) for abstract dynamical systems is
defined as the largest growth rate possible, as n -+ co, of
the number of c-separated orbits of length n, i.e.,
u) =
sup lim ilogN(e, n),
rao n+cc
where N(c, n) denotes the number of e-separated orbits of
length n. Two orbits of length n{uk~}05k5n--l and
{uky}aSkcn-, are said to be r-separated if ] ukx - uky 12 E
>Oforsomek,OSkIn- 1.
An easy consequence of the definition is that, if (X, u) is
a factor of (Y, r), then h( X, a) 5 h(Y, r). Also, if (X, u)
is a subsystem of (X,
then h(X, u) 5 h(X,
We have
Theorem 2.2.
Theorem 2.2 [4, p. 91: If (X, a) is a finite-to-one factor
of (Y, r) then h( X, a) = h( Y, r).
Corollary 2.3: If (X, u) = (Y, r) then h( X, u) = h( Y, r).
Thus topological entropy is an invariant for all three
equivalence relations. We also have the next theorem.
Theorem 2.4 [3, p. 3211: h(X, a) =I n ] -h(X, a), n E
Let A be an alphabet (sometimes called a state space), by
which we mean a finite set of symbols (also called states)
with an ordering. We denote the cardinality of a set A by
I A ] . Examples of alphabets are & = (0, 1 }, @ =
{(O,O), (0, l), (l,O), (1, l)}, etc. We freely abuse notation by
using indexing symbols to represent interchangeably both
an element of an alphabet and its ordinal number. Which it
should be clear from context. This sloppiness is often
compounded by the fact that numbers also appear as
alphabet symbols and that a numerical symbol may not
coincide with its ordinal number. The advantage of incon-
sistency here is that it keeps notation to a minimum.
As is customary, & denotes the set of two-sided infinite
sequences of elements of &. The space @can be endowed
with a metric, the distance between sequences x = {x~}~~=,
/21 where I x, -
y, 1 is defined to be one when x, # y,
and zero otherwise. In this metric the more two sequences
consecutively agree the closer they are, and we have a
neighborhood basis which consists of the family of sets,
called cylinder sets, of the form {x = ( . . . x-,, x0,
XI,. .
.): (x,+,; . .,xntk) = (a,;. .,a,)} where
(a,,* . -,
ak) is some fixed k-tuple of symbols of & In this
topology @is compact.
We define the shift transformation u of Qz onto itself by
(ux), = x,+1
for x E &?, n E Z. In the above metric u is a
homeomorphism and we form the dynamical system (&=, a)
which is called the full N-shift where N = ] 6? I . Any subsys-
tem (X, (7)
u) is called a subshift. We use symbols
@, 3, e to denote alphabets. Occasionally we use @x to
denote the alphabet of a dynamical system (X, a) which in
the above is a subset of &. Any finite n-tuple (a,; . .,a,)
which appears in any sequence of X is called an admissible
n-block. The topological entropy of a subshift is given by
= lim l/nlog N(n),
where N(n) is the number of admissible n-blocks. We
observe that
h(@=,u) =logI&l
and h( X, a) 5 log ] @] for (X, a)
(gz, a). The reader
can regard (3.1) as a definition [44] although it is an easy
exercise to derive it from Definition 2.3.
We introduce the notation x 1: = (xm, x,+,; . .,x,) for
a sub-(n - m + 1)-tuple of an n-tuple or sequence x. Given
a subshift (X, a) we can form a subshift (Xrnl, a), called
the higher n-block system of (X, a), where ( XrH1, a) consists
of sequences (. . . ,x 1!T2, x I:-, x 17, . . . ), where x E X.
The higher n-block system (XL], a) is canonicahy isomor-
phic to (X, a) under the correspondence
(-x(,,x);;-,x1;) ++(~-X-,,Xo,X,,-~)
(Xt], a) is a subshift of the full shift based on an alphabet
of symbols consisting of all admissible n-blocks of X.
Definition 3.1: Let (X, a), (Y, u) be subshifts of two full
shifts (P, a), (%Z,
a), respectively. A mapping p of (Y, a)
onto (X, a) is called a k-block map requiring memory I and
anticipation m, if there exists a function cp: gk --) % such
that if {xn} = cp{y,} then
x, = dY,-6. hYn-t,>>
We use here the following abuse of notation. The same
symbol p is used to denote a mapping defined on se-
quences and the component function of several variables
by which it is specified. What is meant will always be clear
from context, hopefully.
In dynamical systems only the finiteness of k, not its
size, is important. We can often take advantage of the
conceptual simplification of regarding a k-block map as a
l-block map on a system with a larger alphabet. This is
done by replacing a k-block map cp by the l-block map
(~9~ where 8 is the canonical isomorphism between (Y, a)
and ( Ytkl, a). (See Fig. 3.)
However, for construction of encoders and decoders in
engineering applications it is important to have k and the
alphabet size as small as possible; so the above artifice is of
no advantage from this point of view.
Theorem 3.1 [ 11, p. 31: An onto mapping cp between
subshifts is a homomorphism if and only if it is a k-block
If a k-block map QI between subshifts is invertible, then
its inverse p-l is also a k-block map, perhaps with a
different k. We define a weaker form of invertibility.
Definition 3.2: Let p be as in Definition 3.1 with
cp({Yn)) = {-%I>. p
is said to be right resolving with parame-
ters p, q, r of memory and anticipation if each y,, is uniquely
determined from the upcoming x,-~,. . .,x,,+~ and preced-
ingy,-,; . .,ynpl-
in other words, there exists a function
ar X @p+q+ + % such that
Y, =f(Yn-r,-,Yn-l; X,-p,-,X,+q).
A similar definition is given for
resolving by replacing
{xn}, {y,,} by {x-n}, {y-,} in the above definition.
We remark that Definition 3.2 is what Definition 2.4
becomes in the context of subshifts. Kitchens [31] used the
term right (left) closing here. In [4] the definition of right
and left resolving covered only the cases where r = 1,
p = q = 0. The concept of right resolving is not new in
information theory. It was called unifilar by McMillan [36,
p. 2161. We could also give the definition of two-sided
resolving [4] which is what two-sided closing becomes for
symbolic systems, but that will not be needed in the
present work. Suffice it to say that the importance of the
Fig. 3.
Equivalence of k-block to the l-block map.
concept of resolvability is put into evidence by the follow-
ing theorem.
Theorem 3.2 [ 4, p. 231: A homomorphism is finite-to-one
if it is right or left resolving. (Right or left resolving imply
two-sided resolving but not conversely. For subshifts of
finite type which are defined in Section V finite-to-one
implies two-sided resolving.)
Closely associated with the concept of right and left
resolving is the notion of resolving block. Such blocks serve
as a means for resetting an encoding automaton con-
structed from a right resolving map. Let p in Definition 3.1
be a l-block map, i.e., x, = cp( y,).
Definition 3.3 [ 11, [ 41: An X-admissible m-block
(a,,. * .,
a,) is called a resolving block if there exists an
index i,, 1 I i, I m,
such that if (y,, . * . ,y,) and
(Y;,. * *
,yA) are two Y-admissible m-blocks such that p( yi)
= p( y,) = a,, 1 I i 5
then y,, = y:,. In other words,
the block
(a,; * -,
determines a unique preimage in the
i,th coordinate.
Remark 3.1: If QI is also right resolving with r = 1,
p = q = 0 and x I? = (a,;. .,a,) then the sequence y 1:
can be uniquely determined from x 1;whenever x = p(y).
Furthermore, if there are so many resolving blocks that in
every k-block there exists at least one, then p is invertible,
in fact, p-l can be seen to be a k-block map.
Let (X, a) and (W, a) be two subshifts with alphabets @
and 3, respectively. In the vocabulary of engineering let us
call (X, u) the source and (W, a) the channel. Usually @
and $8 consist, respectively, of admissible p-blocks and
q-blocks of 0 and 1 for some fixed p and q. Furthermore,
the source usually consists of unconstrained sequences and
the channel of constrained ones. Thus 6! is the set of all 2P
p-blocks whereas $B is some subset of q-blocks.
Our problem is to construct two finite state automata:
an encoder which converts source sequences {xn} E X to
channel sequences {y,} E W, and a decoder which recovers
{xn} from {y,}. The coding rate is r = p/q (p source
symbols per q channel symbols) and we want this as large
as feasible. At the same time we want q and p small to
minimize complexity.
A finite state automaton, say the encoder, is given by two
Y, = 4x,-,,-. *,x,+,, tn),
2, = f(&-[,. ,X,tm, z,-J,
where z, belongs to some finite alphabet (J?, called the
internal states of the automaton. The elements y, are called
the output and x,,-,, . . *,x,+,, the inputs, with I, m param-
Fig. 4. Commutative diagram of decoding map.
eters of memory and anticipation. The function e is called
the output function and f the next state function. By sub-
stitution, y, is a function of x,-,;..,x,+~, I,-,. The
numbering of variables in (4.1) may be slightly off from
standard usage. We adopt the present one to conform to
our notation of subsequent constructions.
The output sequences {y,} belong to a subsystem (Y, a)
(W, a). By virtue of (4.1) this subsystem is a factor of a
system of ((3 X e),
a), which in turn is a factor of a
subsystem of ((a X e), a). A single error in the input will
possibly propagate forever in the output. We take the point
of view that the source is error free. However, errors may
occur in the channel, and we want to limit the range of
their propagation in the decoder. In order to do this, we
should also make the decoder a finite state automaton, but
one in which the internal state at a particular time does not
depend on the input, but only on the previous state.
Thus the decoder is given by two functions:
= d(y,-r,. . .ryn+rii; F,>,
where Z belongs to some other finite alphabet 6?. Since the
set of states is finite, say ( 6 ( = v, we can label them so that
= n (mod v).
From this we see that we require the decoder to be a
finite-block map satisfying
da= ud;
that is, we have the diagram of Fig. 4.
In designing a decoder, we try to make v as small as
possible, hopefully v = 1. This condition can always be
trivially achieved at the expense of increasing p and q by a
multiplicative factor v, but this would not count as an
We would also like the encoder to be given by a finite
block map of (X, a) onto (Y, a), which would mean (X, u)
= (Y, a). This is usually not possible, so we must be
content with some weaker form of invertibility of the
decoding map, like right resolvability, which is sufficient
for constructing an encoding automaton.
We single out a special class of subshifts which go under
a variety of names, two of which we shall use, the choice
depending on the mode of description. The term subshift of
finite type shall be used to designate a subshift (X, a) when
X is defined by specifying a finite list of forbidden finite
blocks which do not occur anywhere in the sequences of X.
Let T = (ti,) be an N X N matrix of zeros and ones,
which we call a transition matrix. A k-tuple (x, , . . . , xk) of
symbols xi E @, is called a T-admissible k-block if t,z,,,+I
- 1 for i= 1,s..
, k - 1. A two-sided infinite sequence
n ,,== is called T-admissible if t,., x,,+, = 1 for n E 2. Let
{T} denote the set of T-admissible sequences. We use the
term topological Markov shift to describe a subshift (X, u)
when X = {T}.
The first description tells what is forbidden, the second
what is allowed. Both definitions describe the same class of
dynamical systems: for one obtains a finite list of forbid-
den 2-blocks (i, j) from a transition matrix T whenever
ti, = 0. Conversely, if L is the length of the longest block
in the forbidden list, then a new alphabet can be chosen to
be the admissible (L - 1)-blocks. Tacking on the right
single symbols from the original alphabet in such a way as
to get admissible L-blocks defines a transition matrix be-
tween (L - I)-blocks which overlap in L - 2 places. The
system that results is isomorphic to the original one. For
this reason subshifts of finite type could also be called
(L - 1)-step Markov systems and topological Markov
shifts, l-step Markov.
A transition matrix T defines a directed graph, the
symbols are nodes and the transitions edges. If we label
edges with a new alphabet, then a new transition matrix
T[] is formed by specifying how the edges are connected.
The topological Markov shift ({T[*]}, a) is merely the
higher 2-block system of ({T}, u). Similarly we can form
still higher edge graphs to obtain all the higher block
systems ({T[]}, a).
Using the notion of directed graph we can also define a
dynamical system (X, a) for arbitrary nonnegative integral
matrices T = (t,,) in the following manner. From i to j
draw t,, directed edges and label each with a distinct
symbol. Let us again use the notation T[*] to designate the
directed edge graph. Then T[*] is a zero-one transition
matrix, so we can form the dynamical system ({T12]}, u)
which serves as a definition of a topological Markov shift
given by a matrix in which appear positive integers larger
than one.
Sometimes we must deal with dynamical systems
({T}, up) involving a higher power of the shift. In order to
apply the results as they are expressed in Section VI we
must represent it as a first power and to this end we have
the following theorem.
Theorem 5.1: If the pth matrix power TIkl of the k th
higher edge graph T tkl for a transition matrix T is zero-one
(which is always the case for k = p), then ({T}, up} =
({TrklP}, a) with the conjugacy given by a canonical map
like in Section III. Alternatively if TP is not zero-one, then
its edge graph T P12] defined above is zero-one and
({T}, up) = ({TPrzl}, u).
A subshift which is a finite-to-one homomorphic image
of a topological Markov shift is called a sofic system. A
sofic system need not be a subshift of finite type (these
systems were studied in [9], [lo], [50]). However, a subshift
which is an isomorphic image of a subshift of finite type is
again a subshift of finite type. Sometimes, when the transi-
tion matrix specifying a topological Markov shift is large,
we can take advantage of the above fact by specifying the
system by an isomorphism (invertible k-block map) from a
system given by a much smaller transition matrix, thus
reducing the overall complexity of the description. Symbols
in sequences in the domain of the above isomorphisms are
sometimes called channel states and symbols in sequences
of the range, channel symbols. An isomorphism of a shift of
finite type is sometimes called a deterministic channel with
finite memory and a homomorphism with a sofic image
which is not a shift of finite type, a deterministic channel
with infinite memory. We remark that the relation of the
above terminology to that in engineering literature is a bit
blurred. For example, whether the word, channel, should
refer to a mapping, its image, or both is nebulous. We shall
not dwell further on these pedantic difficulties, except to
say that (W, a) was called a channel in Section IV because
the constraints on W are typically specified by defining it
as the homomorphic,image of a shift of finite type.
We introduce some useful term@ology with regard to
topological Markov shifts. We say j is a (T-admissible)
successor of i, or equivalently the transition i to j is
allowable (under T), and write i --) j, if
= 1. We also say
in this case, i is a (T-admissible) predecessor of j. We
denote the successors of i by the set T(i) = {j,; . . JIr(i,I}.
The transpose matrix T* defines another transition matrix
in which the roles of predecessor and successor have been
interchanged. Observe that ({T*}, a) is isomorphic with
C(T), a->.
Definition 5.1: T is said to be irreducible if for every
i, j E W there exists a positive integer n (depending on i, j)
such that
> 0, i.e., there exists i,; * -,in-, E & such
that i = i, + i, - . . . + i,-, --) i, = j.
Definition 5.2: We shall call T aperiodic if there exists
n > 0 such that T > 0, i.e.,
> 0 for all i, j (n being
independent of i, j).
Definition 5.3: The greatest common divisor (gcd) of the
set {n: t(F) > 0, i E @, n = 1,2, . . * } of cycle lengths is
called thlperiod of T.
Theorem 5.2 [21, pp. 651, [97]: T is aperiodic if and
only if T is irreducible and has period 1.
Theorem 5.3 [ 4, p. 191, [ 24, p. 151: A topological Markov
shift ({T}, a) is nonwandering transitive if and only if T is
Theorem 5.4 [4, p. 191: A topological Markov shift
({T}, a) is aperiodic if and only if T is aperiodic.
Definition 5.4: A subalphabet a
& under transitions
T inherited from T is called an irreducible component if
i) i E a- T(i)
ii) i, j E @* ji,; . .,in-, E @such that i = i, + i,
-3 . ..j.-
, + i, =j.
Theorem 5.5 [4, p. 211: If T(i) # 0 for every i E W,
then there exists an irreducible component.
The number N(n) of T-admissible n-blocks is given by
N(n) = $ t,.,:),
n Z-2.
i,;=l "
It follows from the Perron-Frobenius theory of nonnega-
tive matrices that there exist positive constants a, b such
aX5 N(n) 5 bh,
where A is the largest positive characteristic value (spectral
radius) of T. Thus from (3.1)
h({T}, u) = logh.
Let us address the problem of topological conjugacy.
Besides the topological entropy invariant some stronger
ones are known which are contained in the following
Theorem 5.6 [31]: If ({T,}, a) and ({T,}, a) are two
equal entropy topological Markov shifts with the first a
factor of the second, then the block of the Jordan canoni-
cal form of T, with nonzero characteristic values is a
principal submatrix of that of T,.
Corollary 5.7 [40]: In Theorem 5.6 the characteristic
polynomial of T, divides that of T2 when the monomial
factors are deleted.
We also have an algebraic characterization of topological
Theorem 5.8 [51]: ({T,}, a) = ({T2}, a) if and only if
there exists nonnegative integral rectangular matrices Ai, B,,
j= I,...
,n, for some n such that A,+,B,+, = BiAi, i =
1; * .)
n - 1, T, = A,B, and T2 = B,,A,.
Using Theorem 5.8 it is easy to construct matrices T,, T2
with the same Jordan canonical form such that ({T/21}, a)
5 4
* ({Ti[]}, a). For example T, = 1 1
= 5
i 1
( 1
and T,
2 1 . Assuming conjugacy it would follow from The-
orem 5.8 that there exists an integral 2 X 2 matrix S such
that ST, = T,S and det S = 1. However, an easy computa-
tion shows that 2 divides det S, a contradiction. Thus the
invariants presented above are not complete ones for topo-
logical conjugacy. In fact, the major unsolved problem in
symbolic dynamics is to give a finite procedure for de-
termining when two shifts of finite type are topologically
conjugate. Possibly there is none. Also unsolved is the
following conjecture which is still a far cry from a finite
Conjecture 5.1 1511: ({T,}, a) = ({T2}, a) if and only if
there exists a positive integer 1 and nonnegative integral
matrices A, B such that AT, = T,A, T,B = BT,, T,= AB,
and Ti = BA.
The situation for determining finite equivalence or al-
most one-to-one finite equivalence is just the opposite. We
do have a finite procedure which comes down to checking
whether transition matrices have the same largest char-
acteristic value. The completeness of topological entropy
for finite equivalence and almost one-to-one finite equiva-
lence is revealed in the following theorems.
Theorem 5.9 [4], [45]: Let (X, u), (Y, a) be two non-
wandering transitive subshifts of finite type, i.e., their
transition matrices are irreducible. Then (X, a) - (Y, a) if
and only if h(X, a) = h(Y, a).
P #
Fig. 5. Finite equivalence diagram (same as Fig. 2, reproduced of con-
Theorem 5.10 [4]: Let (X, a) and (Y, a) be two
aperiodic subshifts of finite type. Then (X, a) * (Y, a) if
and only if h(X, a) = h(Y, u). Furthermore in [4], [45]
methods are given for constructing the associated factor
maps which are depicted in Fig. 5.
In the constructions one of the factor maps is right
resolving and the other is left-one is free to choose which.
We usually draw the right and left resolving maps on the
corresponding side of the diagram.
The special case where (X, u) is the full N-shift and
h(Y, a) = h( X, a) = log N, N an integer, was treated in
[l]. Marcus (371 showed how to achieve an invertible map
cp, which is not always possible if (X, a) is not a full
N-shift. If we select a right resolving $, then from Marcus
result we obtain a right resolving finite block map cp- IJ,
the very thing needed to construct an encoder and decoder
of Section IV.
In applications we generally have h( W, a) > h( X, a), so
we must find a subsystem (Y, a)
(W, a) such that
h(Y, a) = h( X, a). This problem was not addressed by
Marcus, but it can be done by strengthening his result,
which is the main theorem of this work.
Main Theorem 6.1: Let (X, a) be the full N-shift for an
integer N > 2, i.e., X = {S} where S is an N X N matrix
all of whose entries are one. If T is an M
M irreducible
transition matrix such that h({T}, a) 2 h({S}, a) = log N,
then there exists by construction an irreducible transition
matrix ? with row sum N, an invertible left resolving
l-block factor map (isomorphism) p of ({ ?}, u) onto a
subshift of finite type (Y, a)
({T}, a), and a right resolv-
ing 2-block factor map 4 of ({T}, a) onto ({S}, a). The
composition cp-$ is a right resolving factor map of (Y, a)
onto (X, a).
Proof: The plan is to construct from T a matrix with
row sum > N, then delete excess transitions, that is change
some entries from 1 to 0, in order to get a matrix with row
sum N.
We have by hypothesis that h((T}, u) 2 log N, so h 2 N
where A is the spectral radius of T. From the Perron-
Frobenius theory of nonnegative matrices [21], [47] there
exists a column vector 0 = (o~),<,,~ which we call an
approximate characteristic vector, satisfying
TV 2 NV,
v > 0,
Fig. 6. Coding scheme
where inequality here means componentwise inequality.
Since T has integer entries, in fact zeros and ones, we can
satisfy (6.1) with integers oui > 0 and furthermore with
gcd (0,) = 1. (See Appendix for a method of solving this
integer programming problem.)
If all u), = 1, then T itself is the sought after matrix; so
we assume max 0, > 1, which also implies min vi < max ui.
Let us call oi the weight of i.
Consider the set T(i) = {j,, . . . ,jlTCrj,} of successors of i.
For each i, 1 5 i 5 M, choose a disjoint,partition (Y = (Y, =
{A,,. . .
,Aia,} of T(i) where the following conditions hold
x II,=O (modN),
1 ~k<lal -1, (6.2)
vi - 2 2 v,/N 10.
We dispense with the subscript on (Y when it is clear from
context. Usually we pick partitions of T(i) with largest
possible 1 (Y 1 , but sometimes we must settle for 1 (Y I= 1.
Nevertheless, we show there exists an index i,, for which
namely, take i, for which o10 = max o, and T(i,) contains
an index, say j,,
such that vJ, < v,~. Such an i, exists; for
otherwise T would be reducible because the indices of
maximum weight would circulate only among them-
selves. We are free to order symbols so that i, = 1. From
(6.1) follows
v,IT(l)l> 2 v$Nv,
from which we conclude 1 T(1) I> N. Consider next the
following sums modulo N:
v,o/, + vJ2
vj, + . * . +f&.
Either there are N distinct values and one of them is 0
(mod N), or two repeat, in which case their difference
Y,> +
+ . . .
we can find a nonempty subset A, CJ T(l), ] A, IS N <
1 T(lf] , such that
2 vJ=O (modN)
1 v,<Nv,,
as follows:
v, > 2 vi/N.
lal- k--
I$ = v, - x v;i = v, - 2 2 vj,N
k=l k==l
This last inequality holds because state one is heaviest and
either (A, I< N, or IA, I= N andj, E A,,j, being lighter
than state 1.
Next we split i into new symbols i, . ,ilall , which we
call offspring of i. With respect to the alphabet e = {ik:
1 < k 5 1 (Y~ 1, 1 5 i I M} of new symbols ordered in some
fashion, we obtain a new transition matrix T specified by
the transitions
Nv, - 2 2 j
k=l jEA,
x vj - JET;-Al lvj
ik --f
j E A, C T(i),
wherej are offspring ofj, i.e., T(ik) = {j: j E Ak}.
We define a l-block map cp of {T} to {T} by
cpik = i.
This map is obviously onto. Suppose i is a predecessor ofj
under transitions of T. Then, because no elements of (Y,
overlap, there is a unique offspring ik of i such that T( ik)
contains the offspring of j. This fact establishes that p is
left resolving and that every 2-block (i, j) is a resolvable
one. Consequently by Definition 3.2, p is invertible, in
other words, an isomorphism.
We form an approximate characteristic subvector vwith
components vi*, 1 5 k I) ai 1, 1 5 i 5 M, for T as fol-
= 2
v,/N = 2 v;,/N.
jE T( Al)
Next we take the subalphabet e = {i E (I?, vjk > 0}
along with transitions which we denote by T, that are
inherited from T by deleting transitions ik + j whenever
v:k or I$ = 0, in other words, by crossing out rows and
columns of T corresponding to components of o that
vanish. From (6.10), (6.11) we see that T(ik) # 0, ik E
e. So by Definition 5.4 there eg-sts an irreducible compo-
nent, i.e., a further subal