Content uploaded by Anatoly Yambartsev

Author content

All content in this area was uploaded by Anatoly Yambartsev on Feb 14, 2020

Content may be subject to copyright.

arXiv:1911.06778v1 [math.PR] 15 Nov 2019

Limit theorems for chains with unbounded variable

length memory which satisfy Cramer condition ∗

A. Logachev, A. Mogulskii, A. Yambartsev

November 18, 2019

Abstract

Here we obtain the exact asymptotics for large and moderate deviations, strong law

of large numbers and central limit theorem for chains with unbounded variable length

memory.

Key words: variable length memory chain, regeneration scheme, generalized renewal

process, local limit theorem, large deviation principle, moderate deviation principle, rate

function, Cramer condition.

1 Instroduction

Let A={0,1,...,d}be a set of dsymbols (characters) – an alphabet. Here we consider a

class of chains r= (ri, i ∈Z)∈ AZwhich is a special case of so-called chains with unbounded

variable length memory. These chains began to be actively studied after they were ﬁrst

introduced by Jorma Rissanenom [1] as an economic and universal way of data compression.

The short and simple introduction to these processes can be found, for example, in [2]. They

are used for modeling data in computer science [1], in biology [3], [4], in neuro-biology [6],

[7], in linguistics [5]. We do not pretend for at least some sort of full review on these chains,

and we restrict ourselves to some papers known to us by the activity of the research group

NeuroMat under the guidance of Prof. A.Galves.

If we interpret Zas a discrete time, then one can imagine such chains as a successive in

time attribution of a character from the alphabet Awith probability which depends on the

past (existing sequence of characters), or, more precisely, it depends on a part of the past,

∗The research is supported by RSF according to the research project 18-11-00129, AL and AY thank

FAPESP grant 2017/20482-0, AY also thanks CNPq and FAPESP grants 301050/2016-3 and 2017/10555-0,

respecively. It is part of USP project Mathematics, computation, language and the brain, FAPESP project

Research, Innovation and Dissemination Center for Neuromathematics grant 2013/07699-0

1

a context. As a consequence such dependence can be represented as a context tree, where

each vertex represent a context, and each vertex is associated with a probability distribution

(on A) of a new character. Markov chain with the state space Ais a particular case of these

chains. In this case the context tree has height 1 – we should know only the last character

in order to decide the distribution of a new character. The existence of a stationary measure

on AZcompatible with a family of transition probabilities determined by a context tree is

the question which naturally arises. These question was answered and the short review can

be ﬁnd [2], where also methods of statistical inferences of context tree were provided.

In [8] the perfect simulation for such processes was constructed. The success of the

algorithm (if the perfect simulation stops in a ﬁnite time) depends directly from the existence

of a (ﬁnite) renewal time moment – the moment when the next and successive attributions

of characters do not depend on the past. In the same paper the connection between renewal

processes and the chains with variable length memory was established. This indicates to us

that the large deviations results for this sort of chain can be obtained using the regeneration

structure and the recently published results on large deviations for the renewal processes [9].

Despite the fact that for a complete deﬁnition and description of chains with unbounded

variable length memory we need to introduce the notion of a context tree, in this article we

restrict ourselves to an alternative description of the chain, because from the very beginning

we will consider a particular case of such chains. Let us ﬁx an initial conﬁguration

r(0) = {ri(0)}−∞<i≤0,

where ri(0), −∞ < i ≤0 take values from the alphabet A, which is binary alphabet

A:= {0,1}in our case. We set the conﬁguration change rule. Remind that at any time step

we write exactly one character from the alphabet Aat the end of existing sequence without

changing it.

In order to set the transition rules, let ﬁrst ﬁx a number v∈N(one of the parameters

of the chain). Consider the set of all words from vcharacters ending with the character 1;

the total number of such words is 2v−1; for any such word we assing their own order number

j, 1 ≤j≤2v−1. Let us set the sequence of positive numbers on this set of words

pkj ∈(0,1), k ∈0∪N,1≤j≤2v−1.

Now we are ready to describe the rules (transition probabilities) of adding a character

from the right: let we have a conﬁguration on the n-th step

r(n) = {ri(n)}−∞<i≤n.

Denote

mn:= sup{−∞ < i ≤n:ri(n) = 1}.

Then, at the next step the conﬁguration r(n) = {ri(n)}−∞<i≤njump to the conﬁguration

r(n+ 1) = {ri(n+ 1)}−∞<i≤n+1

by writing from the right a character 1, rn+1(n+ 1) = 1, with probability pknjn,where

kn=n−mn, jn– the number of word, which forms the sequence

rmn−v+1(n), rmn−v+2 (n),...,rmn(n).

2

Thus, the character 0, rn+1(n+ 1) = 0, will be adding with probability 1 −pknjn. Note that

the previous sequence do not change:

{ri(n+ 1)}−∞<i≤n≡r(n).

Thus, the probability of the attributed character rn+1(n+ 1) depends on

1) the distance to the nearest character 1;

2) the word from v−1 letters, which stand on the left from this 1.

It is obvious that the random sequence {rn(n)}is not Markov chain, because the tran-

sition probability from rn(n) to rn+1(n+ 1) can depend, generally speaking, not only on the

character rn(n), but also on values rn−j(n−j) for arbitrarily large j≥0.

For such deﬁned process r(n) deﬁne R(n) the number of units adding from the right to

the inicial conﬁguration r(0) in nsteps:

R(n) :=

n

X

k=1

rk(k).

We are interested in the behaviour of the process R(n) when n→ ∞. In the next section

we prove low of large numbers, central limit theorem, local limit theorem, we establish also

the large and moderate deviation principles.

Further we suppose that the following condition [A] holds true. The condition [A]

composed by two items.

1. Inicial conﬁguration r(0) contains at least one unit.

2. There exist constants 1> δ1> δ2>0such that for all k∈0∪N,1≤j≤2v−1the

following inequalities holds

δ1≥pkj ≥δ2.

The condition 1 is an obvious condition for existence of process and for implementation

of transition probabilities. Note, however, that this condition can be omitted adding the

probability p∞∈(0,1) to atribute a character 1, when the sequence consists of only zeros.

Condition 2 gives us the possibility of construction of arithmetic generalized renewal process

which satisfy Cramer moment condition [C0] and the condition of arithmeticity of [Z] (see

Section 2).

The paper is organized as follows: in Section 2 we introduce our deﬁnitions and nota-

tions, we provide the main result Theorem 2.2 (low of large numbers, central limit theorem,

local moderate and large limit theorem and principle of moderate large deviations for R(n));

in Section 3 we prove Theorem 2.2; in Section 4 auxiliary lemmas are proved.

3

2 Main results, deﬁnitions, notations

To formulate and prove the main result we need some auxiliary processes which we

deﬁne in this section.

For any state r(n) we correspond the pair

Y(n) := (Y1(n),Y2(n)),

where Y1(n) := n−mn(the distance to the nearest unit, or, what’s the same, the number

of zeros before the nearest unit),

Y2(n) := (rmn−v+1(n), rmn−v+2 (n),...,rmn(n))

(sequence of the nearest unit and v−1 letters from its left).

Note, that the pair Y(n) := (Y1(n),Y2(n)) can transit with probability pknjninto the

pair

Y(n+ 1) = (0,Y2(n+ 1)),

where

Y2(n+ 1) = (0,...,0,1),if Y1(n)≥v−1,

(rmn−v+Y1(n)+2(n),...,rmn(n),0,...,0,1),if Y1(n)< v −1,

and with probability 1 −pknjninto the pair

Y(n+ 1) = (Y1(n) + 1,Y2(n)).

In this way Y(n+ 1) is an random function on Y(n). Thus, the sequence {Y(n)},n≥0

is homogeneous Markov chain with phase state

Y:= {y= (y1,y2) : y1∈0∪N,y2= (a1,...,av), a1∈ A,...,av−1∈ A, av= 1}.

Let us pick out the state

y0:= (0,(0,...,0,1)).

Note that the chain can jump in one step from any state (y1,y2) to chosen state y0, if the

coordinate y1not less than v−1.

Denote

τ1:= min{n > 0 : Y(n) = y0},

τk:= min{n > τ1+···+τk−1:Y(n) = y0} − (τ1+···+τk−1),2≤k.

Since {Y(n)}is homogeneous Markov chain, the random variables τ1,...,τk,... are

independent and, moreover, τkare identically distributed when k≥2.

Let ζkbe the number of units added from the right during the time n∈ {(Tk−1+

1,...,Tk}, where

T0:= 0, Tk:= Tk−1+τkk∈N.

4

In other words

ζ0:= 0, ζk:=

Tk

X

n=Tk−1+1

rn(n) for k∈N.

By construction the random vectors ξk:= (τk, ζk), k∈Nare independent, and ξkare

identically distributed when k≥2.

Let

ν(0) := 0, ν(n) := max{k≥0 : Tk< n}for k∈N.

Deﬁne generalized renewal process

Z(n) :=

ν(n)

X

k=0

ζk.

Let the random vector ξ:= (τ, ζ ) has the distribution which coincides with distribution

of vectors ξk= (τk, ζk) for k≥2.

Since ζk≤τka.s. when k∈N, then from Lemma 4.1 (see Section 4) it follows that for

ξ1and ξthe Cramer’s condition [C0] holds true:

Eeδ|ξ1|≤Ee2δτ1<∞and Eeδ|ξ|≤Ee2δτ <∞,when δ < ρ/2,

where ρ > 0 is the constant from Lemma 4.1.

From Lemma 4.2 (see Section 4) we obtein that the vector ξsatisﬁes the arithmeticity

condition [Z]:

For any u∈Z2the equality f(2πu) = 1 holds and for any u∈R2\Z2the inequality

|f(2πu)|<1holds, where for u= (u1, u2)∈R2the function

f(u) := Eei(u1τ+u2ζ)

is the characteristic function for ξ.

Give the notation we need from the paper [9]

a:= Eζ

Eτ, σ2:= 1

aτ

E(ζ−aτ)2, aτ:= Eτ,

ψ1(λ, µ) := Eeλτ1+µζ1, ψ(λ, µ) := Eeλτ +µζ ,

A(λ, µ) := ln ψ(λ, µ),(λ, µ)∈R2,

D(θ, α) := sup

(λ,µ)∈A≤0{λθ +µα},A≤0:= {(λ, µ) : A(λ, µ)≤0},

D(α) := D(1, α).

Denote Bthe Borel σ-algebra of subsets of R. For an arbitrary set B∈Bwe denote

[B] and (B) its closure and interior correspondingly.

Now we give the deﬁnition of the Large Deviation Principle (LDP).

5

Deﬁnition 2.1. The sequence of random variables snsatisﬁes LDP in Rwith rate function

I=I(y) : R→[0,∞]and normalized function ϕ(n) : lim

n→∞ ϕ(n) = ∞, if for any c≥0the

set {y∈R:I(y)≤c}is compact and for any set B∈Bthe following inequalities holds

true

lim sup

n→∞

1

ϕ(n)ln P(sn∈B)≤ −I([B]),

lim inf

n→∞

1

ϕ(n)ln P(sn∈B)≥ −I((B)),

where I(B) = inf

y∈BI(y),I(∅) = ∞.

Denote Φ0,σ2the normal distribution with parameters (0, σ2), and by ⇒we deonte the

convergence in distribution.

Let us give now the main result of our work.

Theorem 2.2. Let the condition [A]holds true. Then

1. (strong low of large numbers) When n→ ∞

R(n)

n→aa.s.

2. (central limit theorem) When n→ ∞

R(n)−an

√n⇒Φ0,σ2.

3. (local theorem in regions of normal, moderate and large deviations) There

exists ∆>0such that if x∈0∪N,lim

n→∞

x

n=α0and |α0−a| ≤ ∆, then

P(R(n) = x) = 1

√nψ1(λ(α0), µ(α0))CH(1, α0)I(α0)e−nD(x

n)(1 + o(1)),

where

I(α0) =

∞

X

l=1

eλ(α0)lE(eµ(α0)R(l), τ ≥l|Y(0) = y0),

and CH(θ, α)is the positive function, which is continuous in a neighborhood of the

point (θ, α) = (1, α0)and it is known explicitly from Theorem 2.1 and 2.1A [9].

4. (local theorem in regions of normal and moderate deviations) If x∈0∪N

and lim

n→∞

x

n=a, then the following equality holds true

P(R(n) = x) = 1

σ√2πn e−nD(x

n)(1 + o(1)).

6

5. (moderate deviation principle) Let the sequence κ:= κn,κ∈Rsatistﬁes the

conditions

lim

n→∞

κ

n= 0,lim

n→∞

κ

√n=∞.

Then the sequence of random variables ˜

R(n) := R(n)−an

κsatisﬁes LDP with normalized

function ϕ(n) := κ2

nand rate function

I(y) := y2

2σ2.

3 Proof of Theorem 2.2

P r o o f of statements 1) and 2). Since ζk≤τka.s. when k∈N, then the following

inequality holds

Z(n)≤R(n)≤Z(n) + τν(n)+1.(1)

Using Lemma 4.4 and Borel-Cantelli lemma, when n→ ∞, it is easy to see that

τν(n)+1

√n→0 a.s.

Thus the statements 1) and 2) follow from the inequality (1) and corresponding results for

Z(n) (see [12] Theorem 11.5.2 pp.332 and Theorem 10.6.2 pp. 311).

✷

P r o o f of statement 3). Consider

Ln(x) :=

∞

X

k=1

[ln2n]

X

l=1

P(R(n) = x, Tk=n−l, τk+1 ≥l)

=

∞

X

k=1

[ln2n]

X

l=1

l

X

s=0

P(Zk=x−s, R(n)−Zk=s, Tk=n−l, τk+1 ≥l).

(2)

Note that if Tk=n−l, then the random variable Zkuniquely deﬁned by the values of

Markov chain Y(m) when m < n −l, but the random variables R(n)−Zkand τk+1 depend

on the values of the chain Y(m) when m > n −l. Therefore, by the inclusion

{ω:Tk=n−l} ⊆ {ω:Y(n−l) = y0}

the following equality holds

P(Zk=x−s, R(n)−Zk=s, τk+1 ≥l|Tk=n−l)

=P(Zk=x−s|Tk=n−l)P(R(n)−Zk=s, τk+1 ≥l|Tk=n−l).(3)

7

Applying (2) and (3) we obtain

Ln(x) =

∞

X

k=1

[ln2n]

X

l=1

l

X

s=0

P(Zk=x−s, R(n)−Zk=s, Tk=n−l, τk+1 ≥l)

=

∞

X

k=1

[ln2n]

X

l=1

l

X

s=0

P(Zk=x−s|Tk=n−l)P(R(n)−Zk=s, τk+1 ≥l|Tk=n−l)P(Tk=n−l)

=

∞

X

k=1

[ln2n]

X

l=1

l

X

s=0

P(Zk=x−s, Tk=n−l)P(R(n)−Zk=s, τk+1 ≥l|Tk=n−l).

(4)

Note that

P(R(n)−Zk=s, τk+1 ≥l|Tk=n−l) = P(R(l) = s, τ ≥l|Y(0) = y0).

Thus, from equality (4) it is follows that

Ln(x) =

[ln2n]

X

l=1

l

X

s=0

P(R(l) = s, τ ≥l|Y(0) = y0)

∞

X

k=1

P(Zk=x−s, Tk=n−l).(5)

Since P(T0=n−l) = 0 when 0 ≤l≤[ln2n], then from Theorem 2.2 [9] it is follows, when

n→ ∞, that

∞

X

k=1

P(Zk=x−s, Tk=n−l) =

∞

X

k=0

P(Zk=x−s, Tk=n−l)

=1

√nψ1λ˜α

θ, µ˜α

θCH(θ, ˜α)e−nD(θ, ˜α)(1 + o(1)),

where

˜α:= x−s

n, θ := n−l

n.

Since the function ψ1(λ(α), µ(α)) is continuous in a neighborhood of the point α=a, and

the function CH(θ, α) is continuous in a neighborhood of the point (θ, α) = (1, a), then, for

suﬃciently small ∆ and n→ ∞ the following equality holds true

∞

X

k=1

P(Zk=x−s, Tk=n−l) = 1

√nψ1(λ(α0), µ(α0))CH(1, α0)e−nD(θ, ˜α)(1 + o(1)).(6)

Applying Lemma 4.6 (see Section 4) and considering that 0 ≤s≤l≤[ln2n] and |α0−a|<

∆, from the equality (6) we obtain

∞

X

k=1

P(Zk=x−s, Tk=n−l)

=1

√nψ1(λ(α0), µ(α0))CH(1, α0)e−nD(x

n)+(λ(x

n)+εn)l+(µ(x

n)+θn)s(1 + o(1)).

(7)

8

Show that

lim

n→∞

[ln2n]

X

l=1

l

X

s=0

P(R(l) = s, τ ≥l|Y(0) = y0)e(λ(x

n)+εn)l+(µ(x

n)+θn)s

=

∞

X

l=1

eλ(α0)lE(eµ(α0)R(l), τ ≥l|Y(0) = y0) =: I(α0).

(8)

Due to the fact that

lim

n→∞ e(λ(x

n)+εn)l+(µ(x

n)+θn)s=eλ(α0)l+µ(α0)s

the equality (8) will be proved if we can show that the series

∞

X

l=1

l

X

s=0

P(R(l) = s, τ ≥l|Y(0) = y0)e(λ(x

n)+εn)l+(µ(x

n)+θn)s(9)

converges.

Note that if τ≥l, then ζ≥R(l), thus

∞

X

l=1

l

X

s=0

P(R(l) = s, τ ≥l|Y(0) = y0)e(λ(x

n)+εn)l+(µ(x

n)+θn)s

=

∞

X

l=1

e(λ(x

n)+εn)lE(e(µ(x

n)+θn)R(l), τ ≥l|Y(0) = y0)

≤

∞

X

l=1

e(λ(x

n)+εn)lE(e(µ(x

n)+θn)ζ, τ ≥l|Y(0) = y0).

(10)

Due to the Cramer’s condition [C0] for suﬃciently small ∆ >0 and suﬃciently large n

E(e2(µ(x

n)+θn)ζ|Y(0) = y0)<∞

and there exists ρ > 0 such that

E(e2(λ(x

n)+εn+ρ)τ|Y(0) = y0)<∞.

Therefore, using Cauchy-Bunyakovsky and Chebyshev inequalities, we obtain

E(e(µ(x

n)+θn)ζ, τ ≥l|Y(0) = y0)

≤(Ee2(µ(x

n)+θn)ζ|Y(0) = y0)1

2(P(τ≥l|Y(0) = y0))1

2

≤(Ee2(µ(x

n)+θn)ζ|Y(0) = y0)1

2(Ee2(λ(x

n)+εn+ρ)τ|Y(0) = y0)1

2e−(λ(x

n)+εn+ρ)l

=: Ke−(λ(x

n)+εn+ρ)l.

(11)

Using (10), (11), we obtain

∞

X

l=1

l

X

s=0

P(R(l) = s, τ ≥l|Y(0) = y0)e(λ(x

n)+εn)l+(µ(x

n)+θn)s≤K

∞

X

l=1

e−ρl <∞.

9

Thus, the series (9) converges, hence the equality (8) holds true.

From (5), (7) and (8) it is follows that

Ln(x) = 1

√nψ1(λ(α0), µ(α0))CH(1, α0)I(α0)e−nD(x

n)(1 + o(1)).(12)

It is obvious that the following inequality holds

Ln(x)≤P(R(n) = x)≤Ln(x) + P(R(n) = x, τν(n)+1 ≥ln2n).(13)

From Lemma 4.5 (see Section 4) it is follows that

P(R(n) = x, τν(n)+1 ≥ln2n)≤˜

Ce−nD(α)−˜γln2n.(14)

From (12), (13) and (14) it is follows that

P(R(n) = x) = (1 + o(1))Ln(x)

=1

√nψ1(λ(α0), µ(α0))CH(1, α0)I(α0)e−nD(x

n)(1 + o(1)).(15)

✷

P r o o f of statement 4). Due to the fact that (λ(a), µ(a)) = (0,0) and the function

I(α) is continuous in a neighborhood of the point α=awe will have

I(a) =

∞

X

l=1

eλ(a)lE(eµ(a)R(l), τ ≥l|Y(0) = y0)

=

∞

X

l=1

P(τ≥l|Y(0) = y0) = E(τ|Y(0) = y0).

(16)

From Lemma 2.1 (see [9]) it is follows that

CH(1, a) = 1

E(τ|Y(0) = y0)σ√2π.

Hence, from (15) and (16) we obtain

P(R(n) = x) = 1

σ√2πn e−nD(x

n)(1 + o(1)).

✷

P r o o f of statement 5). From Consequence 3.2 (see [10]) it is follows that the sequence

of random variables ˜

Z(n) := Z(n)−an

κsatisﬁes LDP with normalized function ϕ(n) = κ2

nand

rate funtion I(y).

Using Lemma 4.4, for any ε > 0 we will have

lim

n→∞

n

κ2ln P(|˜

R(n)−˜

Z(n)|> ε)≤lim

n→∞

n

κ2ln P(τν(n)+1 > κε)

≤lim

n→∞

n

κ2ln e−ρ

4κε =−lim

n→∞

nρε

4κ=−∞.

Therefore from Theorem 4.2.13 (see [11]) we obtain that the sequences ˜

R(n) and ˜

Z(n) satisfy

the same LDP. ✷

10

4 Auxiliary Results

Lemma 4.1. For any k, n ∈Nthe following inequality holds

P(τk≥n)≤Ce−ρn,(17)

where

C:= 1

1−(1 −δ1)v−1δ2

, ρ := 1

vln C.

P r o o f. Due to the fact that the process Y(n) is markovian it is suﬃcient to prove

Lemma 4.1 for τ1with an arbitrary initial condition. We ﬁx some inicial state Y(0) = (y1,y2).

Since C > 1, C e−ρn =C1−n

v, then for n≤vthe right-hand side of inequality (17) is not less

than 1, therefore (17) obviously holds true.

We prove now the inequality (17) when n≥v+ 1. Denote k:= [n

v] and for l=

0,1,...,k−1 we consider the events

Al:= {ω:rvl+1(n) = 0,...,rvl+v−1(n) = 0, rvl+v(n) = 1}.

Denote Bthe complement to the set

k−1

S

l=0

Al:

B:=

k−1

\

l=0

Al.

Since it is obvious that

{τ1< n} ⊃

k−1

[

l=0

Al,

then we obtain

{τ1≥n} ⊆ B.

Hence we have

Pn: = P(τ1≥n|Y(0) = (y1,y2))

≤P(B|Y(0) = (y1,y2)) = Pk−1

\

l=0

Al

Y(0) = (y1,y2)

=PA0|Y(0) = (y1,y2)

k−1

Y

l=1

PAl

l−1

\

i=0

Ai, Y (0) = (y1,y2).

Since by condition [A] each cofactor in the right-hand side has the following upper bound

1−(1 −δ1)v−1δ2=1

C,

then we have

Pn≤1

Ck

≤1

Cn

v−1

=Ce−ρn.

✷

11

Lemma 4.2. For any u∈Z2the equality f(2πu) = 1 holds true, and for any u∈R2\Z2

the inequality |f(2πu)|<1holds, where

f(u) := Eei(u1τ+u2ζ)

is characteristic function for ξ.

P r o o f. Since τand ζare integers numbers, then, it is obvious, that for u∈Z2the

equality f(2πu) = 1 holds true. We show that for any u∈R2\Z2the inequality |f(2πu)|<1

holds true.

Suppose that it is not true, then there exists (u1, u2)∈R2\Z2such that |Ee2πi(u1τ+u2ζ)|=

1. Note that equality |Ee2πi(u1τ+u2ζ)|= 1 is equivalent to the fact that there exists k∈R

such that

2π(u1τ+u2ζ) = kmod (2π) a.s.

From Condition [A] it is follows that

P(ζ= 1, τ =s+ 1) ≥(1 −δ1)sδ2>0,

P(ζ= 2, τ =s+ 1) ≥δ2(1 −δ1)s−1δ2>0,

P(ζ= 1, τ =s+ 2) ≥(1 −δ1)s+1δ2>0.

Thus, if or hypothesis is true, then should exist k1∈Z,k2∈Z,k3∈Zsuch that the

following inequalities holds true

2π(u1(s+ 1) + u2) = k+ 2πk1

2π(u1(s+ 1) + 2u2) = k+ 2πk2

2π(u1(s+ 2) + u2) = k+ 2πk3.

Divide each equality by 2π. Subtracting from the 2nd equality the 1st we obtain u2=

k2−k1∈Z; substracting from the 3rd equality the 1st, we obtain u1=k3−k1∈Z. The

resulting contradiction completes the proof. ✷

For the vector (˜

λ, ˜µ) such that ψ(˜

λ, ˜µ) = 1, we consider the sequence of random vectors

(ˆτk,ˆ

ζk), k∈N, whose joint distribution is given as follows

P((ˆτ1,ˆ

ζ1)∈A1,...,(ˆτk,ˆ

ζk)∈Ak,...) := 1

ψ1(˜

λ, ˜µ)

∞

Y

k=1

E(e˜

λτk+˜µζk; (τk, ζk)∈Ak).(18)

Let ˆτ0:= 0, ˆ

ζ0:= 0, ˆν(0) := 0. Denote

ˆ

Tk:=

k

X

l=0

ˆτl,ˆν(n) := max{k≥0 : ˆ

Tk< n}.

Lemma 4.3. Let γ+˜

λ+ ˜µ < ρ, then there exists the constant ˆ

C > 0, such that for any

n∈Nthe following inequality holds true

Eeγˆτˆν(n)+1 <ˆ

Cn.

12

P r o o f. Since random variables ˆτk+1 ˆ

Tkare independent, then

E1:= Eeγˆτˆν(n)+1 =E(eγˆτ1; ˆτ1≥n) +

∞

X

k=1

E(eγˆτk+1 ;ˆ

Tk< n ≤ˆ

Tk+1)

≤1

ψ1(˜

λ, ˜µ)Eeγτ1+Eeγ τ +˜

λτ +˜µζ

∞

X

k=1

P(ˆ

Tk< n).

Due to the fact that by arithmeticity the inequality ˆ

Tk≥na.s. when k≥n. Therefore,

using Lemma 4.1 and inequality τk≥ζka.s., we obtain

E1≤1

ψ1(˜

λ, ˜µ)Eeγτ1+Ee(γ+˜

λ+˜µ)τn≤1

ψ1(˜

λ, ˜µ)C

∞

X

k=1

e(γ−ρ)k+Cn

∞

X

k=1

e(γ+˜

λ+˜µ−ρ)k

≤ C

ψ1(˜

λ, ˜µ)

e(γ−ρ)

1−e(γ−ρ)+Ce(γ+˜

λ+˜µ−ρ)

1−e(γ+˜

λ+˜µ−ρ)!n.

✷

Lemma 4.4. Let lim

n→∞ κn=∞. Then for suﬃciently large n∈Nthe following holds true

P(τν(n)+1 ≥κn)≤e−ρ

4κn.

P r o o f. Since random variables τk+1 and Tkare independent, then

P(τν(n)+1 ≥κn)≤P(τ1≥κn) +

∞

X

k=1

P(τk+1 ≥κn, Tk< n ≤Tk+1)

≤P(τ1≥κn) + P(τ≥κn)

∞

X

k=1

P(Tk< n).

Due to the arithmeticity the inequality Tk≥nholds true almost surely when k≥n. By

Lemma 4.1 and Chebyshev inequality for suﬃciently large nwe obtain

P(τν(n)+1 ≥κn)≤Eeρ

2τ1

eρ

2κn+nEeρ

2τ

eρ

2κn≤e−ρ

4κn.

✷

Lemma 4.5. There exist constants ∆>0,˜

C > 0,˜γ > 0such that for x∈0∪N,α:= x

n,

n≥1,|α−a| ≤ ∆the following inequality holds true

P(R(n) = x, τν(n)+1 ≥ln2n)≤˜

Ce−nD(α)−˜γln2n.

13

P r o o f. By Theorem 2.1, [9], it is follows that for suﬃciently small ∆ there exists λ(α)

and µ(α) such that (λ(α), µ(α)) ∈ A≤0,A(λ(α), µ(α)) = 0 and

D(α) = λ(α) + αµ(α).

Denote

Bn:= {ω:τν(n)+1 ≥ln2n}.

We have

P(R(n) = x, τν(n)+1 ≥ln2n)

=P(R(n) = x, Bn, ν(n) = 0) + P(R(n) = x, Bn, ν(n)≥1) := P0+P1.

From Lemma 4.1 it is follows that

P0≤P(τ1≥n)≤Ce−ρn.

Since the function D(α) is continuouis in a neighborhood of the point α=aand D(a) = 0,

then for suﬃciently small ∆ >0 and α:|α−a| ≤ ∆ the following inequality holds

P0≤Ce−ρn ≤Ce−D(α)n−cln2n.(19)

Denote Zk:=

k

P

l=1

ζl.

Estimate from above P1. For λ=λ(α), µ=µ(α) we obtain

P1=P(R(n) = x, Bn, ν(n)≥1) =

∞

X

k=1

P(R(n) = x, Bn, ν(n) = k)

=

∞

X

k=1

P(R(n) = x, τk+1 ≥ln2n, Tk< n ≤Tk+1)

=e−D(α)n

∞

X

k=1

E(e−λ(Tk+1−n)−µ(Zk+1 −x)+λTk+1+µZk+1 ;R(n) = x, τk+1 ≥ln2n, Tk< n ≤Tk+1 ).

(20)

Note that if Tk+1 ≥nand R(n) = x, then x+ζk+1 ≥Zk+1, therefore from (20) it is

follows that

P1≤e−D(α)n

∞

X

k=1

E(e|λ|τk+1+|µ|ζk+1 +λTk+1+µZk+1 ;R(n) = x, τk+1 ≥ln2n, Tk< n ≤Tk+1)

≤e−D(α)n

∞

X

k=1

E(e|λ|ˆτk+1+|µ|ˆ

ζk+1 ; ˆτk+1 ≥ln2n, ˆ

Tk< n ≤ˆ

Tk+1),

(21)

where the joint distribution of random variables (ˆτk,ˆ

ζk), k∈Nhas the form (compare with

(18))

P((ˆτ1,ˆ

ζ1)∈A1,...,(ˆτk,ˆ

ζk)∈Ak,...) := 1

ψ1(λ(α), µ(α))

∞

Y

k=1

E(eλ(α)τk+µ(α)ζk; (τk, ζk)∈Ak),

(22)

14

Making the summation in the inequality (21) we obtain

P1≤e−D(α)nE(e|λ|ˆτν(n)+1+|µ|ˆ

ζν(n)+1 ; ˆτν(n)+1 ≥ln2n).

Since τk≥ζka.s., then from (22) it is follows that ˆτk≥ˆ

ζka.s. Therefore, using

Cauchy-Schwarz-Bunjakowski inequality we obtain

P1≤e−D(α)nEe2(|λ|+|µ|)ˆτν(n)+1 1

2P(ˆτν(n)+1 ≥ln2n)1

2.(23)

From Consequence 2.1, [9], it is follows that (λ(a), µ(a)) = (0,0), therefore due to the

continuity in the point α=aof the function D(α) for suﬃciently small ∆ >0 the following

inequality 3(|λ(α)|+|µ(α)|)< ρ holds true.

Thus, from Lemma 4.3 it is follows that there exists a constant C1>0 such that

Ee2(|λ|+|µ|)ˆτν(n)+1 ≤C1n.

Using Lemma 4.3 and Chebyshev inequality for some C2>0, γ > 0 we obtain

P(ˆτν(n)+1 ≥ln2n)< C2e−γln2n.

From inequalities (19), (23) it is follows that for suﬃciently large n

P(R(n) = x, τν(n)+1 ≥ln2n)≤Ce−D(α)n−cln2n+e−D(α)npC1C2√ne−γ

2ln2n

≤Ce−D(α)n−cln2n+pC1C2e−D(α)n−γ

4ln2n.

✷

Lemma 4.6. There exists ∆>0such that if lim

n→∞

x

n=α0,|a−α0| ≤ ∆, then the following

holds

−nD1−m

n,x

n−y

n=−nD1,x

n+λx

n+εnm+µx

n+θny, (24)

where functions εn=εn(m, y),θn=θn(m, y)satisfy

βn:= max

(m,y)∈Bn{|εn(m, y)|+|θn(m, y)|} =o(1) when n→ ∞,

where Bn:= {(m, y)∈Z2: 1 ≤m≤[nκn],1≤ |y| ≤ [nκn]},κn=o(1) when n→ ∞.

P r o o f. For suﬃciently small ∆ if |a−α0| ≤ ∆, then the function D(α) is analytic

in some neighborhood of the point α0. Therefore, due to the fact that D(θ, α) = θD(α

θ) the

function D(θ, α) is analytic in some neighborhood of the point (1, α0). It means that for

suﬃciently large nthe function can be represented as a Taylor series in a neighborhood of

the point (1,x

n). Therefore we have

−nD1−m

n,x

n−y

n=−nD1,x

n+λx

nm+µx

ny−nM1,(25)

15

where M1is the remainder term in Lagrange’s form

Denote

D′′

(1,1)(x, y) := ∂2

∂2xD(x, y), D′′

(1,2)(x, y) := ∂2

∂x∂y D(x, y), D′′

(2,2)(x, y) := ∂2

∂2yD(x, y).

Then there exist u∈[1 −m

n,1], u∈[x

n−y

n,x

n] such that

|M1| ≤ 2 max(|D′′

(1,1)(u, v)|,|D′′

(1,2)(u, v)|,|D′′

(1,2)(u, v)|)m2

n2+y2

n2:= Km2

n2+y2

n2.

Therefore, from (25) it is follows that there exist εn(m, y), θn(m, y) such that

|εn(m, y)| ≤ Km

n,|θn(m, y)| ≤ Ky

n

and the equality (24) holds.

✷

References

[1] Rissanen, J. (1983). A universal data compression system. IEEE Transactions on infor-

mation theory, 29(5), pp. 656–664.

[2] Galves, A. and L¨ocherbach, E. (2008). Stochastic chains with memory of variable length.

“Festschrift in honour of the 75th birthday of Jorma Rissanen”, TICSP series 38, pp.

117–133

[3] Bejerano, G., and Yona, G. (2001). Variations on probabilistic suﬃx trees: statistical

modeling and prediction of protein families. Bioinformatics, 17(1), pp. 23–43.

[4] Leonardi, F. G. (2006). A generalization of the PST algorithm: modeling the sparse

nature of protein sequences. Bioinformatics, 22(11), pp. 1302–1307.

[5] Galves, A., Galves, C., Garcia, J. E., Garcia, N. L., and Leonardi, F. (2012). Context

tree selection and linguistic rhythm retrieval from written texts. The Annals of Applied

Statistics, 6(1), pp. 186–209.

[6] Duarte, A. and Ost, G. (2016) A model for neural activity in the absence of external

stimuli. Markov Processes and Related Fields, v. 22, p. 37, 2016.

[7] Duarte, A., Galves, A., L¨ocherbach, E. and Ost, G. (2018) Estimating the interaction

graph of stochastic neural dynamics. To be appear in BERNOULLI.

[8] Gallo, S. (2011). Chains with unbounded variable length memory: perfect simulation

and a visible regeneration scheme. Advances in Applied Probability, 43(3), pp. 735–759.

16

[9] Mogulskii A.A. Local theorems for arithmetic compound renewal processes, when

Cramers condition holds. Siberian Electronic Mathematical Reports, 2018. In Russian.

English version is available from arXiv:1811.06299.

[10] Logachov A.V., Mogulskii A.A. Anscombe-type theorem and moderate deviations for

trajectories of a compound renewal process. Journal of Mathematical Sciences, Vol. 229,

No. 1, pp. 36–50.

[11] Dembo A., Zeitouni O., Large Deviations Techniques and Applications, Springer, New

York, 1998.

[12] Borovkov A.A., Probability Theory. Springer, 2013.

17