Page 1

A regeneration proof of the central limit theorem for uniformly

ergodic Markov chains

By

AJAY JASRA

Department of Mathematics, Imperial College London, SW7 2AZ, London, UK

and

CHAO YANG1

Department of Mathematics, University of Toronto, M5S 2E4, Toronto ON, Canada

November 10, 2005

Abstract

Let (Xn) be a Markov chain on measurable space (E,E) with unique stationary distribution

π. Let h : E → R be a measurable function with finite stationary mean π(h) :=?

theorem (CLT) holds for h whenever π(|h|2+δ) < ∞, δ > 0. Cogburn (1972) proved that if a

Markov chain is uniformly ergodic, with π(h2) < ∞ then a CLT holds for h. The first result

was re-proved in Roberts and Rosenthal (2004) using a regeneration approach; thus removing

many of the technicalities of the original proof. This raised an open problem: to provide a proof

of the second result using a regeneration approach. In this paper we provide a solution to this

problem.

Keywords: Markov chains; Central limit theorems

Eh(x)π(dx).

Ibragimov and Linnik (1971) proved that if (Xn) is geometrically ergodic, then a central limit

1 Introduction

Let (Xn) be a Markov chain with transition kernel P : E × E → [0,1] and a unique stationary

distribution π. Let h : E → R be a real-valued measurable function. We say that h satisfies

a Central Limit Theorem (or√n−CLT) if there is some σ2< ∞ such that the normalized sum

n−1

distribution with zero mean and variance σ2(we allow that σ2= 0), and (e.g. Chan and Geyer

(1994), see also Bradley (1985) and Chen (1999))

2?n

i=1[h(Xi)−π(h)] converges weakly to a N(0,σ2) distribution, where N(0,σ2) is a Gaussian

σ2

=π(h2) + 2

?

E

∞

?

n=1

h(x)Pn(h)(x)π(dx)

with Pn(h)(x) =?

measures on (E,E) as P(E). The total variation distance between μ,ν ∈ P(E) is:

Eh(y)Pn(x,dy) and Pn(x,dy) the n−step transition law for the Markov chain.

To further our discussion we provide the following definitions. Denote the class of probability

?μ − ν?

:=sup

A∈E|μ(A) − ν(A)|.

We will be concerned with geometrically and uniformly ergodic Markov chains:

1Corresponding author, e-mail: chaoyang@math.toronto.edu

1

Page 2

Definition 1.1. A Markov chain with stationary distribution π ∈ P(E) is geometrically ergodic if

∀n ∈ N:

?Pn(x,∙) − π(∙)?

?

M(x)ρn

where ρ < 1 and M(x) < ∞ π−almost everywhere. If M = supx∈E|M(x)| is finite then the chain

is uniformly ergodic.

Theorem 1.2 (Cogburn, 1972). If a Markov chain with stationary distribution π ∈ P(E) is

uniformly ergodic, then a√n−CLT holds for h whenever π(h2) < ∞.

Ibragimov and Linnik (1971) proved a CLT for h when the chain is geometrically ergodic and,

for some δ > 0, π(|h|2+δ) < ∞. Roberts and Rosenthal (2004) provided a simpler proof using

regeneration arguments. In addition, Roberts and Rosenthal (2004) left an open problem: To

provide a proof of Theorem 1.2 (originally proved by Cogburn (1972)) using regeneration.

Many of the recent developments of CLTs for Markov chains are related to the evolution of

stochastic simulation algorithms such as Markov chain Monte Carlo (MCMC) (e.g. Robert and

Rosenthal (2004)). For example, Roberts and Rosenthal (2004) posed many open problems, includ-

ing that considered here, for CLTs; see H¨ aggstr¨ om (2005) for a solution to another open problem.

Additionally, Jones (2004) discusses the link between mixing processes and CLTs, with MCMC

algorithms a particular consideration. For an up-to-date review of CLTs for Markov chains see:

Bradley (1985), Chen (1999) and Jones (2004).

The proof of Theorem 1.2, using regeneration theory, provides an elegant framework for the

proof of CLTs for Markov chains. The approach may also be useful for alternative proofs of CLTs

for chains with different ergodicity properties; e.g. polynomial ergodicity (see Jarner and Roberts

(2002)).

The structure of this paper is as follows. In Section 2 we provide some background knowledge

about the small sets and the regeneration construction, we also detail some technical results. In

Section 3 we use the results of the previous Section to provide a proof of Theorem 1.2 using

regenerations.

2Small Sets and Regeneration Construction

2.1 Small Sets

We recall the notion of a small set:

Definition 2.1. A set C ∈ E is small (or (n0,?,ν)-small) if there exists an n0∈ N, ? > 0 and a

non-trivial ν ∈ P(E) such that the following minorization condition holds ∀x ∈ C:

Pn0(x,∙)

?

?ν(∙). (1)

It is known (e.g. Meyn and Tweedie (1993)) that if P is uniformly ergodic, the whole state

space E is small. That is we have the following lemma:

Lemma 2.1. If (Xn) on (E,E) with stationary distribution π ∈ P(E) is uniformly ergodic, then

E is small.

2

Page 3

2.2 Regeneration Construction and Some related Technical Results

Now we consider the regeneration construction for the proof. Since E is small we use the split

chain construction (Nummelin, 1984), for any x ∈ E, A ∈ E

Pn0(x,A)= (1 − ?)R(x,A) + ?ν(A)

where R(x,A) = (1 − ?)−1[Pn0(x,A) − ?ν(A)]. That is, for a single chain (Xn), with probability ?

we choose Xn+n0∼ ν, while with probability 1 − ? we choose Xn+n0∼ R(Xn,∙), if n0> 1, we fill

in the missing values as Xn+1using the appropriate Markov kernel and conditionals.

We let T1,T2,... be the regeneration times, i.e. the times such that XTi∼ ν, clearly Ti= in0.

Let T0= 0 and r(n) = sup{i ? 0 : Ti? n}, using the regeneration time, we can break up the sum

?n

n

?

i=0[h(Xi) − π(h)] into sums over tours as follows:

i=0

[h(Xi) − π(h)] =

r(n)

?

j=1

Tj+1−1

?

i=Tj

[h(Xi) − π(h)] + Q(n)

where

Q(n) =

T1−1

?

j=0

[h(Xj) − π(h)] +

n

?

Tr(n)+1

[h(Xj) − π(h)].

We begin our construction, by noting the following result.

Lemma 2.1. Under the formulation above, we have that:

Q(n)

n1/2−→p0. (2)

Proof. Let

Q+

1(n)=

T1−1

?

T1−1

?

j=0

[h(Xj) − π(h)]+

Q−

1(n)=

j=0

[h(Xj) − π(h)]−

and

Q+

2(n)=

n

?

n

?

Tr(n)+1

[h(Xj) − π(h)]+

Q−

2(n)=

Tr(n)+1

[h(Xj) − π(h)]−

where [h(Xj) − π(h)]+= max{h(Xj) − π(h),0} and [h(Xj) − π(h)]−= max{−[h(Xj) − π(h)],0}.

3

Page 4

The strategy of the proof is to show that Q±

i(n)/n1/2→p0 as n → ∞. Consider Q+

1(n),

Q+

1(n) =

sn0−1

?

j=0

[h(Xj) − π(h)]+w.p ?(1 − ?)(s−1)

(3)

where s ∈ N. If Q+

P(Q+

Q−

For Q2we have Q+

We know that?Q+

discussion, we conclude that Q(n)/n1/2→p0.

The above lemma indicates that our objective is to find the asymptotic distribution of?r(n)

i=Tj

has same distribution. However, we know that Tjdepends on XTj−1+1,∙ ∙ ∙,XTj−1−1, but does not

depend on the value of XTj−1. That is, we have the following lemma:

Lemma 2.2. For any 0 ? i < ∞, si and si+1 are not independent, but the two collections of

random variables: {si: 0 ? i ? m − 2} and {si: i ? m} are independent for any m ? 2.Therefore

the random variable sequence {si}∞

Proof. Clearly si+1depends on the distribution Ti+1, thus:

?

and

?

Note sidepends on Ti+1. Therefore siand si+1are not independent. However, for any 0 ? i ?

m − 2 < m ? j < ∞, since XTi∼ ν(∙) and XTjdepends XTj−1+1,∙ ∙ ∙,XTj−1, but is independent of

all the {Xk: k ? Tj}. Thus, we have the result.

To prove Theorem 1.2 we follow the strategy:

??T1−1

Step 2: Prove that J =?

1(n)/n1/2→p0, i.e. P(∃?,Q+

1(n) = ∞,i.o.) = 1, which is impossible from (3). So Q+

i(n)/n1/2→p0 as n → ∞.

j=rn+1[h(Xj) − π(h)]+=?Q+

therefore, Q+

1(n) > ?n1/2,i.o.) = 1 for all n, which means that

i(n)/n1/2→p0 as n → ∞. Similarly

2(n) ??ln

2(n), where l(n) = inf {i ? 0 : Ti? n}.

2(n), so?Q+

2(n) has the same distribution with Q+

2(n)/n1/2→p0 as n → ∞. Similarly Q−

i(n)/n1/2→p0 as n → ∞ and

2(n)/n1/2→p0 as n → ∞. From the above

j=1

?Tj+1−1

[h(Xi)−π(h)]. Given the definition of Ti, each random variable sj=?Tj+1−1

i=Tj

[h(Xi)−π(h)]

i=0is a one-dependent stationary stochastic processes.

P

XTi+1∈ dx1,∙ ∙ ∙,XTi+m∈ dy)|XTi= x,Ti+1− Ti> m

?

=(1 − ?)R(x,dy)

Pm(x,dy)

P(x,dx1) ∙ ∙ ∙ P(xm−1,dy)

P

XTi+1∈ dx1,∙ ∙ ∙,XTi+m∈ dy)|XTi= x,Ti+1− Ti= m

?

=

?ν(dy)

Pm(x,dy)P(x,dx1) ∙ ∙ ∙ P(xm−1,dy).

Step 1: Prove that I = Eν

i=0[h(Xi) − π(h)]

???T1−1

?

= 0

Eν(dx)E

i=0[h(Xi) − π(h)]

?2????X0= x

?

< ∞.

Step 3: Prove that a√n−CLT holds for a stationary, one-step dependent stochastic process.

3Proof of Theorem 1.2

Lemma 3.1. I = Eν

??T1−1

i=0[h(Xi) − π(h)]

?

= 0

4

Page 5

Proof. Denote T1= τm and Hk=?(k+1)m−1)

i=km

[h(Xi) − π(h)], then we have:

I = Eν[

∞

?

k=0

HkI{k < τ}]

Consider the splitting m−skeleton chain {ˇ Xnm} as in section 5.1.1 of Meyn and Tweedie (1993), we

know that ˇ α = X1is an accessible atom. Then we can apply Theorem 10.0.1 of Meyn and Tweedie

(1993) to this splitting chain. That is:

?

?

Let ˇ τˇ α= min{n ? 1 :ˇ Xnm∈ ˇ α}. Since for any w ∈ ˇ α,ˇPm(w,∙) ∼ ν(∙), we have ˇ τˇ α= τ. Following

Theorem 5.1.3 in Meyn and Tweedie (1993), we also have Pkn0(x,B) =ˇPkn0(x,ˇB) for any B ∈ E.

Therefore we have:

τ1

?

So we have:

?

k=0

∞

?

∞

?

The last equation follows since random variables I{τ > k} and Xkmare independent. In addition,

given τ1> k and Xkm, the distribution of Hkis equal to H0given X0; therefore

?

?

=

Eπ(H0)

=0.

π(B) = ˇ π(B0∪ B1)=

ˇ α

ˇ π(dw)Ew[

ˇ τˇ α

?

k=1

I{ˇ Xkm∈ˇB}]

=?

X1

π(dw)Ew[

ˇ τˇ α

?

k=1

I{ˇ Xkm∈ˇB}]

π(B) = ?Eν[

k=1

I{Xkm∈ B}] = ?Eν[

∞

?

k=1

I{Xkm∈ B}I{τ > k}]

I=

Eν

E

? ∞

?

?

?

E

HkI{k < τ}|Xkm

??

??

=

k=0

Eν

?

?

HkI{k < τ}|Xkm

?

=

k=0

Eν

E

Hk|Xkm

I{k < τ}

?

I=

∞

?

Eπ

k=0

Eν

E

?

H0|X0

??

?

I{k < τ}

?

=

E

?

H0|X0

Lemma 3.2. We have:

J=

Eν

??T1−1

?

i=0

[h(Xi) − π(h)]

?2?

< ∞. (4)

5

Page 6

Proof.

J=

Eν

??τ−1

?? ∞

? ∞

k=0

? ∞

k=0

? ∞

k=0

? ∞

k=0

?

?

k=0

(k+1)m−1)

?

I{k < τ}|Hk|

i=km

[h(Xi) − π(h)]

?2?

?2?

?

Eν

k=0

=

Eν

?

?

?

?

|Hk|2I{k < τ} + 2

?

?

?

∞

?

k=0

?

|Hk|

∞

?

j=k+1

|Hj|I{j < τ}

?

{k < τ}

?

=

Eν

|Hk|2+ 2Hk

∞

?

j=i+1

|Hj|I{j < τ}

?

I{k < τ}

?

=

Eν

E

|Hk|2+ 2|Hk|

∞

?

∞

?

j=k+1

|Hj|I{j < τ}]I{k < τ}|Xkm,I{k < τ}

?

??

=

Eν

E

|Hk|2+ 2|Hk|

j=k+1

|Hj|I{j < τ}|Xkm

I{k < τ}

?

.

In the last equation, we have used the fact that random variables I{τ > k} and Xkmare indepen-

dent. Since

E

?

|Hi|2+ 2|Hi|

∞

?

j=1

|Hj|{j < τ}|Xim= x

?

= E

?

|H0|2+ 2|H0|

∞

?

j=1

|Hj|{j < τ}|X0= x

?

define f(x) = E

?

|H0|2+ 2|H0|?∞

j=1|Hj|{j < τ}|X0= x

?

then we have:

J

?

Eν

? ∞

k=0

?

?

?

f(X0)I{0 < τ}

?

f(X0)I{k < τ}

?

=

Eν

?

+ Eν

? ∞

k=1

? ∞

k=1

?

?

f(X0)I{k < τ}

?

?

?

Eν

f(X0)

+ Eν

?

f(X0)

Eν

I{k < τ}

?

The last inequality is follows since:

1. f(X0)I{k < τ} ? f(X0);

2. When k ? 1, I{τ > k} is independent with X0

Note

Eν

?

I{k < τ

?

=

Pν(k < τ) ? (1 − ?)k

6

Page 7

and

π(dy)=

?

?ν(dy)

E

Pn0(x,dy)π(dx)

?

therefore we have J ?1

?Eν[f(X0)] ?

1

?2Eπ[f(X0) and

Eπ[f(X0)]

?

Eπ[

m−1

?

i=0

|h(Xi) − π(h)|2]

?

m(π(h2) − π(h)2) < ∞

From the above arguments we conclude that J < ∞.

Finally, we prove Theorem 1.2:

Proof of Theorem 1.2. Following Lemma 2.1, we can obtain:

lim

n→∞

?n

i=0[h(Xi) − π(h)]

n1/2

= lim

n→∞

?r(n)

j=1

?Tj+1−1

i=Tj

[h(Xi) − π(h)]

n1/2

. (5)

Define hi= h(Xi)−π(h), sj=?Tj+1

{si: i ? m} are independent for any m ? 2; thus

i=Tj+1hiand ηj= sjm+1+∙∙∙+s(j+1)m−1for an integer m ? 2.

Following Lemma 2.2 we know that two collections of random variables: {si: 0 ? j ? m − 2} and

1

√n

n

?

j=1

sj

=

1

√n

[n/m]−1

?

j=0

ηj+

1

√n

[n/m]−1

?

j=0

smj+

1

√n

n

?

m[n/m]

sj

It should be noted that if j−i > n0, then Xiand Xjare independent, ηjare i.i.d random variables

and smjare i.i.d. so we have:

1

√n

[n/m]−1

?

[n/m]

?

j=0

ηj

→d

N(0,σ2

m

m)

1

√n

j=0

smj

→d

N(0,σ2

s

m)

where σ2

E(s2

Let

m= (m − 1)E(s2

1) + 2E(s1s2) and m−1σ2

1) + 2(m − 2)E(s1s2) and σ2

s→ 0, so the CLT holds.

s= E[s2

1], letting m → ∞, we have

σ2

m→

m

σ2

= lim

n→∞

1

nE

??

n

?

i=1

[h(Xi) − π(h)]

?2?

7

Page 8

then

σ2

=lim

n→∞

1

nE

??

?

?

n

?

r(n)

?

r(n)s2

i=1

[h(Xi) − π(h)]

?

?2?

=lim

n→∞

1

nE

(

j=1

sj)2

=lim

n→∞

1

nE

1+ 2(r(n) − 2)s1s2

?

By the elementary renewal theorem (e.g. Feller (1968)), limn→∞rn

n0s] = ε(1 − ε)(s−1), E(T2− T1) =?∞

σ2=n0

n= E(T2−T1). Since P[T2−T1=

n0

ε< ∞. Therefore if we denote

s=1[n0sε(1 − ε)(s−1)] =

? σ2= E[s2

1+ 2s1s2], then

εE[s2

1+ 2s1s2] =n0

ε? σ2

?r(n)

?1/2

(6)

As a result, we conclude that

?r(n)

lim

n→∞

j=1

?Tj+1−1

i=Tj

[h(Xi) − π(h)]

n1/2

= lim

n→∞

?n0

N(0,σ2)

j=1

?Tj+1−1

i=Tj

[h(Xi) − π(h)]

r1/2

n

∙r1/2

n1/2

n

−→d

=

ε

N(0,? σ2)

as n → ∞.

Acknowledgement

Both authors would like to thank Jeffrey Rosenthal for his assistance in writing this paper. The

first author was supported by an Engineering and Physical Sciences Research Council Studentship

and would like to thank Dave Stephens and Chris Holmes for their advice relating to this paper.

REFERENCES

Bradley, R. C. 1985. On the central limit question under absolute regularity. Ann. Prob.,

13, 1314–1325.

Chan, K. S. and Geyer, C. J. 1994. Discussion of Markov chains for exploring posterior

distributions. Ann. Statist., 22, 1747–1758.

Chen, X. 1999. Limit theorems for functionals of ergodic Markov chains with general state

space. Mem. Amer. Math. Soc., 139.

Cogburn, R. 1972. The central limit theorem for Markov processes. In Le Cam, L. E.,

Neyman, J. and Scott, E. L. (Eds.) Proc. Sixth Ann. Berkley Symp. Math. Statist. and

Prob., 2, 485–512.

8

Page 9

Feller, W. 1968. An Introduction to Probability Theory and its Applications. 3rd ed,

Wiley, Chichester.

H¨ aggstr¨ om, O. 2005. On the central limit theorem for geometrically ergodic Markov chains.

Prob. Theory and Rel. Fields, 132, 74–82.

Ibragimov, I. A. and Linnik, Y. V. 1971. Independent and stationary sequences of random

variables. Wolter-Noordhoff, Groiningen.

Jarner, S. F. and Roberts G. O. (2002). Polynomial convergence rates of Markov chains.

Ann. Appl. Prob., 12, 224–247.

Jones, G. L. 2004. On the Markov chain central limit theorem. Prob. Surveys, 1, 299–320.

Meyn, S. P. and Tweedie, R. L. 1993. Markov chains and stochastic stability. Springer,

New York. http://probability.ca/MT.

Nummelin, E. 1984. General irreducible Markov chains and non-negative operators. Cam-

bridge University Press, Cambridge.

Roberts, G. O. and Rosenthal, J. S. 2004. General state space Markov chains and MCMC

algorithms. Prob. Surveys, 1, 20–71.

9