Page 1

Approximately Exact Inference in Dynamic Panel Models

Simon A. Brodaa ∗

Marc S. Paolellaa

Yianna Tchopouriana

aSwiss Banking Institute, University of Zurich, Switzerland

December 7, 2005

Abstract

This paper develops a general method for conducting exact small-sample inference in

models which allow the estimator of the (scalar) parameter of interest to be expressed as

the root of an estimating equation. The method requires the evaluation of the distribution

function of the latter. When applied to dynamic panel models, the estimating equation works

out to be a sum of ratios in quadratic forms in normal variates, the distribution of which

cannot be straightforwardly computed. We overcome this obstacle by deriving a saddlepoint

approximation that is both readily evaluated and remarkably accurate. A simulation study

demonstrates the validity of the procedure.

Keywords: Dynamic Panel Data, Bias Correction, Estimating Equation, Saddlepoint Approxi-

mation.

∗Corresponding author.

Tchopourian has been carried out within the National Centre of Competence in Research “Financial Valuation

and Risk Management” (NCCR FINRISK), which is a research program supported by the Swiss National Science

Foundation.

E-mail address: broda@isb.unizh.ch.Part of the research of M. Paolella and Y.

Page 2

1 Introduction

Dynamic panel data (DPD) models receive a considerable amount of attention in both the the-

oretical and applied literature (see, for example, the references in Arellano, 2003). Due to its

tractability and wide applicability, the first–order DPD model is by far the most popular. Since

the seminal work of Anderson and Hsiao (1981), the literature has mainly focused on generalized

method of moments (GMM) procedures for its estimation. The first–difference GMM estimator

introduced by Arellano and Bond (1991) is now one of the standard procedures used in empirical

applications. Ahn and Schmidt (1995) exploit additional moment conditions in the presence of

exogenous variables. More recently, Kruiniger (2002) examines various estimators, under different

specifications of the individual effects, and derives conditions of consistency. A recent addition

to this strand of literature is Moon and Phillips (2004).

One of the reasons why GMM procedures enjoy such popularity is the “incidental parameters”

problem and associated asymptotic bias, which was first discussed by Neyman and Scott (1948)

and is pertinent to the least squares or maximum likelihood estimation of (fixed effects) DPD

models (Nickell, 1981). Addressing this issue, Hsiao et al. (2002) propose a transformed likelihood

approach along with a minimum distance estimator, which they demonstrate to outperform other

commonly used estimators.Choi et al. (2004) propose a bias reduction technique based on

recursive mean adjustment, and which is applicable to panel AR(p) models.

In some recent publications, attempts have been made at correcting the bias of the least

squares estimator (Kiviet, 1995; Hahn and Kuersteiner, 2002; Bun and Carree, 2005). These

methods rely heavily on asymptotic results, typically as the number of individuals tends to infinity.

As Hahn and Kuersteiner (2002, p. 1647) note,

Unfortunately, our bias-corrected estimator does not completely remove the bias. This

suggests that an even more careful small sample analysis based on higher order ex-

pansions of the distribution might be needed to account for the entire bias.

The present manuscript is geared at providing such an analysis. We develop a general method

for median unbiased point estimation and the construction of exact confidence intervals, which

we subsequently apply to the first–order DPD model. A saddlepoint approximation for the tail

probabilities of the requisite estimating equations obviates the need for burdensome simulations

as used by Phillips and Sul (2003). The work of these latter authors is in line with our paper in

that it relies on median–bias correction. However, their methodology is based on the least squares

estimator of the model, whereas our approach relies on exact maximum likelihood estimation, and

enables us to allow both for a more general set of exogenous regressors and for non-homogenous

individual error variances. Use of our otherwise exact inferential procedure in conjunction with

the proposed saddlepoint approximation gives rise to the seemingly contradictory nomenclature

approximately exact inference, a term originally coined by Strawderman and Wells (1998).

1

Page 3

The remainder of this paper is organized as follows: Section 2 presents the general method

for point and interval estimation. Section 3 introduces the model. Sections 4 and 5 apply the

estimation methodology to the least squares and maximum likelihood estimation of the model.

Section 6 contains numerical results. Section 7 concludes.

2 A General Approach to Unbiased Estimation

This section develops a general procedure for conducting exact inference in models that allow

the estimator of the parameter of interest to be defined as the root of an estimating equation.

The approach generalizes the approach of Andrews (1993) and is related to the adjusted profile

likelihood of McCullagh and Tibshirani (1990). In contrast to the latter, our approach uses

quantiles, rather than moments, of the distribution. This has two advantages: i) under certain

conditions, the resulting estimator is exactly median unbiased, as opposed to approximately mean

unbiased, ii) it facilitates construction of confidence intervals.

Consider a parametric model {X,θ}, where X is the data, θ′= (θ,δ′), θ ∈ [θ,θ] is the scalar

parameter of interest, δ is a (possibly empty) set of nuisance parameters, and θ, θ need not be

finite. Consider an estimator of θ defined as the root of a (continuously differentiable) estimating

equation E(θ,X) that does not involve δ, i.e.,ˆθ is given by

ˆθ =

θ, if E(θ,X) < 0,

θ, if E(θ,X) > 0,

θ : E(θ,X) = 0,otherwise,

(1)

where for every data set X, we assume

d

dθE(θ,X) < 0.(2)

In the sequel, the dependence of E on the data will not generally be made explicit; rather, if

X appears explicitly, then E(θ,X) will be understood as the (observed) sample value of the

corresponding statistic.

Let Prθ(B) and Medθ(X) denote the probability of B and the median of X if the true

parameter is θ, respectively. In analogy to the notion of (mean) unbiased estimating equations,

it is natural to call an estimating equation E(·) median unbiased if

MedθE(θ) = 0.

More generally, if for a fixed value q ∈ (0,1), E(·) satisfies

Prθ(E(θ) ≤ 0) = q,(3)

we shall refer to it as a (100q%) quantile-unbiased estimating equation. It should be empha-

sized that, while mean unbiased estimating equations do not, in general, lead to mean unbiased

2

Page 4

estimators, it follows from (2) and (3) that

q = Prθ

?

E(θ) ≤ 0

?

= Prθ

?

E−1(E(θ)) ≥ E−1(0)

?

= Prθ

?ˆθ ≤ θ

?

,

e.g., if E(·) satisfies (3) with q = 0.5, then its unique root is a median unbiased estimator of θ,

while if q = (1 ± τ)/2, it constitutes the left (right) endpoint of an equal–tails 100τ% confidence

interval. The following proposition shows how a quantile unbiased estimating equation can be

constructed for any value of q ∈ (0,1).

Proposition 1. Let E(c) : (θ,θ) ?→ R be a continuously differentiable, strictly decreasing es-

timating equation for θ. Assume that, for all c, its distribution function is constant in δ and

strictly increasing in θ, and denote it by FE(c)(·;θ). Then

E∗(c) := E(c) − F−1

E(c)(q;c) (4)

is a strictly decreasing, 100q% quantile unbiased estimating equation for θ.

Proof. It is immediate from the assumptions on E(c) and FE(c)(·;c) that E∗(c) is strictly de-

creasing. Furthermore, for all values of c,

Prc

?

E∗(c) ≤ 0

?

= Prc

?

E(c) ≤ F−1

E(c)(q;c)

?

= FE(c)

?

F−1

E(c)(q;c);c

?

= q,

which, in particular, also holds for c = θ, i.e., E∗satisfies (3).

The root of equation (4), sayˆθq, can also be expressed as

?

θ : Prθ

?E(θ) ≤ E(θ,X)?= q,(5)

which will be convenient for our purposes as it obviates the need to calculate the inverse distri-

bution function appearing in (4). It is important to note that in (5), θ occurs both as the true

parameter and as the argument of the estimating equation.

We close this section with a few remarks concerning related schemes of bias correction. Firstly,

if estimatorˆθ can be expressed in closed form, then it can be written as the root of

ˆθ(X) − θ = 0,

andˆθqsolves

θ : Prθ

?ˆθ ≤ˆθ(X)

?

= q.

In this special case, our technique yields the same estimator as that used by, e.g., Andrews

(1993) and Phillips and Sul (2003). Their requirement that the quantile function ofˆθ be strictly

increasing in θ translates into our assumption that E∗be strictly decreasing. As noted by Andrews

(1993), it is not apparent how this can be formally proven. However, for our model, numerical

results are strongly confirmatory of this assumption.

3

Page 5

Secondly, it appears natural to construct another bias-corrected point estimator by replacing

F−1

E(c)(q;c) in equation (4) by Ec

?E(c)?, i.e., the expected value of E(c) if the true parameter is

c. This is the idea behind the adjusted profile likelihood of McCullagh and Tibshirani (1990),

except that here, we are concerned with a general estimation equation that need not necessarily

be a profile score function. We shall refer to the resulting estimator as Mean Adjusted and denote

it byˆθMean. If the estimator in question is closed-form,ˆθMeanis the same as the nonlinear-bias-

correcting estimator of MacKinnon and Smith (1998).

3 The Model

We consider a first–order DPD model, with or without fixed effects. For each of the N ∈ N+

individuals, the model is characterized by an observed panel and a latent panel, given respectively

by

yi,t= x′

i,tβ + yℓ

i,t,t ∈ {0,...,T},

t ∈ {1,...,T},yℓ

i,t= αyℓ

i,t−1+ ui,t,

(6)

where α ∈ (−1,1], xi,t= (x1

the error components ui,t

i,t,...,xk

iid

∼ N(0,σ2

i,t)′is a vector of regressors with k < NT, β = (β1,...,βk)′,

i), and each initialization yℓ

i,0∼ N

?

0,

σ2

i

1−α2

?

if α ∈ (−1,1) and

an arbitrary constant or random variable if α = 1. In matrix form, the model becomes

Y0= X0β + Yℓ

0,

Yℓ= αYℓ

−1+ U,

where

Y0=?Y1,0′,...,YN,0′?′, Yi,0= [yi,0,...,yi,T]′, X0= [X′

Yℓ=

?

Yℓ

i,−1=yℓ

i,T−1

1,0,...,X′

N,0]′, Xi,0= [xi,0,...,xi,T]′,

Yℓ

1

′,...,Yℓ

N

′?′, Yℓ

?′, Yℓ

i=

?

yℓ

i,1,...,yℓ

i,T

?′, Yℓ

−1= [Yℓ

1,−1

′,...,Yℓ

N,−1

′]′,

?

i,0,...,yℓ

0=

?

Yℓ

1,0

′,...,Yℓ

N,0

′?′, Yℓ

i,0=

?

yℓ

i,0,...,yℓ

i,T

?′,

and X0is assumed to have full column rank. By combining the observable and latent equations

the model can equivalently be written

yi,t= αyi,t−1+ x′

i,tβ − x′

i,t−1βα + ui,t,t = 1,...,T,(7)

or, in matrix form,

Y = αY−1+ Zγ + U,(8)

where γ = [β′,−β′α]′, Y−1= [Y′

[X′

1,−1,...,Y′

N,−1]′, Yi,−1= [yi,0,...,yi,T−1]′, Z = [X,X−1], X =

1,−1,...,X′

1,...,X′

We are particularly concerned with the following two special cases:

N]′, Xi= [xi,1,...,xi,T]′, X−1= [X′

N,−1]′, and Xi,−1= [xi,0,...,xi,T−1]′.

4