Page 1

Approximately Exact Inference in Dynamic Panel Models

Simon A. Brodaa ∗

Marc S. Paolellaa

Yianna Tchopouriana

aSwiss Banking Institute, University of Zurich, Switzerland

December 7, 2005

Abstract

This paper develops a general method for conducting exact small-sample inference in

models which allow the estimator of the (scalar) parameter of interest to be expressed as

the root of an estimating equation. The method requires the evaluation of the distribution

function of the latter. When applied to dynamic panel models, the estimating equation works

out to be a sum of ratios in quadratic forms in normal variates, the distribution of which

cannot be straightforwardly computed. We overcome this obstacle by deriving a saddlepoint

approximation that is both readily evaluated and remarkably accurate. A simulation study

demonstrates the validity of the procedure.

Keywords: Dynamic Panel Data, Bias Correction, Estimating Equation, Saddlepoint Approxi-

mation.

∗Corresponding author.

Tchopourian has been carried out within the National Centre of Competence in Research “Financial Valuation

and Risk Management” (NCCR FINRISK), which is a research program supported by the Swiss National Science

Foundation.

E-mail address: broda@isb.unizh.ch.Part of the research of M. Paolella and Y.

Page 2

1 Introduction

Dynamic panel data (DPD) models receive a considerable amount of attention in both the the-

oretical and applied literature (see, for example, the references in Arellano, 2003). Due to its

tractability and wide applicability, the first–order DPD model is by far the most popular. Since

the seminal work of Anderson and Hsiao (1981), the literature has mainly focused on generalized

method of moments (GMM) procedures for its estimation. The first–difference GMM estimator

introduced by Arellano and Bond (1991) is now one of the standard procedures used in empirical

applications. Ahn and Schmidt (1995) exploit additional moment conditions in the presence of

exogenous variables. More recently, Kruiniger (2002) examines various estimators, under different

specifications of the individual effects, and derives conditions of consistency. A recent addition

to this strand of literature is Moon and Phillips (2004).

One of the reasons why GMM procedures enjoy such popularity is the “incidental parameters”

problem and associated asymptotic bias, which was first discussed by Neyman and Scott (1948)

and is pertinent to the least squares or maximum likelihood estimation of (fixed effects) DPD

models (Nickell, 1981). Addressing this issue, Hsiao et al. (2002) propose a transformed likelihood

approach along with a minimum distance estimator, which they demonstrate to outperform other

commonly used estimators.Choi et al. (2004) propose a bias reduction technique based on

recursive mean adjustment, and which is applicable to panel AR(p) models.

In some recent publications, attempts have been made at correcting the bias of the least

squares estimator (Kiviet, 1995; Hahn and Kuersteiner, 2002; Bun and Carree, 2005). These

methods rely heavily on asymptotic results, typically as the number of individuals tends to infinity.

As Hahn and Kuersteiner (2002, p. 1647) note,

Unfortunately, our bias-corrected estimator does not completely remove the bias. This

suggests that an even more careful small sample analysis based on higher order ex-

pansions of the distribution might be needed to account for the entire bias.

The present manuscript is geared at providing such an analysis. We develop a general method

for median unbiased point estimation and the construction of exact confidence intervals, which

we subsequently apply to the first–order DPD model. A saddlepoint approximation for the tail

probabilities of the requisite estimating equations obviates the need for burdensome simulations

as used by Phillips and Sul (2003). The work of these latter authors is in line with our paper in

that it relies on median–bias correction. However, their methodology is based on the least squares

estimator of the model, whereas our approach relies on exact maximum likelihood estimation, and

enables us to allow both for a more general set of exogenous regressors and for non-homogenous

individual error variances. Use of our otherwise exact inferential procedure in conjunction with

the proposed saddlepoint approximation gives rise to the seemingly contradictory nomenclature

approximately exact inference, a term originally coined by Strawderman and Wells (1998).

1

Page 3

The remainder of this paper is organized as follows: Section 2 presents the general method

for point and interval estimation. Section 3 introduces the model. Sections 4 and 5 apply the

estimation methodology to the least squares and maximum likelihood estimation of the model.

Section 6 contains numerical results. Section 7 concludes.

2 A General Approach to Unbiased Estimation

This section develops a general procedure for conducting exact inference in models that allow

the estimator of the parameter of interest to be defined as the root of an estimating equation.

The approach generalizes the approach of Andrews (1993) and is related to the adjusted profile

likelihood of McCullagh and Tibshirani (1990). In contrast to the latter, our approach uses

quantiles, rather than moments, of the distribution. This has two advantages: i) under certain

conditions, the resulting estimator is exactly median unbiased, as opposed to approximately mean

unbiased, ii) it facilitates construction of confidence intervals.

Consider a parametric model {X,θ}, where X is the data, θ′= (θ,δ′), θ ∈ [θ,θ] is the scalar

parameter of interest, δ is a (possibly empty) set of nuisance parameters, and θ, θ need not be

finite. Consider an estimator of θ defined as the root of a (continuously differentiable) estimating

equation E(θ,X) that does not involve δ, i.e.,ˆθ is given by

ˆθ =

θ, if E(θ,X) < 0,

θ, if E(θ,X) > 0,

θ : E(θ,X) = 0,otherwise,

(1)

where for every data set X, we assume

d

dθE(θ,X) < 0.(2)

In the sequel, the dependence of E on the data will not generally be made explicit; rather, if

X appears explicitly, then E(θ,X) will be understood as the (observed) sample value of the

corresponding statistic.

Let Prθ(B) and Medθ(X) denote the probability of B and the median of X if the true

parameter is θ, respectively. In analogy to the notion of (mean) unbiased estimating equations,

it is natural to call an estimating equation E(·) median unbiased if

MedθE(θ) = 0.

More generally, if for a fixed value q ∈ (0,1), E(·) satisfies

Prθ(E(θ) ≤ 0) = q,(3)

we shall refer to it as a (100q%) quantile-unbiased estimating equation. It should be empha-

sized that, while mean unbiased estimating equations do not, in general, lead to mean unbiased

2

Page 4

estimators, it follows from (2) and (3) that

q = Prθ

?

E(θ) ≤ 0

?

= Prθ

?

E−1(E(θ)) ≥ E−1(0)

?

= Prθ

?ˆθ ≤ θ

?

,

e.g., if E(·) satisfies (3) with q = 0.5, then its unique root is a median unbiased estimator of θ,

while if q = (1 ± τ)/2, it constitutes the left (right) endpoint of an equal–tails 100τ% confidence

interval. The following proposition shows how a quantile unbiased estimating equation can be

constructed for any value of q ∈ (0,1).

Proposition 1. Let E(c) : (θ,θ) ?→ R be a continuously differentiable, strictly decreasing es-

timating equation for θ. Assume that, for all c, its distribution function is constant in δ and

strictly increasing in θ, and denote it by FE(c)(·;θ). Then

E∗(c) := E(c) − F−1

E(c)(q;c) (4)

is a strictly decreasing, 100q% quantile unbiased estimating equation for θ.

Proof. It is immediate from the assumptions on E(c) and FE(c)(·;c) that E∗(c) is strictly de-

creasing. Furthermore, for all values of c,

Prc

?

E∗(c) ≤ 0

?

= Prc

?

E(c) ≤ F−1

E(c)(q;c)

?

= FE(c)

?

F−1

E(c)(q;c);c

?

= q,

which, in particular, also holds for c = θ, i.e., E∗satisfies (3).

The root of equation (4), sayˆθq, can also be expressed as

?

θ : Prθ

?E(θ) ≤ E(θ,X)?= q,(5)

which will be convenient for our purposes as it obviates the need to calculate the inverse distri-

bution function appearing in (4). It is important to note that in (5), θ occurs both as the true

parameter and as the argument of the estimating equation.

We close this section with a few remarks concerning related schemes of bias correction. Firstly,

if estimatorˆθ can be expressed in closed form, then it can be written as the root of

ˆθ(X) − θ = 0,

andˆθqsolves

θ : Prθ

?ˆθ ≤ˆθ(X)

?

= q.

In this special case, our technique yields the same estimator as that used by, e.g., Andrews

(1993) and Phillips and Sul (2003). Their requirement that the quantile function ofˆθ be strictly

increasing in θ translates into our assumption that E∗be strictly decreasing. As noted by Andrews

(1993), it is not apparent how this can be formally proven. However, for our model, numerical

results are strongly confirmatory of this assumption.

3

Page 5

Secondly, it appears natural to construct another bias-corrected point estimator by replacing

F−1

E(c)(q;c) in equation (4) by Ec

?E(c)?, i.e., the expected value of E(c) if the true parameter is

c. This is the idea behind the adjusted profile likelihood of McCullagh and Tibshirani (1990),

except that here, we are concerned with a general estimation equation that need not necessarily

be a profile score function. We shall refer to the resulting estimator as Mean Adjusted and denote

it byˆθMean. If the estimator in question is closed-form,ˆθMeanis the same as the nonlinear-bias-

correcting estimator of MacKinnon and Smith (1998).

3 The Model

We consider a first–order DPD model, with or without fixed effects. For each of the N ∈ N+

individuals, the model is characterized by an observed panel and a latent panel, given respectively

by

yi,t= x′

i,tβ + yℓ

i,t,t ∈ {0,...,T},

t ∈ {1,...,T},yℓ

i,t= αyℓ

i,t−1+ ui,t,

(6)

where α ∈ (−1,1], xi,t= (x1

the error components ui,t

i,t,...,xk

iid

∼ N(0,σ2

i,t)′is a vector of regressors with k < NT, β = (β1,...,βk)′,

i), and each initialization yℓ

i,0∼ N

?

0,

σ2

i

1−α2

?

if α ∈ (−1,1) and

an arbitrary constant or random variable if α = 1. In matrix form, the model becomes

Y0= X0β + Yℓ

0,

Yℓ= αYℓ

−1+ U,

where

Y0=?Y1,0′,...,YN,0′?′, Yi,0= [yi,0,...,yi,T]′, X0= [X′

Yℓ=

?

Yℓ

i,−1=yℓ

i,T−1

1,0,...,X′

N,0]′, Xi,0= [xi,0,...,xi,T]′,

Yℓ

1

′,...,Yℓ

N

′?′, Yℓ

?′, Yℓ

i=

?

yℓ

i,1,...,yℓ

i,T

?′, Yℓ

−1= [Yℓ

1,−1

′,...,Yℓ

N,−1

′]′,

?

i,0,...,yℓ

0=

?

Yℓ

1,0

′,...,Yℓ

N,0

′?′, Yℓ

i,0=

?

yℓ

i,0,...,yℓ

i,T

?′,

and X0is assumed to have full column rank. By combining the observable and latent equations

the model can equivalently be written

yi,t= αyi,t−1+ x′

i,tβ − x′

i,t−1βα + ui,t,t = 1,...,T,(7)

or, in matrix form,

Y = αY−1+ Zγ + U,(8)

where γ = [β′,−β′α]′, Y−1= [Y′

[X′

1,−1,...,Y′

N,−1]′, Yi,−1= [yi,0,...,yi,T−1]′, Z = [X,X−1], X =

1,−1,...,X′

1,...,X′

We are particularly concerned with the following two special cases:

N]′, Xi= [xi,1,...,xi,T]′, X−1= [X′

N,−1]′, and Xi,−1= [xi,0,...,xi,T−1]′.

4

Page 6

I) σ2

i= σ2∀i (arbitrary regressors, identical variance)

II) X0= IN⊗ X1,0, β = (β′

1,...,β′

N)′(identical regressors, arbitrary variance)

Here and in the sequel, we denote by ⊗ the Kronecker product.

4Estimation by Least Squares

4.1 Case I: Arbitrary Regressors, Identical variance

This section applies the method of quantile unbiased estimating equations to the least squares

estimation of the model with arbitrary regressors and equal individual variances, thus slightly

generalizing the procedure of Phillips and Sul (2003), and embedding it in our methodology.

From the Frisch–Waugh theorem, the least squares estimator ˆ αLScan be expressed as

α : E1(α) ≡

Y′

Y′

−1MY

−1MY−1

− α = 0, (9)

where M = INT− Z(Z′Z)−1Z′. Generalizing a result of Phillips and Sul (2003), it is shown in

the appendix that E1has distribution free of nuisance parameters, as required by Proposition 1.

4.1.1Quantile Unbiased Estimation

Computation of (5) requires a method of evaluating the distribution function of E1. It is shown

in the appendix that

Prα

?E1(α) ≤ E1(α,X)?− q = Prα

?U′

0AU0

U′

0BU0

≤ c

?

− q,(10)

where matrices A = A(α) and B = B(α) are as in (34), and c =

Y′

−1MY−1is the observed OLS

−1MY

Y′

estimator.

While Andrews (1993) uses the Imhof (1961) algorithm to evaluate the requisite distribu-

tion function and Phillips and Sul (2003) resort to simulation, we replace these time-consuming

processes by a saddlepoint approximation. More generally, and as will be required in the non-

homoscedastic case, it is shown in the appendix that an approximation to the distribution function

of the mean of N i.i.d. ratios of quadratic forms in standard normal variates is given by

Pr

?

1

N

N

?

i=1

U′

U′

iA1Ui

iA2Ui

≤ ¯ r

?

≈ Φ

?

ˆ wn+

1

ˆ wnlogˆ un

ˆ wn

?

,¯ r ?= µ,

where

ˆ wn=

?

N log|D|sgn(¯ r − µ),

(2ˆ strK2K3+ trK2)2− 4ˆ s2trK2

tr2K2

ˆ un= ˆ s

?

2N trK2

3

?

2trK2

3

?(N−1)/2

,

5

Page 7

D = I−2ˆ sA3, A3= A1−¯ rA2, µ = trA1/trA2, Ki= AiD−1, i ∈ {2,3}, Φ denotes the standard

normal cdf, and, with λidenoting the eigenvalues of A3, the saddlepoint ˆ s solves

trK3≡

T

?

i=1

λi

1 − 2ˆ sλi

= 0.

This generalizes the result for the N = 1 case as given in Lieberman (1994) and is of interest

in itself, as it is potentially applicable in numerous other modelling contexts. Also, unlike in

the N = 1 case, no exact methods exist for evaluating the distribution function, rendering the

saddlepoint approximation the only practical means of its computation.

Calculation of ˆ αqthus only requires a univariate root search over α in (10). Efficient estimates

and approximate confidence intervals for β can be computed as usual from a GLS estimation using

α = ˆ α0.5.

4.1.2 Mean Adjusted Estimation

We now turn to the construction of the mean adjusted estimator of α in model (6). It is given as

the solution to

E1(c) − Ec[E1(c)] = 0.

In order to evaluate

Ec

?E1(c)?= Ec

?U′

0AU0

U′

0BU0

?

− c,

we make use of the expression for the mean of a ratio of quadratic forms in normal variates given

in Sawa (1978). Let PΛP′be the spectral decomposition of B and set C = P′AP. Then

E

?U′

0AU0

U′

0BU0

?

=

?∞

0

?T+1

j=1

cj

(1 + 2λjt)3/2?

k?=j

(1 + 2λkt)1/2dt,(11)

where cjand λjdenote the jthdiagonal element of C and Λ, respectively. The integrand in (11)

dies off quickly, so that the integral is straightforward to evaluate numerically.

This estimator is equivalent to that of Tanizaki (2000), who is, however, concerned with the

pure time-series (N = 1) case, and used simulation for the evaluation of the mean function.

4.2Case II: Identical regressors, Arbitrary Variance

In this section we relax the assumption of homoscedasticity under some additional constraints.

In particular, we drop the assumption that σ2

iid

∼ N(0,σ2

The regressor matrix is assumed to be block diagonal, so that

i= σ2, so that now, for i = 1,...,N, the errors of

the individual series satisfy uit

i), t = 1,...,T.

X0= IN⊗ X1,0,

6

Page 8

where X1,0represents the (T +1)×k1individual regressor matrix and k1:= k/N is the number of

parameters in β corresponding to each individual. This assumption is necessary for the resulting

estimators of the autoregressive coefficients to have distribution free of nuisance parameters. This

requirement is not, however, overly restrictive: for example, the restricted model encompasses

the standard fixed effect model which includes a dummy regressor for each individual.

Let Vu= diag?σ2

Y0= X0β + Yℓ

1,...,σ2

N

?⊗ IT. The model in matrix form is then given by

0,

Yℓ= αYℓ

−1+ U,

U ∼ N(0,Vu),

N)′, or, equivalently,

(12)

with β = (β′

1,...,β′

Y = αY−1+ Zγ + U,

U ∼N(0,Vu), (13)

where now Z = IN⊗ Z1, Z1 = [X1

by omitting the observations at time t = 0 and t = T, respectively, γ = (γ′

γj= (β′

column space.

Applying the Frisch-Waugh theorem to (13) premultiplied by V−1/2

X1,−1], X1 and X1,−1 are defined analogously to X1,0

1,...,γ′

N)′and

j,−αβ′

j)′. If Z1is singular, it should be replaced by a full-rank matrix spanning the same

u

= diag?σ−1

1,...,σ−1

N

?⊗

IT, we arrive at the GLS estimator

α :

Y′

−1V−1/2

−1V−1/2

u

u

MV−1/2

MV−1/2

u

u

Y

Y′

Y−1

− α = 0,

where

M = INT− V−1/2

= INT− IN⊗ Z1

u

Z

?

?Z′

Z′V−1

uZ

?−1Z′V−1/2

?−1Z′

u

1Z1

1=: IN⊗ M1.

Due to the simple structure of matrices M and Vu, ˆ αGLScan be equivalently written

α :

?N

?N

i=1

1

ˆ σ2

1

ˆ σ2

iY′

iY′

i,−1M1Yi

i=1

i,−1M1Y−1,i

− α = 0.

Upon estimating the individual variances, for given α, by

ˆ σ2

i=trR′

1MR1

T − k1

Y′

i,−1M1Yi,−1,

where R1= R1(α) is as in (33), we obtain the feasible GLS estimator

α : E2(α) ≡

1

N

N

?

i=1

Y′

i,−1M1Yi

i,−1M1Yi,−1

Y′

− α = 0.(14)

As E2(α) is the average of the individual OLS estimating equations, its independence of β and

the σi follows from the corresponding property of the OLS estimating equation proven in the

appendix.

7

Page 9

4.2.1 Quantile Unbiased Estimation

Following through the same steps that led to (34) shows that

Prα

?E2(α) ≤ E2(α,X)?= Prα

2[A∗′

A∗

?0,IN(T+1)

This can be evaluated by means of the saddlepoint approximation for the mean of N i.i.d. ratios

of quadratic forms as discussed in Section 4.1.1.

?

1

N

N

?

i=1

U′

U′

i,0A1Ui,0

i,0B1Ui,0

≤ c

?

,

where A1

iid

∼ N

=

1

1+ A∗

?, and c =

1],

1

=

R1DT−1M1DTR1,

Y′

i,−1M1Yi

Y′

B1

=

R1DT−1M1DT−1R1,

Ui,0

1

N

?N

i=1

i,−1M1Yi,−1is the observed feasible GLS estimator.

4.2.2Mean Adjusted Estimation

The mean adjusted estimating equation is given by

E2(α) − Eα[E2(α)] = 0.

It is immediate that

Ec

?E2(c)?= Ec

?U′

0A1U0

U′

0B1U0

?

− c,

which can be evaluated by (11) with matrices A1and B1replacing A and B, respectively.

5Estimation by Maximum Likelihood

This section is concerned with the estimation of the model by maximum likelihood methods, in

both the homoscedastic and heteroscedastic settings. The general idea is to concentrate out all

nuisance parameters from the log likelihood, giving rise to the profile log likelihood and profile

score functions. The latter takes the form of a sum of ratios of quadratic forms, to which our

methodology can be applied.

5.1 Case I: Arbitrary Regressors, Identical Variance

The log likelihood of model (6), after dropping the constant, is given by

ℓ(α,β,σ2) = −N(T + 1)

2

logσ2−1

2log|Σ| −

1

2σ2(Y0− X0β)′Σ−1(Y0− X0β),

where Σ = RR′= IN⊗ R1R′

1with R and R1= R1(α) as in (33). The score functions are

˙ℓβ(α,β,σ2) =

1

σ2X′

0Σ−1(Y0− X0β),

˙ℓσ2(α,β,σ2) = −N(T + 1)

2σ2

+

1

2σ4(Y0− X0β)′Σ−1(Y0− X0β),

and

˙ℓα(α,β,σ2) = −1

2tr

?

Σ−1˙Σα

?

+

1

2σ2(Y0− X0β)′Σ−1˙ΣαΣ−1(Y0− X0β),

8

Page 10

where˙Σα denotes the elementwise derivative of Σ with respect to α, given by˙Σα = IN⊗

?

R1˙R′

1+˙R1R′

1

?

, where

˙R1=

b′

b′α + b

b′α2+ 2bα

b′α3+ 3bα2

...

b′αT+ TbαT−1

00 ···

···

···

···

...

00

0000

1000

2α

...

(T − 1)αT−2

1

...

(T − 2)αT−3

0

...

0

...

···10

,

b′= α?1 − α2?−3/2if α ∈ (−1,1) and zero if α = 1. It is computationally advantageous to note

1

is a matrix with top left element b−1, ones on the rest of

main diagonal, and −α on the first subdiagonal.

Solving for β and σ2, we obtain

that Σ−1= IN⊗ R−T

1R−1

1, and R−1

β =?X′

0Σ−1X0

?−1X′

0Σ−1Y0

and

σ2=

1

N(T + 1)(Y0− X0β)′Σ−1(Y0− X0β),

which can be substituted back into˙ℓαto yield the profile score function

˙ℓα(α) = −1

2tr

?

?

Σ−1˙Σα

?

?

+N(T + 1)

2

Y′

0M′

Y′

ΣΣ−1˙ΣαΣ−1MΣY0

0M′

0M′

Y′

0M′

ΣΣ−1MΣY0

ΣΣ−1˙ΣαΣ−1MΣY0

ΣΣ−1MΣY0

= −N tr

R−1

1

˙R1

+N(T + 1)

2

Y′

,(15)

with idempotent matrix MΣ= IN(T+1)− X0

?X′

0Σ−1X0

?−1X′

0Σ−1.

5.1.1Quantile Unbiased Estimation

It is clear from scaling and the fact that MΣX0= 0 that˙ℓα(α) has distribution free of (β,σ2),

and if α is the true parameter, we can write

Y′

0M′

Y′

ΣΣ−1˙ΣαΣ−1MΣY0

0M′

ΣΣ−1MΣY0

=

?Yℓ

0

?′M′

?Yℓ

U′

0R′M′

=U′

ΣΣ−1˙ΣαΣ−1MΣYℓ

?′M′

ΣR−TR−1MΣRU0

0MRR−1˙ΣαR−TMRU0

U′

0MRU0

0

0

ΣΣ−1MΣYℓ

ΣΣ−1˙ΣαΣ−1MΣRU0

0

=U′

0R′M′

,

9

Page 11

where the last equality follows with

R−1MΣR

= R−1(IN(T+1)− X0

= R−1(IN(T+1)− X0

= IN(T+1)− R−1X0

?X′

?X′

?X′

0Σ−1X0

?−1X′

?−1X′

?−1X′

0Σ−1)R

0Σ−1X0

0R−TR−1)R

0Σ−1X0

0R−T=: MR,

with symmetric and idempotent matrix MRso defined. Therefore, the quantile unbiased esti-

mating equation becomes

α : Pr

?U′

0A(α)U0

U′

0B(α)U0

≤ c(α)

?

− q = 0, (16)

where A(α) = MRR−1˙ΣαR−TMR, B(α) = MR, and

c(α) =Y′

0M′

Y′

ΣΣ−1˙ΣαΣ−1MΣY0

0M′

ΣΣ−1MΣY0

.

This can be solved my means of our proposed saddlepoint approximation.

5.1.2 Mean Adjusted Estimation

A mean adjusted estimator can be constructed by subtracting from (15) its mean if the true

parameter is α, i.e.,

Eα(˙ℓ) = −N tr

?

?

?

R−1

1

˙R1

?

?

?

+N(T + 1)

2

Eα

?

?

Y′

0M′

Y′

ΣΣ−1˙ΣαΣ−1MΣY0

0M′

ΣΣ−1MΣY0

?

= −N tr

R−1

1

˙R1

+N(T + 1)

2

Eα

U′

0MRR−1˙ΣαR−TMRU0

U′

0MRU0

trMRR−1˙ΣαR−TMR

trMR

?

= −N tr

R−1

1

˙R1

+N(T + 1)

2

,(17)

where the last equality follows because due to its special form, the ratio of quadratic forms

appearing in the profile score is independent of its own denominator (Pitman, 1937), whence the

mean of the ratio equals the ratio of the means. This bears the advantage that as opposed to its

least squares based counterpart (11), there are no unsolved integrals in (17), allowing for massive

savings in computational time. Simplifying,

Eα(˙ℓ) = −N tr

?

R−1

1

˙R1

?

+

N(T + 1)

N(T + 1) − ktr

?

IN⊗ R−1

1

˙R1

?

MR,

so that the mean-adjusted estimating equation can be written

α :

1

N(T + 1) − ktr

?

IN⊗ R−1

1

˙R1

?

MR=

Y′

0M′

ΣΣ−1?

IN⊗ R−1

0M′

1

˙R1

?

Σ−1MΣY0

Y′

ΣΣ−1MΣY0

.(18)

10

Page 12

This is the adjusted profile likelihood estimator of McCullagh and Tibshirani (1990). However,

those authors derive (18) only as an approximation, overlooking the fact that the mean of the ratio

equals the ratio of the means. They therefore also show only approximately that the estimator,

in this case, is the same as the marginal likelihood estimator of Wilson (1989). If there are no

exogenous regressors, i.e., MR= IN(T+1)and k = 0, then the so-defined estimator coincides with

the MLE of α.

5.2Case II: Identical regressors, Arbitrary Variance

For the model with unequal variances and identical regressors, the log likelihood becomes, after

dropping the constant,

ℓ?α,β,?σ2

i

??= −1

= −T + 1

2log|Vu⊗ R1R′

log|Vu| −N

N

?

1| −1

2

?Y0− X0β?′?Vu⊗ R1R′

2log|R1R′

1

?−1(Y0− X0β)

V−1

2

1| −1

2(Y0− X0β)′?

1| −

i=1

u ⊗?R1R′

1

?−1?

1)−1(Yi,0− Xi,0βi).

(Y0− X0β)

= −T + 1

2

i=1

logσ2

i−N

2log|R1R′

N

?

1

2σ2

i

(Yi,0− Xi,0βi)′(R1R′

The score functions are

˙ℓβi

?α,βi,σ2

?= −(T + 1)

i

?=

1

σ2

i

X′

i,0

?R1R′

1

?−1(Yi,0− Xi,0βi),

˙ℓσ2

i

?α,β,σ2

i

2σ2

i

+

1

2σ4

i

(Yi,0− Xi,0βi)′?R1R′

1

?−1(Yi,0− Xi,0βi),

and

˙ℓα

?α,β,σ2

i

?= −N tr

?

˙R1R−1

1

?

+

N

?

i=1

1

σ2

i

(Yi,0− Xi,0βi)′(R−T

1R−1

1

˙R1R−1

1)(Yi,0− Xi,0βi),

(19)

from which

βi=

?

X′

i,0

?R1R′

1

?−1Xi,0

?−1X′

i,0

?R1R′

1

?−1Yi,0

and

σ2

i=

1

T + 1(Yi,0− Xi,0βi)′?R1R′

1

?−1(Yi,0− Xi,0βi).

Substituting back into (19),

˙ℓ(α) = −N tr

?

R−1

1

˙R1

?

+ (T + 1)

N

?

i=1

Y′

i,0M′

Y′

Σ1(R−T

i,0M′

1R−1

Σ1(R1R′

1

˙R1R−1

1)−1MΣ1Yi,0

1)MΣ1Yi,0

,(20)

with idempotent matrix MΣ1= IT+1− Xi,0

?

X′

i,0R−T

1R−1

1Xi,0

?−1Xi,0R−T

1R−1

1.

11

Page 13

5.2.1Quantile Unbiased Estimation

As before, from scaling and the fact that MΣ1Xi,0= 0,˙ℓα(α) has distribution free of?β,σ2

Following through the same steps that led to (16) now yields

i

?.

Y′

i,0M′

Y′

Σ1(R−T

i,0M′

1R−1

Σ1(R1R′

1

˙R1R−1

1)−1MΣ1Yi,0

1)MΣ1Yi,0

=

U′

i,0MR1

?

R−1

1

˙R1

?′MR1Ui,0

U′

i,0MR1Ui,0

,

with symmetric and idempotent matrix

MR1= IT+1− R−1

1Xi,0

?

X′

i,0R−T

1R−1

1Xi,0

?−1X′

i,0R−T

1.

Therefore, the quantile unbiased estimating equation becomes

α : E4(α) ≡ Pr

?

1

N

N

?

i=1

U′

U′

i,0A(α)Ui,0

i,0B(α)Ui,0

≤ ¯ c(α)

?

− q = 0,

where A(α) = MR1

?

R−1

1

˙R1

?′MR1, B(α) = MR1, and

1

N

i=1

Y′

¯ c(α) =

N

?

Y′

i,0M′

Σ1(R−T

i,0M′

1R−1

Σ1(R1R′

1

˙R1R−1

1)−1MΣ1Yi,0

1)MΣ1Yi,0

.

5.2.2Mean Adjusted Estimation

If the true parameter is α, then, similar to section 5.1.2, the mean of˙ℓ(α) in (20) is

Eα(˙ℓ) = −N tr

?

R−1

1

˙R1

?

+N(T + 1)

T + 1 − k1

?

tr

?

R−1

1

˙R1

?′MR1

?

,

so that the mean-adjusted estimating equation can be written

α :

1

T + 1 − k1

?

tr

?

R−1

1

˙R1

?′MR1

?

=

1

N

N

?

i=1

Y′

i,0M′

Y′

Σ1(R−T

i,0M′

1R−1

Σ1(R1R′

1

˙R1R−1

1)−1MΣ1Yi,0

1)MΣ1Yi,0

.

This is a panel version of the adjusted profile likelihood estimator of McCullagh and Tibshirani

(1990), and also the marginal likelihood estimator of Wilson (1989). As before, if there are no

exogenous regressors, then the estimator coincides with the MLE of α.

6 Numerical Results

6.1Point Estimation

In order to exemplify the virtues of our proposed point estimators, a simulation study was con-

ducted with 1,000 samples from a model with individual dummies and time trends, and standard

normal innovations. The sample sizes used included all possible combinations of N ∈ {10,25,50}

and T +1 ∈ {10,25,50}. For all sample sizes under investigation, the relative performance of the

12

Page 14

estimators were extremely similar; as such, we only report the results for N = 10 and T +1 = 50,

a somewhat typical setup when working with panel data. The results for other configurations are

available from the authors upon request.

In Figure 1, solid lines represent the MLE and its median unbiased counterpart, whereas

dashed lines refer to the OLS–based estimators. In an attempt to unclutter the graphs, the

results for the mean adjusted estimators are not shown: their performance generally was very

close to that of the median unbiased MLE-based estimator.

Regarding the plain ML and OLS estimators, the top row of Figure 1 holds little surprise: they

are asymptotically (in T) equivalent, and asymptotically (in N) biased. This is to be contrasted

with the median-unbiased point estimators, both of which exhibit a small mean bias only near

α = 1, and are virtually median unbiased. This naturally has dramatic consequences for the root

mean squared error (RMSE), shown in the rightmost column. In terms of this latter measure,

the ML-based median unbiased estimator has a slight edge over the OLS–based one, while both

improve on the RMSE of the standard estimators by a factor of up to 6.

The bottom row of the same figure depicts the results for the model with unequal individual

variances. As the theory suggests, the MLE-based median-unbiased estimator maintains its out-

standing performance over the ML and OLS estimators in terms of both bias and RMSE. The

OLS–based median-”unbiased” estimator shows a somewhat different picture: while still being

far ahead of the plain MLE and OLSE, it is no longer (approximately exactly) median unbi-

ased, especially for values of α close to the stationarity border; this must be attributed to the

loss of accuracy of the associated saddlepoint approximation, as explained in the appendix. The

(not shown) OLS–based mean adjusted estimator, on the other hand, does not suffer from this

deficiency, and still performs close to the MLE–based variant.

As a final remark, a comparison of the top and bottom rows of Figure 1 shows that the MLE–

based median unbiased estimator suffers almost no loss in accuracy as the assumption of equal

individual variances is dropped.

6.2Interval Estimation

This section investigates the quality of the ML and OLS–based quantile unbiased interval esti-

mates. For all combinations of N ∈ {10,25,50} and T + 1 ∈ {10,25,50}, and using the same

samples as in Section 6.1, 1,000 90% equal-tail intervals for α were computed based on (5) with

q = (1±0.9)/2. Figure 2 contains the results for the homoscedastic case, with the average length

of the intervals being shown on the left scale, and the empirical coverage on the right scale. The

dotted lines represent 5% critical values of a two-sided binomial test of the null hypothesis that

the true coverage of each intervals equals its nominal value, i.e., 90%. Similar to the results for

the point estimators in the homoscedastic case, the results for the OLS (dashes) and MLE (solid)

based intervals do not differ much, with empirical coverages lying well within the critical values

13

Page 15

Mean Bias Median BiasRMSE

Equal Variances

−1 −0.50 0.51

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

MLE

OLS

−1 −0.50 0.51

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

−1−0.50 0.51

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Unequal Variances

−1−0.50 0.51

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

−1−0.50 0.51

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

−1−0.50 0.51

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Figure 1: Bias and Root Mean Squared Error of MLE (solid) and OLS (dashes) estimators and their median

unbiased counterparts, for a model with N = 50, T + 1 = 10, xit = [1 t].

for all sample sizes shown.

The differences between estimators are substantially more pronounced for the heteroscedas-

tic case. Figure 3 contains the results. While for the MLE–based confidence intervals, the null

hypothesis that the true coverage equals the nominal value is again well supported, this is not

true for the OLS–based intervals, even though the latter tend to be considerably longer on aver-

age. As before, this must be attributed to the decreased accuracy of the associated saddlepoint

approximation. In particular, the accuracy of the approximation deteriorates as i) N increases,

ii) T decreases, and iii) α approaches the stationarity border. This is exactly reflected in the

performance of the OLS–based interval estimates.

The comments made in the previous section regarding the relative performance of the MLE–

based estimators in the homoscedastic and heteroscedastic cases remain valid in the context of

interval estimation. Considering that in many, if not most, empirical applications, the assumption

of equal individual variances is questionable at best, this behavior is fortuitous, as it will allow

researchers to abandon this restriction altogether.

14

Page 16

Equal Individual Variances

T+1=10T+1=25 T+1=50

N=10

−1 −1−0.5 −0.500 0.50.511

0

1

2

0.85

0.90

0.95

−1−1−0.5−0.500 0.5 0.511

0

0.1

0.2

0.3

0.4

0.5

0.6

0.85

0.90

0.95

−1 −1−0.5 −0.500 0.50.511

0

0.1

0.2

0.3

0.4

0.85

0.90

0.95

N=25

−1 −1−0.5−0.500 0.5 0.511

0

0.5

1

0.85

0.90

0.95

−1 −1−0.5−0.500 0.5 0.511

0

0.1

0.2

0.3

0.4

0.85

0.90

0.95

−1−1 −0.5 −0.500 0.50.511

0

0.05

0.1

0.15

0.2

0.85

0.90

0.95

N=50

−1 −1−0.5−0.500 0.50.5 11

0

0.2

0.4

0.6

0.8

0.85

0.90

0.95

−1−1 −0.5−0.5000.50.5 11

0

0.05

0.1

0.15

0.2

0.25

0.3

0.85

0.90

0.95

−1 −1 −0.5−0.5000.5 0.511

0

0.1

0.2

0.85

0.90

0.95

Figure 2: Mean length (left scale) and empirical coverage (right scale) of OLS–based (dashes) and MLE–based

(solid) δ = 90% equal tails confidence intervals. The dotted lines represent 5% critical values of a two sided test of

H0 : δ = 0.9.

15

Page 17

Unequal Individual Variances

T+1=10T+1=25T+1=50

N=10

−1−1 −0.5−0.500 0.5 0.511

0

1

2

0.85

0.90

0.95

−1 −1 −0.5−0.500 0.5 0.511

0

0.1

0.2

0.3

0.4

0.5

0.6

0.85

0.90

0.95

−1−1 −0.5 −0.500 0.50.511

0

0.1

0.2

0.3

0.4

0.85

0.90

0.95

N=25

−1 −1−0.5 −0.500 0.5 0.511

0

0.2

0.4

0.6

0.8

1

1.2

0.85

0.90

0.95

−1−1−0.5 −0.500 0.5 0.511

0

0.5

0.85

0.90

0.95

−1 −1−0.5−0.500 0.50.511

0

0.2

0.4

0.85

0.90

0.95

N=50

−1 −1−0.5−0.500 0.50.511

0

0.5

1

0.85

0.90

0.95

−1−1 −0.5−0.5 00 0.5 0.511

0

0.1

0.2

0.3

0.4

0.85

0.90

0.95

−1 −1−0.5 −0.500 0.50.511

0

0.05

0.1

0.15

0.2

0.85

0.90

0.95

Figure 3: Mean length (left scale) and empirical coverage (right scale) of OLS–based (dashes) and MLE–based

(solid) τ = 90% equal tails confidence intervals. The dotted lines represent 5% critical values of a two sided test of

H0 : τ = 0.9.

16

Page 18

7Conclusions

This paper developed a general method for median unbiased estimation and the construction

of exact confidence intervals. The two cornerstones of our approach are the notion of a quantile

unbiased estimating equation, and a saddlepoint approximation to the distribution function of the

original estimating equation. The method was demonstrated to be capable of extremely accurate

inference in the first–order dynamic panel model, without having to rely on any asymptotic

arguments. As regards point estimation, not only does it successfully tackle the problem of

asymptotic bias, but essentially removes the bias, even for samples as small as N = 10, T+1 = 10.

Furthermore, the method was shown to produce valid confidence intervals regardless of sample

size.

We have demonstrated how the estimator of Phillips and Sul (2003) can be embedded in our

methodology. In the case with equal individual variances, the results from their least–squares

based approach are very similar to our likelihood-based method. However, in the setting with

unequal individual variances, this is no longer the case, at least if the time–consuming simulations

are to be replaced with our saddlepoint approximation. We consider it an important feature of

our likelihood–based method that the properties of both the point and interval estimates are

virtually unaltered as the assumption of equal individual variances is dropped, as it allows us

to recommend that this restriction be abandoned wherever it is maintained for the sole sake of

feasibility.

The methods developed herein are not restricted to the first–order DPD model. The concept

of quantile unbiased estimation equations is quite general, and especially useful in normal models

with a covariance matrix depending on one parameter. The saddlepoint approximation to the

mean of ratios of quadratic forms has other applications as well, certainly in panel model contexts.

We leave those for future research.

17

Page 19

Appendices

A A Saddlepoint Approximation for the Mean of i.i.d. Ratios of

Quadratic Forms

This section develops saddlepoint approximations for the density and distribution function of

the mean of N i.i.d. quadratic forms in standard normal variates. Saddlepoint approximations

typically require that the cumulant generating function (cgf) KX(s) of the random variable

X in question be available in a serviceable form. In such cases, the approximations extend

straightforwardly to the mean of N such random variables, the reason being that the cgf of a sum

of N i.i.d. copies of X is simply NKX(s). The random variables to be dealt with here do not,

however, permit a closed form for their cgf, and thus require custom treatment.

A saddlepoint approximation for the density of

R ≡X′A1X

X′A2X,

A2≥ 0,

X ∼N(0,σ2IT), (21)

A1,A2symmetric, is given by (Lieberman, 1994)

ˆf1(r) =φ( ˆ w)trK2

?2trK2

3

, (22)

where ˆ w =?log|D|sgn(r − µ), D = I − 2ˆ sA3, A3= A1− rA2, µ = trA1/trA2, Ki= AiD−1,

i ∈ {2,3}, φ denotes the standard normal pdf, and, with λidenoting the eigenvalues of A3, the

saddlepoint ˆ s solves

T

?

Interest centers on the distribution of the mean¯R of N identical and independent copies of

trK3=

i=1

λi

1 − 2ˆ sλi

= 0. (23)

R. For ease of exposition, we first consider the random variable

S =

N

?

i=1

Ri,

where each of the Riis identically and independently distributed as (21).

Tierney et al. (1989a) show that for an N-dimensional random vector Y having density pro-

portional to

f(y) = b(y)exp(−H(y)),

a saddlepoint approximation to the marginal density of a (sufficiently smooth) scalar function

x = g(y) is given by

ˆf(x) =

1

√2π

?

|H′′(ˆ y)|

|H′′(ˆ yx)|∇g(ˆ yx)′H′′(ˆ yx)−1∇g(ˆ yx)

?1/2f(ˆ yx)

f(ˆ y),(24)

18

Page 20

where H′′denotes the Hessian of H, ˆ y is a local minimizer of H, and ˆ yxminimizes H subject to

g(y) = x. In the i.i.d. case, the joint density takes the form

f(y) =

?

i

f1(yi) =

??

i

b1(yi)

?

exp

?

−

?

i

h(yi)

?

,

and the Hessian becomes

H′′(y) = diag(h′′(y1),...,h′′(yN)).

If g(y) =

?

iyi, then ∇g(y) = (1,...,1), and, under certain regularity conditions, ˆ yx =

(x/N,...,x/N). Furthermore, ˆ y = (ˆ y1,..., ˆ y1), and we can write

ˆf(x) =

1

√2π

?

h′′(ˆ y1)N

h′′(x/N)N−1N

?1/2f1(x/N)N

f1(ˆ y1)N.

Taking f1(yi) ≡ˆf1(yi), i.e., the saddlepoint density given in (22), we have h(yi) =1

Note that matrix D, as well as A3, Ki, and quantities ˆ s and ˆ w, depend on the argument ofˆf1;

2log|D|.

however, in a slight abuse of notation, we shall not make this explicit. We then have

h′(yi) =1

2trD−1(−2∂ˆ s

∂yiA3+ 2ˆ sA2) = ˆ strK2

(25)

and

h′′(yi) =∂ˆ s

∂yitrK2− ˆ strA2D−1(−2∂ˆ s

=∂ˆ s

∂yiA3+ 2ˆ sA2)D−1

∂yitrK2+ 2ˆ s∂ˆ s

∂yitrK2K3− 2ˆ s2trK2

2.

Differentiating (23),

∂ˆ s

∂yi

= (2ˆ strK2K3+ trK2)/(2trK2

3),(26)

and, plugging in,

h′′(yi) = 2

?∂ˆ s

∂yi

?2

trK2

3− 2ˆ s2trK2

2.

For the case at hand, ˆ y1= µ, where ˆ s = 0, ˆ w = 0, K2= A2, and K3= A1− µA2, so that

1

√2π

?

Also,

h′′(ˆ y1) =

?

and we obtain

?

ˆf1(ˆ y1) =

trA2

(A1− µA2)2?.

2tr

?

trA2)2/(2tr

?

(A1− µA2)2??

?1/2

,

ˆf(s) =

(2π)N−1

h′′(s/N)N−1N

ˆf1(s/N)N

19

Page 21

for the sum of ratios. Similarly, for the mean¯R = S/N, a univariate transformation shows that

ˆf(¯ r) =

?N(2π)N−1

?N(2π)N−1

?

2π

h′′(¯ r)N−1

?1/2

?1/2φ( ˆ w)NtrNK2

trK2

?2trK2

ˆf1(¯ r)N

=

h′′(¯ r)N−1

(2trK2

3)N/2

=

N

?

3

??

tr2K2

(2ˆ strK2K3+ trK2)2− 4ˆ s2trK2

2trK2

3

?(N−1)/2

exp−N

2ˆ w2.

The approximate cdf can be written

ˆF(x) =

?x

?

−∞

?

N

2π

?

trK2

?2trK2

q( ˆ w)exp(−N

3

??

tr2K2

(2ˆ strK2K3+ trK2)2− 4ˆ s2trK2

2trK2

3

?(N−1)/2

exp−N

2ˆ w2d¯ r

=

N

2π

?

ˆ w(x)

−∞

2ˆ w2)d ˆ w,(27)

where

q( ˆ w) =

?

trK2

?2trK2

3

??

tr2K2

(2ˆ strK2K3+ trK2)2− 4ˆ s2trK2

2trK2

3

?(N−1)/2

d¯ r

d ˆ w.

Temme (1982) shows that the value of an integral of the form (27) is approximately

ˆF(x) ≈ q(0)Φ

?√N ˆ w(x)

?

−q( ˆ w(x)) − q(0)

√N ˆ w(x)

φ

?√N ˆ w(x)

?

,

where Φ is the standard normal cdf. It is easily seen thatd ˆ w

d¯ r=ˆ strK2

ˆ w

, so that

q( ˆ w(x)) =

?

ˆ w(x)

ˆ s?2trK2

3

??

tr2K2

(2ˆ strK2K3+ trK2)2− 4ˆ s2trK2

2trK2

3

?(N−1)/2

,

and by applying l’Hˆ opital’s rule twice to the quantityˆ strK2

ˆ w

, we find q(0) = 1. By defining

ˆ wn=

?

N log|D|sgn(¯ r − µ)

and

ˆ un≡

ˆ wn

q( ˆ w)= ˆ s

?

2N trK2

3

?

(2ˆ strK2K3+ trK2)2− 4ˆ s2trK2

tr2K2

2trK2

3

?(N−1)/2

,

and writing ¯ r instead of x in the final result, the approximation can be expressed in the more

familiar form

ˆF(¯ r) = Φ( ˆ wn) + φ( ˆ wn)

?1

ˆ wn

−

1

ˆ un

?

.

However, for the case at hand, an alternative form of the approximate cdf due to Barndorff-Nielsen

(1986, 1990) turns out to be more reliable: it is given by

ˆF∗(¯ r) = Φ(w∗

n),w∗

n≡ ˆ wn+

1

ˆ wnlogˆ un

ˆ wn,(28)

20

Page 22

and is the approximation used throughout the paper. Note that, at ¯ r = µ, ˆ un= ˆ wn= 0, so that

(28) is not meaningful, and w∗

nshould be replaced by its limit, which is given by

lim

¯ r→µw∗

n=

?

2

N trA2

3,µ

?

(N − 1)trA2A3,µ

trA2

+

trA3

3trA2

3,µ

3,µ

?

. (29)

This can be shown as follows. Since

ˆ wn

ˆ un= q( ˆ w) and q ( ˆ w(µ)) = q(0) = 1,

lim

¯ r→µ

?1

ˆ wlogˆ un

ˆ wn

?

= lim

¯ r→µ

−logq( ˆ wn)

ˆ wn

?

=0

0

?

.

Using l’Hˆ opital’s rule,

lim

¯ r→µ

?1

ˆ wnlogˆ un

ˆ wn

?

= lim

¯ r→µ

ˆ u′n/ˆ un− ˆ w′n/ ˆ wn

ˆ w′n

= lim

¯ r→µ

ˆ u′nˆ wn− ˆ unˆ w′n

ˆ unˆ wnˆ w′n

?

?

=0

0

?

l′H

= lim

¯ r→µ

ˆ u′′

nˆ wn− ˆ unˆ w′′

n

ˆ u′nˆ wnˆ w′n+ ˆ un( ˆ w′n)2+ ˆ unˆ wnˆ w′′

ˆ u′′′

ˆ unˆ wnˆ w′′′

ˆ u′′

n

2ˆ u′n( ˆ w′n)2

n

?

=0

0

l′H

= lim

¯ r→µ

nˆ wn+ ˆ u′′

nˆ w′n− ˆ u′nˆ w′′

n+ 2ˆ u′nˆ wnˆ w′′

n− ˆ unˆ w′′′

n+ 2ˆ u′n( ˆ w′n)2+ ˆ u′′

n

n+ 3ˆ unˆ w′nˆ w′′

nˆ w′n− ˆ u′nˆ w′′

nˆ wnˆ w′n

= lim

¯ r→µ

,

or, as 1 = lim¯ r→µ

ˆ un

ˆ wn

l′H

= lim¯ r→µ

ˆ u′n

ˆ w′n,

lim

¯ r→µ

?1

ˆ wnlogˆ un

ˆ wn

?

= lim

¯ r→µ

ˆ u′′

2(ˆ u′n)2,

n− ˆ w′′

n

evaluating which requires an expression for lim¯ r→µˆ u′n, lim¯ r→µˆ u′′

wardly,

nand lim¯ r→µˆ w′′

n. Straightfor-

ˆ u′

n= ˆ s′(∗) + ˆ s(∗)′,

where

(∗) =

?

2N trK2

3

?

(2ˆ strK2K3+ trK2)2− 4ˆ s2trK2

tr2K2

2trK2

3

?(N−1)/2

.

It easy to see that lim¯ r→µK3= A3,µ, where A3,µ= A1− µA2. Thus,

lim

¯ r→µˆ u′

n= lim

¯ r→µ(ˆ s′)

?

2N trA2

3,µ.

From (26) and using lim¯ r→µK2= A2,

lim

¯ r→µˆ s′= lim

¯ r→µ

2ˆ strK2K3+ trK2

2trK2

trA2

2trA2

3,µ

3

=

,

so that

lim

¯ r→µˆ u′

n= trA2

?

N

2trA2

3,µ

.

21

Page 23

It easily follows from (25) that

ˆ w′

n=Nˆ strK2

ˆ wn

,

and thus

ˆ w′′

n=Nˆ s′trK2ˆ wn+ Nˆ str˙K2ˆ wn− Nˆ strK2ˆ w′n

ˆ w2

n

,

where a matrix with a dot denotes the elementwise derivative. In the limit,

lim

¯ r→µˆ w′′

n= lim

¯ r→µ

Nˆ s′trK2ˆ wn+ Nˆ str˙K2ˆ wn− Nˆ strK2ˆ w′n

ˆ w2

Nˆ s′trK2+ Nˆ str˙K2− ( ˆ w′n)2

ˆ wn

Nˆ s′′trK2+ 2Nˆ s′tr˙K2+ Nˆ str¨K2− 2 ˆ w′nˆ w′′

ˆ w′n

Nˆ s′′trK2+ 2Nˆ s′tr˙K2

ˆ w′n

Nˆ s′′trK2+ 2Nˆ s′tr˙K2

3 ˆ w′n

n

?

=0

0

?

= lim

¯ r→µ

l′H

= lim

¯ r→µ

n

= lim

¯ r→µ

− 2 lim

¯ r→µˆ w′′

n

⇒ lim

¯ r→µˆ w′′

n= lim

¯ r→µ

.

where˙K2= 2ˆ s′K2K3− 2ˆ sK2, and therefore lim¯ r→µ˙K2=

Differentiating (26),

trA2

trA2

3,µA2A3,µ.

ˆ s′′=4ˆ s′trK2K3trK2

3+ 4ˆ s(tr˙K2K3+ trK2˙K2

4(trK2

3)trK2

3+ 2tr˙K2trK2

3

3)2

−8ˆ strK2K3trK3˙K3+ 4trK2trK3˙K3

(4trK2

3)2

,

and, using K3˙K3= −K3K2+ 2ˆ s′K3

ˆ s′′=4ˆ s′trK2K3trK2

3− 2ˆ sK2

3K2,

3+ 4ˆ s(tr˙K2K3+ trK2˙K2

3)trK2

4(trK2

3+ 4ˆ s′trK2K3trK2

3− 4ˆ strK2trK2

3

3)2

−8ˆ strK2K3trK3˙K3− 4trK2trK3K2+ 8ˆ s′trK2trK3

3− 8ˆ strK2trK2

3K2

4(trK2

3)2

.

In the limit, using the limiting expressions for K2, K3, and ˆ s′,

lim

¯ r→µˆ s′′=2lim¯ r→µ(ˆ s′)trA2A3,µtrA2

3,µ+ trA2trA3,µA2− 2lim¯ r→µ(ˆ s′)trA2trA3

(trA2

3,µ)2

−(trA2)2trA3

(trA2

3,µ)3

3,µ

=2trA2trA3,µA2

(trA2

3,µ)2

3,µ

.

We now require an expression for lim¯ r→µˆ u′′

n. It is immediate that

ˆ u′′

n= ˆ s′′(∗) + 2ˆ s′(∗)′+ ˆ s(∗)′′

¯ r→µˆ u′′

¯ r→µ

⇒ lim

n= lim

?ˆ s′′??

2N trA2

3,µ+

trA2

trA2

3,µ

lim

¯ r→µ(∗)′,

22

Page 24

and straightforward but tedious calculations reveal that

lim

¯ r→µ(∗)′=

√2N lim

¯ r→µ

?

?

trK3˙K3

?trK2

trA2trA3

trA2

3

+ (N − 1)

?

trK2

3

tr˙K2

trK2

?

=

?

2N

trA2

3,µ

3,µ

3,µ

+ (N − 2)trA2A3,µ

?

.

Plugging in and simplifying yields (29).

An important special case occurs if

A2A2= A2

and A1= A2GA2. (30)

This constellation appears not only in our MLE-based quantile unbiased estimation procedure,

but also in the computation of, e.g., the null distribution of the Durbin-Watson statistic. In that

case, we have that trK2= rankA2, and ˆ unand lim¯ r→µw∗

nsimplify to

ˆ un= ˆ s

?

2N trK2

3

?(2ˆ strK2K3+ rankA2)

rankA2

?(N−1)/2

and

lim

¯ r→µw∗

n=

?

2

N

trA3

3,µ

3

?

trA2

3,µ

?3/2,

respectively. It is further computationally advantageous to note that the nonzero eigenvalues λi

of K3in (23) satisfy

λi= ωi− ¯ r,

where ωiare the eigenvalues of A1.

The aforementioned case is also the one where the approximation performs most favorably.

This can be attributed to the fact that (24) is based on a Laplace approximation to the marginal-

izing integral over the joint density, which assumes that the dominant contribution to the integral

is from a neighborhood of the maximum of the integrand, and that the location of this maximum

is dominated by the exponential term. Now in our application, the maximum of the exponential

occurs at µ = trA1/trA2, where ˆ w = 0. If A1 and A2 satisfy the above conditions, then µ

coincides with the mean of¯R, which, to order N−1, maximizes the joint density.

The problems arising in other cases (lower accuracy, and the pdf is poorly normalized) could

potentially be remedied by incorporating the term trK2/?2trK2

ponent saddlepoint density, i.e.,

3into the exponent of the com-

ˆf1(r) =

1

√2πexp

?

−1

2ˆ w2−1

2logtrK2

3+ logtrK2

?

, (31)

and proceeding from there. This is what Tierney et al. (1989b) refer to as a fully exponential

Laplace approximation. We shall not pursue this here, as, besides leading to less compact for-

mulae, it would entail having to numerically find the mode of (31), where for every evaluation

23

Page 25

of (31), the saddlepoint equation (23) would have to be solved, thus eroding the time savings

otherwise associated with the use of the approximation.

In order to exemplify the virtues of our proposed approximation, we consider the mean of N

Durbin-Watson statistics. This can be seen as a version of Bhargava et al. (1982)’s test for serial

correlation in panels, adapted for the case of unequal individual variances. We need to evaluate

Pr

?1

N

U′R′

1M1AM1R1U

U′R′

1M1R1U

< ¯ r

?

,

where the T + 1 × T + 1 matrix A is given by

A =

1−1

2−1−1

2

...

−1−1

...

...

−12−1

1−1

,

where M1= I − X(X′X)−1X′, and X is a T + 1 × k1regressor matrix. Under the null of no

serial correlation, R1= I, and the statistic is precisely in the form (30). Under the alternative of

a serial correlation of α, R1is as in (33).

For a model with T +1 = 25 and a constant and a time trend as regressors, the simulated and

saddlepoint null distributions of the mean of N ∈ {1,10,50} independent copies of the Durbin-

Watson statistic depicted in the left panel of Figure 4 are graphically almost indistinguishable.

The right panel of the same figure shows the distribution under the alternative hypothesis that

α = 0.95, demonstrating the loss of accuracy incurred when the statistic at hand is not in the

form (30).

B Proofs

ˆ αLSin (9) has distribution free of (β′,σ).

Neglecting the common factor restrictions on the 2k parameters in γ, computation of (9) requires

the NT × 2k matrix Z to have full column rank, which may fail to hold, e.g., if the regressors

include N individual dummies and T time dummies. Thus, if r = rank(Z) < 2k, replace Z by

˜Z = QW,

where QWV′is the singular value decomposition of Z, i.e., Q and V are NT × r and 2k × r

matrices, respectively, of full column rank r and W is an r ×r diagonal matrix of full rank, such

that Z = QWV′and Q′Q = V′V = Ir.1

1Clearly, only r different parameters in γ are identified.

24

Page 26

0 0.51 1.52 2.53 3.54

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.51 1.52 2.53 3.54

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 4: Empirical (solid) and saddlepoint (dotted) cdf of the mean of N ∈ {1,10,50} Durbin-Watson statistics

under H0 : α = 0 (left panel) and H1 : α = 0.95 (right panel). The empirical cdf was obtained by simulation with

10.000 replications.

Letting ˜ γ = V′γ, rewrite (8) as

Y = αY−1+˜Z˜ γ + U, (32)

and the OLS estimator for α can now be obtained as in (9) with M replaced by M = INT−

˜Z(˜Z′˜Z)−1˜Z′= INT− QQ′.

To show that ˆ αLSdoes not depend on the value of β, it suffices to show that the first–step

residuals MY and MY−1 do not depend on β. Since MY = MXβ + MYℓand MY−1 =

MX−1β + MYℓ

−1, this amounts to showing that MX = 0 and MX−1 = 0, because neither

MYℓnor MYℓ

and V′

and X−1 = QWV′

(INT− QQ′)QWV′

β. The invariance of ˆ αLSwith respect to σ2(and to Yi0if α = 1) can be proved as outlined by

−1depend on β. Partitioning V′into the first k and the last k columns V′

2, respectively, we obtain Z = [X,X−1] = [QWV′

2. Now, MX = (INT− QQ′)QWV′

2= (Q − Q)WV′

1

1,QWV′

1= (Q − Q)WV′

2] and, thus, X = QWV′

1

1= 0 and MX−1 =

2= 0, which proves the invariance of ˆ αLSwith respect to

Andrews (1993).

?

Representation of E1 as a ratio of quadratic forms.

W.l.o.g., we can assume β = 0 and σ2= 1 by virtue of the above invariance property. Defining

the T × (T + 1) selection matrices DT = [0 | IT] and DT−1 = [IT| 0], we have that MY =

M[IN⊗ DT]Yℓ

0and MY−1= M[IN⊗ DT−1]Yℓ

0.

25

Page 27

With U0∼N(0,INT+N), we have that Yℓ

0= [IN⊗ R1]U0=: RU0, where

R1= R1(α) =

b00 ···

···

···

...

00

bα

bα2

...

bαT

1000

α

...

αT−1

1

...

αT−2

0

...

0

...

···α1

, (33)

b =?1 − α2?−1/2if α ∈ (−1,1) and zero if α = 1. Substituting this into (9) yields

E1(α) =

(Yℓ

U′

U′

(Yℓ

0)′[IN⊗ DT−1]′M[IN⊗ DT−1]Yℓ

0[IN⊗ R1DT−1]′M[IN⊗ DTR1]U0

0[IN⊗ R1DT−1]′M[IN⊗ DT−1R1]U0

0)′[IN⊗ DT−1]′M[IN⊗ DT]Yℓ

0

0

− α

=

− α =U′

0A∗U0

U′

0BU0

− α,

where A∗and B are so defined.

Letting A =1

2[A∗′+ A∗], we can thus write

Prα

?E1(α) ≤ E1(α,X)?= Prα

?U′

0AU0

U′

0BU0

≤

Y′

Y′

−1MY

−1MY−1

?

.(34)

?

References

Ahn, S. C. and P. Schmidt (1995): “Efficient Estimation of Models for Dynamic Panel Data,”

Journal of Econometrics, 68, 5–27.

Anderson, T. W. and C. Hsiao (1981): “Estimation of Dynamic Models with Error Compo-

nents,” Journal of the American Statistical Association, 76, 598–606.

Andrews, D. W. K. (1993): “Exactly Median-Unbiased Estimation of First Order Autoregres-

sive / Unit Root Models,” Econometrica, 61, 139–65.

Arellano, M. (2003): Panel Data Econometrics, Oxford University Press.

Arellano, M. and S. Bond (1991): “Some Tests of Specification for Panel Data: Monte Carlo

Evidence and an Application to Employment Equations,” Review of Economic Studies, 58,

277–297.

Barndorff-Nielsen, O. E. (1986): “Inference on Full and Partial Parameters Based on the

Standardized Signed Log Likelihood Ratio,” Biometrika, 73, 307–322.

——— (1990): “Approximate Interval Probabilities,” Journal of the Royal Statistical Society B,

52, 485–496.

26

Page 28

Bhargava, A., L. Franzini, and W. Narendranathan (1982): “Serial Correlation and the

Fixed Effects Model,” The Review of Economic Studies, 49, 533–549.

Bun, M. J. G. and M. A. Carree (2005): “Bias-Corrected Estimation in Dynamic Panel Data

Models,” Journal of Business and Economic Statistics, 23, 200–210.

Choi, C.-Y., N. C. Mark, and D. Sul (2004): “Bias Reduction by Recursive Mean Adjustment

in Dynamic Panel Data Models,” Economics Working Paper Archive at WUSTL.

Hahn, J. and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic

Panel Model with Fixed Effects when both n and T are large,” Econometrica, 70, 1639–1657.

Hsiao, C., M. H. Pesaran, and A. K. Tahmiscoglu (2002): “Maximum Likelihood Estima-

tion of Fixed Effects Dynamic Panel Data Models Covering Short Time Periods,” Journal of

Econometrics, 109, 107–150.

Imhof, J. P. (1961): “Computing the Distribution of Quadratic Forms in Normal Variables,”

Biometrika, 48, 419–26.

Kiviet, J. F. (1995): “On Bias, Inconsistency, and Efficiency of Various Estimators in Dynamic

Panel Data Models,” Journal of Econometrics, 68, 53–78.

Kruiniger, H. (2002): “On the Estimation of Panel Regression Models with Fixed Effects,”

Working Paper, Queen Mary, University of London.

Lieberman, O. (1994): “Saddlepoint Approximation for the Distribution of a Ratio of Quadratic

Forms in Normal Variables,” Journal of the American Statistical Association, 89, 924–928.

MacKinnon, J. G. and A. A. Smith, Jr. (1998): “Approximate Bias Correction in Econo-

metrics,” Journal of Econometrics, 85, 205–30.

McCullagh, P. and R. J. Tibshirani (1990): “A Simple Method for the Adjustment of Profile

Likelihoods,” Journal of the Royal Statistical Society Series B, 52, 325–44.

Moon, H. R. and P. C. B. Phillips (2004): “GMM Estimation of Autoregressive Roots Near

Unity with Panel Data,” Econometrica, 72, 467–522.

Neyman, J. and E. Scott (1948): “Consistent Estimates Based on Partially Consistent Ob-

servations,” Econometrica, 16, 1–32.

Nickell, S. (1981): “Biases in Dynamic Models with Fixed Effects,” Econometrica, 49, 1417–

1426.

Phillips, P. C. B. and D. Sul (2003): “Dynamic Panel Estimation and Homogeneity Testing

Under Cross Section Dependence,” Econometrics Journal, 6, 217–259.

27

Page 29

Pitman, E. J. G. (1937): “The “Closest” Estimates of Statistical Parameters,” Proc. Camb.

Phil. Soc., 33, 212–222.

Sawa, T. (1978): “The Exact Moments of the Least Squares Estimator for the Autoregressive

Model,” Journal of Econometrics, 8, 159–72.

Strawderman, R. L. and M. T. Wells (1998): “Approximately Exact Inference for the

Common Odds Ratio in Several 2 × 2 Tables,” Journal of the American Statistical Association,

93, 1294–1307.

Tanizaki, H. (2000): “Bias Correction of OLSE in the Regression Model with Lagged Dependent

Variables,” Journal of Computational Statistics and Data Analysis, 34, 495–511.

Temme, N. M. (1982): “The Uniform Asymptotic Expansion of a Class of Integrals Related to

Cumulative Distribution Functions,” SIAM Journal on Mathematical Analysis, 13, 239–253.

Tierney, L., R. E. Kass, and J. B. Kadane (1989a): “Approximate Marginal Densities of

Nonlinear Functions,” Biometrika, 76, 425–433.

——— (1989b): “Fully Exponential Laplace Approximations to Expectations and Variances of

Nonpositive Functions,” Journal of the American Statistical Association, 84, 710–716.

Wilson, G. T. (1989): “On the use of Marginal Likelihood in Time Series Model Estimation,”

Journal of the Royal Statistical Society B, 51, 15–27.

28