Content uploaded by James G. Mackinnon
Author content
All content in this area was uploaded by James G. Mackinnon
Content may be subject to copyright.
QED
Queen’s Economics Department Working Paper No. 642
Testing the Specification of Econometric Models in
Regression and Non-Regression Directions
Russell Davidson
Queen’s University
James G. MacKinnon
Queen’s University
Department of Economics
Queen’s University
94 University Avenue
Kingston, Ontario, Canada
K7L 3N6
2-1986
Testing the Specification of Econometric Models
in Regression and Non-Regression Directions
Russell Davidson
and
James G. MacKinnon
Department of Economics
Queen’s University
Kingston, Ontario, Canada
K7L 3N6
Abstract
The asymptotic power of a statistical test depends on the model being tested, the
(implicit) alternative against which the test is constructed, and the process which
actually generated the data. The exact way in which it does so is examined for several
classes of models and tests. First, we analyze the power of tests of nonlinear regression
models in regression directions, that is, tests which are equivalent to tests for omitted
variables. Next, we consider the power of heteroskedasticity-robust variants of these
tests. Finally, we examine the power of very general tests in the context of a very
general class of models.
This research was supported, in part, by grants from the Social Sciences and Humanities
Research Council of Canada. This paper was published in Telecommunications Demand
Modelling: An Integrated View, ed. A. de Fontenay, M. H. Shugard, and D. Sibley, Amster-
dam, North-Holland, 1990, 221–240. The references have been updated.
February, 1986
1. Introduction
In any area of applied econometric research, and especially in the usual situation where
the experimental design cannot be controlled, it is essential to subject any model to a
great many statistical tests before even tentatively accepting it as valid. When that
is done routinely, especially early in the model-building process, it is inevitable that
most models will fail many tests. The applied econometrician must then attempt to
infer from those failures what is wrong with the model. The interpretation of test
statistics, both those which reject and those which do not reject the null hypothesis,
is thus an important part of the econometrician’s job. There is, however, surprisingly
little literature on the subject, a recent exception being Davidson and MacKinnon
(1985a).
The vast majority of the models that econometricians estimate are regression mod-
els of some sort, linear or nonlinear, univariate or multivariate. The tests to which
such models may be subjected fall into three broad categories, according to a scheme
discussed in Davidson and MacKinnon (1985a). First of all, there are tests in “regres-
sion directions”, in which the (possibly implicit) alternative model is also a regression
model, at least locally. Secondly, there are tests in “higher moment directions”, which
are only concerned with the properties of the error terms; these might include tests
for heteroskedasticity, skewness, and kurtosis. Finally, there are tests in “mixed direc-
tions”, which combine both regression and higher moment components. This paper
will deal primarily with tests in regression directions, although there will be some
discussion of tests in mixed directions.
Tests in regression directions form the lion’s share of the tests that are commonly em-
ployed by econometricians when testing regression models. The asymptotic analysis of
such tests is reasonably easy, and it turns out to be remarkably similar to the asymp-
totic analysis of much more general families of tests; see Davidson and MacKinnon
(1987) and Section 5 below. In contrast to Davidson and MacKinnon (1985a), which
deals only with linear regression models, we focus on the case of univariate, nonlinear
regression models. Because we are dealing with nonlinear models, all of the analysis
is asymptotic; this is probably an advantage rather than otherwise, because, even in
the linear case, many inessential complexities can be eliminated by focusing on what
happens as the sample size tends to infinity.
In Section 2, we discuss nonlinear regression models and tests in regression directions.
In Section 3, we analyze the asymptotic power of these tests when the data generating
process, or DGP, is not the alternative against which the test is constructed. In
Section 4, we discuss heteroskedasticity-robust variants of tests in regression directions.
Finally, in Section 5, we discuss tests in mixed directions.
2. Nonlinear Regression Models and Tests in Regression Directions
The model of interest is the nonlinear regression model
y=f(β,γ) + u,E(u) = 0,E(uu>) = σ2I,(1)
–1–
where yand uare n--vectors, and f(·) denotes a vector of twice continuously dif-
ferentiable functions ft(β,γ) which depend on β, a k--vector, and γ, an r--vector, of
(generally unknown) parameters. The matrices of derivatives of the elements of f
with respect to the elements of βand γwill be denoted Fβ(β,γ) and Fγ(β,γ). These
matrices are n×kand n×r, respectively. The quantities f(β,γ), Fβ(β,γ), and
Fγ(β,γ) are all functions of βand γ, as are several other quantities to be introduced
below. When one of these, f(β,γ) for example, is evaluated at the true values, say
β0and γ0, it will simply be denoted f.
The functions ftmay depend on past values of yt, but not on current or future values,
since otherwise (1) would not be a regression model and least squares would no longer
be an appropriate estimating technique. In order to ensure that all estimators and
test statistics behave sensibly as n→ ∞, it is assumed that Fβ
>Fβ/n,Fγ
>Fγ/n, and
Fβ
>Fγ/n all tend to finite limiting matrices with ranks k,r, and min(k, r), respectively,
as n→ ∞, while the matrix [FβFγ] always has rank k+rfor large n.
It is remarkably easy to test hypotheses about βand γby means of artificial linear
regressions without ever estimating the unrestricted model; see, among others, Engle
(1982) and Davidson and MacKinnon (1984). Suppose that we wish to test the null
hypothesis γ=0. If we estimate (1) by least squares imposing this restriction, we will
obtain restricted estimates ˜
βand ˜σ2. Evaluating f(β,γ), Fβ(β,γ), and Fγ(β,γ) at
˜
βand 0, we obtain the vector ˜
fand the matrices ˜
Fβand ˜
Fγ. Using these, we can
construct the artificial linear regression
(y−˜
f)/˜σ=˜
Fβb+˜
Fγc+ errors.(2)
The explained sum of squares from this regression, which is also ntimes the uncentered
R2, is
(y−˜
f)>˜
Fγ(˜
Fγ
>˜
Mβ˜
Fγ)−1˜
Fγ
>(y−˜
f)/˜σ2,(3)
where ˜
Mβ≡I−˜
Fβ(˜
Fβ
>˜
Fβ)−1˜
Fβ
>. It is straightforward to show that this test statistic
is asymptotically distributed as chi-squared with rdegrees of freedom under the null
hypothesis. The proof makes use of the facts that, asymptotically, y−˜
f=Mβuand
that a central limit theorem can be applied to the r--vector FγMβu. Note that the
numerator of (3) is also the numerator of the ordinary Fstatistic for a test of c=0
in (2), so that (3) is asymptotically equivalent to an Ftest.
The test statistic (3) may be thought of as a generic test in what Davidson and
MacKinnon (1985a) refers to as “regression directions”. It is generic because the fact
that it is based solely on restricted estimates does not prevent it from having the same
asymptotic properties, under the null and under local alternatives, as other standard
tests. If we had assumed that the utwere normally distributed, (3) could have been
derived as a Lagrange Multiplier test, and standard results on the equivalence of LM,
Wald, and Likelihood Ratio tests could have been invoked; see Engle (1984). It is
obvious that this equivalence does not depend on normality.
By a regression direction is meant any direction from the null hypothesis, in the space
of likelihood functions, which corresponds, at least locally, to a regression model. It
–2–
is clear that (3) is testing in regression directions because (1) is a regression model
whether or not γ=0. But a test in regression directions need not be explicitly derived
from an alternative hypothesis which is a regression model, although of course the null
must be such a model. If we were simply to replace the matrix ˜
Fγin (2) and (3) by
an arbitrary matrix Z, asymptotically uncorrelated with uunder the null hypothesis,
we would obtain the asymptotically valid test statistic
(y−˜
f)>Z(Z>˜
MβZ)−1Z>(y−˜
f)/˜σ2.(4)
Using this device, specification tests in the sense of Hausman (1978) and Holly (1982)
can be computed in the same way as other tests in regression directions whenever the
null is a regression model; see Davidson and MacKinnon (1985b). So can non-nested
hypothesis tests, encompassing tests, and differencing tests; see Davidson and Mac-
Kinnon (1981), Mizon and Richard (1986), and Davidson, Godfrey, and MacKinnon
(1985), respectively. In the next section, we shall ignore where the matrix Zand
associated test statistic (4) came from and analyze what determines the power of all
tests in regression directions.
3. The Local Power of Tests in Regression Directions
In order to say anything about the power of a test statistic, one must specify how the
data are actually generated. Since we are concerned with tests in regression directions,
we shall restrict our attention to DGPs which differ from the null hypothesis only in
such directions. This restriction is of course by no means innocuous, as will be made
clear in Section 5 below. The null hypothesis will be (1) with γ=0. Since the
alternative that γ6=0is only one of many alternatives against which we may wish to
test the null, we shall usually suppress γand write f(β,0) as f(β).
Suppose the data are generated by the sequence of local DGPs
y=f(β0) + αn−1/2a+u,E(u) = 0,E(uu>) = σ2
0I,(5)
where β0and σ0denote particular values of βand σ,ais an n--vector which may
depend on exogenous variables, the parameter vector β0, and past values of the yt, and
αis a parameter which determines how far the DGP is from the simple null hypothesis
y=f(β0) + u,E(u) = 0,E(uu>) = σ2
0I.(6)
We assume that a>a/n,a>Fβ, and a>Zall tend to finite limiting matrices as n→ ∞.
The notion of a sequence of local DGPs requires some discussion. Following Pitman
(1949) and many subsequent authors, we adopt it because it seems the most reason-
able way to deal with power in the context of asymptotic theory. The sequence (5)
approaches the simple null (6) at a rate of n−1/2. This rate is chosen so that the test
statistic (4), and all asymptotically equivalent test statistics, will be of order unity as
n→ ∞. If, on the contrary, the DGP were held fixed as the sample size was increased,
–3–
the test statistic would normally tend to blow up, and it would be impossible to talk
about its asymptotic distribution.
Sequences like (5) have not been widely used in econometrics. Most authors who
investigate the asymptotic power of test statistics have been content to conduct their
analysis on the assumption that the sequence of local DGPs actually lies within the
compound alternative against which the test is constructed; see, for example, Engle
(1984). But this makes it impossible to study how the power of a test depends on the
relations among the null, the alternative, and the DGP, so that the ensuing analysis
can shed no light on the difficult question of how to interpret significant test statistics.
One paper which uses a sequence similar to (5) is Davidson and MacKinnon (1982),
which studies the power of various non-nested hypothesis tests. They conclude that
several of the tests are asymptotically equivalent under all sequences of local DGPs, a
stronger conclusion than would be possible using the conventional assumption.
The sequence (5) provides a perfectly general local representation of any regression
model which is sufficiently close to the simple null (6). For example, suppose that we
wish to see how a certain test performs when the data are generated by an alternative
like (1), with γ6=0. We could simply specify the sequence of local DGPs as
y=f(β0, αn−1/2γ0) + u(7)
with the usual assumptions about the vector u, where γ0is fixed, and αdetermines
how far (7) is from the simple null hypothesis (6). Because (7) approaches (6) as
n→ ∞, a first-order Taylor approximation to (7) around α= 0 must yield exactly the
same results, in an asymptotic analysis, as (7) itself. This approximation is
y=f(β0) + αn−1/2Fγ(β0)γ0+u,(8)
where Fγ(β0)≡Fγ(β0,0). We can see that (8) is simply a particular case of (5), with
the vector Fγ(β0,0)γ0playing the role of a.
We now wish to find the asymptotic distribution of the test statistic (4) under the
sequence of local DGPs (5). We first rewrite (4) so that all factors are O(1):
¡n−1/2(y−˜
f)>Z¢¡n−1Z>˜
MβZ¢−1¡n−1/2Z>(y−˜
f)¢/˜σ2.(9)
The factor of n−1/2in (5) ensures that ˜σ2→σ2
0as n→ ∞, and it is obvious that
¡n−1Z>˜
MβZ¢−1→³plim
n→∞
1
−
nZ>MβZ´−1.(10)
Now recall that
f(˜
β)a
=f(β0) + Fβ(Fβ
>Fβ)−1Fβ
>¡y−f(β0)¢,(11)
–4–
where a
= means “is asymptotically equal to”. It follows from (11) that
y−f(˜
β)a
=αn−1/2a+u−Fβ(Fβ
>Fβ)−1Fβ
>(αn−1/2a+u)
=Mβ(αn−1/2a+u).(12)
The test statistic (4) is thus asymptotically equal to
¡αn−1a+n−1/2u¢>MβZ³plim
n→∞
1
−
nZ>MβZ´−1Z>Mβ¡α n−1a+n−1/2u¢/σ2
0.(13)
It is easy to find the asymptotic distribution of (13). First, define P0as an r×r
triangular matrix such that
P0P0
>=³plim
n→∞
1
−
nZ>MβZ´−1,(14)
and then define the r--vector ηas
η≡P0
>Z>Mβ¡αn−1a+n−1/2u¢/σ0.(15)
The test statistic (13) now takes the very simple form η>η; it is just the sum of r
squared random variables which are the elements of the vector η. It is clear that,
asymptotically, the mean of ηis the vector
plim
n→∞³1
−
nP0
>Z>Mβa´(16)
and its variance-covariance matrix is
plim
n→∞³1
−
nσ2
0P0
>Z>MβE(uu>)MβZP0´(17)
=P0
>³plim
n→∞
1
−
nZ>MβZ´P0=Ir.(18)
Since ηis equal to ntimes a weighted sum of random variables with mean zero and
finite variance, and since our assumptions keep those weights bounded from above
and below, a central limit theorem can be applied to it. The test statistic (9) is
thus asymptotically equal to a sum of rindependent squared normal random variates,
each with variance unity, and with means given by the vector (16). Such a sum has
the non-central chi-squared distribution with rdegrees of freedom and non-centrality
parameter, or NCP, given by the squared norm of the mean vector, which in this case
is equal to
α2
σ2
0³plim
n→∞
1
−
na>MβZ´³plim
n→∞
1
−
nZ>MβZ´−1³plim
n→∞
1
−
nZ>Mβa´.(19)
–5–
The NCP (19) can be rewritten in a more illuminating way. Consider the vector
αn−1/2Mβa, the length of which, asymptotically, is
α2plim
n→∞³1
−
na>Mβa´(20)
The quantity (20) is a measure of the distance between the DGP (5) and a linear
approximation to the null hypothesis around the simple null (6); in a sense, it tells us
how “wrong” the model being tested is. Now consider the artificial regression
αn−1/2Mβa/σ0=MβZd + errors.(21)
The total sum of squares for this regression, asymptotically, is expression (20) divided
by σ2
0. The explained sum of squares, asymptotically, is the NCP (19). Thus the
asymptotic uncentered R2from regression (21) is
plim(n−1a>MβZ)¡plim n−1Z>MβZ¢−1plim(n−1Z>Mβa)
plim(n−1a>Mβa).(22)
Expression (22) has an alternative interpretation. Consider the asymptotic projection
of αn−1/2Mβaonto the space spanned by Fβand Zjointly. This projection is
αn−1/2MβZ³plim
n→∞
1
−
nZ>MβZ´−1³plim
n→∞
1
−
nZ>Mβa´.(23)
Now let φbe the angle between the vector αn−1/2Mβaand the projection (23). By
the definition of a cosine, it is easily seen that cos2φis equal to the R2(22). Thus we
may rewrite the NCP (19) as
α2
σ2
0
plim
n→∞³1
−
na>Mβa´cos2φ(24)
or simply as
α2σ−2
0cos2φ(25)
if we normalize the vector aso that plim(n−1a>Mβa) = 1 and rescale αappropriately.
If a test statistic which has the non-central chi-squared distribution with rdegrees of
freedom is compared with critical values from the corresponding central chi-squared
distribution, the power of the test increases monotonically with the NCP. Expression
(25) writes this NCP as the product of three factors. The first factor, α2, measures the
distance between the DGP and the closest point on a linear approximation to the null
hypothesis. Note that this distance in no way depends on Z; the greater α2, the more
powerful any test will be. Like the other two factors, this first factor is independent
of n. For the first factor, that comes about because the DGP approaches the null
hypothesis at a rate of n−1/2. If, on the contrary, the DGP were fixed as nincreased,
–6–
this factor would have to be proportional to n. Of course, the asymptotic analysis
we have done would not be valid if the DGP were fixed, although it would provide a
useful approximation in most cases.
The second factor in expression (25) is σ−2
0. This tells us that the NCP is inversely
proportional to the variance of the DGP, which makes sense because, as the DGP
becomes noisier, it should become harder to reject any null hypothesis. What affects
the NCP is α2/σ2
0, the ratio of the systematic discrepancy between the null and the
DGP to the noise in the latter. Note that this ratio does not depend on Z. It will
be the same for all tests in regression directions of any given null hypothesis with any
given data set.
The most interesting factor in expression (25) is the third one, cos2φ. It is only through
this factor that the choice of Zaffects the NCP. A test will have maximal power, for
a given number of degrees of freedom, when cos2φ= 1, that is, when the artificial
regression (21) has an R2of one. This will be the case whenever the vector ais a
linear combination of the vectors in Fβand Z, which will occur whenever the DGP
lies within the alternative against which the test is constructed. For example, if the
null hypothesis were y=f(β,0), the alternative were y=f(β,γ), and the DGP were
(8), then Zwould be Fγ, and awould be a linear combination of the columns of Fγ;
see (8). Thus the power of a test is maximized when we test against the truth.
On the other hand, a test will have no power at all when cos2φ= 0. This would
occur if Mβawere asymptotically orthogonal to MβZ, something which in general
would seem to be highly unlikely. However, special features of a model, or of the sample
design, may make such a situation less uncommon than one might think. Nevertheless,
it is probably not very misleading to assert that, when a null hypothesis is false in a
regression direction, almost any test in regression directions can be expected to have
some power, although perhaps not very much.
These results make it clear that there is always a tradeoff when we choose what re-
gression directions to test against. By increasing the number of columns in Z, we can
always increase cos2φ, or at worst leave it unchanged, which by itself will increase the
power of the test. But doing so also increases r, the number of degrees of freedom,
which by itself reduces the power of the test. This tradeoff is at the heart of a number
of controversies in the literature on hypothesis testing. These include the debate over
non-nested hypothesis tests versus encompassing tests and the literature on specifica-
tion tests versus classical tests. For the former, see Dastoor (1983) and Mizon and
Richard (1986). For the latter, see Hausman (1978) and Holly (1982).
The tradeoff between cos2φand degrees of freedom is affected by the sample size. As
nincreases, the NCP can be expected to increase, because in reality the DGP is not
approaching the null as n→ ∞, so that a given change in cos2φwill have a larger
effect on power the larger is n. On the other hand, the effect of ron the critical value
is independent of sample size. Thus, when the sample size is small, it is particularly
important to use tests with few degrees of freedom. In contrast, when the sample size
is large, it becomes feasible to look in many directions at once so as to maximize cos2φ.
–7–
If we were confident that the null could only be false in a single direction (that is, if
we knew exactly what the vector amight be), the optimal procedure would be to have
only one column in Z, that column being proportional to a. In practice, we are rarely
in that happy position. There are normally a number of things which we suspect might
be wrong with our model, and hence a large number of regression directions in which
to test. Faced with this situation, there are at least two ways to proceed.
One approach is to test against each type of potential misspecification separately, with
each test having only one or a few degrees of freedom. If the model is in fact wrong
in one or a few of the regression directions in which these tests are carried out, such
a procedure is as likely as any to inform us of that fact. However, the investigator
must be careful to control the overall size of the test, since when one does, say, ten
different tests each at the .05 level, the overall size could be as high as .40. Thus one
should avoid jumping to the conclusion that the model is wrong in a particular way
just because a certain test statistic is significant. Remember that cos2φwill often be
well above zero for many tests, even if only one thing is wrong with the model.
Alternatively, it is possible to test for a great many types of misspecification at once
by putting all the regression directions we want to test against into one big Zmatrix.
This maximizes cos2φ, and hence it maximizes the chance that the test is consistent. It
also makes it easy to control the size of the test. But such a test will have many degrees
of freedom, so that power may be poor when the sample size is small. Moreover, if
such a test rejects the null, that gives us very little information as to what may be
wrong with the model.
It may be possible to make some tentative inferences about the true model by looking
at the values of several test statistics. Suppose that we test a model against several
sets of regression directions, represented by regressor matrices Z1,Z2, and so on, and
thus generate test statistics T1,T2, and so on. Each of the test statistics Tican be
used to estimate the corresponding NCP, say NCPi. Since the mean of a non-central
chi-squared random variable with rdegrees of freedom is rplus the NCP, the obvious
estimate of NCPiis just Ti−ri. It is far from certain that the Ziwith the highest
estimated NCPi, say Z∗
i, actually represents truly omitted directions. Nevertheless,
modifying the model in the directions represented by Z∗
iwould seem to be a reasonable
thing to do in many cases, especially when the number of columns in Z∗
iis small. It
might be useful to perform a test in all the interesting regression directions one can
think of, thus obtaining a test statistic with the largest NCP obtainable. If that test
statistic is not much larger than T∗
i, then one might feel reasonably confident that the
directions represented by Z∗
iadequately capture the discrepancy between the null and
the DGP.
4. Tests that are Robust to Heteroskedasticity of Unknown Form
The distinguishing feature of regression models is that the error term is simply added
to a regression function which determines the mean of the dependent variable. This
greatly simplifies the analysis of such models. In particular, it means that test statistics
–8–
such as (4) are asymptotically valid regardless of how the error terms utare distributed,
provided only that, for all t, E(ut) = 0, E(utus) = 0 for all s6=t, and E(u2
t) = σ2.
Without normality, least squares will not be asymptotically efficient, and tests based on
least squares will not be most powerful, but least squares estimates will be consistent,
and tests based on them will be asymptotically valid.
When the error terms utdisplay heteroskedasticity of unknown form, least squares
estimates remain consistent, but test statistics such as (4) are no longer asymptotically
valid. However, the results of White (1980) make it clear that asymptotically valid
test statistics can be constructed in this case. Davidson and MacKinnon (1985b)
shows how to compute such tests by means of artificial linear regressions and provides
some results on their finite-sample properties. These heteroskedasticity-robust tests
are likely to be very useful when analyzing cross-section data. In this section, we
consider the power properties of such tests.
Under the null hypothesis, the test statistic (4) tends to the random variable
¡n−1/2u>MβZ¢³plim
n→∞
1
−
nZ>MβZ´−1¡n−1/2Z>Mβu¢(26)
as n→ ∞. Thus it is evident that (4) is really testing the hypothesis that
lim
n→∞ E¡n−1/2u>MβZ¢=0.(27)
Now suppose that E(uu>) = Ω, where Ωis an n×ndiagonal matrix with diago-
nal elements σ2
tbounded from above. The asymptotic variance-covariance matrix of
n−1/2u>MβZis
plim
n→∞³1
−
nZ>MβΩMβZ´.(28)
Even though Ωis unknown and has as many unknown elements as there are obser-
vations, the matrix (28) may be estimated consistently in a number of different ways.
One of the simplest is to replace Ωby ˜
Ω, a diagonal matrix with diagonal elements
˜u2
t, the ˜utbeing the residuals from least squares estimation of the null hypothesis. Of
course, Mβmust also be replaced by ˜
Mβ. Other estimators, which may have better
finite-sample properties, are discussed in MacKinnon and White (1985).
It is now straightforward to derive a heteroskedasticity-robust test statistic. Written
in the same form as (9), so that all factors are O(1), it is
¡n−1/2(y−˜
f)>Z¢¡n−1Z>˜
Mβ˜
Ω˜
MβZ¢−1¡n−1/2Z>(y−˜
f)¢.(29)
This test statistic is simply nminus the sum of squared residuals from the artificial
regression
ι=˜
U˜
MβZc + errors,(30)
where ιis an n--vector of ones and ˜
U≡diag(˜ut). That (29) can be calculated in this
simple way follows from the facts that
ι>˜
U˜
MβZ= (y−˜
f)>˜
MβZ= (y−˜
f)>Z(31)
–9–
and
Z>˜
Mβ˜
U>˜
U˜
MβZ=Z>˜
Mβ˜
Ω˜
MβZ.(32)
Finding the asymptotic distribution of the heteroskedasticity-robust test statistic (29)
is very similar to finding the asymptotic distribution of the ordinary test statistic (9).
Under a suitable sequence of local DGPs, it is easy to show that (29) is asymptotically
equal to
¡αn−1a+n−1/2u¢>³plim
n→∞
1
−
nZ>MβΩ0MβZ´−1¡αn−1a+n−1/2u¢.(33)
Here Ω0is the n×ndiagonal covariance matrix of the error terms in a sequence of
local DGPs similar to (5) except for the heteroskedasticity. An argument very similar
to that used earlier can then be employed to show that, asymptotically, (33) has the
non-central chi-squared distribution with rdegrees of freedom and NCP
α2³plim
n→∞
1
−
na>MβZ´³plim
n→∞
1
−
nZ>MβΩ0MβZ´−1³plim
n→∞
1
−
nZ>Mβa´.(34)
Like the earlier NCP (19), expression (34) can also be interpreted as the explained
sum of squares from a certain artificial regression. In this case, the tth element of the
regressand is αn−1/2(Mβa)tσt, and the tth row of the regressor matrix is σt(MβZ)t.
The NCP may then be written as
α2plim
n→∞³1
−
na>MβΩ0Mβa´ψ, (35)
where ψdenotes the uncentered, asymptotic R2from the artificial regression.
Expression (35) resembles expression (24), but it differs from it in two important
respects. First of all, the factor which measures the distance between the DGP and
the null hypothesis relative to the noisiness of the DGP is now
α2plim
n→∞³1
−
na>MβΩ0Mβa´
rather than α2
σ2
0
plim
n→∞³1
−
na>Mβa´.
Secondly, although ψplays the same role as cos2φand is the only factor which is
affected by the choice of Z,ψdoes not have quite the same properties as cos2φ. It
is possible to make ψzero by choosing Zappropriately, but it is usually not possible
to make ψunity even by choosing Zso that alies in the span of the columns of Z.
That would of course be possible if Ω0were proportional to the identity matrix, in
which case (35) and (24) would be identical. Thus, when there is no heteroskedasticity,
the heteroskedasticity-robust test statistic (29) is asymptotically equivalent, under all
–10–
sequences of local DGPs, to the ordinary test statistic (9). There will, however, be
some loss of power in finite samples; see Davidson and MacKinnon (1985b).
It is clear from expression (35) that when there is in fact heteroskedasticity, the pattern
of the error variances will affect the power of the test. Multiplying all elements of Ω0
by a factor λwill of course reduce the NCP by a factor 1/λ, as in the homoskedastic
case. But changes in the pattern of heteroskedasticity, even if they do not affect the
average value of σt, may well affect plim(n−1Z>MβΩ0MβZ) and hence affect the
power of the test.
As a result of this, the interpretation of heteroskedasticity-robust tests is even harder
than the interpretation of ordinary tests in regression directions. As discussed in
Section 4, we know in the ordinary case that the NCP will be highest when we test
against the truth, and so looking at several test statistics can provide some guidance as
to where the truth lies. In the heteroskedasticity-robust case, however, things are not
so simple. It is entirely possible that amay lie in the span of the columns of Fβand
Z1, so that a test against the directions represented by Z1is in effect a test against
the truth, and yet the NCP may be substantially higher when testing against some
quite different set of directions represented by Z2. Thus, in the common situation in
which several different tests reject the null hypothesis, it may be far from obvious how
the model should be modified.
5. Tests in Mixed Non-Regression Directions
The popularity of regression models is easy to understand. They have evolved naturally
from the classical problem of estimating a mean; they are easy to write down and
interpret; and they are usually quite easy to estimate. The regression specification is,
however, very restrictive. By forcing the error term to be additive, it greatly limits
the way in which random events outside the model can affect the dependent variable.
In order to see whether this is in fact a severe restriction, careful applied workers will
usually wish to test their models in non-regression as well as regression directions.
As we saw above, regression directions are those which correspond, at least locally, to
a more general regression model in which the null is nested. Non-regression directions,
then, are those which correspond either to a regression model with a different error
structure (as in tests for heteroskedasticity, skewness, and kurtosis), or to a more
general non-regression model. We shall refer to the latter as “mixed” directions, since
they typically affect both the mean and the higher moments of the dependent variable.
One non-regression model which has been widely used in applied econometrics is the
Box-Cox regression model; see, among others, Zarembka (1974) and Savin and White
(1978). This model can be written as
yt(λ) =
k
X
i=1
βiX1
ti(λ) +
m
X
j=1
γjX2
tj +ut,(36)
–11–
where x(λ) denotes the Box-Cox transformation:
x(λ)≡xλ−1
λif λ6= 0; x(λ)≡log xif λ= 0.(37)
Here the X1
ti are regressors which may sensibly be subjected to the Box-Cox transfor-
mation, while the X2
tj are ones which cannot sensibly be thus transformed, such as the
constant term, dummy variables, and variables which can take on non-positive values.
Conditional on λ, the Box-Cox model (36) is a regression model. When λ= 1 and
there is a constant term, it is a linear model, and when λ= 0, it is a loglinear
one. Either of these null hypotheses may be tested against the Box-Cox alternative
(36), and numerous procedures exist for doing so. These are examples of tests in
mixed directions, because (36) is not a regression model when λis a parameter to be
estimated. The reason is that the regression function in (36) determines the mean of
yt(λ) rather than the mean of the actual dependent variable. If we were to rewrite (36)
so that ytwas on the left-hand side, it would be clear that ytis actually a nonlinear
function of the X1
ti, the X2
tj , and ut.
The obvious way to test both linear and loglinear null hypotheses against the Box-Cox
alternative (36) is to use some form of the Lagrange Multiplier test. Several such
tests have been proposed. In particular, Godfrey and Wickens (1981) suggest using
the “outer product of the gradient”, or OPG, variant of the LM test, while Davidson
and MacKinnon (1985c) suggest using the “double-length regression”, or DLR, variant.
Each of these variants can be computed by means of a single artificial linear regression,
and each can handle a wide variety of tests in mixed non-regression directions.
In the remainder of this section, we shall discuss the OPG variant of the LM test. We
shall not discuss the DLR variant, which was originally proposed by Davidson and
MacKinnon (1984), even though it appears to have substantially better finite-sample
properties than the OPG variant. Discussion of the asymptotic power properties of
the DLR variant may be found in Davidson and MacKinnon (1985a, 1985c), and there
would be no point in repeating that discussion here. Moreover, the DLR variant is
applicable to a narrower class of models than the OPG variant, and it is somewhat
more complicated to analyze. Note that our results do not apply merely to the OPG
form of the LM test; they are equally valid for any form of the LM test, and for
asymptotically equivalent Wald and Likelihood Ratio tests as well.
The OPG variant of the LM test is applicable to any model for which the loglikelihood
function may be written as a sum of the contributions from all the observations. Thus,
if `t(yt,θ1,θ2) denotes the contribution from the tth observation (possibly conditional
on previous observations), the loglikelihood function is
L(y,θ1,θ2) =
n
X
t=1
`t(yt,θ1,θ2).(38)
The derivatives of `t(yt,θ1,θ2) with respect to the ith element of θ1and the jth
element of θ2will be denoted G1
ti(·) and G2
tj (·), respectively. These derivatives may
–12–
then be formed into the n×kand n×rmatrices G1and G2. The null hypothesis is
that θ2=0. For the computation of the LM test, all quantities will be evaluated at
the restricted ML estimates ˜
θ1and 0.
In the particular case of testing the null hypothesis of loglinearity against the Box-Cox
alternative (36), θ1would be a vector of the βi, the γj, and σ(or σ2), and θ2would be
the scalar λ. Assuming normality, it is easy to write down the loglikelihood function,
and we see that
`t=−1
−
2log 2π−log σ−1
−
2u2
t/σ2+ (λ−1) log yt,(39)
where utis implicitly defined by (36). It is now easy to calculate the elements of the
matrix Gand to evaluate them under the null hypothesis that λ= 0, which simply
requires that one estimate the loglinear null by OLS and obtain estimates ˜
βi, for
i= 1, . . . , k, ˜γj, for j= 1, . . . , m, and ˜σ; see Godfrey and Wickens (1981).
The OPG form of the LM test is remarkably easy to calculate. It is simply the
explained sum of squares (or nminus the sum of squared residuals) from the artificial
regression
ι=˜
G1b+˜
G2c+ errors,(40)
where ιis an n--vector of ones, ˜
G1=G1(˜
θ1,0), and ˜
G2=G2(˜
θ1,0). The explained
sum of squares from regression (40) is
ι>˜
G(˜
G>˜
G)−1˜
G>ι=ι>˜
G2(˜
G2
>˜
M1˜
G2)−1˜
G2>ι,(41)
where ˜
M1=I−˜
G1(˜
G1
>˜
G1)−1˜
G1
>
. The equality in (41) follows from the facts that the
sum of squared residuals from regression (40) is identical to that from the regression
˜
M1ι=˜
M1˜
G2c+ errors (42)
and that ι>˜
G1=0by the first-order conditions for ˜
θ1.
Just as a test in regression directions may look in any such directions, not merely
those which are suggested by an explicit alternative hypothesis, so may a test in
non-regression directions. It is clearly valid to replace ˜
G2in regression (40) by any
n×rmatrix Z, provided that the elements of Zare O(1) and that the asymptotic
expectation of the mean of every column of Zis zero under the null hypothesis. Newey
(1985) and Tauchen (1985) exploit this fact to propose families of specification tests,
while Lancaster (1984) uses it to provide a simple way to compute the information
matrix test of White (1982).
What we are interested in, then, is the asymptotic distribution of the test statistic
¡n−1/2ι>Z¢³1
−
nZ>˜
M1Z´−1¡n−1/2Z>ι¢,(43)
which is the explained sum of squares from the artificial regression
ι=˜
G1b+Zc + errors,(44)
–13–
written in such a way that all factors are O(1). Following Davidson and MacKin-
non (1987), we shall suppose that the data are generated by a process which can be
described by the loglikelihood
L0=
n
X
t=1¡`t(yt,θ0
1,0) + αn−1/2at(yt)¢.(45)
Here θ0
1is a vector of fixed parameters which determines the simple null hypothesis to
which the sequence of DGPs (45) tends as n→ ∞,αis a parameter which determines
how far (45) is from that simple null, and the at(yt) are random variables which are
O(1) and have mean zero under the null. The sequence of local DGPs (45) plays the
same role here as the sequence (5) did in our earlier analysis. The major difference
between the two is that (45) is written in terms of loglikelihoods, so that the DGP
may differ from the null hypothesis in any direction in likelihood space, while (5) was
written in terms of regression functions, so that the DGP could only differ from the
null in regression directions.
Using results of Davidson and MacKinnon (1987), it is possible to show that, under
the sequence of local DGPs (45), the statistic (43) is asymptotically distributed as
non-central chi-squared with rdegrees of freedom and non-centrality parameter
α2³plim
n→∞
1
−
na>M1Z´³plim
n→∞
1
−
nZ>M1Z´−1³plim
n→∞
1
−
nZ>M1a´,(46)
where M1≡I−G1(G1
>G1)−1G1>and G1≡G(θ0
1,0). The similarity between ex-
pressions (46) and (19) is striking and by no means coincidental. Note that M1plays
exactly the same role here that Mβdid previously, that aand Zplay the same roles
as before, although of course their interpretation is different, and that σ2
0has no place
in (46), because the variance parameters (if any) are subsumed in θ1and/or θ2.
Consider the vector αn−1/2M1a. Its asymptotic projection onto the space spanned
by G1and Zjointly is
αn−1/2M1Z³plim
n→∞
1
−
nZ>M1Z´−1³plim
n→∞
1
−
nZ>M1a´(47)
If φdenotes the angle between αn−1/2M1aand the projection (47), then
cos2φ=plim(n−1a>M1Z)¡plim n−1Z>M1Z¢−1plim(n−1Z>M1a)
plim(n−1a>M1a),(48)
which is the uncentered asymptotic R2from the artificial regression
αn−1/2M1a=M1Zb + errors.(49)
Thus the NCP (46) may be rewritten as
α2³plim
n→∞
1
−
na>M1a´cos2φ. (49)
–14–
The interpretation of expression (50) is almost exactly the same as the interpretation
of expression (24). The first two factors measure the distance between the DGP
and the closest point on a linear approximation to the null hypothesis. The larger
these factors, the greater will be the power of any test statistic like (43), or of any
asymptotically equivalent test statistic. The choice of Zonly affects the NCP through
cos2φ, and a test will have maximal power, for a given number of degrees of freedom,
when cos2φ= 1. This will be the case if ais a linear combination of the vectors in G1
and Z, which will happen whenever the DGP lies within the alternative against which
the test is constructed. Thus, once again, the power of a test is maximized when we
test against the truth.
When cos2φ= 0, a test will have no power at all asymptotically. This is a situation
which is likely to arise quite often when using tests in non-regression directions. For
example, it can be shown that cos2φ= 0 whenever the DGP is in a higher moment
direction and we test in regression directions, or vice versa. This is true for essentially
the same reason that the information matrix for a regression model is block-diagonal
between the parameters which determine the regression function and those which de-
termine the higher moments of the error terms. Notice that when Zhas only one
column, expression (48) for cos2φis symmetrical in aand Z; thus, if a test against
alternative 1 has power when alternative 2 is true, a test against alternative 2 must
have power when alternative 1 is true.
The artificial regression (49) may actually be used to compute NCPs, and values of
cos2φ, for models and tests where it is too difficult to work them out analytically. This
requires a computer simulation, in which nis allowed to become large enough so that
the probability limits in expression (46) are calculated with reasonable accuracy. Such
a procedure may tell us quite a lot about the ability of certain test statistics to pick
up various types of misspecification.
As an illustration of this technique, we consider certain tests for functional form. The
null hypothesis is the linear regression model which emerges from the Box-Cox model
(36) when λ= 1:
yt= 1 +
k
X
i=1
βi(X1
ti −1) +
m
X
j=1
γjX2
tj +ut.(51)
If one of the X2
tj is a constant term, the various 1s that appear on the right-hand side
of equation (51) can be ignored. The data are assumed to be generated by the Box-Cox
model, or, more precisely, by a sequence of local approximations to that model so that
the loglikelihood function has the form of expression (45). The artificial regression
(49) will be used to compute cos2φfor three tests of the model (51), none of which is
a classical test of (51) against (36).
First of all, we consider the LM test of (51) against the model
yt(λ) =
k
X
i=1
βiX1
ti +
m
X
j=1
γjX2
tj +ut,(52)
–15–
in which the Box-Cox transformation is applied only to the dependent variable. This
model is often just as plausible as (36), and would seem to be a reasonable alternative
to test against in many cases.
Secondly, we consider a test originally proposed by Andrews (1971) and later extended
by Godfrey and Wickens (1981). The basic idea of the Andrews test is to replace the
non-regression direction in which a classical test of the linear or loglinear null against
(36) would look with a regression direction which approximates it. One first takes a
first-order Taylor approximation to the Box-Cox model (36) around λ= 0 or λ= 1.
The term which multiplies λin the Taylor approximation necessarily involves yt, which
is replaced by the fitted value of ytfrom estimation under the null. This yields an OLS
regression, which looks exactly the same as the original regression, with the addition
of one extra regressor. The tstatistic on that regressor is the test statistic, and, under
the usual conditions for ttests to be exact, it actually has the Student’s tdistribution
with n−k−m−1 degrees of freedom. As Davidson and MacKinnon (1985c) showed,
and as we will see shortly, the test regressor often provides a poor approximation to
the true non-regression direction of the Box-Cox model, so that the Andrews test can
be seriously lacking in power.
Finally, we will consider a test against a particular form of heteroskedasticity, namely
that associated with the model
yt=Xtβ+ (1 + αXtβ)ut, ut∼NID(0, σ2).(53)
When αis greater than zero, this model has heteroskedastic errors with variance
proportional to (1 + αXtβ)2. It is well-known to users of the Box-Cox transformation
that estimates of models like (36) are very sensitive to heteroskedasticity, and so it
seems likely that a test of α= 0 in (53) will have power when the DGP is actually the
Box-Cox model.
The contribution of the tth observation to the loglikelihood function for the Box-Cox
model (36) is given by expression (39). In order to approximate this expression around
λ= 1 by a sequence like (45), we must set
at(yt) = log yt−ut
σ2³h(yt)−
k
X
i=1
h(X1
ti)´,(54)
where
h(x) = xlog x−x+ 1.(55)
The vector adepends only on the DGP, and it will be the same for any test we wish to
analyze. If we were interested in the power of an LM test of (51) against (36), Zwould
be a vector and would be identical to the vector a, so that cos2φwould necessarily
be unity. In fact, however, we are interested in testing (51) against the alternative
Box-Cox model (52) and the heteroskedastic model (53), and in the Andrews test. For
the first, we find that
Zt= log yt−ut
σ2h(yt).(56)
–16–
For the second, we find that
Zt=Xtβ(u2
t/σ2−1),(57)
and for the third, we find that
Zt=ut
σ2³h(¯yt)−
k
X
i=1
h(X1
ti)´,(58)
where ¯ytis the non-stochastic part of yt. The matrix Gwill be the same for all the
tests and is easily derived.
In order actually to compute cos2φby regression (49), we must specify the model and
DGP more concretely than has been done so far. For simplicity, we examine a model
with only one regressor in addition to the constant term. The regressor is constant
dollar quarterly GNP for Canada for the period 1955:1 to 1979:4 (100 observations).
The constant term is chosen to be 1000, and the coefficient of the other regressor
unity. Based on results in Davidson and MacKinnon (1985c), we expect cos2φto be
very sensitive to the choice of σ, so the calculations are performed for a range of values.
In order to obtain reasonably accurate approximations to probability limits, nis set to
5000; this involves repeating the actual 100 observations on the regressor fifty times.
The results presented in Table 1 are averages over 200 replications, and they are quite
accurate.
The results in Table 1 are quite striking. When σis small, cos2φis very close to
unity for the Andrews test and the alternative Box-Cox test, and it is very close to
zero for the heteroskedasticity test. Note that σ= 10 is very small indeed, since
the mean value of the dependent variable is 21394. For the Andrews test, cos2φ
then declines monotonically towards zero as σincreases, a result previously noted
by Davidson and MacKinnon (1985c). This could have been predicted by looking
at expressions (54) and (58) for atand Zt, respectively. The behavior of cos2φfor
the other two tests is more interesting. For the alternative Box-Cox test, it initially
declines, essentially to zero, but then begins to increase again as σis increased beyond
500. By examining expressions (54) and (56), we can see the reason for this: When
σis large, the second terms in these expressions become small, and the first terms,
which are log ytin both cases, become dominant. For the heteroskedasticity test, cos2φ
initially rises as σincreases from zero, but it reaches a maximum around σ= 1000
and then falls somewhat thereafter. The reason for this is not entirely clear; possibly
the fact that (57) has no log ytterm begins to matter as σgets large.
This example illustrates that, once we leave the realm of regression models, the power
of a test may depend in quite a complicated way on the parameters of the DGP, as
well as on the structure of the null, the alternative and the DGP. Thus techniques for
computing cos2φmay be quite useful in practice. The technique we have used here is
very widely applicable and quite easy to use, but it may be computationally inefficient
in many cases. When LM tests can be computed by means of double-length regressions
–17–
(Davidson and MacKinnon, 1984), a more efficient but basically similar technique is
available; see Davidson and MacKinnon (1985a, 1985c).
This example also shows that approximating a mixed non-regression direction by a
regression direction may yield a test with adequate power, as in the case of the Andrews
test with σsmall, but it may also yield a test with very low power, as in the case of the
same test with σlarge. Despite the possibly large loss of power, there may sometimes
be a reason to do this. Tests in regression directions are asymptotically insensitive to
misspecification of the error process, such as normality. Moreover, the techniques of
Section 4 can be used to make tests in regression directions robust to heteroskedasticity.
Thus, by applying the artificial regression (30) to the Andrews test regression, one
could obtain a heteroskedasticity-robust test of linear and loglinear models against
Box-Cox alternatives. If such a test rejected the null hypothesis, and the sample was
reasonably large, one could be quite confident that rejection was justified.
6. Conclusion
Any test of an econometric model can be thought of as a test in certain directions in
likelihood space. If the null is a regression model, these may be regression directions,
higher moment directions, or mixed non-regression directions. The power of a test
will depend on the model being tested, the process that generated the data, and
the directions in which the test is looking. Section 3 provided a detailed analysis of
what determines power when the null hypothesis is a univariate nonlinear regression
model, the DGP is also a regression model, and we are testing in regression directions.
Section 4 extended this analysis to the case of heteroskedasticity-robust tests and
obtained the surprising result that a test may not have highest power when looking
in the direction of the truth. Section 5 then considered a much more general case,
in which the null and the DGP are merely described by loglikelihood functions, and
tests may look in any direction. The results are remarkably similar to those for the
regression case, and they are concrete enough to allow one to compute the power of
test statistics in a variety of cases.
–18–
References
Andrews, D. F. (1971). “A note on the selection of data transformations,”
Biometrika, 58, 249–254.
Dastoor, N. K. (1983). “Some aspects of testing non-nested hypotheses,” Journal of
Econometrics, 21, 213–228.
Davidson, R., and J. G. MacKinnon (1981). “Several tests for model specification in
the presence of alternative hypotheses,” Econometrica, 49, 781–793.
Davidson, R., and J. G. MacKinnon (1982). “Some non-nested hypothesis tests and
the relations among them,” Review of Economic Studies, 49, 551–565.
Davidson, R., and J. G. MacKinnon (1984). “Model specification tests based on
artificial linear regressions,” International Economic Review, 25, 485–502.
Davidson, R., and J. G. MacKinnon (1985a). “The interpretation of test statistics,”
Canadian Journal of Economics, 18, 38–57.
Davidson, R., and J. G. MacKinnon (1985b). “Heteroskedasticity-robust tests in
regression directions,” Annales de l’INSEE, 59/60, 183–218.
Davidson, R., and J. G. MacKinnon (1985c). “Testing linear and loglinear
regressions against Box-Cox alternatives,” Canadian Journal of Economics, 25,
499–517.
Davidson, R., and J. G. MacKinnon, (1987). “Implicit alternatives and the local
power of test statistics,” Econometrica, 55, 1305–1329.
Davidson, R., L. G. Godfrey, and J. G. MacKinnon (1985). “A simplified version of
the differencing test,” International Economic Review, 26, 639–647.
Engle, R. F. (1982). “A general approach to Lagrange Multiplier model diagnostics,”
Journal of Econometrics, 20, 83–104.
Engle, R. F. (1984). “Wald, likelihood ratio and Lagrange multiplier tests in
econometrics,” in Z. Griliches and M. Intriligator, ed., Handbook of Econometrics.
Amsterdam: North Holland.
Godfrey, L. G., and Wickens, M. R. (1981). “Testing linear and log-linear regressions
for functional form,” Review of Economic Studies, 48, 487–496.
Hausman, J. A. (1978). “Specification tests in econometrics,” Econometrica, 46,
1251–1272.
Holly, A. (1982). “A remark on Hausman’s specification test. Econometrica, 50,
749–759.
Lancaster, T. (1984). “The covariance matrix of the information matrix test,”
Econometrica, 52, 1051–1053.
–19–
MacKinnon, J. G., and H. White (1985). “Some heteroskedasticity consistent
covariance matrix estimators with improved finite sample properties,” Journal of
Econometrics, 29, 305–325.
Mizon, G. E., and J.-F. Richard (1986). “The encompassing principle and its
application to testing non-nested hypotheses,” Econometrica, 54, 657–678.
Newey, W. K. (1985). “Maximum likelihood specification testing and conditional
moment tests,” Econometrica, 53, 1047–1070.
Pitman, E. J. G. (1949). “Notes on non-parametric statistical inference,” Columbia
University, mimeographed.
Savin, N. E., and K. J. White (1978). “Estimation and testing for functional form
and autocorrelation: A simultaneous approach,” Journal of Econometrics, 8, 1–12.
Tauchen, G. E. (1985). “Diagnostic testing and evaluation of maximum likelihood
models,” Journal of Econometrics, 30, 415–443.
White, H. (1980). “A heteroskedasticity-consistent covariance matrix estimator and
a direct test for heteroskedasticity,” Econometrica, 48, 817–838.
White, H. (1982). “Maximum likelihood estimation of misspecified models,”
Econometrica, 50, 1–25.
Zarembka, P. (1974). “Transformation of variables in econometrics,” in P. Zarembka
(ed.), Frontiers in Econometrics, New York, Academic Press.
Table 1. Calculations of cos2φ
σAlternative Box-Cox Test Andrews Test Heteroskedasticity Test
10 0.9784 0.9913 0.0095
20 0.9508 0.9661 0.0349
50 0.7907 0.8198 0.1767
100 0.4767 0.5318 0.4574
200 0.1439 0.2207 0.7621
500 0.0025 0.0431 0.9342
1000 0.1220 0.0113 0.9530
1500 0.2958 0.0057 0.9377
2000 0.4596 0.0033 0.9115
2500 0.5932 0.0025 0.8765
All figures were calculated numerically using n= 5000 and 200 replications.
Standard errors never exceed 0.0022 and are usually much smaller.
–20–