Content uploaded by Njål Foldnes
Author content
All content in this area was uploaded by Njål Foldnes on Oct 11, 2016
Content may be subject to copyright.
arXiv:1610.02207v1 [math.ST] 7 Oct 2016
NEW TESTING PROCEDURES
FOR STRUCTURAL EQUATION MODELING
STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
Abstract. We introduce and evaluate a new class of hypothesis testing proce-
dures for moment structures. The methods are valid under weak assumptions
and includes the well-known Satorra-Bentler adjustment as a special case. The
proposed procedures applies also to difference testing among nested models.
We prove the consistency of our approach. We introduce a bootstrap selection
mechanism to optimally choose a p-value approximation for a given sample.
Also, we propose bootstrap procedures for assessing the asymptotic robustness
(AR) of the normal-theory maximum likelihood test, and for the key assump-
tion underlying the Satorra-Bentler adjustment (Satorra-Bentler consistency).
Simulation studies indicate that our new p-value approximations performs well
even under severe nonnormality and realistic sample sizes, but that our tests
for AR and Satorra-Bentler consistency require very large sample sizes to work
well. R code for implementing our methods is provided.
1. Introduction
In testing hypotheses in psychometrics, test statistics often converge in law to
a mixture of independent chi squares, under the null hypothesis of correct model
specification. This paper presents novel methods for calculating p-values based on
such test statistics, and a novel selection procedure aimed at identifying the best p-
value among any given set of candidate p-value procedures. Although the proposed
methods can be used in the general setting of moment structure inference, we here
focus on the framework of structural equation modeling (SEM).
As shown in Shapiro (1983) and Satorra (1989), a large class of p-values in the
context of SEM originates from convergence in distribution results (derived under
the null hypothesis) for a test statistic Tnbased on nobservations, of the form
(1) Tn
D
−−−−→
n→∞
d
X
j=1
λjZ2
j, Z1,...,Zd∼N(0,1) IID,
where λ= (λ1, . . . , λd)′consists of unknown population parameters. If λwas
known, eq. (1) motivates the “oracle” p-value
(2) pn=P
d
X
j=1
λjZ2
j> Tn
.
Date: October 10, 2016.
1
2 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
The above probability is with respect to Z1,...,Zd, while Tnis considered fixed.
In a practical setting λis however unknown. Let ˆ
λbe a a consistent estimator of
λ, i.e., ˆ
λP
−−−−→
n→∞ λ. In the present article we propose to estimate pnby
(3) ˆpn=P
d
X
j=1
ˆ
λjZ2
j> Tn
,
where the probability is with respect to Z1,...,Zd.
We show that for large samples, the error originating from replacing λwith ˆ
λis
vanishing, so that ˆpn−pnconverges in probability to 0. The estimator ˆpndefined
above is the canonical member of a new class of estimators, obtained by grouping
the ˆ
λjby magnitude and replacing them by group means in order to reduce variance
in ˆpn. Although the idea behind this new class of p-value approximations is simple,
we are unaware that it is found previously in the literature.
Since we introduce a whole class of p-value approximations, where no member
seems to be uniformly best in all conditions, we also introduce a selector to aid
the user in choosing which p-value approximation to apply. The core idea of this
selector is to choose the p-value approximation whose distribution is closest to the
uniform, as measured by the supremum distance. This is achieved through the non-
parametric bootstrap and is seen to work very well in our simulation experiments.
The paper is structured as follows. In Section 2 we review fit statistics of mo-
ment structures with a special emphasis on the well-known Satorra-Bentler (SB)
statistic. Section 3 proposes a class of new procedures, that incorporates the SB
statistic as a special case, to evaluate model fit and parameter restrictions in co-
variance models. We give conditions under which the estimators are consistent,
which implies the fundamental p-value property of converging in distribution to a
uniform distribution. In Section 4 we introduce a bootstrap procedure that selects,
for a given sample, a good candidate among a list of p-value approximations. Next,
in Section 5 we introduce a bootstrap test for assessing whether the normal-theory
maximum likelihood test statistic may be trusted, i.e., whether asymptotic robust-
ness holds. Also, we introduce a test for the consistency of the SB statistic, which
may help decide whether the SB statistic may be trusted. Monte Carlo results on
the performance of the proposed new methods are presented in Section 6. In the
final section we discuss our findings and point out further directions for research.
Proofs of theoretical results are presented in the appendix.
2. Fit statistics for moment structure models
Consider a p-dimensional vector of population moments σ◦. In covariance mod-
eling, σ◦consists of second-order moments, but in more general structural equation
models the means may also be included in σ◦. The corresponding sample mo-
ment vector sis assumed to converge in probability to σ◦, i.e., sP
−−−−→
n→∞ σ◦, and
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 3
be asymptotically normal, i.e., √n(s−σ◦)D
−−−−→
n→∞ N(0,Γ). A structural equation
model implies a certain parametrization θ7→ σ(θ) with θvarying in a set Θ. Let
the free parameters in the proposed model be contained in the q-vector θ. The
model has degrees of freedom given by d=p−q.
The model is said to be correctly specified if there is a θ◦∈Θ such that σ(θ◦) =
σ◦. A very general class of estimators for θ◦introduced by Browne (1982, 1984) is
obtained by minimising discrepancy functions F=F(s, σ ) that obey the following
three conditions: F(s, σ)≥0 for all s, σ;F(s, σ) = 0 if and only if s=σ; and Fis
twice continuously differentiable jointly. That is, we consider estimators obtained
as
ˆ
θ= argmin
θ∈Θ
F(s, σ(θ)).
It is well known that the widely used normal-theory maximum likelihood (NTML)
estimator is such a minimal discrepancy estimator.
Similarly, we may define the least false parameter configuration, which we denote
with θ◦. That is,
θ◦= argmin
θ∈Θ
F(σ◦, σ(θ)).
Irrespective of the correctness of the model, we have ˆ
θP
−−−−→
n→∞ θ◦under mild regu-
larity conditions.
One, out of several mainly asymptotically equivalent (see Satorra, 1989) ways of
assessing the correctness of the model is to study Tn=nF (s, σ (ˆ
θ)). If the model
is misspecified, i.e. if σ◦6=σ(θ◦), then Tn→ ∞ since sP
−−−−→
n→∞ σ◦6=σ(θ◦). Under
correct model specification and other assumptions presented in Shapiro (1983) and
Satorra (1989), we have Tn=√n(s−σ◦)′U√n(s−σ◦) + oP(1).Assuming (for
simplicity) that ∆′V∆ is non-singular (see comment immediately following eq. (9)
in Satorra (1989)) where ∆ is the p×qderivative matrix ∂σ(θ)/∂θ′evaluated at
θ◦, and V=1
2
∂2F(s,σ)
∂s∂σ , evaluated at (σ◦, σ◦), we have
(4) U=V−V∆{∆′V∆}−1∆′V.
Note that Uhas rank d. Since we assume √n(s−σ◦)D
−−−−→
n→∞ Q∼N(0,Γ), the
continuous mapping theorem now implies that Tn
D
−−−−→
n→∞ Q′UQ. By Theorem 1 in
Box (1954), we have
(5) Tn
D
−−−−→
n→∞ Q′UQ =
d
X
j=1
λjZ2
j, Z1,...,Zd∼N(0,1) IID,
where λ1,...,λdare the dnon-zero eigenvalues of UΓ under the standard scaling
of the eigenvectors. That is, the parameters λin eq. (1) are eigenvalues of a cer-
tain matrix that depends both on the underlying distribution and on the proposed
model. Note that estimating Uand Γ is a standard problem in moment models
which we will not discuss in technical detail. The usual estimators are based on
4 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
replacing expectations with averages of the observed data, and the true least-false
parameter θ◦by the estimator ˆ
θ. This is the estimator readily available in software
packages such as the Rpackage lavaan (Rosseel, 2012). We here assume that con-
sistent estimators ˆ
U,ˆ
Γ and ˆ
λare given. We may use the plug-in method to form
ˆ
λ, so that ˆ
λis the dnon-zero eigenvalues of ˆ
Uˆ
Γ under the standard scaling of the
eigenvectors.
Note that the asymptotically distribution-free (ADF) estimator of Browne (1984),
where the estimate is obtained by minimising a quadratic form whose weight matrix
is the inverse of a distribution-free estimate of Γ, yields a test statistic TADF whose
population eigenvalues are all equal to one. Hence ADF estimation leads to consis-
tent p-values for model fit. However, ADF estimation is unstable in small samples,
and it is well-known that TADF has unacceptably poor performance in small and
medium samples sizes (Curran et al., 1996; Hu et al., 1992). Another test statistic
with consistent p-value approximation is the residual-based test statistic (Browne,
1984, eq. 2.20), which is not of the form Tn=n F (s, σ(ˆ
θ)) investigated in the
present article. Unfortunately this statistic suffers from the same lack of accept-
able finite-sample performance as TADF. Therefore, a more popular approach has
been to use normal-theory based estimators, and to correct the test statistic for
non-normality in the data. We now proceed to describe such methods.
Based on the convergence result in eq. (1), Satorra & Bentler (1994) proposed
to rescale Tnby dividing it by the mean value of the eigenvalues to form
TSB =Tn
ˆc,
where ˆc=Pd
j=1 ˆ
λj
d. Using TSB as a test statistic is a widely used SEM practice
under conditions of non-normal data. Simulation studies report that TSB out-
performs the NTML fit statistic TML in such conditions, but that Type I error
rates under TSB are seriously inflated under substantial excess kurtosis in the data
(Bentler & Yuan, 1999; Nevitt & Hancock, 2004; Savalei, 2010; Foldnes & Olsson,
2015). Also, Yuan & Bentler (2010) theoretically demonstrated that TSB departs
from a chi-square with increasing dispersion of the eigenvalues in (1).
Recently Asparouhov & Muth´en (2010) proposed a test statistic that agrees with
the reference chi-square distribution in both asymptotic mean and variance, ob-
tained from TML by scaling and shifting. This statistic, found to perform slightly
better (Foldnes & Olsson, 2015) than a Sattertwaithe type test statistic proposed
by Satorra & Bentler (1994), is given by
TSS =v
u
u
t
d
tr (ˆ
Uˆ
Γ)2·Tn+d−v
u
u
u
u
t
dtr(ˆ
Uˆ
Γ)2
tr (ˆ
Uˆ
Γ)2.
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 5
A quite different testing methodology is offered by the so-called Bollen-Stine
bootstrap (Bollen & Stine, 1992), which is based on the non-parametric boot-
strap (Efron & Tibshirani, 1994). Instead of starting with the fundamental re-
sult in eq. (1), one starts by transforming the sample observations Xiinto ˜
Xi=
Σ(ˆ
θ)1/2S−1/2
nXifor i= 1,2,...,n, where Snand Σ(ˆ
θ) are the sample and model-
implied covariance matrices, respectively. Noting that the model holds exactly in
this transformed sample, we proceed by assuming that the transformed sample
may serve as a proxy for the population from which the original sample was drawn.
The Bollen-Stine p-value is now obtained by drawing bootstrap samples from the
transformed sample, and calculating the proportion of bootstrap test statistics that
exceed the test statistic obtained from the original sample. The validity of this ap-
proach is derived in Beran & Srivastava (1985). Nevitt & Hancock (2001) report
that Bollen-Stine bootstrapping outperformed the SB scaling approach under cor-
rect model specification at realistic sample sizes, having Type I errors slightly below
the nominal level. Despite the promising performance of the Bollen-Stine bootstrap,
it seems to be relatively understudied. In fact, we are not aware of any later simu-
lation study that systematically evaluates its performance relative to other robust
test statistics.
Our upcoming selection methodology to be described in Section 4 will fuse the
two ideas discussed above, trying to combine the strength of the convergence in
eq. (1) with the power of the non-parametric bootstrap. Our tests for AR and
Satorra-Bentler consistency described in Section 5, are also based on the non-
parametric bootstrap. Before describing these bootstrap based methods, we return
to the fundamental convergence result in eq. (1) and present new approximations
for the oracle p-value.
3. A new class of p-value approximations
In this section we introduce and establish the consistency of a new computational
technique for p-values. The proposed methodology applies as long as the null
distribution of a test statistic is a weighted sum of independent chi squares and the
weights can be estimated consistently. This means that the method may be used
both for conventional goodness-of-fit testing of a single proposed model, and for
nested model comparison tests. Consistency is established in Theorem 1, and the
proof is found in Appendix A.
The convergence result in eq. (1) is only valid if the model is correctly specified.
But we here note that UΓ is defined also when the model is misspecified, and that
the number of non-zero eigenvalues is known to be dfrom the model configuration.
We may therefore speak of and estimate λ= (λ1, λ2,...,λd)′without knowing if
the model is correctly specified. We refer to the p-value in (3) as the full p-value
approximation. We will also shortly introduce other estimators by combining the
6 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
ˆ
λjin eq.(3) in ways that may reduce variability in ˆpn, although at the expense
of consistency. This may be reasonable in situations where the full estimates are
unstable, e.g, under small sample sizes and highly non-normal data. In fact, the
familiar TSB procedure may conceptualized as an (inconsistent) p-value approxima-
tion where the λjare replaced by the mean value of the canonical estimates, i.e.,
ˆ
λSB
j=Pd
j=1 ˆ
λj
d,j= 1,...,d, and clearly
ˆpSB =P
d
X
j=1
ˆ
λSB
jZ2
j> Tn
.
We obtain a valid approximation as long as ˆ
λP
−−−−→
n→∞ λ, as the following theorem
shows. Note that we make no assumptions on Tn. That is, the approximation
holds irrespective of the correctness of the model. Note also that we typically have
kˆ
λ−λk=OP(n−1/2), i.e., √n[ˆpn−pn] stays bounded in probability.
Theorem 1. Let (Tn)be a sequence of random variables, and let pn= 1 −
H(Tn;λ)and ˆpn= 1 −H(Tn;ˆ
λ)where H(q;λ1,...,λr) = P(Pd
j=1 λjZ2
j≤q). If
ˆ
λP
−−−−→
n→∞ λwhere λonly has positive elements, then ˆpn−pn=kˆ
λ−λkOP(1), and
hence, ˆpn−pn
P
−−−−→
n→∞ 0.
Proof. See Appendix A.
We see that the TSB procedure is a valid large sample approximation to pn
if λ=c·(1,1,...,1). If this is true in the population, Theorem 1 implies the
consistency of the TSB procedure. The only crucial assumption of the theorem is
that each λj>0. Recall that in goodness of fit testing in SEM, we are guaranteed
dnon-zero eigenvectors by the Box Theorem, see the discussion near eq. (5). Hence
this assumption is innocuous.
A direct consequence of Theorem 1 is that ˆpnfulfills the following property
considered fundamental to p-values.
Corollary 1. Suppose the conditions of Theorem 1 holds. If Tn
D
−−−−→
n→∞
Pd
j=1 λjZ2
jfor Z1,...,Zd∼N(0,1) IID, then ˆpn
D
−−−−→
n→∞ U[0,1].
Proof. Since (1) holds, it follows that pn
D
−−−−→
n→∞ U[0,1]. Then the corollary follows
from the standard asymptotic result that if Xn−Yn=oP(1) and Xn
D
−−−−→
n→∞ Zthen
also Yn
D
−−−−→
n→∞ Z.
From our perspective of aiming at consistent p-values, the TSB procedure is well
motivated under an equality constraint among all eigenvalues. But if the eigenvalues
differ considerably in the population, this restriction may lead to poor estimates
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 7
due to a high bias. In contrast, ˆpnis always a valid approximation for pnin that
it is consistent – and hence asymptotically unbiased. However, in finite samples
the variability of ˆ
λmay lead to excessive variability in ˆpn. We therefore wish to
find middle-grounds between the SB approximation and ˆpn. This amounts to using
the consistent estimates ˆ
λjto calculate new weights that may reduce the sample
variability of pn, and at the same time reduce the effect of inconsistency in SB.
Consider for instance the following split-half approximation, where the lower half
of the eigenvalues are replaced by their mean value, and likewise for the upper half
of the eigenvalues.
ˆpn,half =P
d
X
j=1
˜
λjZ2
j> Tn
,
where
˜
λ1=···=˜
λ⌈d/2⌉=1
⌈d/2⌉
⌈d/2⌉
X
j=1
ˆ
λj
and
˜
λ⌈d/2⌉+1 =···=˜
λd=1
d− ⌈d/2⌉
d
X
j=⌈d/2⌉+1
ˆ
λj.
This procedure allows the p-value approximation an additional degree of freedom
compared to the SB statistic, where all eigenvalues are estimated to be equal to
each other. A whole class of middle-grounds between the full ˆpnand ˆpn,SB can
be defined as follows. Choose cut-off integers 1 < τ1< τ2<··· < τk< d with
1≤k < d. For τl−1≤k < τllet
(6) ˜
λk=1
τl−τl−1
τl−1
X
j=τl−1
ˆ
λj
where τ0= 1 and τk+1 =d. Let us denote this choice by ˜
λ(τ) = (˜
λ1(τ),...,˜
λr(τ))′.
The proposed p-value estimator is then
ˆpn(τ) = P
d
X
j=1
˜
λj(τ)Z2
j> Tn
.
An extension of the above framework is tests that assess nested hypotheses in
SEM. Due to its great practical importance, we here include a short discussion on
this special case. We again focus on the statistic Tn, since this statistic is typically
asymptotically equivalent to other tests of interests, as described in Satorra (1989).
Following Satorra (1989), let H:σ=σ(θ), θ ∈Θ and H0:σ=σ(θ), θ ∈Θ0
where Θ0={θ∈Θ : a(θ) = 0}for some continuously differentiable function a. We
assume that the matrix ∂a(θ)
∂θ has full row rank, say m. We let
ˆ
θ= argmin
θ∈Θ
F(s, σ(θ)),˜
θ= argmin
θ∈Θ0
F(s, σ(θ))
8 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
and Tn=nF (s, σ(ˆ
θ)) and ˜
Tn=nF (s, σ(˜
θ)). Under H0and the conditions of
Lemma 1 (iv) in Satorra (1989) we have
Tn=√n(s−σ◦)′U√n(s−σ◦) + oP(1)
˜
Tn=√n(s−σ◦)′˜
U√n(s−σ◦) + oP(1),
for matrices Uand ˜
Ufollowing the formula of eq. (4) under Hand H0, respectively.
Using the basic algebraic fact that x′(A+B)x=x′Ax +x′Bx we conclude that
the difference statistic is of the form
˜
Tn−Tn=√n(s−σ◦)′Ud√n(s−σ◦) + oP(1),
where Ud=˜
U−Uhas rank m.
By the continuous mapping theorem, the convergence √n(s−σ◦)D
−−−−→
n→∞ N(0,Γ),
and Theorem 1 in Box (1954), we therefore have that
(7) ˜
Tn−Tn
D
−−−−→
n→∞
m
X
j=1
αjZ2
j, Z1,...,Zm∼N(0,1) IID,
where α1,...,αmare the mnon-zero eigenvalues of UdΓ.
Distribution-free consistent estimators ˆ
Udand ˆ
Γ for Udand Γ are found and
discussed in Satorra & Bentler (2001), and we do not review them here. Again
the standard estimators can be found in software such as the Rpackage lavaan.
One then forms ˆα= (ˆα1,..., ˆαm)′equal to the mlargest eigenvectors of ˆ
Udˆ
Γ and
calculates the full p-value approximation
ˆpn=P
m
X
j=1
ˆαjZ2
j>˜
Tn−Tn
.
We remark that a single equality constraint, say, βi,j = 0, can be treated as
special case of the above framework. In this case, the number of restrictions is 1, and
hence the limiting distribution in eq. (7) is a scaled χ2
1. The SB and the proposed
p-value approximations then coincide exactly. Note that Theorem 1 implies that
these procedures are consistent.
4. A selection algorithm for p-value approximations
The framework of the last section leads to several competing p-value approxi-
mations, and we next introduce a way of selecting among these. Our selector is
inspired by Beran & Srivastava (1985), the Bollen-Stine bootstrap (Bollen & Stine,
1992), and the non-parametric focused information criterion of Jullum & Hjort
(2016, forthcoming).
We wish to select the p-value approximation ˆpnwhose distribution is closest to
the uniform distribution under the null hypothesis. We formalize this by estimating
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 9
the supremum distance between the cumulative distribution function of ˆpnunder
the null hypothesis and the uniform distribution, i.e. we approximate
Dn= sup
0≤x≤1|PH0(ˆpn≤x)−x|
for each p-value approximation, and select the method with the least value of Dn.
The probability PH0is the probability measure induced by the data-generating
distribution that is closest to fulfilling the null hypothesis compared to the true
data-generating mechanism, which we let be the data generating distribution of
Σ(θ◦)1/2Σ−1/2Xi, where Σ is the true covariance matrix. Under PH0, we know
that p-values should be uniformly distributed. If we consider asymptotically con-
sistent p-values, minimizing Dnwill mean that we choose the approximation whose
convergence has been best achieved at our sample-size n.
The approximation to Dnis done via the non-parametric bootstrap, based on
the transformed sample ˜
Xi= Σ(ˆ
θ)1/2S−1/2
nXifor i= 1,2,...,n, as described in
Algorithm 1. The supremum in Algorithm 1 is the test statistic of the Kolmogorov-
Smirnov test, which is implemented in most statistical software packages. Formally,
what we do is to use the empirical distribution function ˆ
Pnof ( ˜
Xi) as an approx-
imation to PH0, and approximate this probability through re-sampling. We then
plug this approximation into Dnto generate ˆ
Dnfor each p-value approximation.
We note that we may use this selector among any p-value approximation for
hypothesis testing in moment structures, and not just the suggestions in Section 3.
Also, Dnis only one out of many possible success criteria. One could also investigate
the mean square error of the approximation, or the distance from PH0(ˆpn≤x) to
xat a particular point x. In our simulations, the performance of Dnas a selection
criterion was overall satisfactory.
Algorithm 1 Selection algorithm
1: procedure Select(sample, B)
2: ˜
Xi= Σ(ˆ
θ)1/2S−1/2
nXifor i= 1,2,...,n.
3: for k←1, . . . , B do
4: boot.sample ←Draw with replacement from transformed sample ˜
Xi
5: for l∈1,...,L do
6: ˆpn,l ←based on boot.sample
7: end for
8: end for
9: for l∈1, . . . , L do
10: ˆ
DB,n,l ←sup0≤x≤1|B−1PB
k=1 I{ˆpn,l < x} − x|
11: end for
12: return argmin1≤l≤Lˆ
DB,n,l
13: end procedure
10 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
5. Hypothesis tests for Satorra-Bentler consistency
and asymptotic robustness
In this section we propose a bootstrap procedure for testing Satorra-Bentler
consistency, that is, that all non-zero eigenvalues are equal. This also leads naturally
to a test for asymptotic robustness (AR), that is, that all non-zero eigenvalues are
equal to 1. Such tests may help a practitioner to decide whether it is advisable to
apply the NTML test, the SB test, or to instead use the Bollen-Stine bootstrap or
the new procedures proposed in the present article.
There is a substantial body of theoretical literature on AR (Shapiro, 1987;
Browne & Shapiro, 1988; Amemiya & Anderson, 1990; Satorra & Bentler, 1990),
where exact conditions are given that involve certain relationships between Γ and
∆ that must hold for TML to retain its asymptotic chi-square distribution under non-
normality. However, these conditions are hard to check in practice, and currently no
practical procedure exist for verifying asymptotic robustness in a real-world setting
(Yuan, 2005, p. 118). Similarly, we unaware of the existence of tests for SB consis-
tency. This lack of tests might be due to the fact that testing statements concerning
the eigenvalues of UΓ involves testing statements about high moment properties
of a distribution. Without detailed parametric assumptions on the data it seems
very difficult to construct tests that perform well in small-sample situations. It is
therefore expected that our proposed procedures will require a large sample size to
attain Type I error rates close to the nominal level. This is confirmed to be the
case in the simulation experiment in Section 6.3.
The proposed bootstrap test is summarized in Algorithm 2 and is inspired by Sec-
tion 4.2 in Beran & Srivastava (1985). A proof of its consistency, which we do not
provide, seems to require a non-trivial extension of the theory contained in Beran
(1984); Beran & Srivastava (1985). We note that an important difference between
our suggested test and the procedures in Section 4.2 in Beran & Srivastava (1985),
who work with eigenvalues of empirical covariance (i.e., symmetric) matrices, is
that UΓ is typically not symmetrical.
Let Ebe the matrix of normalised (complex) eigenvectors sorted by descending
values of the eigenvalues λof UΓ. We have UΓ = E∆E−1(Meyer, 2000, p.514)
where ∆ = ∆d0
0 0!,and where ∆dis the diagonal matrix with elements λ1,...,λd.
Define
(8) A=c1/2·E ∆−1/2
d0
0 0!E−1,
where cdenotes the mean value of the eigenvalues λ1,...,λd. We propose the
following bootstrap procedure. Let ˆ
Abe estimated from the original sample, by
replacing E, ∆, c with ˆ
E, ˆ
∆,ˆc. For each bootstrap sample drawn from the original
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 11
sample, we calculate ˆ
Uboot ˆ
Γboot and form the matrix
W∗
n=ˆ
Aˆ
Uboot ˆ
Γboot ˆ
A.
The crucial observation is now that W∗
nconverges to a matrix for which the null-
hypothesis is true, that is, whose non-zero eigenvalues are all equal. To see this,
note that
W∗
n
P
−−−−→
n→∞ cE ∆−1/2
d0
0 0!E−1·UΓ·E ∆−1/2
d0
0 0!E−1
=cE ∆−1/2
d0
0 0!E−1E∆E−1E ∆−1/2
d0
0 0!E−1
=cE ∆−1/2
d0
0 0! ∆d0
0 0! ∆−1/2
d0
0 0!E−1=E cId0
0 0!E−1,
where the last matrix has dnon-zero eigenvalues equal to d. In the bootstrap
sample, the dlargest eigenvalues of W∗
nis then computed as ˆ
λboot. This process is
repeated many times, and we get realizations ˆ
λk,boot , giving us information about
the sampling variability of the estimated eigenvalues under the null hypothesis of
identical eigenvalues.
The above procedure may also be adapted to test for asymptotic robustness of
the NTML statistic TML, that is, whether λj= 1 for all j= 1,...,d. By setting
c= 1 in eq. (8), Algorithm 2 then produces a p-value for the test of consistency of
TML based testing. Since this test does not need to estimate c, it should converge to
the correct level I error rate slightly faster than the general case. The test statistic
that is bootstrapped is then
W∗
n=ˆ
A1ˆ
Uboot ˆ
Γboot ˆ
A1.
where ˆ
A1is the estimator of A1=E ∆−1/2
d0
0 0!E−1.
We suppose that an extension of Corollary 4 in Beran & Srivastava (1985) holds
also in our setting. That corollary requires the test statistic h(λ) to be non-negative
and zero under the null hypothesis, and that it has partial derivatives that are
zero under the null hypothesis and that its double derivative matrix is positive
definite under the null hypothesis. The additional restriction that also the partial
derivatives vanish under the null-hypothesis means we must consider two different
test statistics, adapted from the examples in Section 4.3 in Beran & Srivastava
(1985). For asymptotic robustness, this holds for hAR (λ) = dlog[d−1Pd
j=1 λj]−
log[Qd
j=1 λj]. For Satorra-Bentler consistency, this holds for hSB (λ) = log[(λ1+
λd)2]−log[4λ1λd].Algorithm 2 summarizes this discussion.
12 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
Algorithm 2 Bootstrap testing for Satorra-Bentler consistency and Asymptotic
Robustness
1: procedure Bootstrap(sample, B)
2: Calculate ˆ
U, ˆ
Γ,ˆ
A, ˆ
A1from sample
3: ˆ
λ←The dlargest eigenvalues of ˆ
Uˆ
Γ
4: Tn,SB =hSB (ˆ
λ)
5: Tn,AR =hAR(ˆ
λ)
6: for k←1, . . . , B do
7: boot.sample ←Draw with replacement from sample
8: ˆ
Uboot ˆ
Γboot ←Based on boot.sample
9: W∗
n,SB ←ˆ
Aˆ
Uboot ˆ
Γboot ˆ
A
10: ˆ
λk,boot = (ˆ
λk,1,boot ,...,ˆ
λk,d,boot )′←the dlargest eigenvalues of W∗
n,SB
11: Tn,k,SB ←hSB (ˆ
λk,boot )
12: W∗
n,AR ←ˆ
A1ˆ
Uboot ˆ
Γboot ˆ
A1
13: ˆ
λk,boot = (ˆ
λk,1,boot ,...,ˆ
λk,d,boot )′←the dlargest eigenvalues of W∗
n,AR
14: Tn,k,AR ←hAR(ˆ
λk,boot )
15: end for
16: return B−1PB
k=1 I{Tn,k,SB > Tn,SB}and B−1PB
k=1 I{Tn,k,AR > Tn,AR}
17: end procedure
6. Monte Carlo evaluations
In this section we evaluate the proposed procedures by Monte Carlo methods.
We first evaluate our new class of p-value approximations in the setting of goodness-
of-fit testing for a single model. Specifically, two members of this class are eval-
uated, ˆpnand ˆpn,half, referred to as the full and half eigenvalue approximations,
respectively. We then consider chi-square difference testing for two nested mod-
els. In both cases we evaluate the selection procedure in Algorithm 1 where the
candidates for selection are SB and the full and half eigenvalue approximations.
Finally, we evaluate the empirical performance of the proposed bootstrap test for
SB consistency and for AR. We remark that we here limit ourselves to study the
empirical performance of the procedures when it comes to controlling type I error
rates, leaving the topic of power for future studies.
Our model is the political democracy model discussed by Bollen in his textbook
(Bollen, 1989), see Figure 1, where the residual errors are not depicted for ease
of presentation. There are four measures of political democracy measured twice
(in 1960 and 1965), and three measures of industrialization measured once (in
1960). The unconstrained model M1has d= 35 degrees of freedom. For nested
model testing, we also consider a constrained model M0, nested within M1, with
d= 46 degrees of freedom, which impose ten equalities among unique variances
and residual covariances, and one equality between two factor loadings.
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 13
Model estimation and eigenvalues were computed using the Rpackage lavaan
(Rosseel, 2012), while the p-values of type ˆpnwere calculated with the imhof pro-
cedure in the package CompQuadForm (Duchesne & de Micheaux, 2010). In each
simulation cell we generated 2000 samples. For each sample, 1000 bootstrap sam-
ples were drawn.
Figure 1. Bollen’s political democracy model. dem60: Democ-
racy in 1960. dem65: Democracy in 1965. ind60: Industrialisation
in 1960.
Y4
Y5
Y6
Y7
Y8
Y9
Y10
Y11
Y1Y2Y3
dem60
dem65
ind60dem60
dem65
ind60
Distribution 2
1.87 1.59 1.49 1.44 1.43 1.42 1.38
1.36 1.35 1.34 1.31 1.29 1.26 1.13
1.12 1.11 1.11 1.10 1.10 1.09 1.09
1.08 1.08 1.07 1.07 1.07 1.06 1.05
1.04 1.03 1.03 1.03 1.02 1.02 1.01
Distribution 3
4.16 3.24 2.88 2.82 2.70 2.67 2.51
2.41 2.35 2.31 2.16 2.12 2.03 1.52
1.50 1.47 1.43 1.40 1.38 1.36 1.35
1.33 1.32 1.29 1.27 1.25 1.21 1.20
1.13 1.13 1.11 1.09 1.08 1.08 1.06
Table 1. Eigenvalues λi, for i= 1,...,35, for Bollen’s political
democracy model, assuming correct model specification. Distribu-
tion 2 and 3 have univariate skewness and kurtosis s= 1, k = 7
and s= 2, k = 21, respectively.
14 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
Distribution n NTML SB SS BOST EFULL EHALF SEL ORAC
Normal
100 0.077 0.086 0.050 0.023 0.036 0.050 0.051 0.077
300 0.055 0.053 0.052 0.037 0.037 0.043 0.045 0.055
900 0.068 0.067 0.050 0.059 0.063 0.064 0.065 0.068
Distribution 2
100 0.215 0.108 0.019 0.035 0.021 0.048 0.042 0.057
300 0.197 0.070 0.018 0.053 0.024 0.045 0.045 0.057
900 0.219 0.063 0.033 0.054 0.037 0.051 0.051 0.059
Distribution 3
100 0.488 0.164 0.017 0.038 0.009 0.072 0.031 0.024
300 0.591 0.094 0.013 0.068 0.013 0.050 0.038 0.045
900 0.685 0.076 0.017 0.059 0.015 0.042 0.038 0.046
Table 2. Type I error rates for testing model M1. Normal: mul-
tivariate normal distribution, Distribution 2: skewness 1 and kur-
tosis 7. Distribution 3: skewness 2 and kurtosis 7. NTML=normal-
theory likelihood ratio test. SB=Satorra-Bentler. SS=Scaled and
shifted. BOST=Bollen-Stine bootstrap. EFULL= Full eigen-
value approximation, ˆpn. EHALF= half eigenvalue approximation,
ˆpn,half. SEL = p-value obtained from selection algorithm. ORAC=
oracle p-value pn.
Distribution nSB EHALF EFULL
Normal
100 0.054 0.931 0.015
300 0.448 0.516 0.036
900 0.507 0.263 0.231
Distribution 3
100 0.001 0.865 0.135
300 0.050 0.894 0.055
900 0.153 0.783 0.063
Distribution 3
100 0.000 0.449 0.551
300 0.001 0.733 0.267
900 0.004 0.846 0.150
Table 3. Choice proportions for selection algorithm, testing
model M1. SB=Satorra-Bentler. EHALF=half eigenvalue approx-
imation, ˆpn,half. EFULL= Full eigenvalue approximation, ˆpn.
6.1. Goodness-of-fit testing for M1.Three population distributions were con-
sidered. Distribution 1 was a multivariate normal distribution. The non-normal
distributions were generated using the transform of Vale & Maurelli (1983), with
Distribution 2 having univariate skewness 1 and kurtosis 7, and Distribution 3 hav-
ing skewness 2 and kurtosis 21. These distributional characteristics are the same
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 15
as those used in the influential study by Curran et al. (1996), and replicated in the
bootstrap study by Nevitt & Hancock (2001). The ”oracle” eigenvalues associated
with Distribution 2 and 3 are given in Table 1, numerically calculated from very
large samples, where we clearly see that the values are quite spread out, and that
the spread increases when we move from Distribution 2 to Distribution 3. Note
that under the Distribution 1, we have λ= (1,1,...,1)′.
Three sample sizes nwere used: 100,300 and 900. Hence the resulting full fac-
torial design has nine conditions. In each sample we calculated p-values associated
with the established test statistics associated with normal-theory maximum like-
lihood ratio (NTML), Satorra-Bentler scaling (SB), the scale-and-shifted statistic
(SS) and the Bollen-Stine (BOST) test. Also, we calculated in each sample the
full eigenvalue approximation ˆpn(EFULL) and the split-half eigenvalue estimation
ˆpn,half (EHALF). The selection algorithm (SEL) p-value was calculated using a
candidate set with members SB, EHALF and EFULL, and using ˆ
Dnas criterion
function. Finally, the oracle (ORAC) p-value pnwas calculated, using the values in
Table 1. This allows us to evaluate how well the asymptotic result in eq.(1) applies
in finite-sample conditions.
In Table 2 we present Type I error rates at the the 5% significance level. As
expected, NTML becomes inflated when data is non-normal. The mean-scaling of
SB reduces the inflation, but with non-normal data and small sample sizes, Type I
error rates are still higher than 10%. The scaled-and-shifted statistic on the other
hand, leads to rejection rates much lower than the nominal 5%. These findings are
in accord with Foldnes & Olsson (2015). The Bollen-Stine bootstrap test performs
better than SB and SS, coming close to the nominal level even for highly non-normal
data and medium sample size. Among the new p-value approximations, it is the
middle-ground approximation EHALF that performs the best. While EFULL yields
far too low rejection rates with non-normal data. EHALF as well as BOST with
non-normal data. The selection algorithm SEL also performs generally well, on
par with EHALF and BOST. It is notable that for normal data, SEL outperforms
NTML. Table 3 presents the selection proportions for SEL in each of the nine
conditions. It is seen that the selection algorithm wisely chooses EHALF in the
majority of conditions. It is however unexpected that SEL chooses EFULL in 55%
of the samples under Distribution 3 and n= 100, given the poor performance of
EFULL in that condition, with a 1% rejection rate. The final column in Table 2
gives the oracle solution, and demonstrates that the asymptotic result in (1) is far
from realized at n= 100 under Distribution 3.
6.2. Testing nested mo dels. The chi-square difference test has 11 degrees of
freedom, and the corresponding 11 oracle eigenvalues for Distribution 2 and 3 are
given in Table 4. The spread in eigenvalues is substantial, especially for Distribution
3.
16 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
Distribution 2 3.92 3.49 3.19 2.99 2.94 2.78 2.72 1.85 1.56 1.54 1.30
Distribution 3 10.64 8.79 8.06 7.58 7.37 6.94 6.76 4.09 3.16 3.10 2.04
Table 4. Eigenvalues of UdΓ for nested model testing. Distribu-
tion 2 has skewness 1 and kurtosis 7; Distribution 3 has skewness
2 and kurtosis 21. Rounded to two decimal places.
Distribution nML SB BOST EFULL EHALF SEL ORAC
Normal
100 0.068 0.080 0.037 0.062 0.069 0.075 0.068
300 0.054 0.059 0.046 0.053 0.055 0.058 0.054
900 0.051 0.053 0.051 0.051 0.052 0.053 0.051
Distribution 2
100 0.582 0.137 0.096 0.076 0.099 0.096 0.028
300 0.659 0.088 0.081 0.052 0.066 0.062 0.035
900 0.702 0.059 0.053 0.035 0.043 0.045 0.046
Distribution 3
100 0.911 0.221 0.129 0.115 0.159 0.135 0.005
300 0.961 0.126 0.118 0.062 0.089 0.082 0.018
900 0.976 0.087 0.089 0.044 0.064 0.061 0.043
Table 5. Type I error rates for nested model testing. Nor-
mal: multivariate normal distribution, Distribution 2: skewness
1 and kurtosis 7. Distribution 3: skewness 2 and kurtosis 7.
NTML=normal-theory likelihood ratio test. SB=Satorra-Bentler.
BOST=Bollen-Stine bootstrap. EFULL= Full eigenvalue approxi-
mation, ˆpn. EHALF= half eigenvalue approximation, ˆpn,half. SEL
= p-value obtained from selection algorithm. ORAC= oracle p-
value pn.
Rejection rates observed at the nominal 5% level of signficance are reported in
Table 5. Again, the NTML statistic is inflated by non-normality in the data, a
tendency only partially corrected for by SB. For instance, under the most harsh
condition, with Distribution 3 and n= 100, SB rejection rates are 22%, far better
than the 91% obtained with NTML. But in this condition, as in all conditions,
BOST performs better, with a rejection rate of 13%. However, the new procedure
EFULL performs still better in this condition, while the selection algorithm is only
slightly worse than BOST. Overall EFULL outperforms the other test statistics,
including SB and BOST. EHALF, which was found to have best performance in
the non-nested case, does not perform as well as EFULL in the nested case. The
selection algorithm SEL also performs well, with better performance than SB and
BOST in most conditions, and only sligthly worse then EFULL. The selection
proportions are given in Table 6, where EHALF is unexpectedly found to be the
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 17
most chosen procedure, despite the slightly better performance of EFULL in most
conditions.
Distribution nSB EHALF EFULL
Normal
100 0.601 0.357 0.042
300 0.672 0.205 0.122
900 0.593 0.091 0.316
Distribution 3
100 0.116 0.714 0.170
300 0.209 0.662 0.128
900 0.263 0.595 0.142
Distribution 3
100 0.012 0.663 0.325
300 0.059 0.725 0.215
900 0.104 0.714 0.182
Table 6. Choice proportions for selection algorithm, nested mod-
els. SB=Satorra-Bentler. EHALF=half eigenvalue approximation,
ˆpn,half. EFULL= Full eigenvalue approximation, ˆpn.
6.3. Tests for AR and for SB consistency. To evaluate Type I error rates of the
SB consistency and AR tests proposed in Algorithm 2, we simulated multivariate
normal data for the Bollen model. Under normal data both AR and SB consistency
holds. We simulated 2000 samples for sample sizes n= 200,400,800 and 2000. For
each sample 1000 bootstrap samples were drawn. The rejection rates are given in
Table 7, and clearly demonstrates that these procedures need large sample sizes in
order to reach acceptable Type I error rates.
Test n= 200 n= 400 n= 800 n= 2000
AR 0.354 0.203 0.081 0.035
SB 0.369 0.195 0.070 0.033
Table 7. Type I error rates for tests of asymptotic robustness
(AR) and Satorra-Bentler (SB) consistency.
7. Discussion
This paper deals with the fundamental problem of hypothesis testing in moment
structure models. We present new insight and practically applicable statistical
methodology for SEM and related models.
Some of our conclusions may seem surprising, as they go against what is often
taught in standard courses on SEM. For example, the simulation summarized in
Table 2 shows that our selector can have better finite sample performance than
18 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
the NTML test also when data are exactly normal. Since this paper have focused
exclusively on Type I error, “better” here means having a rejection rate closer to
the nominal one.
Since this conclusion may seem counterintuitive, it is worth pausing and consid-
ering what the NTML does. Firstly, we must keep in mind that the NTML is a
test based on asymptotic theory, also when data are exactly normally distributed.
That is, the Type I error rate of an NTML test at level αconverges to αunder
normality, and for the model considered in Table 2, convergence is still not quite
achieved for n= 100. Secondly, we note that under normality, NTML calculates
the oracle p-value exactly. That is, it is the ultimate approximation to the oracle
test, which has a rejection rate of 7.7 %. Hence, the NTML has only one source of
approximation error: the validity of the fundamental convergence of the oracle.
All methods considered in this paper – with the important exception of the
Bollen-Stine bootstrap and the selector – tries to approximate the oracle, and
thereby introducing another source of approximation error. Let us say they are
oracle-based. Except the NTML, which calculates the oracle perfectly – but only
under exact normality, the oracle-based methods have varying degrees of success in
their approximation. Strictly speaking, oracle-based methods should be judged on
whether they manage to achieve what they set out to do: approximate the oracle.
But that is not the success criteria of interests to the user: When a level αtest
is employed, the Type I error rate should be very close to α. As is clear from our
simulation studies, this may not be the case even when using the actual oracle.
When oracle-based tests have Type I error rate considerably closer to αthan the
oracle, it is tempting to say that they are performing well. This temptation should
be avoided, as the deviation in Type I error compared to the oracle is then solely
due to chance variations caused by the estimation of λ.
The selector overcomes this hurdle by being anchored not in the fundamental
convergence of the oracle, but by transforming the data to a setting where the null
hypothesis holds. It is then known that a correct p-value is uniformly distributed,
i.e., the Type I error rate of a test with level αis to be exactly α. It is this
anchoring that allows us to search for the procedure which best achieves this goal,
without having to compare our methods to the finite sample performance of the
oracle. And so when the selector has a Type I error rate close to the nominal, it
is by design, and not solely due to chance variations. This is a property shared
with the Bollen-Stine bootstrap procedure, but the Bollen-Stine procedure rests on
the quality of the approximation of the empirical distribution function compared to
the data’s actual distribution function. So do we, since we use the non-parametric
bootstrap in our selector, but we are able to combine the fundamental convergence
of the oracle with the non-parametric bootstrap. We have seen that this allows us
to combine the strengths of both methods.
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 19
Let us return to the NTML, and look at the proposed methods from a slightly
different perspective that elaborates on the above. It is well-known that the NTML
usually has a much too high Type I error rate under non-normality. The major
source of the mismatch between nominal and actual Type I error rate is that the
NTML need not be a consistent approximation to the oracle. The NTML can be
seen as estimating λalways by the constant (1,...,1)′. When λis far away from
(1,...,1)′, NTML performs poorly. And for a user, it typically performs poorly in
a particularly bad way: even when a hypothesized theory holds, the NTML will
most likely reject it.
The Satorra-Bentler test has previously been reported to have inflated Type I
error rates under non-normality, and this behaviour is also observed in simulations
in the present paper. This is mainly due to two reasons: firstly, it may be that the
Satorra-Bentler procedure is inconsistent, i.e. λhas variation among its elements.
While inconsistency is an asymptotic property, which may seem irrelevant in small
samples, it does mean that the procedure does not aim to estimate what the user
wants, and may therefore be reflected also in small-sample situations when the
procedure is used uncritically. Secondly, the Satorra-Bentler procedure estimates
λ, and the variability of the resulting p-value approximation may give inflated Type
I errors even when the procedure is consistent.
These two problems, consistency and finite sample approximation error, are
shared also by our suggested p-value approximations. However, the contextual
framework presented in the present paper allows us to argue about balancing these
issues, and selecting among competing approximations. This perspective may lead
to further insight in future research, and has already led to our proposed selector.
We note that while ˆpnand ˆpn,half can be computed just as fast as the Satorra-
Bentler test statistics, both the selector and the Bollen-Stine bootstrap procedure
takes considerable more computation time. Our simulation experiments indicate
that the selector and the Bollen-Stine bootstrap are comparable in performance,
but that the selector works slightly better, especially in small sample situations.
Our recommendations to practitioners are therefore clear: use the selector in small
sample situations, and use the selector or the Bollen-Stine bootstrap in medium
sample situations. In large sample situations, consistent p-value approximation
gives similar answers. Since the assumptions underlying asymptotic robustness
and Satorra-Bentler consistency rests on delicate properties of high order moments
that can only be properly tested in large sample situations, we do not recommend
using the NTML nor the Satorra-Bentler statistic without assessing its performance
with the selector. In many cases, the Satorra-Bentler statistic will be the superior
test, but it is difficult to know this without using techniques such as the re-sampling
methods underlying the selector.
20 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
With current and future computers containing multiple units that can perform
computation simultaneously (multi-core central processing units and multi-core
graphical processing units supporting general purpose calculations), using the se-
lector does not take much time to run. In our prototype implementation in the
scripting language R (which means our code is not compiled, and therefore slow),
it takes a few additional minutes compared to standard p-value approximations
that we have seen often performs considerably worse. Considering the enormous
amount of time and effort many researchers use in gathering and analyzing data,
the extra time spent on using the selector is vanishing in comparison.
Applied researchers are often personally interested in controlling Type I error
as well as possible, as their research hypothesis is often the null hypothesis. If
they use testing procedures, such as the NTML with non-normal data, where the
Type I error is seriously inflated, this is to their disadvantage. This point is also
connected to the use of pragmatic fit indices available in the literature. The p-value
approximations discussed in this paper are all based on solid statistical theory. The
ad-hoc nature of some of these fit indices, with somewhat arbitrary cut-off points
being interpreted in various ways, are not based on statistical theory.
Finally, we mention that the ideas contained in this paper can be generalized in
several directions, including SEM with ordinal variables and in multi-group settings.
Also, additional simulation experiments should be performed on the proposed meth-
ods, such as power studies, allowing the selector more options, and experimenting
with different selection criterias.
References
Amemiya, Y. &Anderson, T. (1990). Asymptotic chi-square tests for a large
class of factor analysis models. The Annals of Statistics , 1453–1463.
Asparouhov, T. &Muth´
en, B. (2010). Simple second order chi-square cor-
rection. Retrieved from Mplus website: http://www. statmodel. com/download-
/WLSMV new chi2 1.
Bentler, P. M. &Yuan, K.-H. (1999). Structural equation modeling with small
samples: Test statistics. Multivariate Behavioral Research 34, 181–197.
Beran, R. (1984). Bootstrap methods in statistics. Jahresbericht der Deutschen
Mathematiker-Vereinigung , 14–30.
Beran, R. &Srivastava, M. S. (1985). Bootstrap tests and confidence regions
for functions of a covariance matrix. The Annals of Statistics , 95–115.
Bollen, K. A. (1989). Structural equations with latent variables. New York:
Wiley.
Bollen, K. A. &Stine, R. A. (1992). Bootstrapping goodness-of-fit measures
in structural equation models. Sociological Methods & Research 21, 205–229.
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 21
Box, G. (1954). Some theorems on quadratic forms applied in the study of analysis
of variance problems, 1. effect of inequality of variance in the one-way classifica-
tion. The Annals of Mathematical Statistics 25, 290–302.
Browne, M. &Shapiro, A. (1988). Robustness of normal theory methods in the
analysis of linear latent variable models. British Journal of Mathematical and
Statistical Psychology 41, 193–208.
Browne, M. W. (1982). Covariance structures. Topics in applied multivariate
analysis , 72–141.
Browne, M. W. (1984). Asymptotically distribution-free methods for the anal-
ysis of covariance structures. British Journal of Mathematical and Statistical
Psychology 37, 62–83.
Curran, P. J.,West, S. G. &Finch, J. F. (1996). The robustness of test
statistics to nonnormality and specification error in confirmatory factor analysis.
Psychological Methods 1, 16–29.
Duchesne, P. &de Micheaux, P. L. (2010). Computing the distribution of
quadratic forms: Further comparisons between the liu-tang-zhang approximation
and exact methods. Computational Statistics and Data Analysis 54, 858–862.
Efron, B. &Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC
press.
Foldnes, N. &Olsson, U. H. (2015). Correcting too much or too little? the
performance of three chi-square corrections. Multivariate Behavioral Research
50, 533–543.
Hu, L.-T.,Bentler, P. M. &Kano, Y. (1992). Can test statistics in covariance
structure analysis be trusted? Psychological Bulletin 112, 351–62.
Jullum, M. &Hjort, N. L. (2016, forthcoming). Parametric or nonparametric:
The fic approach. Statistica Sinica .
Laury-Micoulaut, C. (1976). The n-th centered moment of a multiple convolu-
tion and its applications to an intercloud gas model. Astronomy and Astrophysics
51, 343–346.
Meyer, C. D., ed. (2000). Matrix Analysis and Applied Linear Algebra. Philadel-
phia, PA, USA: Society for Industrial and Applied Mathematics.
Nevitt, J. &Hancock, G. (2001). Performance of Bootstrapping Approaches
to Model Test Statistics and Parameter Standard Error Estimation in Structural
Equation Modeling. Structural Equation Modeling: A Multidisciplinary Journal
8, 353–377.
Nevitt, J. &Hancock, G. (2004). Evaluating small sample approaches for model
test statistics in structural equation modeling. Multivariate Behavioral Research
39, 439–478.
Rosseel, Y. (2012). lavaan: An r package for structural equation modeling. Jour-
nal of Statistical Software 48, 1–36.
22 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
Satorra, A. (1989). Alternative test criteria in covariance structure analysis: A
unified approach. Psychometrika 54, 131–151.
Satorra, A. &Bentler, P. M. (1990). Model conditions for asymptotic robust-
ness in the analysis of linear relations. Computational Statistics & Data Analysis
10, 235–249.
Satorra, A. &Bentler, P. M. (1994). Corrections to test statistics and stan-
dard errors in covariance structure analysis. In Latent variable analysis: applica-
tions for developmental research, A. V. Eye & C. Clogg, eds., chap. 16. Newbury
Park, CA: Sage, pp. 399–419.
Satorra, A. &Bentler, P. M. (2001). A scaled difference chi-square test
statistic for moment structure analysis. Psychometrika 66, 507–514.
Savalei, V. (2010). Small sample statistics for incomplete nonnormal data: ex-
tensions of complete data formulae and a monte carlo comparison. Structural
Equation Modeling: A Multidisciplinary Journal 17, 241–264.
Shapiro, A. (1983). Asymptotic distribution theory in the analysis of covariance
structures - a unified approach. South African Statistical Journal 17, 33–81.
Shapiro, A. (1987). Robustness properties of the mdf analysis of moment struc-
tures. South African Statistical Journal 21, 39–62.
Vale, C. &Maurelli, V. (1983). Simulating nonnormal distributions. Psy-
chometrika 48, 465–471.
Yuan, K. (2005). Fit indices versus test statistics. Multivariate Behavioral Re-
search 40, 115–148.
Yuan, K.-H. &Bentler, P. M. (2010). Two simple approximations to the
distributions of quadratic forms. British Journal of Mathematical and Statistical
Psychology 63, 273–291.
Appendix A. Proof of Theorem 1
Proof of Theorem 1. By the mean value theorem, there is a sequence of random
variables 0 ≤rn≤1 so that ˆpn= 1−H(Tn;ˆ
λ) = 1−H(Tn;λ)+Rn=pn+Rnwhere
Rn=Pd
j=1(ˆ
λj−λj)Hj(Tn;λ+rn(ˆ
λ−λ)) and Hj(q, (l1,...,ld)′) = ∂H(q,(l1,...,ld)′
∂lj.
The statement of the theorem therefore holds if we show that Hj(Tn;λ+rn(ˆ
λ−λ)) =
OP(1). To show this, we calculate Hj. The cumulative distribution function of
S=Pd
j=1 λjZ2
jis HS(q) = Rq
0hS(s) dswhere hSis the density of S. Denote the
density of λjZ2
jby hj(z). Since (Zj)d
j=1 contains independent variables, so does
(λjZ2
j)d
j=1. Hence hSis given by d-times convolution, i.e. apply the well-known
convolution formula iteratively, and see that
(9) gS(s) = ZR···ZR
d
Y
j=2
hj(xj−1)
h1
s−
d−1
X
j=1
xj
dx1···dxr−1,
NEW TESTING PROCEDURES FOR MOMENT STRUCTURE MODELS 23
see also Laury-Micoulaut (1976) for some basic properties of d-times convolution.
We wish to calculate
(10) ∂
∂λj
HS(q) = Zq
0
∂
∂λj
hS(s) ds.
It turns out that ∂
∂λjhSis a weighted sum of densities, which implies that ∂
∂λjHSis a
weighted sum of cumulative distribution functions that is easy to bound uniformly.
We now show this by calculating ∂
∂λjhS. Since summation is commutative, the
distribution of Pr
j=1 λπ(j)Z2
π(j)is the same for any permutation π(1),...,π(r) of
{1,...,r}. We may therefore, without loss of generality, assume that j=d. Using
eq. (9), we have
(11) ∂
∂λd
hS(s)
=ZR···ZR∂
∂λd
hd(xd−1)[
d−1
Y
j=2
hj(xj−1)]h1(s−
d−1
X
j=1
xj) dx1···dxd−1
We hence need to calculate ∂
∂λdhd(xd−1). Since λjZ2
jis a linear transformation of
Z2
j∼χ2
1, we have hj(z) = hχ2(z/λj)/λjwhere hχ2(z) = z1/2
√2πe−z/2I{z≥0}is the
density of Z2
j∼χ2.
We have ∂
∂λdhd(z) = ∂
∂λdλ−1
dhχ2(z/λd) = −λ−2
dhχ2(z/λd)+λ−1
dh′
χ2(z/λd)[ ∂
∂λdzλ−1
d] =
−λ−2
dhχ2(z/λd)−λ−3
dzh′
χ2(z/λd).For z < 0 then hd(z) = 0 and so ∂
∂λdhd(z) = 0.
The event z= 0 has probability zero and can be ignored. For z > 0 we have
√2πh′
χ2(z) = √2πd
dz
z1/2
√2πe−z/2=d
dzz1/2e−z/2=1
2z−1/2e−z/2+ (−1
2)z1/2e−z/2so
that h′
χ2(z/λd) = 1
2λ1/2
dz−1/2e−z/(2λd)−1
2λ−1/2
dz1/2e−z/(2λd)Inserting this into the
expression obtained for ∂
∂λdhr(z) gives ∂
∂λdhd(z) = −λ−2
dfχ2(z/λd)−λ−3
dz[1
2λ1/2
dz−1/2e−z/(2λd)−
1
2λ−1/2
dz1/2e−z/(2λd)] = −λ−1
dλ−1
dhχ2(z/λd)−1
2λ−5/2
dz1/2e−z/(2λd)+1
2λ−7/2
dz3/2e−z/(2λd).
We now note that z7→ λ−1
dhχ2(z/λd) is a density, since it is the density of λjZ2
j.
Also, z1/2e−z/(2λd)and z3/2e−z/(2λd)are proportional to Gamma-distributions. Re-
call that the Gamma(α, β) density for α > 0, β > 0 is hG(α,β)(z) = βαzα−1e−β x/Γ(α)I{z≥
0}in which Γ(z) = R∞
0uz−1e−udu. In conclusion, we see that ∂
∂λdhd(z) =
−λ−2
dhχ2(z/λd)−1
2λ−5/2
d
Γ(3/2)
(2λd)3/2hG(3/2,1/(2λd))(z)+ 1
2λ−7/2
d
Γ(5/2)
(2λd)5/2hG(5/2,1/(2λd))(z) =
−λ−2
dhχ2(z/λd)−2−5/2λ−4
dΓ(3/2)hG(3/2,1/(2λd))(z)+2−7/2λ−6
dΓ(5/2)hG(5/2,1/(2λd))(z)
By the linearity of integration and xd−17→ hχ2(xd−1/λd)/λd, and xd−17→
hG(5/2,1/(2λd))(xd−1), and xd−17→ hG(5/2,1/(2λd)) (xd−1) are densities, eq. (11) is a
weighted sum of convolutions of densities that result in new densities hA, hBand hC.
That is, ∂
∂λdhS(s) = −λ−1
dhA(z)−2−5/2λ−4
dΓ(3/2)hB(z) + 2−7/2λ−6
dΓ(5/2)fC(z).
Returning to eq. (10) we therefore see that ∂
∂λdHS(q) = Rq
0−λ−1
dhA(s)−2−5/2λ−4
dΓ(3/2)hB(s)+
2−7/2λ−6
dΓ(5/2)hC(s) ds=−λ−1
dHA(q)−1
2λ−4
dΓ(3/2)HB(q)+2−7/2λ−6
dΓ(5/2)HC(q)
where HA, HB, HCare the cumulative distribution functions of hA, hB, hC.
Recalling that cumulative distribution functions are probabilities, and hence has
absolute values bounded by 1, we see that |Hj(Tn;x+rnhn)| ≤ |λj+rn(ˆ
λj−
24 STEFFEN GRØNNEBERG AND NJ˚
AL FOLDNES
λj)|−1+ 2−5/2|λj+rn(ˆ
λj−λj)|−4Γ(3/2) + 2−7/2|λj+rn(ˆ
λj−λj)|−6Γ(5/2).Since
0≤rn≤1 and ˆ
λj
P
−−−−→
n→∞ λj>0, we see that |Hj(Tn;x+rnhn)|=OP(1).
Department of Economics, BI Norwegian Business School, Oslo, Norway 0484
E-mail address:steffeng@gmail.com
Department of Economics, BI Norwegian Business School, Stavanger, Norway 4014
E-mail address:njal.foldnes@bi.no