ArticlePDF Available

Abstract and Figures

In transformation regression models, the response is transformed before fitting a regression model to covariates and transformed response. We assume such a model where the errors are independent from the covariates and the regression function is modeled nonparametrically. We suggest a test for goodness-of-fit of a parametric transformation class based on a distance between a nonparametric transformation estimator and the parametric class. We present asymptotic theory under the null hypothesis of validity of the semi-parametric model and under local alternatives. A bootstrap algorithm is suggested in order to apply the test. We also consider relevant hypotheses to distinguish between large and small distances of the parametric transformation class to the ‘true’ transformation.
Content may be subject to copyright.
TEST
https://doi.org/10.1007/s11749-021-00756-0
ORIGINAL PAPER
Specification testing in semi-parametric transformation
models
Nick Kloodt1·Natalie Neumeyer1·Ingrid Van Keilegom2
Received: 10 January 2020 / Accepted: 19 January 2021
© The Author(s) 2021
Abstract
In transformation regression models, the response is transformed before fitting a
regression model to covariates and transformed response. We assume such a model
where the errors are independent from the covariates and the regression function is
modeled nonparametrically. We suggest a test for goodness-of-fit of a parametric trans-
formation class based on a distance between a nonparametric transformation estimator
and the parametric class. We present asymptotic theory under the null hypothesis of
validity of the semi-parametric model and under local alternatives. A bootstrap algo-
rithm is suggested in order to apply the test. We also consider relevant hypotheses to
distinguish between large and small distances of the parametric transformation class
to the ‘true’ transformation.
Keywords Bootstrap ·Goodness-of-fit test ·Nonparametric regression ·
Nonparametric transformation estimator ·Parametric transformation class ·
U-statistics
Mathematics Subject Classification 62G10
1 Introduction
It is very common in applications to transform data before investigation of func-
tional dependence of variables by regression models. The aim of the transformation
is to obtain a simpler model, e.g. with a specific structure of the regression function,
or a homoscedastic instead of a heteroscedastic model. Typically, flexible paramet-
Supplementary Information The online version contains supplementary material available at https://doi.
org/10.1007/s11749- 021-00756- 0.
BNick Kloodt
Nick.Kloodt@uni-hamburg.de
1Fachbereich Mathematik, Universität Hamburg, Bundesstraße 55, 20146 Hamburg, Germany
2Research Centre for Operations Research and Statistics (ORSTAT), KU Leuven, Leuven, Belgium
123
N. Kloodt et al.
ric classes of transformations are considered from which a suitable one is selected
data-dependently. A classical example is the class of Box-Cox power transformations
(see Box and Cox (1964)). For purely parametric transformation models, see Carroll
and Ruppert (1988) and references therein. Powell (1991) and Mu and He (2007)
consider transformation quantile regression models. Nonparametric estimation of the
transformation in the context of parametric regression models has been considered
by Horowitz (1996) and Chen (2002), among others. Horowitz (2009) reviews esti-
mation in transformation models with parametric regression in the cases where either
the transformation or the error distribution or both are modeled nonparametrically.
Linton et al. (2008) suggest a profile likelihood estimator for a parametric class of
transformations, while the error distribution is estimated nonparametrically and the
regression function semi-parametrically. Heuchenne et al. (2015) suggest an estimator
of the error distribution in the same model. Neumeyer et al. (2016) consider profile
likelihood estimation in heteroscedastic semi-parametric transformation regression
models, i.e. the mean and variance function are modeled nonparametrically, while the
transformation function is chosen from a parametric class. A completely nonpara-
metric (homoscedastic) model is considered by Chiappori et al. (2015). Lewbel et al.
(2015) provide a test for the validity of such a model. The approach of Chiappori et al.
(2015) is modified and corrected by Colling and Van Keilegom (2019). The version
of the nonparametric transformation estimator considered in the latter paper is then
applied by Colling and Van Keilegom (2020) to suggest a new estimator of the trans-
formation parameter if it is assumed that the transformation belongs to a parametric
class.
In general, asymptotic theory for nonparametric transformation estimators is
sophisticated and parametric transformation estimators show much better performance
if the parametric model is true. A parametric transformation will thus lead to better
estimates of the regression function. Moreover, parametric transformations are eas-
ier to interpret and allow for subsequent inference in the transformation model. For
the latter purpose note that for transformation models with parametric transformation,
lack-of-fit tests for the regression function as well as tests for significance for covariate
components have been suggested by Colling and Van Keilegom (2016), Colling and
Van Keilegom (2017), Allison et al. (2018) and Kloodt and Neumeyer (2020). Those
tests cannot straightforwardly be generalized to nonparametric transformation models
because known estimators in that model do not allow for uniform rates of convergence
over the whole real line, see Chiappori et al. (2015) and Colling and Van Keilegom
(2019).
However, before applying a transformation model with parametric transformation, it
would be appropriate to test the goodness-of-fit of the parametric transformation class.
In the context of parametric quantile regression, Mu and He (2007) suggest such a
goodness-of-fit test. In the context of nonparametric mean regression Neumeyer et al.
(2016), develop a goodness-of-fit test for the parametric transformation class based
on an empirical independence process of pairs of residuals and covariates. The latter
approach was modified by Hušková et al. (2018), who applied empirical characteristic
functions. In a linear regression model with transformation of the response, Szydłowski
(2020) suggests a goodness-of-fit test for the parametric transformation class that is
based on a distance between the nonparametric transformation estimator considered
123
Specification testing in semi-parametric transformation models
by Chen (2002) and the parametric class. We will follow a similar approach but con-
sider a nonparametric regression model. The aim of the transformations we consider is
to induce independence between errors and covariates. The null hypothesis is that the
unknown transformation belongs to a parametric class. Note that when applied to the
special case of a class of transformations that contains as only element the identity, our
test provides indication on whether a classical homoscedastic regression model (with-
out transformation) is appropriate or whether first the response should be transformed.
Our test statistic is based on a minimum distance between a nonparametric transforma-
tion and the parametric transformations. We present the asymptotic distribution of the
test statistic under the null hypothesis of a parametric transformation and under local
alternatives of n1/2-rate. Under the null hypothesis, the limit distribution is that of a
degenerate U-statistic. With a flexible parametric class applying an appropriate trans-
formation can reduce the dependence enormously, even if the ‘true’ transformation
does not belong to the class. Thus, for the first time in the context of transformation
goodness-of-fit tests, we consider testing for so-called precise or relevant hypotheses.
Here, the null hypothesis is that the distance between the true transformation and the
parametric class is large. If this hypothesis is rejected, then the model with the para-
metric transformation fits well enough to be considered for further inference. Under
the new null hypothesis, the test statistic is asymptotically normally distributed. The
term “precise hypotheses” refers to Berger and Delampady (1987). Dette et al. (2020)
considered precise hypotheses in the context of comparing mean functions in the con-
text of functional time series. Note that the idea of precise hypotheses is related to
that of equivalence tests, which originate from the field of pharmacokinetics (see Lak-
ens (2017)). Throughout, we assume that the nonparametric transformation estimator
fulfills an asymptotic linear expansion. It is then shown that the estimator considered
by Colling and Van Keilegom (2019) fulfills this expansion and thus can be used for
evaluating the test statistic.
The remainder of the paper is organized as follows. In Sect. 2, we present the model
and the test statistic. Asymptotic distributions under the null hypothesis of a paramet-
ric transformation class and under local alternatives are presented in Sect. 3, which
also contains a consistency result and asymptotic results under relevant hypotheses.
Section 4presents a bootstrap algorithm and a simulation study. Section 1 of the
supplementary material contains assumptions for bootstrap results, while Section 2
there treats a specific nonparametric transformation estimator and shows that it fulfills
the required conditions. The proofs of the main results are given in Section 3 and a
rigorous treatment of bootstrap asymptotics is given in Section 4 of the supplement.
2 The model and test statistic
Assume we have observed (Xi,Yi),i=1,...,n, which are independent with the
same distribution as (X,Y)that fulfill the transformation regression model
h(Y)=g(X)+ε, (1)
123
N. Kloodt et al.
where E[ε]=0 holds and εis independent of the covariate X, which is RdX-valued,
while Yis univariate. The regression function gwill be modelled nonparametrically.
The transformation h:RRis strictly increasing. Throughout we assume that
given the joint distribution of (X,Y)and some identification conditions, there exists a
unique transformation hsuch that this model is fulfilled. It then follows that the other
model components are identified via g(x)=E[h(Y)|X=x]and ε=h(Y)g(X).
See Chiappori et al. (2015) for conditions under which the identifiability of hholds.
In particular, conditions are required to fix location and scale, and we will assume
throughout that
h(0)=0 and h(1)=1.(2)
Now let {Λθ:θΘ}be a class of strictly increasing parametric transformation
functions Λθ:RR, where ΘRdΘis a finite-dimensional parameter space. Our
purpose is to test whether a semi-parametric transformation model holds, i.e.
Λθ0(Y)g(X)ε,
for some parameter θ0Θ, where ˜εand Xare independent. Due to the assumed
uniqueness of the transformation hone obtains h=h0under validity of the semi-
parametric model, where
h0(·)=Λθ0(·)Λθ0(0)
Λθ0(1)Λθ0(0).
Thus, we can write the null hypothesis as
H0:hΛθ(·)Λθ(0)
Λθ(1)Λθ(0):θΘ(3)
which thanks to (2) can be formulated equivalently as
H0:hΛθ(·)c2
c1:θΘ,c1R+,c2R.(4)
Our test statistics will be based on the following L2-distance
dθ,h)=min
c1R+,c2R
Ew(Y){h(Y)c1+c2Λθ(Y)}2,(5)
where wis a positive weight function with compact support Yw. Its empirical coun-
terpart is
dnθ,ˆ
h):= min
c1C1,c2C2
1
n
n
j=1
w(Yj){ˆ
h(Yj)c1+c2Λθ(Yj)}2,
123
Specification testing in semi-parametric transformation models
where ˆ
hdenotes a nonparametric estimator of the true transformation has discussed
below, and C1R+,C2Rare compact sets. Assumption (A6) assures that the sets
are large enough to contain the true values. Let γ:= (c1,c2)and Υ:= C1×C2×Θ.
The test statistic is defined as
Tn=nmin
θΘdnθ,ˆ
h)=min
γ=(c1,c2)Υ
n
j=1
w(Yj){ˆ
h(Yj)c1+c2Λθ(Yj)}2(6)
and the null hypothesis should be rejected for large values of the test statistic. If the
null hypothesis holds, the minimizing parameters c1,c2in Eq. (5) can be written as
c1=Λθ0(1)Λθ0(0)and c2=Λθ0(0)for some θ0Θ. Hence, an alternative test
statistic
¯
Tn=min
θΘ
n
j=1
w(Yj)[ˆ
h(Yj){Λθ(1)Λθ(0)}+Λθ(0)Λθ(Yj)]2(7)
can be considered as well.
We will derive the asymptotic distributions under the null hypothesis and local and
fixed alternatives in Section 3 and suggest a bootstrap version of the tests in Section 4.
Remark 2.1 Colling and Van Keilegom (2019) consider the estimators
ˆ
θ:= arg min
θΘdnθ,ˆ
h)
and
˜
θ:= arg min
θΘn1
n
j=1
w(Yj)[ˆ
h(Yj){Λθ(1)Λθ(0)}+Λθ(0)Λθ(Yj)]2
for the parametric transformation (assuming H0) corresponding to Tnand ¯
Tn.They
observe that ˆ
θoutperforms ˜
θin simulations.
Nonparametric estimation of the transformation hhas been considered by Chiappori
et al. (2015) and Colling and Van Keilegom (2019). For our main asymptotic results,
we need that ˆ
hhas a linear expansion, not only under the null hypothesis, but also
under fixed alternatives and the local alternatives as defined in the next section. The
linear expansion should have the form
ˆ
h(y)h(y)=1
n
n
i=1
ψ(Zi,T(y)) +oP(n1/2)uniformly in yYw.(8)
Here, ψneeds to fulfil condition (A8) in Section 3, and we use the definitions (i=
1,...,n)
Zi=(Ui,Xi), Ui=T(Yi), T(y)=FY(y)FY(0)
FY(1)FY(0),(9)
123
N. Kloodt et al.
where FYdenotes the distribution of Yand is assumed to be strictly increasing on the
support of Y. To ensure that Tis well-defined, the values 0 and 1 are w.l.o.g. assumed
to belong to the support of Y, but can be replaced by arbitrary values a<bR
(in the support of Y). The expansion (8) could also be formulated with a linear term
n1n
i=1˜
ψ(Xi,Yi,y). In Section 2 of the supplement, we reproduce the definition of
the estimator ˆ
hthat was suggested by Colling and Van Keilegom (2019) as modification
of the estimator by Chiappori et al. (2015). We give regularity assumptions under
which the desired expansion holds, see Lemma 1. Other nonparametric estimators of
the transformation that fulfill the expansion could be applied as well.
3 Asymptotic results
In this section, we will derive the asymptotic distribution under the null hypothesis
and under local and fixed alternatives. For the formulation of the local alternatives,
consider the null hypothesis as given in (4), i.e. h(·)c1+c2=Λθ0(·)for some θ0Θ,
c1R+,c2R, and instead assume
H1,n:h(·)c1+c2=Λθ0(·)+n1/2r(·)for some θ0Θ,c1R+,c2
Rand some function r.
Due to the identifiability conditions (2), one obtains c2=Λθ0(0)+n1/2r(0)and
c1=Λθ0(1)Λθ0(0)+n1/2(r(1)r(0)). Assumption (A5) yields boundedness
of r, so that we rewrite the local alternative as
h(·)=Λθ0(·)Λθ0(0)+n1/2(r(·)r(0))
Λθ0(1)Λθ0(0)+n1/2(r(1)r(0))
=h0(·)+n1/2r0(·)+o(n1/2), (10)
where h0(·)=θ0(·)Λθ0(0))/(Λθ0(1)Λθ0(0)) and
r0(·)=r(·)r(0)h0(·)(r(1)r(0))
Λθ0(1)Λθ0(0).
Note that the null hypothesis H0is included in the local alternative H1,nby considering
r0 which gives h=h0. We assume the following data generating model under the
local alternative H1,n. Let the regression function g, the errors εiand the covariates
Xibe independent of nand define Yi=h1(g(Xi)+εi)(i=1,...,n), which under
local alternatives depends on nthrough the transformation h. Throughout we use the
notation (i=1,...,n)
Si=h(Yi)=g(Xi)+εi.(11)
Further, recall the definition of Uiin (9). Note that the distribution of Uidoes not
depend on n, even under local alternatives, because FY(Yi)is uniformly distributed
123
Specification testing in semi-parametric transformation models
on [0,1], while FY(0)=P(Yi0)=P(h(Yi)h(0)) =P(Si0)due to (2),
and similarly FY(1)=P(Si1).
To formulate our main result, we need some more notations. With ψfrom (8), Zi
from (9) and Sifrom (11) define (i=1,...,n)
˙
Λθ(y)=
∂θk
Λθ(y)k=1,...,dΘ
R(s)=(s,1,˙
Λθ0(h1
0(s)))t(12)
Rf(s)=˙
Λθ(1)t˙
Λθ(0)t,˙
Λθ(0)t,˙
Λθ0(h1
0(s))t(s,1,1)t(13)
Γ0=E[w(h1
0(S1))R(S1)R(S1)t](14)
Γ0,f=E[w(h1
0(S1))Rf(S1)Rf(S1)t](15)
ϕ(z)=E[w(h1
0(S2))ψ( Z1,U2)R(S2)|Z1=z](16)
ϕf(z)=E[w(h1
0(S2))ψ( Z1,U2)Rf(S2)|Z1=z](17)
ζ(z1,z2)=Ew(h1
0(S3))ψ(Z1,U3)ϕ(Z1)tΓ1
0R(S3)
×ψ(Z2,U3)ϕ(Z2)tΓ1
0R(S3)|Z1=z1,Z2=z2(18)
ζf(z1,z2)=Ew(h1
0(S3))ψ(Z1,U3)ϕf(Z1)tΓ1
0,fRf(S3)
×ψ(Z2,U3)ϕf(Z2)tΓ1
0,fRf(S3)|Z1=z1,Z2=z2(19)
¯r(s)=r0(h1
0(s)) E[w(h1
0(S1))r0(h1
0(S1))R(S1)]tΓ1
0R(s)(20)
¯rf(s)=r0(h1
0(s)) E[w(h1
0(S1))r0(h1
0(S1))Rf(S1)]tΓ1
0,fRf(s)(21)
˜
ζ(z1)=2E[w(h1
0(S2))ψ( Z1,U2)¯r(S2)|Z1=z](22)
˜
ζf(z1)=2E[w(h1
0(S2))ψ( Z1,U2)¯rf(S2)|Z1=z](23)
and let PZand FZdenote the law and distribution function, respectively, of Zi.
The quantities which are marked with an “ f”, referring to “fixed” parameters c1=
Λˆ
θ(1)Λˆ
θ(0)and c2=Λˆ
θ(0), will be used to describe the asymptotic behaviour of
the test statistic ¯
Tn.
With these notations, the assumptions for the asymptotic results can be formulated.
To this end, let Ydenote the support of Y(which depends on nunder local alternatives).
Further, FSdenotes the distribution function of S1as in (11) and TSdenotes the
transformation s→ (FS(s)FS(0))/(FS(1)FS(0)). The following assumptions
are used.
(A1)ThesetsC1,C2and Θare compact.
(A2) The weight function wis continuous with a compact support YwY.
(A3)Themap(y,θ) → Λθ(y)is twice continuously differentiable on Ywwith
respect to θand the (partial) derivatives are continuous in (y,θ) Yw×Θ.
(A4) There exists a unique strictly increasing and continuous transformation hsuch
that model (1) holds with Xindependent of ε.
123
N. Kloodt et al.
(A5) The function h0defined in (10) is strictly increasing and continuously differ-
entiable and ris continuous on Yw.FYis strictly increasing on the support of
Y.
(A6) Minimizing the functions M:ΥR=(c1,c2,θ) → Ew(Y)(h0(Y)c1+
c2Λθ(Y))2and ¯
M:ΘR→ Ew(Y)(h0(Y)(Λθ(1)Λθ(1)) +
Λθ(0)Λθ(Y))2leads to unique solutions γ0=(c1,0,c2,0
0)and θ0in the
interior of Υand Θ, respectively. For all θ= ˜
θit is sup
ysupp(w)
Λθ(y)Λθ(0)
Λθ(1)Λθ(0)
Λ˜
θ(y)Λ˜
θ(0)
Λ˜
θ(1)Λ˜
θ(0)
>0.
(A7) The Hessian matrices Γ0:= Hess M0)and Γ0,f:= Hess ¯
0)are positive
definite.
(A8) The transformation estimator ˆ
hfulfils (8) for some function ψ.ForsomeU0
(independent of nunder local alternatives) with TS(h(Yw)) U0the function
class {z→ ψ(z,t):tU0}is Donsker with respect to PZand E[ψ(Z1,t)]=
0 for all tU0. The fourth moments E[w(h1
0(S1))ψ( Z1,U1)4]and
E[w(h1
0(S1))ψ( Z2,U1)4]are finite.
When considering a fixed alternative H1or the relevant hypothesis H
0below, (A6) and
(A8) are replaced by the following Assumptions (A6’) and (A8’) (assumption (A8’)
is only relevant for H
0). Note that his a fixed function then, not depending on n.
(A6’) Minimizing the functions M:ΥR=(c1,c2,θ) → Ew(Y)(h(Y)c1+
c2Λθ(Y))2and ¯
M:ΘR→ Ew(Y)(h(Y)(Λθ(1)Λθ(1)) +
Λθ(0)Λθ(Y))2leads to unique solutions γ0=(c1,0,c2,0
0)and θ0in the
interior of Υand Θ, respectively. For all θ= ˜
θ,itis sup
ysupp(w)
Λθ(y)Λθ(0)
Λθ(1)Λθ(0)
Λ˜
θ(y)Λ˜
θ(0)
Λ˜
θ(1)Λ˜
θ(0)
>0.
(A8’) The transformation estimator ˆ
hfulfills (8) for some function ψ.Forsome
U0TS(h(Yw)), the function class {z→ ψ(z,t):tU0}is Donsker
with respect to PZand E[ψ(Z1,t)]=0 for all tU0. Further, one has
E[ψ(Z1,U2)2]<.
Remark 3.1 Assumptions concerning compactness of the parameter spaces, differ-
entiability of model components and uniqueness of the minimizer γ0are standard
assumptions in the context of goodness of fit tests. Moreover, it can be shown that the
definitions of Γ0and Γ0,fin (A7) coincide with those in Eqs. (14) and (15), respec-
tively. Assumption (A8) controls the asymptotic behaviour of ˆ
hhand thus the rate
of local alternatives which can be detected. The Donsker and boundedness conditions
are needed to obtain uniform convergence rates of ˆ
hhand some negligible remain-
ders in the proof. Assumption (A8’) is the counterpart of Assumption (A8) for precise
hypotheses as considered in (24).
Theorem 3.2 Assume (A1)–(A8). Let k)k∈{1,2,... }and k,f)k∈{1,2,... }be the eigen-
values of the operators
Kρ(z1):= ρ(z2 (z1,z2)dFZ(z2)and K fρ(z1):= ρ(z2f(z1,z2)dFZ(z2),
123
Specification testing in semi-parametric transformation models
respectively, with corresponding eigenfunctions k)k∈{1,2,... }and k,f)k∈{1,2,... },
which are each orthonormal in the L2-space corresponding to the distribution FZ.
Let (Wk)k∈{1,2,... }be independent and standard normally distributed random vari-
ables and let W0be centred normally distributed with variance E [˜
ζ(Z1)2]such that
for all K Nthe random vector (W0,W1,...,WK)tfollows a multivariate nor-
mal distribution with Cov(W0,Wk)=E[˜
ζ(Z1k(Z1)]for all k =1,..., K . Let
W0,fand (Wk,f)k∈{1,2,... }be defined similarly with E [W2
0,f]=E[˜
ζf(Z1)2]and
Cov(W0,Wk)=E[˜
ζf(Z1k(Z1)]for all k N0. Then, under the local alternative
H1,n,T
nconverges in distribution to
θ0(1)Λθ0(0))2
k=1
λkW2
k+W0+Ew(h1
0(S1))¯r(S1)2
and ¯
Tnconverges in distribution to
θ0(1)Λθ0(0))2
k=1
λk,fW2
k,f+W0,f+Ew(h1
0(S1))¯rf(S1)2.
In particular, under H0(i.e. for r 0), Tnand ¯
Tnconverge in distribution to
T=θ0(1)Λθ0(0))2
k=1
λkW2
kand ¯
T=θ0(1)Λθ0(0))2
k=1
λk,fW2
k,f,
respectively.
The proof is given in Section 3 of the supplementary material. Asymptotic level-α
tests should reject H0if Tnor ¯
Tnare larger than the (1α)-quantile of the distribution
of Tor ¯
T, respectively. As the distributions of Tand ¯
Tdepend in a complicated way
on unknown quantities, we will propose a bootstrap procedure in Section 4. Although
most results hold similarly for Tnand ¯
Tn, for ease of presentation, we will mainly
focus on results for Tnin the remainder.
Remark 3.3
1. Note that ζ(z1,z2)=E[I(z1)I(z2)]with
I(z):= w(h1
0(S1))1/2ψ(z,U1)ϕ(z)tΓ1
0R(S1)
and ψfrom (8). Thus, the operator Kdefined in Theorem 3.2 is positive semi-
definite.
2. The appearance of W0under the local alternative results from asymptotic theory
for degenerate U-statistics. Related phenomena occur in the case of quadratic
forms. Similar to the proof of Theorem 3.1, consider some zn+cn1/2,where
n1/2znconverges to a centred normally distributed random variable, say z, and
123
N. Kloodt et al.
we have c=0 under H0. Moreover, consider a quadratic form zT
nAnzn, where
Anis a positive definite matrix and n1Anconverges to a matrix A. Then, under
H0,zT
0Anz0=zT
nAnznconverges to zTAz, which has a χ2distribution. However,
under H1,n,wehave
zT
0Anz0=zT
nAnzn+2cTn1/2Anzn+cTn1Anc,
where the first term on the right-hand side is as before. However, the second term
converges to 2cTAz, which is normally distributed and corresponds to W0in our
context. The last term converges to a constant cTAc, corresponding to the constant
summand in the limit in Theorem 3.1. Note that the limit of zT
0Anz0cannot be
negative due to the positive definiteness of An.
Next, we consider fixed alternatives of a transformation hthat do not belong to the
parametric class, i. e.
H1:dθ,h)>0 for all θΘ.
Theorem 3.4 Assume (A1)–(A4), (A6’) and let ˆ
h estimate h uniformly consistently on
compact sets. Then, under H1,limn→∞ P(Tn>q)=1and limn→∞ P(¯
Tn>q)=1
for all q R, that is, the proposed tests are consistent.
The proof is given in Section 3 of the supplement.
The transformation model with a parametric transformation class might be useful
in applications even if the model does not hold exactly. With a good choice of θapply-
ing the transformation Λθcan reduce the dependence between covariates and errors
enormously. Estimating an appropriate θis much easier than estimating the trans-
formation hnonparametrically. Consequently, one might prefer the semi-parametric
transformation model over a completely nonparametric one. It is then of interest how
far away we are from the true model. Therefore, in the following, we consider testing
precise hypotheses (relevant hypotheses)
H
0:min
θΘdθ,h)ηand H
1:min
θΘdθ,h)<η. (24)
If a suitable test rejects H
0for some small η(fixed beforehand by the experimenter),
the model is considered “good enough” to work with, even if it does not hold exactly.
To test those hypotheses, we will use the same test statistic as before, but we have to
standardize differently. Assume H
0, then his a transformation which does not belong
to the parametric class, i.e. the former fixed alternative H1holds. Let
M(γ ) =M(c1,c2,θ) =E{w(Y)(h(Y)c1+c2Λθ(Y))2},
and let
γ0=(c1,0,c2,0
0):= arg min
(c1,c2)ΥM(c1,c2).
123
Specification testing in semi-parametric transformation models
If R+and Rare replaced by C1and C2in the definition of din (5) one has
min
c1C1,c2C2
M(γ ) =dθ,h)for all θΘ. Assume that
Γ=E
w(Y1)
h(Y1)2h(Y1)h(Y1)˙
Λθ0(Y1)
h(Y1)1˙
Λθ0(Y1)
h(Y1)˙
Λθ0(Y1)t˙
Λθ0(Y1)tΓ
3,3
(25)
is positive definite, where Γ
3,3=˙
Λθ0(Y1)t˙
Λθ0(Y1)¨
Λθ0(Y1)˜
R1with
¨
Λθ(y)=2
∂θk∂θ
Λθ(y)k,=1,...,dΘ
and ˜
Ri=h(Yi)c1,0+c2,0Λθ0(Yi)(i=1,...,n).
Theorem 3.5 Assume (A1)–(A4), (A6’), (A8’), let (A7) hold with γ0from (A6’)
and let Γbe positive definite. Then,
n1/2(Tn/nM0)) D
N02
with σ2=Var w(Y1)˜
R2
1+δ(Z1), where δ(Z1)=2c1,0E[w(Y2)ψ( Z1,U2)˜
R2|Z1].
The proof is given in Section 3 of the supplementary material. It is conjectured,
that a similar result can be derived for ¯
Tn, although the corresponding Hessian matrix
might become more complex.
A consistent asymptotic level-α-test rejects H
0if (Tnnη)/(nˆσ2)1/2<uα, where
uαis the α-quantile of the standard normal distribution and ˆσ2is a consistent estimator
of σ2. Further research is required on suitable estimators of σ2.Let ˆγ=(ˆc1,ˆc2,ˆ
θ)t
be the minimizer in Eq. (6). For some intermediate sequences (mn)nN,(qn)nNwith
qn=n/mn−1 we considered
ˆσ2:= 1
qn
qn
s=12ˆc1mn
n
n
k=1
w(Yk)ˆ
h(s)(Yk)ˆ
h(Yk)(ˆ
h(Yk)ˆc1c2Λˆ
θ(Yk))
+1
mn
smn
j=(s1)mn+1w(Yj)( ˆ
h(Yj)ˆc1c2Λˆ
θ(Yj))2
1
n
n
i=1
w(Yi)( ˆ
h(Yi)ˆc1c2Λˆ
θ(Yi))22
as an estimator of σ2, where ˆ
h(s)denotes the nonparametric estimator of hdepending
on the subsample (Y(s1)mn+1,X(s1)mn+1), . . . , (Ysmn,Xsmn), s=1,...,qn,but
suitable choices for mnare still unclear. Alternatively, a self-normalization approach
as in Shao (2010), Shao and Zhang (2010) or Dette et al. (2020) can be applied. For this
purpose, let s(0,1)and let ˆ
hsand ˆγs=(ˆcs,1,ˆcs,2,ˆ
θs)tbe defined as ˆ
hand ˆγ,but
123
N. Kloodt et al.
based on the subsample (Y1,X1), . . . , (Yns,Xns). Moreover, let KN,0<t1<
···<tK<1 and let νbe a probability measure on (0,1)with ν({t1,...,tK})=1.
Define
Vn:= 1
0ns
k=1
w(Yk)( ˆ
hs(Yk)ˆc1,sc2,sΛˆ
θs(Yk))2
s
n
k=1
w(Yk)( ˆ
h(Yk)ˆc1c2Λˆ
θ(Yk))22
ν(ds)
as well as
˜
Tn:= TnnM0)
Vn
.
In Section 5 of the supplementary material, it is shown that ˜
Tn
D
˜
Tfor some random
variable ˜
Tand that the distribution of ˜
Tdoes not depend on any unknown parameters.
Hence, its quantiles can be simulated and ˜
Tncan be used to test for the hypotheses H
0
and H
1.
Remark 3.6 Note that not rejecting the null hypothesis H0does not mean that the null
hypothesis is valid. Consequently, alternative approaches like for example increas-
ing the level to accept more transformation functions instead of testing for precise
hypotheses as in (24) in general do not necessarily result in evidence for applying a
transformation model.
4 A bootstrap version and simulations
Although Theorem 3.2 shows how the test statistic behaves asymptotically under H0,
it is hard to extract any information about how to choose appropriate critical values of
a test that rejects H0for large values of Tn. The main reasons for this are that first for
any function ζthe eigenvalues of the operator defined in Theorem 3.2 are unknown
that second this function is unknown and has to be estimated as well, and that third
even ψ(which would be needed to estimate ζ) mostly is unknown and rather complex
(see e.g. Section 2 of the supplement). Therefore, approximating the α-quantile, say
qα, of the distribution of Tin Theorem 3.2 in a direct way is difficult and instead we
suggest a smooth bootstrap algorithm to approximate qα.
Algorithm 4.1 Let (Y1,X1),...,(Yn,Xn)denote the observed data, define
hθ(y)=Λθ(y)Λθ(0)
Λθ(1)Λθ(0)and gθ(x)=E[hθ(Y)|X=x]
and let ˆgbe a consistent estimator of gθ0, where θ0is defined as in (A6) under the
null hypothesis and as in (A6’) under the alternative. Let κand be smooth Lebesgue
123
Specification testing in semi-parametric transformation models
densities on RdXand R, respectively, where is strictly positive, κhas bounded
support and κ(0)>0. Let (an)nand (bn)nbe positive sequences with an0,
bn0, nan→∞,nbdX
n→∞. Denote by mNthe sample size of the bootstrap
sample.
(1) Calculate ˆγ=(ˆc1,ˆc2,ˆ
θ)t=arg min
γΥn
i=1w(Yi)( ˆ
h(Yi)c1+c2Λθ(Yi))2.
Estimate the parametric residuals εi0)=hθ0(Yi)gθ0(Xi)by ˆεi=hˆ
θ(Yi)
ˆg(Xi)and denote centered versions by ˜εiεin1n
j=1ˆεj,i=1,...,n.
(2) Generate X
j,j=1,...,m, independently (given the original data) from the
density
fX(x)=1
nbdX
n
n
i=1
κxXi
bn
(which is a kernel density estimator of fXwith kernel κand bandwidth bn).
For j=1,...,mdefine bootstrap observations as
Y
j=(h)1ˆg(X
j)+ε
jfor h(·)=Λˆ
θ(·)Λˆ
θ(0)
Λˆ
θ(1)Λˆ
θ(0),(26)
where ε
jis generated independently (given the original data) from the density
1
n
n
i=1
1
an
˜εi−·
an
(which is a kernel density estimator of the density of ε(θ0)with kernel and
bandwidth an).
(3) Calculate the bootstrap estimate ˆ
hfor hfrom (Y
j,X
j), j=1,...,m.
(4) Calculate the bootstrap statistic T
n,m=min
(c1,c2)Υm
j=1w(Y
j)( ˆ
h(Y
j)c1+
c2Λθ(Y
j))2.
(5)LetBN. Repeat steps (2)–(4)Btimes to obtain the bootstrap statis-
tics T
n,m,1,...,T
n,m,B.Letq
αdenote the quantile of T
n,mconditional on
(Yi,Xi), i=1,...,n. Estimate q
αby
ˆq
α=min z∈{T
n,m,1,...,T
n,m,B}: 1
B
B
k=1
I{T
n,m,kz}α.
Remark 4.2
1. The reason for resampling the bootstrap data X
j,j=1,...,m,nonparametri-
cally consists in the need to mimic the original transformation estimator and its
asymptotic behaviour with the bootstrap estimator conditional on the data. There-
fore, to proceed in the proof as in Colling and Van Keilegom (2019), it is necessary
123
N. Kloodt et al.
to smooth the distribution of X. The properties nbdX
n→∞and κ(0)>0 ensure
that conditional on the original data (Y1,X1),...,(Yn,Xn)the support of X
contains that of v(from assumption (B7) in Section 2 of the supplement) with
probability converging to one. Thus, vcan be used for calculating ˆ
has well.
2. To proceed as in Algorithm 4.1, it may be necessary to modify hso that S
j=
ˆg(X
j)+ε
jbelongs to the domain of (h)1for all j=1,...,m. As long as
these modifications do not have any influence on h(y)for yYw, the influence
on the ˆ
hand Tn,mshould be asymptotically negligible (which can be proven for
the estimator by Colling and Van Keilegom (2019)).
The bootstrap algorithm should fulfil two properties: On the one hand, under the null
hypothesis, the algorithm has to provide, conditionally on the original data, consistent
estimates of the quantiles of Tn, or rather its asymptotic distribution from Theorem 3.2.
On the other hand, to be consistent under H1the bootstrap quantiles have to stabilize
or at least converge to infinity with a rate less than that of Tn.Toformalizethis,
let , A,P)denote the underlying probability space. Assume that (Ω, A)can be
written as Ω=Ω1×Ω2and A=A1A2for some measurable spaces 1,A1)
and 2,A2). Further, assume that Pis characterized as the product of a probability
measure P1on 1,A1)and a Markov kernel
P1
2:Ω1×A2→[0,1],
that is P=P1P1
2. While randomness with respect to the original data is modelled
by P1, randomness with respect to the bootstrap data and conditional on the original
data is modelled by P1
2. Moreover, assume
P1
2(ω, A)=PΩ1×A|(Y1(ω), X1(ω )), . . . , (Yn(ω), Xn(ω))
for all ωΩ1,AA2.
With these notations, the assumptions (A8) and (A9) from Section 1 of the supple-
mentary material can be formulated.
Theorem 4.3 Let q
αdenote the bootstrap quantile from Algorithm 4.1.
1. Assume H0,(A1)–(A8),(A8),(A9). Then, q
αfulfils
P1ωΩ1:lim sup
m→∞ |q
αqα|
=o(1)
for all δ>0. Hence, P (Tn>q
α)=α+o(1)under the null hypothesis.
2. Assume H1,(A1)–(A4),(A6’),(A8). Then, q
αfulfils
P1ωΩ1:Tn>lim sup
m→∞
q
α=1+o(1),
so that P (Tn>q
α)=1+o(1)under the alternative.
123
Specification testing in semi-parametric transformation models
The proof is given in the supplement. Since only ˆ
θis used to generate the bootstrap
observations in Algorithm 4.1, it is conjectured that Theorem 4.3 can be generalized
to the usage of ¯
Tnfrom (7) in Algorithm 4.1.
Simulations
Throughout this section, g(X)=4X1, XU([0,1])and εN(0,1)are
chosen. Moreover, the null hypothesis of hbelonging to the Yeo and Johnson (2000)
transformations
Λθ(Y)=
(Y+1)θ1
θ,if Y0= 0
log(Y+1), if Y0=0
(1Y)2θ1
2θ,if Y<0= 2
log(1Y), if Y<0=2.
with parameter θΘ0=[0,2]is tested. Under H0, we generate data using the
transformation h=θ0(·)Λθ0(0))/(Λθ0(1)Λθ0(0)) to match the identification
constraints h(0)=0,h(1)=1. Under the alternative, we choose transformations h
with an inverse given by the following convex combination,
h1(Y)=(1c)(Λ1
θ0(Y)Λ1
θ0(0)) +c(r(Y)r(0))
(1c)(Λ1
θ0(1)Λ1
θ0(0)) +c(r(1)r(0)) (27)
for some θ0∈[0,2], some strictly increasing function rand some c∈[0,1].In
general, it is not clear if a growing factor cleads to a growing distance (5). Indeed,
the opposite might be the case, if ris somehow close to the class of transforma-
tion functions considered in the null hypothesis. Simulations were conducted for
r1(Y)=5Φ(Y),r2(Y)=exp(Y)and r3(Y)=Y3, where Φdenotes the cumulative
distribution function of a standard normal distribution, and c=0,0.2,0.4,0.6,0.8,1.
The prefactor in the definition of r1is introduced because the values of Φare rather
small compared to the values of Λθ, that is, even when using the presented convex
combination in (27), Λθ0(except for c=1) would dominate the “alternative part” r
of the transformation function without this factor. Note that r2and Λ0only differ with
respect to a different standardization. Therefore, if his defined via (27) with r=r2
the resulting function is for c=1 close to the null hypothesis case.
For calculating the test statistic the weighting function wwas set equal to one.
The nonparametric estimator of hwas calculated as in Colling and Van Keilegom
(2019) (see Section 2 of the supplement for details) with the Epanechnikov kernel
K(y)=3
4(1y2)2I[−1,1](y)and a normal reference rule bandwidth (see for example
Silverman (1986))
hu=40π
n1
5
ˆσu,hx=40π
n1
5
ˆσx,
123
N. Kloodt et al.
where ˆσ2
uand ˆσ2
xare estimators of the variance of U=T(Y)and X, respectively.
The number of evaluation points Nxfor the nonparametric estimator of hwas set
equal to 100 (see Section 2 of the supplement for details). The integral in (S3) was
computed by applying the function integrate implemented in R. In each simulation run,
n=100 independent and identically distributed random pairs (Y1,X1),...,(Yn,Xn)
were generated as described before, and 250 bootstrap quantiles, which are based
on m=100 bootstrap observations (Y
1,X
1),...,(Y
m,X
m), were calculated as in
Algorithm 4.1 using κthe U([−1,1])-density, the standard normal density and
an=bn=0.1. To obtain more precise estimators of the rejection probabilities
under the null hypothesis, 800 simulation runs were performed for each choice of θ0
under the null hypothesis, whereas in the remaining alternative cases 200 runs were
conducted. Among other things the nonparametric estimation of h, the integration in
(S3), the optimization with respect to θand the number of bootstrap repetitions cause
the simulations to be quite computationally demanding. Hence, an interface for C++
as well as parallelization were used to conduct the simulations.
The main results of the simulation study are presented in Table 1. There, the rejec-
tion probabilities of the settings with h=θ0(·)Λθ0(0))/(Λθ0(1)Λθ0(0))
under the null hypothesis, and has in (27) under the alternative with r∈{r1,r2,r3},
c∈{0,0.2,0.4,0.6,0.8,1}and θ0∈{0,0.5,1,2}are listed. The significance level
was set equal to 0.05 and 0.10. Note that the test sticks to the level or is even a bit
conservative. Under the alternatives, the rejection probabilities not only differ between
different choices of r, but also between different transformation parameters θ0that are
inserted in (27). While the test shows high power for some alternatives, there are also
cases, where the rejection probabilities are extremely small. There are certain reasons
that explain these observations. First, the class of Yeo-Johnson transforms seems to be
quite general and second the testing approach itself is rather flexible due to the mini-
mization with respect to γ. Having a look at the definition of the test statistic in (6), it
attains small values if the true transformation function can be approximated by a linear
transformation of Λ˜
θfor some appropriate ˜
θ∈[0,2]. In the following, this issue will
be explored further by analysing some graphics. All of the three figures that occur in
the following have the same structure and consist of four panels. The upper left panel
shows the true transformation function with inverse function (27). Since Ydepends on
the transformation function, the values of S=h(Y)=g(X)+ε, which are displayed
on the vertical axis, are fixed for a comparison of different transformation functions.
Due to the choice of g(X)=4X1 and XU([0,1])the vertical axis reaches from
1 to 3, which would be the support of h(Y)if the error is neglected. In the upper right
panel, the parametric estimator of this function is displayed. Both of these functions
are then plotted against each other in the lower left panel by pairing values with the
same Scomponent. Finally, the function Y→ Λθ0(Y1
θ0(1)Λ1
θ0(0)) +Λ1
θ0(0)),
which represents the part of hcorresponding to the null hypothesis, is plotted against
the true transformation function in the last panel.
In the lower left panel, one can see if the true transformation function can be
approximated by a linear transform of Λ˜
θfor some ˜
θ∈[0,2], which is an indicator
for rejecting or not rejecting the null hypothesis as was pointed out before. As already
mentioned, the rejection probabilities not only differ between different deviation func-
tions r, but also within these settings. For example, when considering r=r1with
123
Specification testing in semi-parametric transformation models
Table 1 Rejection probabilities at θ0∈{0,0.5,1,2}and r∈{r1,r2,r3}for the test statistic Tn
Level αθ
0=0θ0=0.5θ0=1θ0=2
0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10
r1Null hyp. 0.01000 0.04000 0.03125 0.08750 0.03125 0.07750 0.01625 0.05625
c=0.2 0.000 0.010 0.075 0.105 0.010 0.015 0.000 0.020
c=0.4 0.000 0.000 0.020 0.045 0.000 0.015 0.120 0.200
c=0.6 0.100 0.155 0.035 0.050 0.085 0.150 0.415 0.545
c=0.8 0.685 0.765 0.110 0.210 0.505 0.645 0.785 0.890
c=1 0.965 0.990 0.925 0.975 0.975 0.985 0.985 0.990
r2c=0.2 0.010 0.035 0.030 0.045 0.515 0.640 0.885 0.965
c=0.4 0.015 0.040 0.000 0.005 0.060 0.135 0.870 0.980
c=0.6 0.035 0.085 0.000 0.005 0.005 0.005 0.625 0.815
c=0.8 0.020 0.040 0.010 0.040 0.000 0.005 0.185 0.325
c=1 0.020 0.065 0.030 0.090 0.025 0.095 0.050 0.105
r3c=0.2 0.330 0.505 0.730 0.855 0.810 0.905 0.930 0.995
c=0.4 0.730 0.865 0.815 0.945 0.875 0.970 0.915 0.990
c=0.6 0.880 0.940 0.895 0.960 0.950 0.995 0.940 0.990
c=0.8 0.895 0.965 0.925 0.975 0.935 0.990 0.915 0.980
c=1 0.980 0.990 0.960 0.990 0.939 0.990 0.940 0.985
123
N. Kloodt et al.
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
−1 0 123
Y
True Transformation Function
10123
−1 0 1 23
Y
Parametric Estimator
−1 0 1 2 3
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
True Transformation Function
Parametric Estimator
−1 0 1 2 3
−1.5 −0.5 0.0 0.5 1.0 1.5 2.0
True Transformation Function
Param. Funct. at Original Parameter
Fig. 1 Some transformation functions for θ0=0.5,c=0.6andr=r1
−1.0 −0.5 0.0 0.5 1.0 1.5
−1 0 123
Y
True Transformation Function
−2 −1 0 1 2
−1 0 1 23
Y
Parametric Estimator
−1 0 1 2 3
−0.5 0.0 0.5 1.0 1.5 2.0
True Transformation Function
Parametric Estimator
−1 0 1 2 3
−0.5 0.0 0.5 1.0 1.5 2.0
True Transformation Function
Param. Funct. at Original Parameter
Fig. 2 Some transformation functions for θ0=2,c=0.6andr=r1
c=0.6 the rejection probabilities for θ0=0.5 amount to 0.035 for α=0.05 and
to 0.050 for α=0.10, while for θ0=2, they are 0.415 and 0.545. Figures 1and 2
explain why the rejection probabilities differ that much. While for θ0=0.5 the trans-
formation function can be approximated quite well by transforming Λ1.06 linearly,
the best approximation for θ0=2 is given by Λ1.94 and seems to be relatively bad.
The best approximation for c=1 can be reached for θaround 1.4. In contrast to that
considering θ0=2 and r=r3results in a completely different picture. As can be
123
Specification testing in semi-parametric transformation models
202468
−1 0 123
Y
True Transformation Function
−1 0 1 2 3
−1 0 1 23
Y
Parametric Estimator
−1 0 1 2 3
−2 0 2 4 6
True Transformation Function
Parametric Estimator
−1 0 1 2 3
0510 15 20 25
True Transformation Function
Param. Funct. at Original Parameter
Fig. 3 Some transformation functions for θ0=2,c=0.2andr=r3
seen in Fig. 3even for c=0.2 the resulting hdiffers so much from the null hypothesis
that it cannot be linearly transformed into a Yeo-Johnson transform (see the lower left
subgraphic). Consequently, the rejection probabilities are rather high.
A way to overcome this problem can consist in applying the modified test statistic
¯
Tnfrom (7). Although Colling and Van Keilegom (2020) showed that the estimator
ˆ
θseems to outperform ˜
θfrom Remark 2.1 in simulations, fixing c1,c2beforehand
might lead due to the reduced flexibility of the minimization procedure to higher
rejection probabilities when using ¯
Tninstead of Tn. Table 2contains rejection proba-
bilities which are based on the bootstrap version of ¯
Tn. The same simulation setting
and procedures as before have been used. Indeed, some of the rejection probabili-
ties have increased compared to Table 1. For example, the rejection probabilities for
r=r1
0=0.5 and c=0.6 amount to 0.115 and 0.17 instead of 0.0035 and 0.05
in Table 1. Nevertheless, this cannot be generalized since the rejection probabilities
when using ¯
Tnare sometimes below those for Tn,e.g.forθ0=0 and r=r1or θ0=2
and r=r2.
Under some alternatives the rejection probabilities are even smaller than the level.
This behaviour indicates that from the presented test’s perspective, these models seem
to fulfil the null hypothesis more convincingly than the null hypothesis models them-
selves. The reason for this is shown in Fig. 4for the setting θ0=1,c=0.4 and
r=r1. There, the relationship between the nonparametric estimator of the transfor-
mation function and the true transformation function is shown. While the diagonal line
represents the identity, the nonparametric estimator seems to flatten the edges of the
transformation function. In contrast to this, using r=r1in (27) steepens the edges so
that both effects neutralize each other. Similar effects cause low rejection probabilities
for r=r2, although the reasoning is slightly more sophisticated and is also associated
with the boundedness of the parameter space Θ0=[0,2].
123
N. Kloodt et al.
Table 2 Rejection probabilities at θ0∈{0,0.5,1,2}and r∈{r1,r2,r3}for the test statistic ¯
Tn
Level αθ
0=0θ0=0.5θ0=1θ0=2
0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10
r1Null hyp. 0.03375 0.08250 0.03875 0.08375 0.04250 0.07500 0.06000 0.10375
c=0.2 0.025 0.040 0.035 0.115 0.055 0.090 0.230 0.285
c=0.4 0.010 0.025 0.070 0.155 0.155 0.245 0.525 0.595
c=0.6 0.055 0.130 0.115 0.170 0.505 0.585 0.780 0.840
c=0.8 0.235 0.485 0.515 0.610 0.840 0.885 0.925 0.940
c=1 0.990 0.995 0.990 0.995 0.995 1.000 0.980 0.985
r2c=0.2 0.035 0.040 0.035 0.045 0.185 0.405 0.795 0.950
c=0.4 0.035 0.060 0.020 0.060 0.030 0.055 0.715 0.920
c=0.6 0.035 0.065 0.020 0.055 0.015 0.050 0.470 0.705
c=0.8 0.050 0.085 0.035 0.045 0.030 0.060 0.105 0.245
c=1 0.030 0.060 0.045 0.095 0.080 0.130 0.055 0.125
r3c=0.2 0.230 0.395 0.560 0.765 0.725 0.880 0.900 1.000
c=0.4 0.625 0.780 0.760 0.915 0.810 0.945 0.780 0.970
c=0.6 0.750 0.935 0.865 0.975 0.845 0.985 0.780 0.985
c=0.8 0.810 0.980 0.825 0.965 0.805 0.960 0.810 0.990
c=1 0.725 0.970 0.725 0.970 0.680 0.970 0.750 0.990
123
Specification testing in semi-parametric transformation models
2024
−1 0 1 2 3
True Transformation Function
Estimated Transformation Function
Fig. 4 Transformation function for θ0=1,c=0.4andr=r1on the horizontal axis and its nonparametric
estimator on the vertical axis. The identity is displayed in red
Table 3 Rejection probabilities at θ0=1andθ0=2forr=r1
Param. Alternative Original framework Modified weighting
Level α=0.05 α=0.10 α=0.05 α=0.10
θ0=1 Null hyp. 0.03125 0.07750 0.02875 0.07875
c=0.2 0.010 0.015 0.040 0.100
c=0.4 0.000 0.015 0.205 0.320
c=0.6 0.085 0.150 0.590 0.715
c=0.8 0.505 0.645 0.950 0.980
c=1 0.975 0.985 1.000 1.000
θ0=2 Null hyp. 0.01625 0.05625 0.05500 0.10375
c=0.2 0.000 0.020 0.225 0.350
c=0.4 0.120 0.200 0.575 0.710
c=0.6 0.415 0.545 0.910 0.965
c=0.8 0.785 0.890 0.990 1.000
c=1 0.985 0.990 0.995 1.000
One possible solution could consist in adjusting the weight function wsuch that the
boundary of the support of Ydoes no longer belong to the support of w. In Table 3,
the rejection probabilities for a modified weighting approach are presented. There,
the weight function was chosen such that the smallest five percent and the largest five
percent of observations were omitted to avoid the flattening effect of the nonparametric
estimation. Indeed, the resulting rejection probabilities under the alternatives increase
and lie above those under the null hypotheses.
At last, simulations for precise hypotheses as in (24) were conducted. For sake of
brevity, only rejection probabilities resulting from the self normalized test statistic are
123
N. Kloodt et al.
Table 4 Rejection probabilities at θ0=0.5andθ0=1 for precise hypotheses and r=r1,r2,r3,r4
Level αθ
0=0θ0=0.5θ0=1θ0=2
0.05 0.10 Mean (T)0.05 0.10 Mean (T)0.05 0.10 Mean (T)0.05 0.10 Mean (T)
r1c=0 0.095 0.185 6.564 0.995 1.000 0.886 0.965 0.990 1.446 0.680 0.830 2.629
c=0.2 0.700 0.800 2.443 0.965 0.975 1.081 0.905 0.970 1.540 0.205 0.310 5.588
c=0.4 0.790 0.895 2.259 0.930 0.975 1.401 0.385 0.550 3.930 0.000 0.015 14.360
c=0.6 0.215 0.390 4.638 0.660 0.815 2.581 0.010 0.060 9.477 0.000 0.000 22.146
c=0.8 0.085 0.185 6.336 0.030 0.050 9.239 0.000 0.000 19.834 0.000 0.000 28.955
c=1 0.000 0.000 30.996 0.000 0.000 32.081 0.000 0.000 30.907 0.000 0.000 31.710
r2c=0.2 0.090 0.130 7.121 0.985 0.995 0.995 0.180 0.320 5.050 0.000 0.000 71.936
c=0.4 0.035 0.080 7.611 0.905 0.980 1.331 0.785 0.880 2.266 0.000 0.000 38.728
c=0.6 0.030 0.075 8.080 0.580 0.755 2.899 0.790 0.900 2.089 0.010 0.035 11.850
c=0.8 0.025 0.045 8.778 0.125 0.245 5.810 0.260 0.380 4.939 0.485 0.610 3.576
c=1 0.005 0.015 9.254 0.005 0.015 9.550 0.015 0.045 8.756 0.005 0.030 8.785
r3c=0.2 0.030 0.060 8.158 0.080 0.155 8.108 0.000 0.000 25.054 0.000 0.000 197.373
c=0.4 0.000 0.000 18.715 0.000 0.000 29.072 0.000 0.000 56.932 0.000 0.000 243.476
c=0.6 0.000 0.000 51.102 0.000 0.000 75.638 0.000 0.000 106.332 0.000 0.000 246.010
c=0.8 0.000 0.000 118.049 0.000 0.000 135.888 0.000 0.000 166.425 0.000 0.000 259.728
c=1 0.000 0.000 248.025 0.000 0.000 262.134 0.000 0.000 254.237 0.000 0.000 236.894
r4c=0.2 0.475 0.635 3.436 0.995 1.000 0.942 0.960 0.980 1.265 0.685 0.820 2.695
c=0.4 0.815 0.915 1.920 0.995 1.000 0.972 0.880 0.940 1.793 0.420 0.570 4.062
c=0.6 0.580 0.780 2.943 0.975 0.995 1.124 0.590 0.775 2.773 0.135 0.235 6.390
c=0.8 0.155 0.310 5.137 0.800 0.895 2.185 0.120 0.295 5.528 0.060 0.100 8.490
c=1 0.010 0.015 10.722 0.015 0.015 10.519 0.000 0.020 10.680 0.015 0.030 10.211
123
Specification testing in semi-parametric transformation models
presented since this approach seems to outperform that based on the estimator ˆσ2from
Section 3 by far in the simulated settings. Since only a fraction of the data is used to
calculate Vn, the sample size was increased to n=500. The settings and techniques
remain the same as before. The probability measure νwas set to
ν=1
10 δ0.6+2
10 δ0.7+3
10 δ0.8+4
10 δ0.9
to put a higher weight on those parts of Vnwhere more data points are used. Fur-
thermore, the threshold was chosen to be η=0.02, which roughly corresponds to
plugging the logit-function r4(y):= 5exp(y)
1+exp(y)and c=1 into Eq. (27) and calcu-
lating min
θΘdθ,h). Hence, we expect the test to reject the null hypothesis H
0if
Tn<nη=10 holds.
A detailed analysis would go beyond the scope of this manuscript, so that only
some rejection probabilities are given in Table 4. Moreover, the mean values of the
test statistic Tnare listed to link the rejection probabilities to the distance between the
expected value of the test statistic and the threshold of η=0.02. First, the smaller
the value of Tnis the more likely the test seems to reject the null hypothesis H
0.
Further, the test holds the level, but is slightly conservative. Alternatives seem to be
detected for mean values of Tnaround or below eight. Nevertheless, the power of the
test is quite high in scenarios with small expected values of the test statistic, which
often corresponds to transformations functions which are close to the parametric class.
For θ=0.5 and θ=1 the rejection probabilities are in these cases above 0.90 and
sometimes even close to one. Although the influence of simulation parameters such
as the sample size nor the probability measure νhas not been examined, the results
indicate that using the self normalized test statistic can be a good way to test for the
precise hypotheses H
0and H
1.
Acknowledgements Natalie Neumeyer acknowledges financial support by the DFG (Research Unit FOR
1735 Structural Inference in Statistics: Adaptation and Efficiency). Ingrid Van Keilegom acknowledges
financial support by the European Research Council (2016–2021, Horizon 2020/ERC Grant Agreement
No. 694409). Moreover, we would like to thank the associate editor and two anonymous referees for their
very helpful suggestions and comments on the paper.
Open Access This article is licensed under a Creative Commons Attribution4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Allison JS, Hušková M, Meintanis SG (2018) Testing the adequacy of semiparametric transformation
models. TEST 27(1):70–94
Berger JO, Delampady M (1987) Testing precise hypotheses. Stat Sci 2(3):317–335
Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26(2):211–252
123
N. Kloodt et al.
Carroll RJ, Ruppert D (1988) Transformation and weighting in regression. CRC Press, Boca Raton
Chen S (2002) Rank estimation of transformation models. Econometrica 70:1683–1697
Chiappori P-A, Komunjer I, Kristensen D (2015) Nonparametric identification and estimation of transfor-
mation. J Economet 188(1):22–39
Colling B, Van Keilegom I (2016) Goodness-of-fit tests in semiparametric transformation models. TEST
25(2):291–308
Colling B, Van Keilegom I (2017) Goodness-of-fit tests in semiparametric transformation models using the
integrated regression function. J Multivar Anal 160:10–30
Colling B, Van Keilegom I (2019) Estimation of fully nonparametric transformation models. Bernoulli
25:3762–3795
Colling B, Van Keilegom I (2020) Estimation of a semiparametric transformation model: a novel approach
based on least squares minimization. Electron J Stat 14:769–800
Dette H, Kokot K, Volgushev S (2020) Testing relevant hypotheses in functional time series via self-
normalization. J R Stat Soc 82:629–660
Heuchenne C, Samb R, Van Keilegom I (2015) Estimating the error distribution in semiparametric trans-
formation models. Electron J Stat 9:2391–2419
Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19(3):293–
325
Horowitz JL (1996) Semiparametric estimation of a regression model with an unknown transformation of
the dependent variable. Econometrica 64(1):103–137
Horowitz JL (2009) Semiparametric and nonparametric methods in econometrics. Springer, Berlin
Hušková M, Meintanis SG, Neumeyer N, Pretorius C (2018) Independence tests in semiparametric trans-
formation models. S Afr J Stat 52:1–13
Kloodt N, Neumeyer N (2020) Specification tests in semiparametric transformation models—a multiplier
bootstrap approach. Comput Stat Data Anal 145
Lakens D (2017) Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Soc
Psychol Personal Sci 8(4):355–362
Lee AJ (1990) U-Statistics: theory and practice. Dekker, New York City
Lewbel A, Lu X, Su L (2015) Specification testing for transformation models with an application to gen-
eralized accelerated failure-time models. J Economet 184(1):81–96
Linton O, Sperlich S, Van Keilegom I (2008) Estimation of a semiparametric transformation model. Ann
Stat 36(2):686–718
Mu Y, He X (2007) Power transformation toward a linear regression quantile. J Am Stat Assoc 102(477):269–
279
Neumeyer N, Noh H, Van Keilegom I (2016) Heteroscedastic semiparametric transformation models: esti-
mation and testing for validity. Stat Sin 26:925–954
Powell J (1991) Estimation of monotonic regression models under quantile restrictions. In: Barnett WA,
Powell J, Tauchen GE (eds) Nonparametric and semiparametric methods in econometrics and statistics:
proceedings of the 5th international symposium on economic theory and econometrics, pp 357–384
Shao X (2010) A self-normalized approach to confidence interval construction in time series. J R Stat Soc
B 72(3):343–366
Shao X, Zhang X (2010) Testing for change points in time series. J Am Stat Assoc 105(491):1228–1240
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Szydłowski A (2020) Testing a parametric transformation model versus a nonparametric alternative.
Economet Theory. https://doi.org/10.1017/S0266466619000355
Wellner JA (2005) Empirical processes: theory and applications. https://www.stat.washington.edu/jaw/
RESEARCH/TALKS/Delft/emp-proc- delft-big.pdf
Witting H, Müller-Funk U (1995) Mathematical statistics II—Asymptotic statistics: parametric models and
nonparametric functionals. B. G Teubner
Yeo I-K, Johnson RA (2000) A new family of power transformations to improve normality or symmetry.
Biometrika 98(4):954–959
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123
Article
Model averaging is an effective way to enhance prediction accuracy. However, most previous works focus on low-dimensional settings with completely observed responses. To attain an accurate prediction for the risk effect of survival data with high-dimensional predictors, we propose a novel method: rank-based greedy (RG) model averaging. Specifically, adopting the transformation model with splitting predictors as working models, we doubly use the smooth concordance index function to derive the candidate predictions and optimal model weights. The final prediction is achieved by weighted averaging all the candidates. Our approach is flexible, computationally efficient, and robust against model misspecification, as it neither requires the correctness of a joint model nor involves the estimation of the transformation function. We further adopt the greedy algorithm for high dimensions. Theoretically, we derive an asymptotic error bound for the optimal weights under some mild conditions. In addition, the summation of weights assigned to the correct candidate submodels is proven to approach one in probability when there are correct models included among the candidate submodels. Extensive numerical studies are carried out using both simulated and real datasets to show the proposed approach’s robust performance compared to the existing regularization approaches. Supplementary materials for this article are available online.
Article
Despite an abundance of semiparametric estimators of the transformation model, no procedure has been proposed yet to test the hypothesis that the transformation function belongs to a finite-dimensional parametric family against a nonparametric alternative. In this article, we introduce a bootstrap test based on integrated squared distance between a nonparametric estimator and a parametric null. As a special case, our procedure can be used to test the parametric specification of the integrated baseline hazard in a semiparametric mixed proportional hazard model. We investigate the finite sample performance of our test in a Monte Carlo study. Finally, we apply the proposed test to Kennan’s strike durations data.
Article
We develop methodology for testing relevant hypotheses about functional time series in a tuning‐free way. Instead of testing for exact equality, e.g. for the equality of two mean functions from two independent time series, we propose to test the null hypothesis of no relevant deviation. In the two‐sample problem this means that an L2‐distance between the two mean functions is smaller than a prespecified threshold. For such hypotheses self‐normalization, which was introduced in 2010 by Shao, and Shao and Zhang and is commonly used to avoid the estimation of nuisance parameters, is not directly applicable. We develop new self‐normalized procedures for testing relevant hypotheses in the one‐sample, two‐sample and change point problem and investigate their asymptotic properties. Finite sample properties of the tests proposed are illustrated by means of a simulation study and data examples. Our main focus is on functional time series, but extensions to other settings are also briefly discussed.
Article
Semiparametric transformation models are considered, where after pre-estimation of a parametric transformation of the response the data are modeled by means of nonparametric regression. Subsequent procedures for testing lack-of-fit of the regression function and for significance of covariates are suggested. In contrast to existing procedures, the tests are asymptotically not influenced by the pre-estimation of the transformation in the sense that they have the same asymptotic distribution as in regression models without transformation. Validity of a multiplier bootstrap procedure is shown which is easier to implement and much less computationally demanding than bootstrap procedures based on the transformation model. In a simulation study the superior performance of the procedure in comparison with its existing competitors is demonstrated.
Book
This monograph provides a careful review of the major statistical techniques used to analyze regression data with nonconstant variability and skewness. The authors have developed statistical techniques--such as formal fitting methods and less formal graphical techniques-- that can be applied to many problems across a range of disciplines, including pharmacokinetics, econometrics, biochemical assays, and fisheries research. While the main focus of the book in on data transformation and weighting, it also draws upon ideas from diverse fields such as influence diagnostics, robustness, bootstrapping, nonparametric data smoothing, quasi-likelihood methods, errors-in-variables, and random coefficients. The authors discuss the computation of estimates and give numerous examples using real data. The book also includes an extensive treatment of estimating variance functions in regression.
Book
Although there has been a surge of interest in density estimation in recent years, much of the published research has been concerned with purely technical matters with insufficient emphasis given to the technique’s practical value. Furthermore, the subject has been rather inaccessible to the general statistician. The account presented in this book places emphasis on topics of methodological importance, in the hope that this will facilitate broader practical application of density estimation and also encourage research into relevant theoretical work. The book also provides an introduction to the subject for those with general interests in statistics. The important role of density estimation as a graphical technique is reflected by the inclusion of more than 50 graphs and figures throughout the text. Several contexts in which density estimation can be used are discussed, including the exploration and presentation of data, nonparametric discriminant analysis, cluster analysis, simulation and the bootstrap, bump hunting, projection pursuit, and the estimation of hazard rates and other quantities that depend on the density. This book includes general survey of methods available for density estimation. The Kernel method, both for univariate and multivariate data, is discussed in detail, with particular emphasis on ways of deciding how much to smooth and on computation aspects. Attention is also given to adaptive methods, which smooth to a greater degree in the tails of the distribution, and to methods based on the idea of penalized likelihood.
Article
Consider an observed response Y which, following a certain transformation Yϑ := Tϑ (Y), can be expressed by a homoskedastic nonparametric regression model referenced by a vector X of regressors. If this transformation model is indeed valid then conditionally on X, the values of Yϑ may be viewed as being just location shifts of the regression error, for some value of the transformation parameter ϑ. We propose tests for the validity of this model, and establish the limiting distribution of the test statistics under the null hypothesis and under alternatives. Since the null distribution is complicated we also suggest a certain resampling procedure in order to approximate the critical values of the tests, and subsequently use this type of resampling in a Monte Carlo study of the finite-sample properties of the new tests. In estimating the model we rely on the methods proposed by Neumeyer, Noh and Van Keilegom (2016) for the aforementioned transformation model. Our tests however deviate from the tests suggested by Neumeyer et al. (2016) in that we employ an analogue of the test suggested by Hlávka, Hušková and Meintanis (2011) involving characteristic functions, rather than distribution functions. © 2018 South African Statistical Association. All rights reserved.