Available via license: CC BY 4.0

Content may be subject to copyright.

TEST

https://doi.org/10.1007/s11749-021-00756-0

ORIGINAL PAPER

Speciﬁcation testing in semi-parametric transformation

models

Nick Kloodt1·Natalie Neumeyer1·Ingrid Van Keilegom2

Received: 10 January 2020 / Accepted: 19 January 2021

© The Author(s) 2021

Abstract

In transformation regression models, the response is transformed before ﬁtting a

regression model to covariates and transformed response. We assume such a model

where the errors are independent from the covariates and the regression function is

modeled nonparametrically. We suggest a test for goodness-of-ﬁt of a parametric trans-

formation class based on a distance between a nonparametric transformation estimator

and the parametric class. We present asymptotic theory under the null hypothesis of

validity of the semi-parametric model and under local alternatives. A bootstrap algo-

rithm is suggested in order to apply the test. We also consider relevant hypotheses to

distinguish between large and small distances of the parametric transformation class

to the ‘true’ transformation.

Keywords Bootstrap ·Goodness-of-ﬁt test ·Nonparametric regression ·

Nonparametric transformation estimator ·Parametric transformation class ·

U-statistics

Mathematics Subject Classiﬁcation 62G10

1 Introduction

It is very common in applications to transform data before investigation of func-

tional dependence of variables by regression models. The aim of the transformation

is to obtain a simpler model, e.g. with a speciﬁc structure of the regression function,

or a homoscedastic instead of a heteroscedastic model. Typically, ﬂexible paramet-

Supplementary Information The online version contains supplementary material available at https://doi.

org/10.1007/s11749- 021-00756- 0.

BNick Kloodt

Nick.Kloodt@uni-hamburg.de

1Fachbereich Mathematik, Universität Hamburg, Bundesstraße 55, 20146 Hamburg, Germany

2Research Centre for Operations Research and Statistics (ORSTAT), KU Leuven, Leuven, Belgium

123

N. Kloodt et al.

ric classes of transformations are considered from which a suitable one is selected

data-dependently. A classical example is the class of Box-Cox power transformations

(see Box and Cox (1964)). For purely parametric transformation models, see Carroll

and Ruppert (1988) and references therein. Powell (1991) and Mu and He (2007)

consider transformation quantile regression models. Nonparametric estimation of the

transformation in the context of parametric regression models has been considered

by Horowitz (1996) and Chen (2002), among others. Horowitz (2009) reviews esti-

mation in transformation models with parametric regression in the cases where either

the transformation or the error distribution or both are modeled nonparametrically.

Linton et al. (2008) suggest a proﬁle likelihood estimator for a parametric class of

transformations, while the error distribution is estimated nonparametrically and the

regression function semi-parametrically. Heuchenne et al. (2015) suggest an estimator

of the error distribution in the same model. Neumeyer et al. (2016) consider proﬁle

likelihood estimation in heteroscedastic semi-parametric transformation regression

models, i.e. the mean and variance function are modeled nonparametrically, while the

transformation function is chosen from a parametric class. A completely nonpara-

metric (homoscedastic) model is considered by Chiappori et al. (2015). Lewbel et al.

(2015) provide a test for the validity of such a model. The approach of Chiappori et al.

(2015) is modiﬁed and corrected by Colling and Van Keilegom (2019). The version

of the nonparametric transformation estimator considered in the latter paper is then

applied by Colling and Van Keilegom (2020) to suggest a new estimator of the trans-

formation parameter if it is assumed that the transformation belongs to a parametric

class.

In general, asymptotic theory for nonparametric transformation estimators is

sophisticated and parametric transformation estimators show much better performance

if the parametric model is true. A parametric transformation will thus lead to better

estimates of the regression function. Moreover, parametric transformations are eas-

ier to interpret and allow for subsequent inference in the transformation model. For

the latter purpose note that for transformation models with parametric transformation,

lack-of-ﬁt tests for the regression function as well as tests for signiﬁcance for covariate

components have been suggested by Colling and Van Keilegom (2016), Colling and

Van Keilegom (2017), Allison et al. (2018) and Kloodt and Neumeyer (2020). Those

tests cannot straightforwardly be generalized to nonparametric transformation models

because known estimators in that model do not allow for uniform rates of convergence

over the whole real line, see Chiappori et al. (2015) and Colling and Van Keilegom

(2019).

However, before applying a transformation model with parametric transformation, it

would be appropriate to test the goodness-of-ﬁt of the parametric transformation class.

In the context of parametric quantile regression, Mu and He (2007) suggest such a

goodness-of-ﬁt test. In the context of nonparametric mean regression Neumeyer et al.

(2016), develop a goodness-of-ﬁt test for the parametric transformation class based

on an empirical independence process of pairs of residuals and covariates. The latter

approach was modiﬁed by Hušková et al. (2018), who applied empirical characteristic

functions. In a linear regression model with transformation of the response, Szydłowski

(2020) suggests a goodness-of-ﬁt test for the parametric transformation class that is

based on a distance between the nonparametric transformation estimator considered

123

Speciﬁcation testing in semi-parametric transformation models

by Chen (2002) and the parametric class. We will follow a similar approach but con-

sider a nonparametric regression model. The aim of the transformations we consider is

to induce independence between errors and covariates. The null hypothesis is that the

unknown transformation belongs to a parametric class. Note that when applied to the

special case of a class of transformations that contains as only element the identity, our

test provides indication on whether a classical homoscedastic regression model (with-

out transformation) is appropriate or whether ﬁrst the response should be transformed.

Our test statistic is based on a minimum distance between a nonparametric transforma-

tion and the parametric transformations. We present the asymptotic distribution of the

test statistic under the null hypothesis of a parametric transformation and under local

alternatives of n−1/2-rate. Under the null hypothesis, the limit distribution is that of a

degenerate U-statistic. With a ﬂexible parametric class applying an appropriate trans-

formation can reduce the dependence enormously, even if the ‘true’ transformation

does not belong to the class. Thus, for the ﬁrst time in the context of transformation

goodness-of-ﬁt tests, we consider testing for so-called precise or relevant hypotheses.

Here, the null hypothesis is that the distance between the true transformation and the

parametric class is large. If this hypothesis is rejected, then the model with the para-

metric transformation ﬁts well enough to be considered for further inference. Under

the new null hypothesis, the test statistic is asymptotically normally distributed. The

term “precise hypotheses” refers to Berger and Delampady (1987). Dette et al. (2020)

considered precise hypotheses in the context of comparing mean functions in the con-

text of functional time series. Note that the idea of precise hypotheses is related to

that of equivalence tests, which originate from the ﬁeld of pharmacokinetics (see Lak-

ens (2017)). Throughout, we assume that the nonparametric transformation estimator

fulﬁlls an asymptotic linear expansion. It is then shown that the estimator considered

by Colling and Van Keilegom (2019) fulﬁlls this expansion and thus can be used for

evaluating the test statistic.

The remainder of the paper is organized as follows. In Sect. 2, we present the model

and the test statistic. Asymptotic distributions under the null hypothesis of a paramet-

ric transformation class and under local alternatives are presented in Sect. 3, which

also contains a consistency result and asymptotic results under relevant hypotheses.

Section 4presents a bootstrap algorithm and a simulation study. Section 1 of the

supplementary material contains assumptions for bootstrap results, while Section 2

there treats a speciﬁc nonparametric transformation estimator and shows that it fulﬁlls

the required conditions. The proofs of the main results are given in Section 3 and a

rigorous treatment of bootstrap asymptotics is given in Section 4 of the supplement.

2 The model and test statistic

Assume we have observed (Xi,Yi),i=1,...,n, which are independent with the

same distribution as (X,Y)that fulﬁll the transformation regression model

h(Y)=g(X)+ε, (1)

123

N. Kloodt et al.

where E[ε]=0 holds and εis independent of the covariate X, which is RdX-valued,

while Yis univariate. The regression function gwill be modelled nonparametrically.

The transformation h:R→Ris strictly increasing. Throughout we assume that

given the joint distribution of (X,Y)and some identiﬁcation conditions, there exists a

unique transformation hsuch that this model is fulﬁlled. It then follows that the other

model components are identiﬁed via g(x)=E[h(Y)|X=x]and ε=h(Y)−g(X).

See Chiappori et al. (2015) for conditions under which the identiﬁability of hholds.

In particular, conditions are required to ﬁx location and scale, and we will assume

throughout that

h(0)=0 and h(1)=1.(2)

Now let {Λθ:θ∈Θ}be a class of strictly increasing parametric transformation

functions Λθ:R→R, where Θ⊆RdΘis a ﬁnite-dimensional parameter space. Our

purpose is to test whether a semi-parametric transformation model holds, i.e.

Λθ0(Y)=˜g(X)+˜ε,

for some parameter θ0∈Θ, where ˜εand Xare independent. Due to the assumed

uniqueness of the transformation hone obtains h=h0under validity of the semi-

parametric model, where

h0(·)=Λθ0(·)−Λθ0(0)

Λθ0(1)−Λθ0(0).

Thus, we can write the null hypothesis as

H0:h∈Λθ(·)−Λθ(0)

Λθ(1)−Λθ(0):θ∈Θ(3)

which thanks to (2) can be formulated equivalently as

H0:h∈Λθ(·)−c2

c1:θ∈Θ,c1∈R+,c2∈R.(4)

Our test statistics will be based on the following L2-distance

d(Λθ,h)=min

c1∈R+,c2∈R

Ew(Y){h(Y)c1+c2−Λθ(Y)}2,(5)

where wis a positive weight function with compact support Yw. Its empirical coun-

terpart is

dn(Λθ,ˆ

h):= min

c1∈C1,c2∈C2

1

n

n

j=1

w(Yj){ˆ

h(Yj)c1+c2−Λθ(Yj)}2,

123

Speciﬁcation testing in semi-parametric transformation models

where ˆ

hdenotes a nonparametric estimator of the true transformation has discussed

below, and C1⊂R+,C2⊂Rare compact sets. Assumption (A6) assures that the sets

are large enough to contain the true values. Let γ:= (c1,c2,θ)and Υ:= C1×C2×Θ.

The test statistic is deﬁned as

Tn=nmin

θ∈Θdn(Λθ,ˆ

h)=min

γ=(c1,c2,θ)∈Υ

n

j=1

w(Yj){ˆ

h(Yj)c1+c2−Λθ(Yj)}2(6)

and the null hypothesis should be rejected for large values of the test statistic. If the

null hypothesis holds, the minimizing parameters c1,c2in Eq. (5) can be written as

c1=Λθ0(1)−Λθ0(0)and c2=Λθ0(0)for some θ0∈Θ. Hence, an alternative test

statistic

¯

Tn=min

θ∈Θ

n

j=1

w(Yj)[ˆ

h(Yj){Λθ(1)−Λθ(0)}+Λθ(0)−Λθ(Yj)]2(7)

can be considered as well.

We will derive the asymptotic distributions under the null hypothesis and local and

ﬁxed alternatives in Section 3 and suggest a bootstrap version of the tests in Section 4.

Remark 2.1 Colling and Van Keilegom (2019) consider the estimators

ˆ

θ:= arg min

θ∈Θdn(Λθ,ˆ

h)

and

˜

θ:= arg min

θ∈Θn−1

n

j=1

w(Yj)[ˆ

h(Yj){Λθ(1)−Λθ(0)}+Λθ(0)−Λθ(Yj)]2

for the parametric transformation (assuming H0) corresponding to Tnand ¯

Tn.They

observe that ˆ

θoutperforms ˜

θin simulations.

Nonparametric estimation of the transformation hhas been considered by Chiappori

et al. (2015) and Colling and Van Keilegom (2019). For our main asymptotic results,

we need that ˆ

hhas a linear expansion, not only under the null hypothesis, but also

under ﬁxed alternatives and the local alternatives as deﬁned in the next section. The

linear expansion should have the form

ˆ

h(y)−h(y)=1

n

n

i=1

ψ(Zi,T(y)) +oP(n−1/2)uniformly in y∈Yw.(8)

Here, ψneeds to fulﬁl condition (A8) in Section 3, and we use the deﬁnitions (i=

1,...,n)

Zi=(Ui,Xi), Ui=T(Yi), T(y)=FY(y)−FY(0)

FY(1)−FY(0),(9)

123

N. Kloodt et al.

where FYdenotes the distribution of Yand is assumed to be strictly increasing on the

support of Y. To ensure that Tis well-deﬁned, the values 0 and 1 are w.l.o.g. assumed

to belong to the support of Y, but can be replaced by arbitrary values a<b∈R

(in the support of Y). The expansion (8) could also be formulated with a linear term

n−1n

i=1˜

ψ(Xi,Yi,y). In Section 2 of the supplement, we reproduce the deﬁnition of

the estimator ˆ

hthat was suggested by Colling and Van Keilegom (2019) as modiﬁcation

of the estimator by Chiappori et al. (2015). We give regularity assumptions under

which the desired expansion holds, see Lemma 1. Other nonparametric estimators of

the transformation that fulﬁll the expansion could be applied as well.

3 Asymptotic results

In this section, we will derive the asymptotic distribution under the null hypothesis

and under local and ﬁxed alternatives. For the formulation of the local alternatives,

consider the null hypothesis as given in (4), i.e. h(·)c1+c2=Λθ0(·)for some θ0∈Θ,

c1∈R+,c2∈R, and instead assume

H1,n:h(·)c1+c2=Λθ0(·)+n−1/2r(·)for some θ0∈Θ,c1∈R+,c2

∈Rand some function r.

Due to the identiﬁability conditions (2), one obtains c2=Λθ0(0)+n−1/2r(0)and

c1=Λθ0(1)−Λθ0(0)+n−1/2(r(1)−r(0)). Assumption (A5) yields boundedness

of r, so that we rewrite the local alternative as

h(·)=Λθ0(·)−Λθ0(0)+n−1/2(r(·)−r(0))

Λθ0(1)−Λθ0(0)+n−1/2(r(1)−r(0))

=h0(·)+n−1/2r0(·)+o(n−1/2), (10)

where h0(·)=(Λθ0(·)−Λθ0(0))/(Λθ0(1)−Λθ0(0)) and

r0(·)=r(·)−r(0)−h0(·)(r(1)−r(0))

Λθ0(1)−Λθ0(0).

Note that the null hypothesis H0is included in the local alternative H1,nby considering

r≡0 which gives h=h0. We assume the following data generating model under the

local alternative H1,n. Let the regression function g, the errors εiand the covariates

Xibe independent of nand deﬁne Yi=h−1(g(Xi)+εi)(i=1,...,n), which under

local alternatives depends on nthrough the transformation h. Throughout we use the

notation (i=1,...,n)

Si=h(Yi)=g(Xi)+εi.(11)

Further, recall the deﬁnition of Uiin (9). Note that the distribution of Uidoes not

depend on n, even under local alternatives, because FY(Yi)is uniformly distributed

123

Speciﬁcation testing in semi-parametric transformation models

on [0,1], while FY(0)=P(Yi≤0)=P(h(Yi)≤h(0)) =P(Si≤0)due to (2),

and similarly FY(1)=P(Si≤1).

To formulate our main result, we need some more notations. With ψfrom (8), Zi

from (9) and Sifrom (11) deﬁne (i=1,...,n)

˙

Λθ(y)=∂

∂θk

Λθ(y)k=1,...,dΘ

R(s)=(s,1,−˙

Λθ0(h−1

0(s)))t(12)

Rf(s)=˙

Λθ(1)t−˙

Λθ(0)t,˙

Λθ(0)t,˙

Λθ0(h−1

0(s))t(s,1,−1)t(13)

Γ0=E[w(h−1

0(S1))R(S1)R(S1)t](14)

Γ0,f=E[w(h−1

0(S1))Rf(S1)Rf(S1)t](15)

ϕ(z)=E[w(h−1

0(S2))ψ( Z1,U2)R(S2)|Z1=z](16)

ϕf(z)=E[w(h−1

0(S2))ψ( Z1,U2)Rf(S2)|Z1=z](17)

ζ(z1,z2)=Ew(h−1

0(S3))ψ(Z1,U3)−ϕ(Z1)tΓ−1

0R(S3)

×ψ(Z2,U3)−ϕ(Z2)tΓ−1

0R(S3)|Z1=z1,Z2=z2(18)

ζf(z1,z2)=Ew(h−1

0(S3))ψ(Z1,U3)−ϕf(Z1)tΓ−1

0,fRf(S3)

×ψ(Z2,U3)−ϕf(Z2)tΓ−1

0,fRf(S3)|Z1=z1,Z2=z2(19)

¯r(s)=r0(h−1

0(s)) −E[w(h−1

0(S1))r0(h−1

0(S1))R(S1)]tΓ−1

0R(s)(20)

¯rf(s)=r0(h−1

0(s)) −E[w(h−1

0(S1))r0(h−1

0(S1))Rf(S1)]tΓ−1

0,fRf(s)(21)

˜

ζ(z1)=2E[w(h−1

0(S2))ψ( Z1,U2)¯r(S2)|Z1=z](22)

˜

ζf(z1)=2E[w(h−1

0(S2))ψ( Z1,U2)¯rf(S2)|Z1=z](23)

and let PZand FZdenote the law and distribution function, respectively, of Zi.

The quantities which are marked with an “ f”, referring to “ﬁxed” parameters c1=

Λˆ

θ(1)−Λˆ

θ(0)and c2=Λˆ

θ(0), will be used to describe the asymptotic behaviour of

the test statistic ¯

Tn.

With these notations, the assumptions for the asymptotic results can be formulated.

To this end, let Ydenote the support of Y(which depends on nunder local alternatives).

Further, FSdenotes the distribution function of S1as in (11) and TSdenotes the

transformation s→ (FS(s)−FS(0))/(FS(1)−FS(0)). The following assumptions

are used.

(A1)ThesetsC1,C2and Θare compact.

(A2) The weight function wis continuous with a compact support Yw⊂Y.

(A3)Themap(y,θ) → Λθ(y)is twice continuously differentiable on Ywwith

respect to θand the (partial) derivatives are continuous in (y,θ) ∈Yw×Θ.

(A4) There exists a unique strictly increasing and continuous transformation hsuch

that model (1) holds with Xindependent of ε.

123

N. Kloodt et al.

(A5) The function h0deﬁned in (10) is strictly increasing and continuously differ-

entiable and ris continuous on Yw.FYis strictly increasing on the support of

Y.

(A6) Minimizing the functions M:Υ→R,γ =(c1,c2,θ) → Ew(Y)(h0(Y)c1+

c2−Λθ(Y))2and ¯

M:Θ→R,θ → Ew(Y)(h0(Y)(Λθ(1)−Λθ(1)) +

Λθ(0)−Λθ(Y))2leads to unique solutions γ0=(c1,0,c2,0,θ

0)and θ0in the

interior of Υand Θ, respectively. For all θ= ˜

θit is sup

y∈supp(w)

Λθ(y)−Λθ(0)

Λθ(1)−Λθ(0)−

Λ˜

θ(y)−Λ˜

θ(0)

Λ˜

θ(1)−Λ˜

θ(0)

>0.

(A7) The Hessian matrices Γ0:= Hess M(γ0)and Γ0,f:= Hess ¯

(θ0)are positive

deﬁnite.

(A8) The transformation estimator ˆ

hfulﬁls (8) for some function ψ.ForsomeU0

(independent of nunder local alternatives) with TS(h(Yw)) ⊂U0the function

class {z→ ψ(z,t):t∈U0}is Donsker with respect to PZand E[ψ(Z1,t)]=

0 for all t∈U0. The fourth moments E[w(h−1

0(S1))ψ( Z1,U1)4]and

E[w(h−1

0(S1))ψ( Z2,U1)4]are ﬁnite.

When considering a ﬁxed alternative H1or the relevant hypothesis H

0below, (A6) and

(A8) are replaced by the following Assumptions (A6’) and (A8’) (assumption (A8’)

is only relevant for H

0). Note that his a ﬁxed function then, not depending on n.

(A6’) Minimizing the functions M:Υ→R,γ =(c1,c2,θ) → Ew(Y)(h(Y)c1+

c2−Λθ(Y))2and ¯

M:Θ→R,θ → Ew(Y)(h(Y)(Λθ(1)−Λθ(1)) +

Λθ(0)−Λθ(Y))2leads to unique solutions γ0=(c1,0,c2,0,θ

0)and θ0in the

interior of Υand Θ, respectively. For all θ= ˜

θ,itis sup

y∈supp(w)

Λθ(y)−Λθ(0)

Λθ(1)−Λθ(0)−

Λ˜

θ(y)−Λ˜

θ(0)

Λ˜

θ(1)−Λ˜

θ(0)

>0.

(A8’) The transformation estimator ˆ

hfulﬁlls (8) for some function ψ.Forsome

U0⊃TS(h(Yw)), the function class {z→ ψ(z,t):t∈U0}is Donsker

with respect to PZand E[ψ(Z1,t)]=0 for all t∈U0. Further, one has

E[ψ(Z1,U2)2]<∞.

Remark 3.1 Assumptions concerning compactness of the parameter spaces, differ-

entiability of model components and uniqueness of the minimizer γ0are standard

assumptions in the context of goodness of ﬁt tests. Moreover, it can be shown that the

deﬁnitions of Γ0and Γ0,fin (A7) coincide with those in Eqs. (14) and (15), respec-

tively. Assumption (A8) controls the asymptotic behaviour of ˆ

h−hand thus the rate

of local alternatives which can be detected. The Donsker and boundedness conditions

are needed to obtain uniform convergence rates of ˆ

h−hand some negligible remain-

ders in the proof. Assumption (A8’) is the counterpart of Assumption (A8) for precise

hypotheses as considered in (24).

Theorem 3.2 Assume (A1)–(A8). Let (λk)k∈{1,2,... }and (λk,f)k∈{1,2,... }be the eigen-

values of the operators

Kρ(z1):= ρ(z2)ζ (z1,z2)dFZ(z2)and K fρ(z1):= ρ(z2)ζ f(z1,z2)dFZ(z2),

123

Speciﬁcation testing in semi-parametric transformation models

respectively, with corresponding eigenfunctions (ρk)k∈{1,2,... }and (ρk,f)k∈{1,2,... },

which are each orthonormal in the L2-space corresponding to the distribution FZ.

Let (Wk)k∈{1,2,... }be independent and standard normally distributed random vari-

ables and let W0be centred normally distributed with variance E [˜

ζ(Z1)2]such that

for all K ∈Nthe random vector (W0,W1,...,WK)tfollows a multivariate nor-

mal distribution with Cov(W0,Wk)=E[˜

ζ(Z1)ρk(Z1)]for all k =1,..., K . Let

W0,fand (Wk,f)k∈{1,2,... }be deﬁned similarly with E [W2

0,f]=E[˜

ζf(Z1)2]and

Cov(W0,Wk)=E[˜

ζf(Z1)ρk(Z1)]for all k ∈N0. Then, under the local alternative

H1,n,T

nconverges in distribution to

(Λθ0(1)−Λθ0(0))2∞

k=1

λkW2

k+W0+Ew(h−1

0(S1))¯r(S1)2

and ¯

Tnconverges in distribution to

(Λθ0(1)−Λθ0(0))2∞

k=1

λk,fW2

k,f+W0,f+Ew(h−1

0(S1))¯rf(S1)2.

In particular, under H0(i.e. for r ≡0), Tnand ¯

Tnconverge in distribution to

T=(Λθ0(1)−Λθ0(0))2∞

k=1

λkW2

kand ¯

T=(Λθ0(1)−Λθ0(0))2∞

k=1

λk,fW2

k,f,

respectively.

The proof is given in Section 3 of the supplementary material. Asymptotic level-α

tests should reject H0if Tnor ¯

Tnare larger than the (1−α)-quantile of the distribution

of Tor ¯

T, respectively. As the distributions of Tand ¯

Tdepend in a complicated way

on unknown quantities, we will propose a bootstrap procedure in Section 4. Although

most results hold similarly for Tnand ¯

Tn, for ease of presentation, we will mainly

focus on results for Tnin the remainder.

Remark 3.3

1. Note that ζ(z1,z2)=E[I(z1)I(z2)]with

I(z):= w(h−1

0(S1))1/2ψ(z,U1)−ϕ(z)tΓ−1

0R(S1)

and ψfrom (8). Thus, the operator Kdeﬁned in Theorem 3.2 is positive semi-

deﬁnite.

2. The appearance of W0under the local alternative results from asymptotic theory

for degenerate U-statistics. Related phenomena occur in the case of quadratic

forms. Similar to the proof of Theorem 3.1, consider some zn+cn−1/2,where

n1/2znconverges to a centred normally distributed random variable, say z, and

123

N. Kloodt et al.

we have c=0 under H0. Moreover, consider a quadratic form zT

nAnzn, where

Anis a positive deﬁnite matrix and n−1Anconverges to a matrix A. Then, under

H0,zT

0Anz0=zT

nAnznconverges to zTAz, which has a χ2distribution. However,

under H1,n,wehave

zT

0Anz0=zT

nAnzn+2cTn−1/2Anzn+cTn−1Anc,

where the ﬁrst term on the right-hand side is as before. However, the second term

converges to 2cTAz, which is normally distributed and corresponds to W0in our

context. The last term converges to a constant cTAc, corresponding to the constant

summand in the limit in Theorem 3.1. Note that the limit of zT

0Anz0cannot be

negative due to the positive deﬁniteness of An.

Next, we consider ﬁxed alternatives of a transformation hthat do not belong to the

parametric class, i. e.

H1:d(Λθ,h)>0 for all θ∈Θ.

Theorem 3.4 Assume (A1)–(A4), (A6’) and let ˆ

h estimate h uniformly consistently on

compact sets. Then, under H1,limn→∞ P(Tn>q)=1and limn→∞ P(¯

Tn>q)=1

for all q ∈R, that is, the proposed tests are consistent.

The proof is given in Section 3 of the supplement.

The transformation model with a parametric transformation class might be useful

in applications even if the model does not hold exactly. With a good choice of θapply-

ing the transformation Λθcan reduce the dependence between covariates and errors

enormously. Estimating an appropriate θis much easier than estimating the trans-

formation hnonparametrically. Consequently, one might prefer the semi-parametric

transformation model over a completely nonparametric one. It is then of interest how

far away we are from the true model. Therefore, in the following, we consider testing

precise hypotheses (relevant hypotheses)

H

0:min

θ∈Θd(Λθ,h)≥ηand H

1:min

θ∈Θd(Λθ,h)<η. (24)

If a suitable test rejects H

0for some small η(ﬁxed beforehand by the experimenter),

the model is considered “good enough” to work with, even if it does not hold exactly.

To test those hypotheses, we will use the same test statistic as before, but we have to

standardize differently. Assume H

0, then his a transformation which does not belong

to the parametric class, i.e. the former ﬁxed alternative H1holds. Let

M(γ ) =M(c1,c2,θ) =E{w(Y)(h(Y)c1+c2−Λθ(Y))2},

and let

γ0=(c1,0,c2,0,θ

0):= arg min

(c1,c2,θ)∈ΥM(c1,c2,θ).

123

Speciﬁcation testing in semi-parametric transformation models

If R+and Rare replaced by C1and C2in the deﬁnition of din (5) one has

min

c1∈C1,c2∈C2

M(γ ) =d(Λθ,h)for all θ∈Θ. Assume that

Γ=E⎡

⎣w(Y1)⎛

⎝

h(Y1)2h(Y1)−h(Y1)˙

Λθ0(Y1)

h(Y1)1−˙

Λθ0(Y1)

−h(Y1)˙

Λθ0(Y1)t−˙

Λθ0(Y1)tΓ

3,3

⎞

⎠⎤

⎦(25)

is positive deﬁnite, where Γ

3,3=˙

Λθ0(Y1)t˙

Λθ0(Y1)−¨

Λθ0(Y1)˜

R1with

¨

Λθ(y)=∂2

∂θk∂θ

Λθ(y)k,=1,...,dΘ

and ˜

Ri=h(Yi)c1,0+c2,0−Λθ0(Yi)(i=1,...,n).

Theorem 3.5 Assume (A1)–(A4), (A6’), (A8’), let (A7) hold with γ0from (A6’)

and let Γbe positive deﬁnite. Then,

n1/2(Tn/n−M(γ0)) D

→N0,σ2

with σ2=Var w(Y1)˜

R2

1+δ(Z1), where δ(Z1)=2c1,0E[w(Y2)ψ( Z1,U2)˜

R2|Z1].

The proof is given in Section 3 of the supplementary material. It is conjectured,

that a similar result can be derived for ¯

Tn, although the corresponding Hessian matrix

might become more complex.

A consistent asymptotic level-α-test rejects H

0if (Tn−nη)/(nˆσ2)1/2<uα, where

uαis the α-quantile of the standard normal distribution and ˆσ2is a consistent estimator

of σ2. Further research is required on suitable estimators of σ2.Let ˆγ=(ˆc1,ˆc2,ˆ

θ)t

be the minimizer in Eq. (6). For some intermediate sequences (mn)n∈N,(qn)n∈Nwith

qn=n/mn−1 we considered

ˆσ2:= 1

qn

qn

s=12ˆc1√mn

n

n

k=1

w(Yk)ˆ

h(s)(Yk)−ˆ

h(Yk)(ˆ

h(Yk)ˆc1+ˆc2−Λˆ

θ(Yk))

+1

√mn

smn

j=(s−1)mn+1w(Yj)( ˆ

h(Yj)ˆc1+ˆc2−Λˆ

θ(Yj))2

−1

n

n

i=1

w(Yi)( ˆ

h(Yi)ˆc1+ˆc2−Λˆ

θ(Yi))22

as an estimator of σ2, where ˆ

h(s)denotes the nonparametric estimator of hdepending

on the subsample (Y(s−1)mn+1,X(s−1)mn+1), . . . , (Ysmn,Xsmn), s=1,...,qn,but

suitable choices for mnare still unclear. Alternatively, a self-normalization approach

as in Shao (2010), Shao and Zhang (2010) or Dette et al. (2020) can be applied. For this

purpose, let s∈(0,1)and let ˆ

hsand ˆγs=(ˆcs,1,ˆcs,2,ˆ

θs)tbe deﬁned as ˆ

hand ˆγ,but

123

N. Kloodt et al.

based on the subsample (Y1,X1), . . . , (Yns,Xns). Moreover, let K∈N,0<t1<

···<tK<1 and let νbe a probability measure on (0,1)with ν({t1,...,tK})=1.

Deﬁne

Vn:= 1

0ns

k=1

w(Yk)( ˆ

hs(Yk)ˆc1,s+ˆc2,s−Λˆ

θs(Yk))2

−s

n

k=1

w(Yk)( ˆ

h(Yk)ˆc1+ˆc2−Λˆ

θ(Yk))22

ν(ds)

as well as

˜

Tn:= Tn−nM(γ0)

√Vn

.

In Section 5 of the supplementary material, it is shown that ˜

Tn

D

→˜

Tfor some random

variable ˜

Tand that the distribution of ˜

Tdoes not depend on any unknown parameters.

Hence, its quantiles can be simulated and ˜

Tncan be used to test for the hypotheses H

0

and H

1.

Remark 3.6 Note that not rejecting the null hypothesis H0does not mean that the null

hypothesis is valid. Consequently, alternative approaches like for example increas-

ing the level to accept more transformation functions instead of testing for precise

hypotheses as in (24) in general do not necessarily result in evidence for applying a

transformation model.

4 A bootstrap version and simulations

Although Theorem 3.2 shows how the test statistic behaves asymptotically under H0,

it is hard to extract any information about how to choose appropriate critical values of

a test that rejects H0for large values of Tn. The main reasons for this are that ﬁrst for

any function ζthe eigenvalues of the operator deﬁned in Theorem 3.2 are unknown

that second this function is unknown and has to be estimated as well, and that third

even ψ(which would be needed to estimate ζ) mostly is unknown and rather complex

(see e.g. Section 2 of the supplement). Therefore, approximating the α-quantile, say

qα, of the distribution of Tin Theorem 3.2 in a direct way is difﬁcult and instead we

suggest a smooth bootstrap algorithm to approximate qα.

Algorithm 4.1 Let (Y1,X1),...,(Yn,Xn)denote the observed data, deﬁne

hθ(y)=Λθ(y)−Λθ(0)

Λθ(1)−Λθ(0)and gθ(x)=E[hθ(Y)|X=x]

and let ˆgbe a consistent estimator of gθ0, where θ0is deﬁned as in (A6) under the

null hypothesis and as in (A6’) under the alternative. Let κand be smooth Lebesgue

123

Speciﬁcation testing in semi-parametric transformation models

densities on RdXand R, respectively, where is strictly positive, κhas bounded

support and κ(0)>0. Let (an)nand (bn)nbe positive sequences with an→0,

bn→0, nan→∞,nbdX

n→∞. Denote by m∈Nthe sample size of the bootstrap

sample.

(1) Calculate ˆγ=(ˆc1,ˆc2,ˆ

θ)t=arg min

γ∈Υn

i=1w(Yi)( ˆ

h(Yi)c1+c2−Λθ(Yi))2.

Estimate the parametric residuals εi(θ0)=hθ0(Yi)−gθ0(Xi)by ˆεi=hˆ

θ(Yi)−

ˆg(Xi)and denote centered versions by ˜εi=ˆεi−n−1n

j=1ˆεj,i=1,...,n.

(2) Generate X∗

j,j=1,...,m, independently (given the original data) from the

density

fX∗(x)=1

nbdX

n

n

i=1

κx−Xi

bn

(which is a kernel density estimator of fXwith kernel κand bandwidth bn).

For j=1,...,mdeﬁne bootstrap observations as

Y∗

j=(h∗)−1ˆg(X∗

j)+ε∗

jfor h∗(·)=Λˆ

θ(·)−Λˆ

θ(0)

Λˆ

θ(1)−Λˆ

θ(0),(26)

where ε∗

jis generated independently (given the original data) from the density

1

n

n

i=1

1

an

˜εi−·

an

(which is a kernel density estimator of the density of ε(θ0)with kernel and

bandwidth an).

(3) Calculate the bootstrap estimate ˆ

h∗for h∗from (Y∗

j,X∗

j), j=1,...,m.

(4) Calculate the bootstrap statistic T∗

n,m=min

(c1,c2,θ)∈Υm

j=1w(Y∗

j)( ˆ

h∗(Y∗

j)c1+

c2−Λθ(Y∗

j))2.

(5)LetB∈N. Repeat steps (2)–(4)Btimes to obtain the bootstrap statis-

tics T∗

n,m,1,...,T∗

n,m,B.Letq∗

αdenote the quantile of T∗

n,mconditional on

(Yi,Xi), i=1,...,n. Estimate q∗

αby

ˆq∗

α=min z∈{T∗

n,m,1,...,T∗

n,m,B}: 1

B

B

k=1

I{T∗

n,m,k≤z}≥α.

Remark 4.2

1. The reason for resampling the bootstrap data X∗

j,j=1,...,m,nonparametri-

cally consists in the need to mimic the original transformation estimator and its

asymptotic behaviour with the bootstrap estimator conditional on the data. There-

fore, to proceed in the proof as in Colling and Van Keilegom (2019), it is necessary

123

N. Kloodt et al.

to smooth the distribution of X∗. The properties nbdX

n→∞and κ(0)>0 ensure

that conditional on the original data (Y1,X1),...,(Yn,Xn)the support of X∗

contains that of v(from assumption (B7) in Section 2 of the supplement) with

probability converging to one. Thus, vcan be used for calculating ˆ

h∗as well.

2. To proceed as in Algorithm 4.1, it may be necessary to modify h∗so that S∗

j=

ˆg(X∗

j)+ε∗

jbelongs to the domain of (h∗)−1for all j=1,...,m. As long as

these modiﬁcations do not have any inﬂuence on h∗(y)for y∈Yw, the inﬂuence

on the ˆ

h∗and Tn,mshould be asymptotically negligible (which can be proven for

the estimator by Colling and Van Keilegom (2019)).

The bootstrap algorithm should fulﬁl two properties: On the one hand, under the null

hypothesis, the algorithm has to provide, conditionally on the original data, consistent

estimates of the quantiles of Tn, or rather its asymptotic distribution from Theorem 3.2.

On the other hand, to be consistent under H1the bootstrap quantiles have to stabilize

or at least converge to inﬁnity with a rate less than that of Tn.Toformalizethis,

let (Ω, A,P)denote the underlying probability space. Assume that (Ω, A)can be

written as Ω=Ω1×Ω2and A=A1⊗A2for some measurable spaces (Ω1,A1)

and (Ω2,A2). Further, assume that Pis characterized as the product of a probability

measure P1on (Ω1,A1)and a Markov kernel

P1

2:Ω1×A2→[0,1],

that is P=P1⊗P1

2. While randomness with respect to the original data is modelled

by P1, randomness with respect to the bootstrap data and conditional on the original

data is modelled by P1

2. Moreover, assume

P1

2(ω, A)=PΩ1×A|(Y1(ω), X1(ω )), . . . , (Yn(ω), Xn(ω))

for all ω∈Ω1,A∈A2.

With these notations, the assumptions (A8∗) and (A9∗) from Section 1 of the supple-

mentary material can be formulated.

Theorem 4.3 Let q∗

αdenote the bootstrap quantile from Algorithm 4.1.

1. Assume H0,(A1)–(A8),(A8∗),(A9∗). Then, q∗

αfulﬁls

P1ω∈Ω1:lim sup

m→∞ |q∗

α−qα|>δ

=o(1)

for all δ>0. Hence, P (Tn>q∗

α)=α+o(1)under the null hypothesis.

2. Assume H1,(A1)–(A4),(A6’),(A8∗). Then, q∗

αfulﬁls

P1ω∈Ω1:Tn>lim sup

m→∞

q∗

α=1+o(1),

so that P (Tn>q∗

α)=1+o(1)under the alternative.

123

Speciﬁcation testing in semi-parametric transformation models

The proof is given in the supplement. Since only ˆ

θis used to generate the bootstrap

observations in Algorithm 4.1, it is conjectured that Theorem 4.3 can be generalized

to the usage of ¯

Tnfrom (7) in Algorithm 4.1.

Simulations

Throughout this section, g(X)=4X−1, X∼U([0,1])and ε∼N(0,1)are

chosen. Moreover, the null hypothesis of hbelonging to the Yeo and Johnson (2000)

transformations

Λθ(Y)=⎧

⎪

⎪

⎪

⎨

⎪

⎪

⎪

⎩

(Y+1)θ−1

θ,if Y≥0,θ = 0

log(Y+1), if Y≥0,θ =0

−(1−Y)2−θ−1

2−θ,if Y<0,θ = 2

−log(1−Y), if Y<0,θ =2.

with parameter θ∈Θ0=[0,2]is tested. Under H0, we generate data using the

transformation h=(Λθ0(·)−Λθ0(0))/(Λθ0(1)−Λθ0(0)) to match the identiﬁcation

constraints h(0)=0,h(1)=1. Under the alternative, we choose transformations h

with an inverse given by the following convex combination,

h−1(Y)=(1−c)(Λ−1

θ0(Y)−Λ−1

θ0(0)) +c(r(Y)−r(0))

(1−c)(Λ−1

θ0(1)−Λ−1

θ0(0)) +c(r(1)−r(0)) (27)

for some θ0∈[0,2], some strictly increasing function rand some c∈[0,1].In

general, it is not clear if a growing factor cleads to a growing distance (5). Indeed,

the opposite might be the case, if ris somehow close to the class of transforma-

tion functions considered in the null hypothesis. Simulations were conducted for

r1(Y)=5Φ(Y),r2(Y)=exp(Y)and r3(Y)=Y3, where Φdenotes the cumulative

distribution function of a standard normal distribution, and c=0,0.2,0.4,0.6,0.8,1.

The prefactor in the deﬁnition of r1is introduced because the values of Φare rather

small compared to the values of Λθ, that is, even when using the presented convex

combination in (27), Λθ0(except for c=1) would dominate the “alternative part” r

of the transformation function without this factor. Note that r2and Λ0only differ with

respect to a different standardization. Therefore, if his deﬁned via (27) with r=r2

the resulting function is for c=1 close to the null hypothesis case.

For calculating the test statistic the weighting function wwas set equal to one.

The nonparametric estimator of hwas calculated as in Colling and Van Keilegom

(2019) (see Section 2 of the supplement for details) with the Epanechnikov kernel

K(y)=3

4(1−y2)2I[−1,1](y)and a normal reference rule bandwidth (see for example

Silverman (1986))

hu=40√π

n1

5

ˆσu,hx=40√π

n1

5

ˆσx,

123

N. Kloodt et al.

where ˆσ2

uand ˆσ2

xare estimators of the variance of U=T(Y)and X, respectively.

The number of evaluation points Nxfor the nonparametric estimator of hwas set

equal to 100 (see Section 2 of the supplement for details). The integral in (S3) was

computed by applying the function integrate implemented in R. In each simulation run,

n=100 independent and identically distributed random pairs (Y1,X1),...,(Yn,Xn)

were generated as described before, and 250 bootstrap quantiles, which are based

on m=100 bootstrap observations (Y∗

1,X∗

1),...,(Y∗

m,X∗

m), were calculated as in

Algorithm 4.1 using κthe U([−1,1])-density, the standard normal density and

an=bn=0.1. To obtain more precise estimators of the rejection probabilities

under the null hypothesis, 800 simulation runs were performed for each choice of θ0

under the null hypothesis, whereas in the remaining alternative cases 200 runs were

conducted. Among other things the nonparametric estimation of h, the integration in

(S3), the optimization with respect to θand the number of bootstrap repetitions cause

the simulations to be quite computationally demanding. Hence, an interface for C++

as well as parallelization were used to conduct the simulations.

The main results of the simulation study are presented in Table 1. There, the rejec-

tion probabilities of the settings with h=(Λθ0(·)−Λθ0(0))/(Λθ0(1)−Λθ0(0))

under the null hypothesis, and has in (27) under the alternative with r∈{r1,r2,r3},

c∈{0,0.2,0.4,0.6,0.8,1}and θ0∈{0,0.5,1,2}are listed. The signiﬁcance level

was set equal to 0.05 and 0.10. Note that the test sticks to the level or is even a bit

conservative. Under the alternatives, the rejection probabilities not only differ between

different choices of r, but also between different transformation parameters θ0that are

inserted in (27). While the test shows high power for some alternatives, there are also

cases, where the rejection probabilities are extremely small. There are certain reasons

that explain these observations. First, the class of Yeo-Johnson transforms seems to be

quite general and second the testing approach itself is rather ﬂexible due to the mini-

mization with respect to γ. Having a look at the deﬁnition of the test statistic in (6), it

attains small values if the true transformation function can be approximated by a linear

transformation of Λ˜

θfor some appropriate ˜

θ∈[0,2]. In the following, this issue will

be explored further by analysing some graphics. All of the three ﬁgures that occur in

the following have the same structure and consist of four panels. The upper left panel

shows the true transformation function with inverse function (27). Since Ydepends on

the transformation function, the values of S=h(Y)=g(X)+ε, which are displayed

on the vertical axis, are ﬁxed for a comparison of different transformation functions.

Due to the choice of g(X)=4X−1 and X∼U([0,1])the vertical axis reaches from

−1 to 3, which would be the support of h(Y)if the error is neglected. In the upper right

panel, the parametric estimator of this function is displayed. Both of these functions

are then plotted against each other in the lower left panel by pairing values with the

same Scomponent. Finally, the function Y→ Λθ0(Y(Λ−1

θ0(1)−Λ−1

θ0(0)) +Λ−1

θ0(0)),

which represents the part of hcorresponding to the null hypothesis, is plotted against

the true transformation function in the last panel.

In the lower left panel, one can see if the true transformation function can be

approximated by a linear transform of Λ˜

θfor some ˜

θ∈[0,2], which is an indicator

for rejecting or not rejecting the null hypothesis as was pointed out before. As already

mentioned, the rejection probabilities not only differ between different deviation func-

tions r, but also within these settings. For example, when considering r=r1with

123

Speciﬁcation testing in semi-parametric transformation models

Table 1 Rejection probabilities at θ0∈{0,0.5,1,2}and r∈{r1,r2,r3}for the test statistic Tn

Level αθ

0=0θ0=0.5θ0=1θ0=2

0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10

r1Null hyp. 0.01000 0.04000 0.03125 0.08750 0.03125 0.07750 0.01625 0.05625

c=0.2 0.000 0.010 0.075 0.105 0.010 0.015 0.000 0.020

c=0.4 0.000 0.000 0.020 0.045 0.000 0.015 0.120 0.200

c=0.6 0.100 0.155 0.035 0.050 0.085 0.150 0.415 0.545

c=0.8 0.685 0.765 0.110 0.210 0.505 0.645 0.785 0.890

c=1 0.965 0.990 0.925 0.975 0.975 0.985 0.985 0.990

r2c=0.2 0.010 0.035 0.030 0.045 0.515 0.640 0.885 0.965

c=0.4 0.015 0.040 0.000 0.005 0.060 0.135 0.870 0.980

c=0.6 0.035 0.085 0.000 0.005 0.005 0.005 0.625 0.815

c=0.8 0.020 0.040 0.010 0.040 0.000 0.005 0.185 0.325

c=1 0.020 0.065 0.030 0.090 0.025 0.095 0.050 0.105

r3c=0.2 0.330 0.505 0.730 0.855 0.810 0.905 0.930 0.995

c=0.4 0.730 0.865 0.815 0.945 0.875 0.970 0.915 0.990

c=0.6 0.880 0.940 0.895 0.960 0.950 0.995 0.940 0.990

c=0.8 0.895 0.965 0.925 0.975 0.935 0.990 0.915 0.980

c=1 0.980 0.990 0.960 0.990 0.939 0.990 0.940 0.985

123

N. Kloodt et al.

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

−1 0 123

Y

True Transformation Function

−10123

−1 0 1 23

Y

Parametric Estimator

−1 0 1 2 3

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

True Transformation Function

Parametric Estimator

−1 0 1 2 3

−1.5 −0.5 0.0 0.5 1.0 1.5 2.0

True Transformation Function

Param. Funct. at Original Parameter

Fig. 1 Some transformation functions for θ0=0.5,c=0.6andr=r1

−1.0 −0.5 0.0 0.5 1.0 1.5

−1 0 123

Y

True Transformation Function

−2 −1 0 1 2

−1 0 1 23

Y

Parametric Estimator

−1 0 1 2 3

−0.5 0.0 0.5 1.0 1.5 2.0

True Transformation Function

Parametric Estimator

−1 0 1 2 3

−0.5 0.0 0.5 1.0 1.5 2.0

True Transformation Function

Param. Funct. at Original Parameter

Fig. 2 Some transformation functions for θ0=2,c=0.6andr=r1

c=0.6 the rejection probabilities for θ0=0.5 amount to 0.035 for α=0.05 and

to 0.050 for α=0.10, while for θ0=2, they are 0.415 and 0.545. Figures 1and 2

explain why the rejection probabilities differ that much. While for θ0=0.5 the trans-

formation function can be approximated quite well by transforming Λ1.06 linearly,

the best approximation for θ0=2 is given by Λ1.94 and seems to be relatively bad.

The best approximation for c=1 can be reached for θaround 1.4. In contrast to that

considering θ0=2 and r=r3results in a completely different picture. As can be

123

Speciﬁcation testing in semi-parametric transformation models

−202468

−1 0 123

Y

True Transformation Function

−1 0 1 2 3

−1 0 1 23

Y

Parametric Estimator

−1 0 1 2 3

−2 0 2 4 6

True Transformation Function

Parametric Estimator

−1 0 1 2 3

0510 15 20 25

True Transformation Function

Param. Funct. at Original Parameter

Fig. 3 Some transformation functions for θ0=2,c=0.2andr=r3

seen in Fig. 3even for c=0.2 the resulting hdiffers so much from the null hypothesis

that it cannot be linearly transformed into a Yeo-Johnson transform (see the lower left

subgraphic). Consequently, the rejection probabilities are rather high.

A way to overcome this problem can consist in applying the modiﬁed test statistic

¯

Tnfrom (7). Although Colling and Van Keilegom (2020) showed that the estimator

ˆ

θseems to outperform ˜

θfrom Remark 2.1 in simulations, ﬁxing c1,c2beforehand

might lead due to the reduced ﬂexibility of the minimization procedure to higher

rejection probabilities when using ¯

Tninstead of Tn. Table 2contains rejection proba-

bilities which are based on the bootstrap version of ¯

Tn. The same simulation setting

and procedures as before have been used. Indeed, some of the rejection probabili-

ties have increased compared to Table 1. For example, the rejection probabilities for

r=r1,θ

0=0.5 and c=0.6 amount to 0.115 and 0.17 instead of 0.0035 and 0.05

in Table 1. Nevertheless, this cannot be generalized since the rejection probabilities

when using ¯

Tnare sometimes below those for Tn,e.g.forθ0=0 and r=r1or θ0=2

and r=r2.

Under some alternatives the rejection probabilities are even smaller than the level.

This behaviour indicates that from the presented test’s perspective, these models seem

to fulﬁl the null hypothesis more convincingly than the null hypothesis models them-

selves. The reason for this is shown in Fig. 4for the setting θ0=1,c=0.4 and

r=r1. There, the relationship between the nonparametric estimator of the transfor-

mation function and the true transformation function is shown. While the diagonal line

represents the identity, the nonparametric estimator seems to ﬂatten the edges of the

transformation function. In contrast to this, using r=r1in (27) steepens the edges so

that both effects neutralize each other. Similar effects cause low rejection probabilities

for r=r2, although the reasoning is slightly more sophisticated and is also associated

with the boundedness of the parameter space Θ0=[0,2].

123

N. Kloodt et al.

Table 2 Rejection probabilities at θ0∈{0,0.5,1,2}and r∈{r1,r2,r3}for the test statistic ¯

Tn

Level αθ

0=0θ0=0.5θ0=1θ0=2

0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10

r1Null hyp. 0.03375 0.08250 0.03875 0.08375 0.04250 0.07500 0.06000 0.10375

c=0.2 0.025 0.040 0.035 0.115 0.055 0.090 0.230 0.285

c=0.4 0.010 0.025 0.070 0.155 0.155 0.245 0.525 0.595

c=0.6 0.055 0.130 0.115 0.170 0.505 0.585 0.780 0.840

c=0.8 0.235 0.485 0.515 0.610 0.840 0.885 0.925 0.940

c=1 0.990 0.995 0.990 0.995 0.995 1.000 0.980 0.985

r2c=0.2 0.035 0.040 0.035 0.045 0.185 0.405 0.795 0.950

c=0.4 0.035 0.060 0.020 0.060 0.030 0.055 0.715 0.920

c=0.6 0.035 0.065 0.020 0.055 0.015 0.050 0.470 0.705

c=0.8 0.050 0.085 0.035 0.045 0.030 0.060 0.105 0.245

c=1 0.030 0.060 0.045 0.095 0.080 0.130 0.055 0.125

r3c=0.2 0.230 0.395 0.560 0.765 0.725 0.880 0.900 1.000

c=0.4 0.625 0.780 0.760 0.915 0.810 0.945 0.780 0.970

c=0.6 0.750 0.935 0.865 0.975 0.845 0.985 0.780 0.985

c=0.8 0.810 0.980 0.825 0.965 0.805 0.960 0.810 0.990

c=1 0.725 0.970 0.725 0.970 0.680 0.970 0.750 0.990

123

Speciﬁcation testing in semi-parametric transformation models

−2024

−1 0 1 2 3

True Transformation Function

Estimated Transformation Function

Fig. 4 Transformation function for θ0=1,c=0.4andr=r1on the horizontal axis and its nonparametric

estimator on the vertical axis. The identity is displayed in red

Table 3 Rejection probabilities at θ0=1andθ0=2forr=r1

Param. Alternative Original framework Modiﬁed weighting

Level α=0.05 α=0.10 α=0.05 α=0.10

θ0=1 Null hyp. 0.03125 0.07750 0.02875 0.07875

c=0.2 0.010 0.015 0.040 0.100

c=0.4 0.000 0.015 0.205 0.320

c=0.6 0.085 0.150 0.590 0.715

c=0.8 0.505 0.645 0.950 0.980

c=1 0.975 0.985 1.000 1.000

θ0=2 Null hyp. 0.01625 0.05625 0.05500 0.10375

c=0.2 0.000 0.020 0.225 0.350

c=0.4 0.120 0.200 0.575 0.710

c=0.6 0.415 0.545 0.910 0.965

c=0.8 0.785 0.890 0.990 1.000

c=1 0.985 0.990 0.995 1.000

One possible solution could consist in adjusting the weight function wsuch that the

boundary of the support of Ydoes no longer belong to the support of w. In Table 3,

the rejection probabilities for a modiﬁed weighting approach are presented. There,

the weight function was chosen such that the smallest ﬁve percent and the largest ﬁve

percent of observations were omitted to avoid the ﬂattening effect of the nonparametric

estimation. Indeed, the resulting rejection probabilities under the alternatives increase

and lie above those under the null hypotheses.

At last, simulations for precise hypotheses as in (24) were conducted. For sake of

brevity, only rejection probabilities resulting from the self normalized test statistic are

123

N. Kloodt et al.

Table 4 Rejection probabilities at θ0=0.5andθ0=1 for precise hypotheses and r=r1,r2,r3,r4

Level αθ

0=0θ0=0.5θ0=1θ0=2

0.05 0.10 Mean (T)0.05 0.10 Mean (T)0.05 0.10 Mean (T)0.05 0.10 Mean (T)

r1c=0 0.095 0.185 6.564 0.995 1.000 0.886 0.965 0.990 1.446 0.680 0.830 2.629

c=0.2 0.700 0.800 2.443 0.965 0.975 1.081 0.905 0.970 1.540 0.205 0.310 5.588

c=0.4 0.790 0.895 2.259 0.930 0.975 1.401 0.385 0.550 3.930 0.000 0.015 14.360

c=0.6 0.215 0.390 4.638 0.660 0.815 2.581 0.010 0.060 9.477 0.000 0.000 22.146

c=0.8 0.085 0.185 6.336 0.030 0.050 9.239 0.000 0.000 19.834 0.000 0.000 28.955

c=1 0.000 0.000 30.996 0.000 0.000 32.081 0.000 0.000 30.907 0.000 0.000 31.710

r2c=0.2 0.090 0.130 7.121 0.985 0.995 0.995 0.180 0.320 5.050 0.000 0.000 71.936

c=0.4 0.035 0.080 7.611 0.905 0.980 1.331 0.785 0.880 2.266 0.000 0.000 38.728

c=0.6 0.030 0.075 8.080 0.580 0.755 2.899 0.790 0.900 2.089 0.010 0.035 11.850

c=0.8 0.025 0.045 8.778 0.125 0.245 5.810 0.260 0.380 4.939 0.485 0.610 3.576

c=1 0.005 0.015 9.254 0.005 0.015 9.550 0.015 0.045 8.756 0.005 0.030 8.785

r3c=0.2 0.030 0.060 8.158 0.080 0.155 8.108 0.000 0.000 25.054 0.000 0.000 197.373

c=0.4 0.000 0.000 18.715 0.000 0.000 29.072 0.000 0.000 56.932 0.000 0.000 243.476

c=0.6 0.000 0.000 51.102 0.000 0.000 75.638 0.000 0.000 106.332 0.000 0.000 246.010

c=0.8 0.000 0.000 118.049 0.000 0.000 135.888 0.000 0.000 166.425 0.000 0.000 259.728

c=1 0.000 0.000 248.025 0.000 0.000 262.134 0.000 0.000 254.237 0.000 0.000 236.894

r4c=0.2 0.475 0.635 3.436 0.995 1.000 0.942 0.960 0.980 1.265 0.685 0.820 2.695

c=0.4 0.815 0.915 1.920 0.995 1.000 0.972 0.880 0.940 1.793 0.420 0.570 4.062

c=0.6 0.580 0.780 2.943 0.975 0.995 1.124 0.590 0.775 2.773 0.135 0.235 6.390

c=0.8 0.155 0.310 5.137 0.800 0.895 2.185 0.120 0.295 5.528 0.060 0.100 8.490

c=1 0.010 0.015 10.722 0.015 0.015 10.519 0.000 0.020 10.680 0.015 0.030 10.211

123

Speciﬁcation testing in semi-parametric transformation models

presented since this approach seems to outperform that based on the estimator ˆσ2from

Section 3 by far in the simulated settings. Since only a fraction of the data is used to

calculate Vn, the sample size was increased to n=500. The settings and techniques

remain the same as before. The probability measure νwas set to

ν=1

10 δ0.6+2

10 δ0.7+3

10 δ0.8+4

10 δ0.9

to put a higher weight on those parts of Vnwhere more data points are used. Fur-

thermore, the threshold was chosen to be η=0.02, which roughly corresponds to

plugging the logit-function r4(y):= 5exp(y)

1+exp(y)and c=1 into Eq. (27) and calcu-

lating min

θ∈Θd(Λθ,h). Hence, we expect the test to reject the null hypothesis H

0if

Tn<nη=10 holds.

A detailed analysis would go beyond the scope of this manuscript, so that only

some rejection probabilities are given in Table 4. Moreover, the mean values of the

test statistic Tnare listed to link the rejection probabilities to the distance between the

expected value of the test statistic and the threshold of η=0.02. First, the smaller

the value of Tnis the more likely the test seems to reject the null hypothesis H

0.

Further, the test holds the level, but is slightly conservative. Alternatives seem to be

detected for mean values of Tnaround or below eight. Nevertheless, the power of the

test is quite high in scenarios with small expected values of the test statistic, which

often corresponds to transformations functions which are close to the parametric class.

For θ=0.5 and θ=1 the rejection probabilities are in these cases above 0.90 and

sometimes even close to one. Although the inﬂuence of simulation parameters such

as the sample size nor the probability measure νhas not been examined, the results

indicate that using the self normalized test statistic can be a good way to test for the

precise hypotheses H

0and H

1.

Acknowledgements Natalie Neumeyer acknowledges ﬁnancial support by the DFG (Research Unit FOR

1735 Structural Inference in Statistics: Adaptation and Efﬁciency). Ingrid Van Keilegom acknowledges

ﬁnancial support by the European Research Council (2016–2021, Horizon 2020/ERC Grant Agreement

No. 694409). Moreover, we would like to thank the associate editor and two anonymous referees for their

very helpful suggestions and comments on the paper.

Open Access This article is licensed under a Creative Commons Attribution4.0 International License, which

permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give

appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,

and indicate if changes were made. The images or other third party material in this article are included

in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If

material is not included in the article’s Creative Commons licence and your intended use is not permitted

by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the

copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

Allison JS, Hušková M, Meintanis SG (2018) Testing the adequacy of semiparametric transformation

models. TEST 27(1):70–94

Berger JO, Delampady M (1987) Testing precise hypotheses. Stat Sci 2(3):317–335

Box GEP, Cox DR (1964) An analysis of transformations. J R Stat Soc B 26(2):211–252

123

N. Kloodt et al.

Carroll RJ, Ruppert D (1988) Transformation and weighting in regression. CRC Press, Boca Raton

Chen S (2002) Rank estimation of transformation models. Econometrica 70:1683–1697

Chiappori P-A, Komunjer I, Kristensen D (2015) Nonparametric identiﬁcation and estimation of transfor-

mation. J Economet 188(1):22–39

Colling B, Van Keilegom I (2016) Goodness-of-ﬁt tests in semiparametric transformation models. TEST

25(2):291–308

Colling B, Van Keilegom I (2017) Goodness-of-ﬁt tests in semiparametric transformation models using the

integrated regression function. J Multivar Anal 160:10–30

Colling B, Van Keilegom I (2019) Estimation of fully nonparametric transformation models. Bernoulli

25:3762–3795

Colling B, Van Keilegom I (2020) Estimation of a semiparametric transformation model: a novel approach

based on least squares minimization. Electron J Stat 14:769–800

Dette H, Kokot K, Volgushev S (2020) Testing relevant hypotheses in functional time series via self-

normalization. J R Stat Soc 82:629–660

Heuchenne C, Samb R, Van Keilegom I (2015) Estimating the error distribution in semiparametric trans-

formation models. Electron J Stat 9:2391–2419

Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Stat 19(3):293–

325

Horowitz JL (1996) Semiparametric estimation of a regression model with an unknown transformation of

the dependent variable. Econometrica 64(1):103–137

Horowitz JL (2009) Semiparametric and nonparametric methods in econometrics. Springer, Berlin

Hušková M, Meintanis SG, Neumeyer N, Pretorius C (2018) Independence tests in semiparametric trans-

formation models. S Afr J Stat 52:1–13

Kloodt N, Neumeyer N (2020) Speciﬁcation tests in semiparametric transformation models—a multiplier

bootstrap approach. Comput Stat Data Anal 145

Lakens D (2017) Equivalence tests: a practical primer for t tests, correlations, and meta-analyses. Soc

Psychol Personal Sci 8(4):355–362

Lee AJ (1990) U-Statistics: theory and practice. Dekker, New York City

Lewbel A, Lu X, Su L (2015) Speciﬁcation testing for transformation models with an application to gen-

eralized accelerated failure-time models. J Economet 184(1):81–96

Linton O, Sperlich S, Van Keilegom I (2008) Estimation of a semiparametric transformation model. Ann

Stat 36(2):686–718

Mu Y, He X (2007) Power transformation toward a linear regression quantile. J Am Stat Assoc 102(477):269–

279

Neumeyer N, Noh H, Van Keilegom I (2016) Heteroscedastic semiparametric transformation models: esti-

mation and testing for validity. Stat Sin 26:925–954

Powell J (1991) Estimation of monotonic regression models under quantile restrictions. In: Barnett WA,

Powell J, Tauchen GE (eds) Nonparametric and semiparametric methods in econometrics and statistics:

proceedings of the 5th international symposium on economic theory and econometrics, pp 357–384

Shao X (2010) A self-normalized approach to conﬁdence interval construction in time series. J R Stat Soc

B 72(3):343–366

Shao X, Zhang X (2010) Testing for change points in time series. J Am Stat Assoc 105(491):1228–1240

Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

Szydłowski A (2020) Testing a parametric transformation model versus a nonparametric alternative.

Economet Theory. https://doi.org/10.1017/S0266466619000355

Wellner JA (2005) Empirical processes: theory and applications. https://www.stat.washington.edu/jaw/

RESEARCH/TALKS/Delft/emp-proc- delft-big.pdf

Witting H, Müller-Funk U (1995) Mathematical statistics II—Asymptotic statistics: parametric models and

nonparametric functionals. B. G Teubner

Yeo I-K, Johnson RA (2000) A new family of power transformations to improve normality or symmetry.

Biometrika 98(4):954–959

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps

and institutional afﬁliations.

123