Access to this full-text is provided by De Gruyter.

Content available from Journal of Geodetic Science

This content is subject to copyright. Terms and conditions apply.

©2017 R. Lehmann and A. Voß-Böhme, published by De Gruyter Open.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.

J. Geod. Sci. 2017; 7:68–78

Research Article Open Access

R. Lehmann* and A. Voß-Böhme

On the statistical power of Baarda’s outlier test

and some alternative

DOI 10.1515/jogs-2017-0008

Received November 19, 2016; accepted March 14, 2017

Abstract: Baarda’s outlier test is one of the best established

theories in geodetic practice. The optimal test statistic of

the local model test for a single outlier is known as the nor-

malized residual. Also other model disturbances can be

detected and identied with this test. It enjoys the property

of being a uniformly most powerful invariant (UMPI) test,

but is not a uniformly most powerful (UMP) test. In this

contribution we will prove that in the class of test statis-

tics following a common central or non-central χ2distri-

bution, Baarda’s solution is also uniformly most powerful,

i.e. UMPχ2for short. It turns out that UMPχ2is identical to

UMPI, such that this proof can be seen as another proof of

the UMPI property of the test. We demonstrate by an ex-

ample that by means of the Monte Carlo method it is even

possible to construct test statistics, which are regionally

more powerful than Baarda’s solution. They follow a so-

called generalized χ2distribution. Due to high computa-

tional costs we do not yet propose this as a ”new outlier

detection method”, but only as a proof that it is in princi-

ple possible to outperform the statistical power of Baarda’s

test.

Keywords: generalized χ2test; Monte Carlo method; out-

lier test; uniformly most powerful test

1Introduction

Today, Baarda’s test is one of the best established theo-

ries in geodetic practice. It was introduced to detect model

disturbances in geodetic networks, as they are typically

caused by outliers in the observations.

Baarda’s test goes back to the pioneering work of

Baarda (1967,1968) in the eld of geodetic statistics. When

doing geodetic adjustment computations, this test is

*Corresponding Author: R. Lehmann: University of Applied

Sciences Dresden, Germany, E-mail: ruediger.lehmann@htw-

dresden.de

A. Voß-Böhme: University of Applied Sciences Dresden, Germany

mandatory to detect possible outliers or to verify that there

are none. Thus, any reasonable geodetic adjustment soft-

ware has implemented this test. Most geodetic textbooks

have a chapter on Baarda’s theory (e.g. Koch 1999, Teunis-

sen 2000,2006).

Later this work is continued by Pope (1976) and Heck

(1981). These authors introduce and develop the τ-test,

which can be used, if absolute accuracies of observations

are not known. Pelzer (1983) investigates the most feared

case that a model disturbance inuences many or all ob-

servations equally. Teunissen (1990) applies Baarda’s test

to ltering problems in integrated navigation systems. Re-

cently, Lehmann and Lösler (2016) propose an extension

of Baarda’s concept to multiple outlier detection.

Baarda’s testing procedure, also known as data-

snooping, consists of three steps (Teunissen 2000,2006):

1. Detection: By a global model test (also known as

overall model test) it is detected, if a model distur-

bance is present at all. This test is sensitive to a vari-

ety of dierent model disturbances.

2. Identication: If a disturbance has been detected, it

is identied by a local model test (also known as in-

dividual model test or w-test). This test is basically

a multiple test with various dierent alternative hy-

potheses.

3. Adaption: If a disturbance has been identied, a cor-

rective action needs to be undertaken, such as elim-

ination of an outlier from the set of observations.

The testing procedure is often applied consecutively: After

identication and adaption for one type of model distur-

bance, the test is repeated, until no more disturbances can

be detected.

In addition to this, Baarda’s theory provides measures

of reliability (Baarda 1968). The minimum detectable bias

(MDB) is the least model disturbance, which can be de-

tected. Prószyński (2015) recently supplements the MDB by

an outlier identiability index assigned to each individual

observation and a mis-identiability index being the max-

imum probability of identifying an erroneous observation.

In this paper we are exclusively concerned with a sin-

gle local model test (identication), for which a suitable

test statistic is needed. The aim is to nd a test statistic with

On the statistical power of Baarda’s outlier test and some alternative |69

low decision error rates. While the rate of type 1 decision

error can be selected by the user, the rate of type 2 deci-

sion error cannot. A test statistic with a low rate of type 2

decision error is said to be powerful.

Baarda’s optimal test statistic of the local model test

for a single outlier is known as the normalized residual,

see also (Lehmann 2013b). But also multiple outliers can

be tested for. One typical application is outlier detection

in 2D or 3D point coordinates, where usually not only one

coordinate is outlying. This can be advanced to even set

up a multiple hypotheses test according to (Lehmann and

Lösler 2016).

In today’s language, Baarda’s solution belongs to the

class of generalized likelihood ratio tests and therefrom in-

herits the property of being a uniformly most powerful in-

variant (UMPI) test (Arnold 1981). This has been worked

out in great detail by Kargoll (2012). It is also shown there

that a uniformly most powerful (UMP) test does not exist

for Baarda’s test problem. In the present contribution we

will prove the result that in the class of test statistics fol-

lowing a common central or non-central χ2distribution,

Baarda’s solution is also uniformly most powerful. This

property is named UMPχ2for short. Later it turns out that

UMPχ2is identical to UMPI, such that this proof can be

seen as another proof of the UMPI property of the test.

Unlike Baarda, we have fast computers at our dis-

posal. By means of the Monte Carlo method it is even pos-

sible to construct test statistics, which are regionally more

powerful than Baarda’s solution. This is of course compu-

tationally much more costly than Baarda’s solution. In this

paper we restrict ourselves to test statistics following the

so-called generalized χ2distributions. The procedure is il-

lustrated by means of an example: the problem of iden-

tifying two outliers in a straight line t. It is well known

that such outliers can mask each other, making it dicult

to identify them. In the example we successfully mitigate

this outlier masking eect by constructing a generalized

χ2test.

2Gauss-Markov model and

Baarda’s problem

We start from the well known linear or linearized func-

tional adjustment model (observation equations)

y=Ax +e(1)

with the known n-vector of observations y, the unknown

n-vector of observation errors e, the unknown u-vector of

adjustment parameters xand the known n×u-matrix A

(matrix of observation equations) and from a stochastic

adjustment model for normal distributed observation er-

rors

e∼N(0,Σe)(2)

with a known positive denite symmetric covariance ma-

trix Σe. Note that often Σeis assumed to be known only up

to a common scalar factor, the a priori variance factor σe.

In Baarda’s problem, which we exclusively focus on, this

is not so. Therefore, we assume Σeto be fully known.

Moreover, we here conne ourselves to the case that

Aand Σehave full column rank. Although important in

geodesy (free network adjustment), the singular case re-

quires special treatment and is outside our scope. By least

squares solution we nd the well-known estimate for the

vector of residuals

^

e=I−AATΣ−1

eA−1ATΣ−1

ey(3)

following the multivariate normal distribution

^

e∼N(0,Σ^

e)(4)

with the covariance matrix

Σ^

e=Σe−AATΣ−1

eA−1AT(5)

The superscript Tsymbolizes transposition. Using

Eq. 5, we can rewrite Eq. 3 as

^

e=Σ^

eΣ−1

ey(3a)

This model is also known as Gauss-Markov model of

geodetic adjustment. Since we conne ourselves to regular

models, where no singular matrices occur, we know that

rank (Σ^

e)=n−u(6)

Alternatively, one may suspect a number of mgross

errors or other model disturbances aecting the observa-

tions. The common procedure is to oppose Eq. 1 by an ex-

tended model including a m-vector ∇of deterministic bias

parameters, accounting for those model disturbances as

y=Ax +C∇+e=(A C)x

∇+e(7)

where Cis the n×mmatrix relating bias parameters to ob-

servations. We conne ourselves to matrices (A C)having

full column rank, such that:

rank (C)=m≤n−u(8)

In outlier detection, Ctypically consists exclusively of

elements with values 0 and 1, while 1 in row iand column k

70 |R. Lehmann and A. Voß-Böhme

means that the kth bias parameter ∇kaects the ith obser-

vation yiby its full magnitude, such that it becomes an out-

lier, and 0 implies that this bias parameter does not at all

aect this observation (cf. e.g. Teunissen 2000, p. 37). This

type of alternative model in Eq. 7 is known as the mean

shift model. See (Lehmann 2013a) for a synopsis of possi-

ble alternative models.

If the functional model in Eq. 7 applies then the resid-

uals Eq. 3 follow a multivariate normal distribution with

the same covariance matrix as in Eq. 5, but with the expec-

tation

µ^

e=Σ^

eΣ−1

eC∇(9)

rather than zero, as in the case that the functional model in

Eq. 1 applies. In order to discriminate between both mod-

els, we set up a statistical hypothesis test with null hypoth-

esis H0that functional model in Eq. 1 applies:

^

e|H0∼N(0,Σ^

e)(10a)

versus alternative hypothesis HAthat functional model in

Eq. 7 applies:

^

e|HA∼N(µ^

e,Σ^

e)=NΣ^

eΣ−1

eC∇,Σ^

e,∇=0(10b)

This is known as Baarda’s problem.

3Hypothesis tests for Baarda’s

problem

For the test of H0vs. HAin Eqs. 10a, 10b it is now desired

to construct a suitable scalar test statistic Tand to nd a

critical value c, such that a test decision

T(y)>c→reject H0(11)

has low decision error rates. We restrict our search to test

statistics Tof the form

T:= ^

eTM^

e(12)

where Mis some suitable symmetric n×n-matrix with

q:= rank(M)≤n. Note that a non-symmetric matrix M′

can be excluded from consideration, because the symmet-

ric matrix

M=1

2M′T+M′(13)

would give the same test statistic T.

The distribution of Tis known as “generalized χ2dis-

tribution” in the sense of Rice (1980) and others. Note that,

unfortunately, this term is sometimes used dierently by

some authors. We will return to this distribution in sec-

tion 8.

If Tis required to follow a common χ2distribution,

a necessary condition for Mis positive semi-deniteness,

such that ^

eTM^

e≥0is guaranteed. In this case, Mcan be

rank-factored by M=WWT, where Wdenotes a n×q-

matrix of full column rank q, giving

T=^

eTWW T^

e=

WT^

e

2(14)

By propagation of moments we conclude that the

normal random vector WT^

ehas the covariance matrix

WTΣ^

eWand the expectation 0 in the case that H0is true

and WTµ^

eotherwise. Remembering the deniton of the

common χ2distribution we conclude: If

WTΣ^

eW=Iq(15)

where Iqdenotes the q×qunit matrix, then Tin Eq. 14

has a common central or non-central χ2distribution with

qdegrees of freedom and noncentrality parameter λ:

T|H0∼χ2(q,0) (16a)

T|HA∼χ2(q,λ),λ>0(16b)

λ(∇)=

WTµ^

e

2=

WTΣ^

eΣ−1

eC∇

2(17)

Note that Eq. 15 means that Σ^

eis a generalized inverse

of M:

MΣ^

eM=WW TΣ^

eWW T=W W T=M(18)

Furthermore, note that from Eq. 6 follows

q= rank WTΣ^

eW≤min(rank (W),rank (Σ^

e))

= min(q,n−u)≤n−u(19)

Remark: Equation 15 is not a necessary condition for

Eq. 16a, but WTΣ^

eWcould also be a diagonal matrix with

ones and zeros. However, this relaxation would in general

violate Eq. 16b.

In contrast to the generalized χ2distribution, the ad-

vantage of the common χ2distribution is that the cdf and

inverse cdf can be evaluated by standard numerical li-

braries, like chi2cdf,chi2inv,ncx2cdf,ncx2inv in MATLAB.

Given a probability Pr of a type 1 decision error

α=Pr (T>c|H0)(20)

also known as the size of the test, we nd the critical value

c=cq= F−1

χ2(1−α|q,0)(21)

where F−1

χ2denotes the inverse cdf of the common χ2dis-

tribution. The subscript qof crepresents the dependence

On the statistical power of Baarda’s outlier test and some alternative |71

of con q, which we will focus on. We reject H0whenever

T>cq. The resulting probability of a type 2 decision error

β=Pr (T<c|HA)(22)

is found by

β(∇)=Fχ2(cq|q,λ(∇)) (23)

Note that ∇is not known, such that this formula can-

not be immediately evaluated.

The so-called power function

1−β(∇)(24)

is a function of the unknown bias parameters in vector

∇. Typically, the worst power is obtained in the case that

∇= 0, where H0and HAare fully equivalent, i.e. either

both true or both false. Remember that a true H0is rejected

with probability α, thus a true HAis accepted with only the

same small probability:

1−β(0)=α(25)

Otherwise, the power is expected to be larger.

4Baarda’s solution

A most desirable property of the test of H0vs. HAis that,

given a probability of a type 1 decision error αin Eq. 20,

the resulting probability of a type 2 error βin Eq. 22 is

minimized. Such a test is highly desired because with high

probability it does not only lead to the acceptance of H0,

when it is true, which is guaranteed by αchosen small, but

also to the rejection of H0, when it is false, i.e., a true out-

lier is really detected.

In statistics it is distinguished between MP, UMP and

UMPI tests, see Table 1. A “most powerful” test is practi-

cally useless because the values of the extra parameters ∇

of HA, i.e. the values of the outliers, are not known, such

that it would not be possible to adapt the test to the val-

ues of ∇. Therefore, a “uniformly most powerful” (UMP)

test is highly desired. Unfortunately, such UMP tests exist

only for a very small class of test problems (Kargoll 2012),

and Baarda’s test problem does not belong to this class.

Therefore, we usually resort to so-called “uniformly most

powerful invariant” (UMPI) tests, which are not UMP for

the test problem itself, but only for some transformed test

problem. Due to this restriction, UMPI tests are less useful

than UMP tests, but are available for a larger class of test

problems. Details will be given in the next section.

Baarda’s solution to the test problem in Eqs. 10a,10b

is known to be (Teunissen 2000, Kargoll 2012)

TB=^

eTΣ−1

eCCTΣ−1

eΣ^

eΣ−1

eC−1CTΣ−1

e^

e(26)

which is obviously of type Eq. 12 with q=m. Regular-

ity of (A C)guarantees regularity of CTΣ−1

eΣ^

eΣ−1

eC, see also

Eq. 37b. Moreover, TBis even of type Eq. 14, where the spe-

cial case of matrix W, denoted by WB, can be found by

Cholesky decomposition as follows:

TB=^

eTWBWT

B^

e(27)

CTΣ−1

eΣ^

eΣ−1

eC=GTG(28)

WB=Σ−1

eCG−1(29)

Note that also Eq. 15 holds for WB:

WT

BΣ^

eWB=G−TCTΣ−1

eΣ^

eΣ−1

eCG−1=G−TGTGG−1=Im

(30)

Consequently, TBfollows a common central or non-

central χ2distribution with q=mdegrees of freedom and

non-centrality parameter

λB(∇) = ∇TCTΣ−1

eΣ^

eΣ−1

eC(CTΣ−1

eΣ^

eΣ−1

eC)−1CTΣ−1

eΣ^

eΣ−1

eC∇

=∇TCTΣ−1

eΣ^

eΣ−1

eC∇=∇TCTΣ−1

eΣ^

eΣ−1

eΣ^

eΣ−1

eC∇

=µT

^

eΣ−1

eµ^

e(31)

where we made use of Eqs. 17,29 and lemma 1 (see ap-

pendix).

Example: Consider the model of a straight line t of n=

10 equidistant data points. The functional model in Eq. 1

reads

yi=x1+x2i+ei,i= 1, ..., n(32)

Let Σe=I10. The test problem is posed as detecting a

bias in the rst and second observation y1,y2. This gives

m= 2. The relevant matrices read

A=

⎛

⎜

⎜

⎜

⎜

⎜

⎜

⎝

1 1

1 2

1 3

.

.

..

.

.

1 10

⎞

⎟

⎟

⎟

⎟

⎟

⎟

⎠

,C=

⎛

⎜

⎜

⎜

⎜

⎜

⎜

⎝

1 0

0 1

0 0

.

.

..

.

.

0 0

⎞

⎟

⎟

⎟

⎟

⎟

⎟

⎠

,

Σ^

e=⎛

⎜

⎜

⎜

⎜

⎝

0.655 −0.291

−0.291 0.752

· · · 0.145

· · · 0.091

.

.

..

.

.

0.145 0.091

....

.

.

· · · 0.655

⎞

⎟

⎟

⎟

⎟

⎠

(33)

With chosen α= 0.05,Baarda’s test has a critical

value in Eq. 21

c2= F−1

χ2(0.95|2,0)= 5.99 (34)

and we get in Eq. 23

β(∇1,∇2)=Fχ2(5.99|2,λB(∇1,∇2)) (35)

72 |R. Lehmann and A. Voß-Böhme

Table 1: Optimal tests in the sense of maximum power in Eq. 24 for given size αin Eq. 20

property of a test meaning

most powerful MP The test is optimal only for one special ∇.

uniformly most power-

ful

UMP The test is optimal for any ∇.

uniformly most power-

ful invariant

UMPI The test is optimal for any ∇, but only within a class of tests having some

special invariance property.

uniformly most power-

ful χ2(see section 6)

UMPχ2The test is optimal for any ∇, but only within a class of tests with a test

statistic following a common χ2distribution.

with

λB(∇1,∇2) = ∇TCTΣ−1

eΣ^

eΣ−1

eC∇=∇TCTΣ^

eC∇

= 0.655 ·∇2

1−2·0.291 ·∇1·∇2+ 0.752 ·∇2

2

(36)

The curves of equal power are ellipses with center

∇= 0 and major axis in the direction of vector

0.763

0.646

See Fig. 1. It shows that βis large, i.e. the power tends to be

poor, when ∇1,∇2have equal signs. In outlier detection,

this eect is known as masking eect: outliers in ∇1,∇2

can mask each other. The power is a little bit poorer for a

bias in the rst observation y1. In outlier detection, this

is eect known as leverage eect: The outer observations

are not supported from observations on both sides of the x

axis. This manifests itself in Fig. 1 by the angles of the axes

diering from 45°.

Figure 1: Probability of type 2 decision error function β(∇1,∇2)in

Eq. 35 for the straight line t example in section 4.

5Baarda’s solution as a uniformly

most powerful invariant (UMPI)

test statistic

Arnold (1981) derives the result that the generalized like-

lihood ratio test in Eqs. 1,7 enjoys the UMPI property with

respect to some invariance transformation. We follow the

line of reasoning given in (Kargoll 2012), where this is

achieved as follows, see also Fig. 2:

(1) Reduction to a minimally sucient statistic. The least

squares solution (Kargoll 2012)

^

∇= (CTΣ−1

eΣ^

eΣ−1

eC)−1CTΣ−1

e^

e(37a)

Σ^

∇= (CTΣ−1

eΣ^

eΣ−1

eC)−1(37b)

of the bias parameters in vector ∇is shown to be a min-

imally sucient statistic. This means, it most eciently

captures all possible information about the true values of

the parameter vector ∇contained in the observations y.

It reduces the dimension of the test problem from nto m,

without impairing in any way the chances of a correct test

result. In other words: ^

∇in Eq. 37a can be considered as

pseudo observations of the reduced test problem, which

now reads

^

∇|H0∼N(0Σ^

∇)(38a)

versus

^

∇|HA∼N(∇,Σ^

∇)(38b)

(2) Full decorrelation/homogenisation of the pseudo ob-

servations ^

∇. We arrive at homogenised pseudo observa-

tions ^

∇(h), for which the testing problem reads

^

∇(h)|H0∼N(σ2,Im)(39a)

^

∇(h)|HA∼N(∇(h),σ2Im)(39b)

This is still a test problem fully equivalent to Eqs. 10a,10b.

On the statistical power of Baarda’s outlier test and some alternative |73

(3) Reduction to maximally invariant statistics. It is proved

that the test problem is invariant under the group of or-

thogonal transformations

U^

∇(h)|H0∼N(σ2,Im)(40a)

U^

∇(h)|HA∼N(U∇(h),σ2Im)(40b)

where Udenotes an arbitrary orthogonal m×mmatrix,

such that UTU=Im. The maximally invariant statistic of

this group of transformations is

T=^

∇(h)T^

∇(h)=

^

∇(h)

2(41)

This means, the distribution of Tis not changed by any

orthogonal transformation applied to ^

∇(h)and this group

is the largest collection of transformations leaving the test

problem invariant. The new maximally invariant statistic

Treads:

T|H0∼χ2(m,0) (42a)

T|HA∼χ2(m,λ),λ>0(42b)

Teunissen et al. (2005) show that the test statistic

Eqs. 42a, 42b is equivalent to Eq. 26, see also (Kargoll 2012).

It is also shown there, that the test statistic Eqs. 42a, 42b is

uniformly most powerful when compared to all test statis-

tics which are invariant with respect to orthogonal trans-

formations applied to ^

∇(h), a property called UMPI in the

literature. Consequently, Eqs. 42a,42b mean UMPI but not

UMP, as discussed above.

6Optimality of Baarda’s solution in

the class of χ2-test statistics

In this section we will prove the following new

Theorem: In the class of all common χ2-test statistics

Eqs. 14,15, Baarda’s solution in Eq. 26 represents a uni-

formly most powerful test, to be denoted as UMPχ2.

The restriction to a common χ2-test statistics is prac-

tically desirable because of the availability of the cdf and

inverse cdf in tables and statistical function libraries. The

prospective result can therefore be viewed as an additional

justication for Baarda’s solution.

Let the probability of type 1 error αin Eq. 20 be given

and xed. This is a standard assumption in geodetic statis-

tics. Now, we will solve the following

Problem:

Figure 2: Flow chart of the derivation in section 5.

1. Find matrix Wsuch that

β(W)=Fχ2(cq|q,λ(W)) ,q= rank(W)(43)

λ(W)=

WTΣ^

eΣ−1

eC∇

2(44)

cf. Eqs. 23,17, with critical values cqin Eq. 21 is min-

imized subject to Eq. 15.

2. Show that this solution is independent of ∇.

Note that in Eq. 43 we used Was the argument of βbecause

we now focus on this dependence. With such a matrix W,

Eq. 14 would be a UMPχ2test statistic. (Above, 1. ensures

“most powerful” and 2. means “uniform”.)

Solution: Let us assume for the moment that argument ∇

is known (although in practice it is not). The solution strat-

egy is to run through all possible values of qand minimize

β(W)by a matrix Wof rank q. For xed qand α, also cqin

Eq. 21 is xed. Let us denote

d:= Σ−1

eC∇(45)

which is a known n-vector, according to the assumption.

From lemma 2 (see appendix) it is concluded that for xed

74 |R. Lehmann and A. Voß-Böhme

qand cq, minimizing β(W)in Eq. 43 is equivalent to max-

imizing

λ(W)=

WTΣ^

ed

2(46)

both subject to Eq. 15. This shall be accomplished by the

method of Lagrange. The target function, for which a sta-

tionary point is desired, reads

Ψ(W,L):= dTΣ^

eWW TΣ^

ed+ trace WTΣ^

eW−IqL

(47)

where Lis a q×qmatrix of Lagrange multipliers. Note

that the equivalent constraints wT

iΣ^

ewj= 0 and wT

jΣ^

ewi=

0,i=j, where widenotes the ith column of W, are intro-

duced twice. This simplies the notation greatly, and ma-

trix Lis expected to be obtained symmetric.

A stationary point of Ψis found by the conditions

∂Ψ (W,L)

∂W= 2Σ^

eddTΣ^

eW+ 2Σ^

eWL = 0 (48a)

∂Ψ (W,L)

∂L =WTΣ^

eW−Iq= 0 (48b)

From Eq. 48a we get by left multiplication with WT/2.

WTΣ^

eddTΣ^

eW+WTΣ^

eWL = 0 (49)

Using Eq. 48b this simplies to

L=−WTΣ^

eddTΣ^

eW(50)

Lis in fact symmetric. Note that the re-writing Eqs.

(48a)→(49) is not reversible. This means that if a solution

for Lexists at all, then it must be of the form of Eq. 50, but

we cannot yet be sure, that this is the desired solution. It

must be checked by inserting Eq. 50 into Eq. 48a, which

yields

Σ^

eddTΣ^

eW−Σ^

eWW TΣ^

eddTΣ^

eW

=Σ^

ed−WW TΣ^

eddTΣ^

eW= 0 (51)

The expression is the outer product of an n-vector and

aq-vector. According to lemma 3 (see appendix), it can

only be identical to the null-matrix, if at least one of the

vectors is the null-vector. In Eq. 46 it is seen that WTΣ^

ed=

0corresponds to a minimum of λ(W)and can be ruled out

as a solution. Therefore, we have to consider

Σ^

eWW TΣ^

ed=Σ^

ed(52)

from which we obtain the result

λ(W)=dTΣ^

eWW TΣ^

ed=dTΣ^

ed=λB(53)

In words, any maximum of λ(W)equals λBin Eq. 31,

if it exists at all for some xed q. It has a non-centrality pa-

rameter equal to Baarda’s solution. Remember that since

rewriting Eqs. 48a→49 is not reversible, not all solutions

of Eq. 52 are necessarily also stationary points of Ψ.

Summarizing, we have shown that for any xed q,

any existing solution of the minimization problem in

Eqs. 43,44 has non-centrality parameter λ=λB. Now we

have to minimize Eq. 43 also with respect to q. In view of

lemma 2 (see appendix), for xed λ=λBthe optimal qis

the least possible value of q, namely q=m, which also co-

incides with Baarda’s solution. This shows that in the class

of χ2-test statistics there is no more powerful test statistic

than Eq. 26. Moreover, this solution is also uniform in the

sense that it is independent of ∇. Therefore, it is not neces-

sary to know ∇(which is practically of course not known)

to derive an optimal common χ2test statistic. This com-

pletes the proof of the theorem.

Due to the restriction to χ2-test statistics in Eqs. 14,15

this of course does not prove the UMP property of the test,

which the test does not have. We newly propose to intro-

duce the name UMPχ2for such a test.

7Equivalence of UMPI and UMPχ2

At rst sight it seems that UMPI and UMPχ2are dierent.

UMPI applies an invariance transformation Eqs. 40a,40b

to the test problem, while UMPχ2constructs a test statistic

from a special family of distribution. Now it turns out, that

both are in fact equivalent. To show this, it is not neces-

sary to consider the original test problem in Eqs. 10a,10b.

Equivalently, we can consider Eqs. 39a,39b, because we

stated before that ^

∇in Eq. 37a is a minimally sucient

statistic. Any common χ2test statistic operating on ^

∇in

Eq. 37a is of the form

T∇:= ^

∇TWW T^

∇(54)

WTΣ^

∇W=Im(55)

with some suitable m×qmatrix Wof full column rank q.

Let Vbe the regular matrix performing the full decor-

relation/homogenisation ^

∇(h)=V^

∇of ^

∇in Eqs. 39a,39b.

The covariance propagation introduces the constraint on

V:

Σ^

∇(h)=VΣ ^

∇VT=σ2Im(56)

Let Ube any orthogonal matrix as in Eqs. 40a,40b. The

transformed pseudo observations read

U^

∇(h)=UV ^

∇(57)

Any common χ2test statistic operating on U^

∇(h)is of

the form

˜

T∇:= ^

∇(h)TUT˜

W˜

WT^

U^

∇(h)=^

∇TVTUT˜

W˜

WTUV ^

∇(58)

On the statistical power of Baarda’s outlier test and some alternative |75

˜

WTΣ^

∇(h)˜

W=σ2˜

WT˜

W=Im(59)

with some suitable m×qmatrix ˜

Wof full column rank q.

Obviously, Eqs. 54 and 58 become fully equivalent as

soon as we choose W= VTUT˜

W. But also Eqs. 55 and 59

become fully equivalent by virtue of UTU=Imand Eq. 56:

σ2˜

WT˜

W=σ2WTV−1UTUV−TW=σ2WTV−1V−TW

=WTΣ^

∇W=Im(60)

Now we see that any common χ2test statistic in Eq. 14

or Eq. 54 is also invariant under the transformation Eq. 58.

Conversely, any invariant test statistic in Eq. 41 has the

common χ2distribution, given in Eqs. 42a,42b.

Therefore, the proof presented in section 6 can also be

seen as another proof of the UMPI property of Baarda’s so-

lution.

8Generalized χ2-test statistics

The condition in Eq. 15 was introduced to ensure that T

in Eq. 12 is a common χ2-test statistic. If this condition is

dropped, Thas a generalized χ2distribution. This means,

we can no longer use Eqs. 21,23. But it may be possible to

derive a more powerful test statistic than Eq. 26, because

we are looking for an optimal solution in a larger class of

test statistics.

A method for constructing such a more powerful test

statistic is theoretically very simple:

(1)Replace Eqs. 21,23 by the corresponding formulas for the

generalized χ2distribution

cM= F−1

genχ2(1−α|M,0)(61)

β(M,∇)=Fgenχ2(cM|M,∇)(62)

where Fgenχ2is the cdf of the generalized χ2distribution

with the elements of matrix Mas parameters.

(2)Minimize β(M,∇)as a function of elements of matrix

M, i.e. nd the MP test statistic.

Practically, we see four major obstacles:

1. Unlike Fχ2, the cdf Fgenχ2has no analytical expres-

sion, not even as an innite series. It can be ap-

proximated somehow or evaluated as a n(n+ 1)/2-

dimensional integral by Monte Carlo integration,

but this is numerically very costly (see below).

2. β(M,∇)is a n(n+ 1)/2-dimensional function of el-

ements of matrix M. Minimzing β(M,∇)requires a

time-consuming iterative process.

3. Unlike Baarda’s test in the class of χ2-test statis-

tics, any optimal test statistic in the class of general-

ized χ²-test statistics is in general not uniform. This

means, the optimal solution for Min Eq. 62 generally

depends on the unknown parameters in ∇.

4. This construction must be done for every pair of

functional models in Eqs. 1,7 under consideration.

E.g. in a multiple test with dierent alternative mod-

els in Eq. 7 to be tested, each individual test requires

the constrution of a new optimal test statistic.

These obstacles seem to be practically insurmountable.

Therefore, we cannot really propose a “new outlier detec-

tion method“ at the moment, but at least show that in prin-

ciple such a method exists. This will be done in the next

section.

9Proof of existence of a

generalized χ2test more

powerful than Baarda’s test

To begin with, it is again not useful to apply the

generalized χ2test to the original test problem in

Eqs. 10a,10b. Equivalently, we can apply it to Eqs. 38a,38b

or Eqs. 39a,39b, because ^

∇in Eq. 37a is a minimally suf-

cient statistic. That is why we propose to optimally con-

struct

T∇:= ^

∇TM∇^

∇(63)

rather than Eq. 12. This reduces the dimension of the inte-

gration and optimization for Mfrom n(n+ 1)/2to m(m+

1)/2.

Unlike Tin Eq. 41, T∇in Eq. 63 is in general not an in-

variant statistic with respect to an orthogonal transforma-

tion. Otherwise, there would be no chance to outperform

Baarda’s solution in Eq. 26 by Eq. 63.

Since M∇is a general symmetric m×mmatrix, T∇in

Eq. 63 can in principle even assume negative values. How-

ever, such a solution would behave very badly because

Tτ∇would also be negative for any scalar τ>0 and this

would mean that even for some extremely large bias pa-

rameters τ∇,we cannot reject H0. Therefore, it stands to

reason that we expect M∇to be positive denite.

Equation 63 becomes in view of Eq. 37a and for the

sake of simplicity restricting to Σe=σ2In,

T∇:= ^

eTC(CTΣ^

eC)−1M∇(CTΣ^

eC)−1CT^

e(64)

It is obvious, how this form relates to Eq. 12. Baarda’s

solution in Eq. 26 is included as a special case in Eq. 64 by

76 |R. Lehmann and A. Voß-Böhme

the choice

M∇=CTΣ^

eC(65)

Example (cont’d): If Cis of the form of Eq. 33, it is clear

that Eq. 64 takes the form

T∇=m11 ^

∇2

1+ 2m12 ^

∇1^

∇2+m22 ^

∇2

2(66)

with m11,m12,m22 to be determined numerically by

power maximization. The curves of equal values of T∇are

also ellipses with center in ∇= 0, but the curves of equal

power are generally not.

Critical values and error rates of the generalized χ2

test statistic in Eq. 64 must be evaluated numerically by

Eqs. 61,62. Fortunately, there are applicable numerical al-

gorithms for evaluation of the cdf of a generalized χ2distri-

bution. An approximation, which uses numerical integra-

tion to invert the characteristic function of Eq. 12, is derived

by Imhof (1961). The algorithm of Davies (1980) is based

on the numerical evaluation of the Gil-Pelaez integral for-

mula. The computer code is also available online. Alter-

natively, Kotz et al. (1967) expresses Eq. 12 as an innite

series in central χ2distribution functions, which was pro-

grammed by Sheil and O’Muircheartaigh (1977). Huan Liu

et al. (2009) propose an approximation of the generalized

χ2distribution by a common non-central χ2distribution.

For evaluation of Eqs. 61,62 in this contribution we

preliminarily apply a brute force method, using Monte

Carlo integration. The advantage of this method is that

it is more generally applicable: Unlike the methods men-

tioned above, it also works if Mis not positive denite

or semi-denite, and the computer code is written very

quickly with MATLAB. It is the same method as applied

by Lehmann (2012) in a dierent context: The probability

distributions are approximated by frequency distributions

of computer generated pseudo random variates, and the

quantiles in Eqs. 20,22 are derived therefrom. The details

are as follows:

Let a pair of functional models in Eqs. 1,7 be given, and

also α, a matrix M∇and a vector ∇. The algorithm for com-

putation of Eqs. 61,62 consists of the following steps:

1. We compute two sequences of pseudo random vec-

tors ^

∇k,k= 1,. . . ,K, one following the distribu-

tion in Eq. 38a and the other following the distribu-

tion in Eq. 38b. (Practically we use MATLAB’s func-

tion randn here and apply simple covariance propa-

gation to get a non-unit covariance matrix Σ^

∇).Kis

said to be the number of Monte Carlo experiments.

2. For both sequences we compute T∇,kk= 1,. . . ,K

by Eq. 64. The frequency distributions of T∇,kare

approximations of the probability distributions of

T∇. The quality of the approximations depends on

the chosen K, see below.

3. We derive an estimate of the critical value cMin

Eq. 61 being the (1- α)-quantile value of the fre-

quency distribution of the rst sequence of T∇,k.

(Practically we use MATLAB’s function quantile

here.)

4. Using this critical value, we compute from the sec-

ond sequence of T∇,kk= 1,. . . ,K

β(M∇,∇)≈count(T∇,k<cM)

Kin Eq. 62, i.e. the fre-

quency of type 2 decision errors.

5. On order to ensure that the approximate values

cM,βare close to the true values, one should ob-

serve the convergence of the procedure as K→∞

and implement a suitable termination criterion. We

suggest that the procedure should be terminated as

soon as cM,βuctuate by no more than 1%.

Due to high computational costs, this method cannot be

applied to massive data, but must be replaced by one of

the more elaborated methods cited above.

After having obtained an algorithm for a good approx-

imation of β(M∇,∇)for some M∇∇, the next step is to

minimize β(M∇,∇)with respect to M∇. As an initial guess

for M∇we can use Baarda’s solution in Eq. 65.

However, we are still left with ∇being an unknown pa-

rameter vector. In other words, the MP test statistic Eq. 63

is not uniform (not UMP).

There are many possibilities, how to solve this

dilemma. One would be to maximize some minimum or

mean power with respect to ∇. A dierent suggestion is

the following: We can speak of a region where the power is

above some level as of a “region of good power”. Consider

the example in section 4. The region of a power >0.95 or

equivalently β(∇)<0.05 is the region outside the outer

ellipse in Fig. 1. If ∇is from this region, we have good

chances (at least 95%) to reject H0, if it is false. A reason-

able objective would be to extend this region as much as

possible towards ∇= 0, or equivalently, to minimize the

“region of poor power” inside the outer ellipse.

10 Example: Mitigating the outlier

masking eect

To demonstrate the workability of the generalized χ2-test

approach, we resume the example of the straight line.

In Fig. 1 it was seen that the power tends to be poor,

when ∇1,∇2have equal signs (outlier masking eect).

On the statistical power of Baarda’s outlier test and some alternative |77

This causes the ellipticity of the curves of equal power in

Fig. 1. A reasonable goal would be to mitigate this eect.

However, in this simple example, this goal cannot be

reached without downgrading the power elsewhere: The

new β(M∇,∇)function is also above the old one with M∇

from Eq. 65 for some ∇. That is why we speak of a “region-

ally” more powerful test. At the moment it is not known

if this is a general restriction or if it can be relaxed, once

bigger problems are studies.

For Baarda’s solution in Eq. 65 the region of poor

power (our choice: β>0.05) is bounded by an ellipse with

semi-axes 3.932 and 6.149. It is the region inside the outer

ellipse in Fig. 1.

Starting from initial guess in Eq. 65, we try to nd a

matrix M∇, for which the size of the region of poor power

is smaller, even as small a possible. This is performed by

a two-parameter non-linear optimization. (Remember that

M∇is symmetric and a scalar factor applied to M∇does

not change power. Therefore, one element of M∇can be

held xed. We use m11 = 1. Thus, only two unknowns

m12,m22 remain.) β(M∇,∇)is evaluated by the Monte

Carlo method with K= 106Monte Carlo experiments, as

described in the previous section. It is proved that Kis

large enough by repeating the computation with dierent

random numbers and observing that cMβdo not signi-

cantly change.

After some numerical eort we obtain the solution

M∇=1 0.186

0.186 0.935 (67)

The new region of poor power is a region inside a

nearly elliptical curve with a minimum distance of 5.25 and

a maximum distance of 5.67 to the origin (0,0). Thus, in

contrast to Fig. 1 it is nearly circular, such that the outlier

masking eect is mitigated: A bias vector

∇1

∇2

>5.67 (68)

is rejected by Eqs. 67,63 with probability of 1−β>0.95,

while for Baarda’s solution in Eq. 65 this is not the case.

However, as pointed out before, when ∇1,∇2have dier-

ent signs, the performance of our solution is worse than

Baarda’s.

11 Conclusions

We analyzed the statistical power of Baarda’s problem.

It is well known that Baarda’s solution in Eq. 26 enjoys

the property of being uniformly most powerful invariant

(UMPI). We showed that in the class of all test statistics in

Eqs. 14,15 following a common central or non-central χ2

distribution, Eq. 26 is uniformly most powerful, which is

named UMPχ2. Later it turned out that UMPI and UMPχ2

are equivalent.

We demonstrated that dropping the condition of in-

variance can give a regionally more powerful test. But we

have to deal with the generalized χ2distribution. This

requires numerical tools like the Monte Carlo method

and non-linear optimization, which are computationally

costly. Although computer power is now widely available,

we do not yet propose this as a ”new outlier detection

method”, but only as a proof that it is in principle possi-

ble to outperform the statistical power of Baarda’s test.

The proposed procedure can be extended in various

directions:

1. Ecient numerical algorithms for evaluation of the

cdf of a generalized χ2distribution should be imple-

mented.

2. Problems with more observations should be inves-

tigated. Here we expect that it is possible to con-

struct generalized χ2tests, which are globally (i.e.

not only regionally) more powerful than Baarda’s

corresponding solution.

3. An optimal test statistic can be searched for in a

class of test statistics larger than the one dened by

Eq. 12.

The next step must be the evaluation of reliability mea-

sures for such test statistics.

Acknowledgement: An anonymous reviewer inspired the

derivation in section 7, which we thankfully acknowledge.

Appendix: Lemmata

Lemma 1:Σ^

eΣ−1

eis an idempotent matrix.

Proof. In view of Eq. 5 we nd

Σ^

eΣ−1

eΣ^

eΣ−1

e=I−2AATΣ−1

eA−ATΣ−1

e

+AATΣ−1

eA−ATΣ−1

eAATΣ−1

eA−ATΣ−1

eI

−AATΣ−1

eA−ATΣ−1

e=Σ^

eΣ−1

e

which completes the proof.

Lemma 2:Fχ2(c|q,λ)is a monotone decaying function of

λand a monotone increasing function of q.

Proof. Johnson et al. 1995, p. 451.

78 |R. Lehmann and A. Voß-Böhme

Figure 3: Probability of type 2 decision error β=Fχ2(c|q,λ)ver-

sus λfor q=1,.. . ,10 (bottom // top) and α= 0.001 (left) as well as

α= 0.05 (right)

The monotonicity of Fχ2is also depicted in Fig. 3.

Lemma 3: If the outer product uvTof two vectors uv is

identical to the null-matrix, then uor vis a null-vector.

Proof. by contraposition: Let ui=0and vj=0for some

i,j. Then uivj=0and since uivjis an element of matrix

uvT, this matrix is not identical to the null-matrix, which

completes the proof.

References

Arnold S.F., 1981, The theory of linear models and multivariate anal-

ysis. Vol. 2. Wiley New York.

Baarda W.,1967, Statistical concepts in geodesy,Netherlands Geode-

tic Commission, Publication on Geodesy, 1, 4, Delft, Netherlands.

Baarda W., 1968, A testing procedure for use in geodetic networks,

Netherlands Geodetic Commission, Publication on Geodesy, 2, 5,

Delft, Netherlands.

Davies R.B., 1980, Algorithm AS 155: The distribution of a linear com-

bination of χ2random variables, Journal of the RoyalStatistical So-

ciety, Series C (Applied Statistics), 29, 3, 323-333.

Heck B., 1981, Der Einfluss einzelner Beobachtungen auf das

Ergebnis einer Ausgleichung und die Suche nach Ausreis-

sern in den Beobachtungen (in German), Allgemeine Vermes-

sungsnachrichten, 88, 17-34.

Huan Liu, Yongqiang Tang and Hao Helen Zhang, 2009, A new

chi-square approximation to the distribution of non-negative

denite quadratic forms in non-central normal variables. Com-

putational Statistics & Data Analysis, 53, 4, 853–856. DIO:

10.1016/j.csda.2008.11.025

Imhof J.P. ,1961, Computing the distribution of quadratic forms in nor-

mal variables. Biometrika 48, 419–426

Johnson N.L., Kotz S. and Balakrishnan N., 1995, Continuous Univari-

ate Distributions, 2nd ed., Wiley New York.

Kargoll B., 2012, On the Theory and Application of Model Misspeci-

cation Tests in Geodesy, German Geodetic Commission, C 674,

Munich.

Koch K.R., 1999, Parameter Estimation and hypothesis testing in lin-

ear models, 2nd edn. Springer, Heidelberg. DOI 10.1007/978-3-

662-03976-2

Kotz S., Johnson N.L. and Boyd D.W., 1967, Series representa-

tions of distributions of quadratic forms in normal variables

ii, non-central case, Ann. Math. Statist., 38, 3, 838-848. DOI

10.1214/aoms/1177698878

Lehmann R., 2012, Improved critical values for extreme normalized

and studentized residuals in Gauss–Markov models, J Geod., 86,

16, 1137-1146. DOI 10.1007/s00190-012-0569-0

Lehmann R., 2013a, On the formulation of the alternative hypoth-

esis for geodetic outlier detection. J Geod 87(4) 373–386. DOI

10.1007/s00190-012-0607-y

Lehmann R., 2013b, The 3σ-rule for outlier detection from the view-

point of geodetic adjustment. J Surv. Eng. 139, 4, 157–165. DOI

10.1061/(ASCE)SU.1943-5428.0000112

Lehmann R. and Lösler M., 2016, Multiple outlier detection: hypoth-

esis tests versus model selection by information criteria. J Surv.

Eng., 142, 4. DOI 10.1061/(ASCE)SU.1943-5428.0000189

Pelzer H., 1983, Detection of errors in the functional adjustment

model, In: FE Ackermann (Ed): Mathematical models of geode-

tic/photogrammetric point determination with regard to outliers

and systematic errors, German Geodetic Commission, A 98, Mu-

nich.

Pope A.J., 1976, The statistics of residuals and the detection of out-

liers, In: NOAA Technical Report NOS65 NGS1, US Department of

Commerce, National Geodetic Survey, Rockville, Maryland.

Prószyński W., 2015, Revisiting Baarda’s concept of minimal de-

tectable bias with regard to outlier identiability, J Geod., 89, 10,

993–1003. DOI 10.1007/s00190-015-0828-y

Rice S.O., 1980, Distribution of Quadratic Forms in Normal Random

Variables—Evaluation by Numerical Integration, SIAM J. Sci. and

Stat. Comput., 1, 4, 438–448. DOI 10.1137/0901032

Sheil J. and O’Muircheartaigh I., 1977, Algorithm AS 106: The Distribu-

tion of Non-Negative Quadratic Forms in Normal Variables. Journal

of the Royal Statistical Society. Series C (Applied Statistics) Vol.

26, 1, 92-98.

Teunissen P.J.G., 1990, Quality control in integrated navigation sys-

tems, IEEE Aerosp. Electron. Syst. Mag. 5, 7, 35–41.

Teunissen P.J.G., 2000, Testing theory; an introduction, 2nd edition,

Series on Mathematical Geodesy and Positioning, Delft University

of Technology, The Netherlands. ISBN-13 978-90-407-1975-2

Teunissen .P.J.G., Simons D.G., Tiberius C.C.J.M., 2005, Proba-

bility and observation theory, Lecture Notes AE2-E01, Faculty

of Aerospace Engineering, Delft University of Technology, The

Netherlands.

Teunissen P.J.G., 2006, Network quality control, 2nd edition, Se-

ries on Mathematical Geodesy and Positioning, Delft University of

Technology, The Netherlands. ISBN 90-71301-98-2

Available via license: CC BY-NC-ND 4.0

Content may be subject to copyright.