ArticlePDF Available

Observation error model selection by information criteria vs. normality testing

Authors:

Abstract

To extract the best possible information from geodetic and geophysical observations, it is necessary to select a model of the observation errors, mostly the family of Gaussian normal distributions. However, there are alternatives, typically chosen in the framework of robust M-estimation. We give a synopsis of well-known and less well-known models for observation errors and propose to select a model based on information criteria. In this contribution, we compare the Akaike information criterion (AIC) and the Anderson-Darling (AD) test and apply them to the test problem of fitting a straight line. The comparison is facilitated by a Monte Carlo approach. It turns out that the model selection by AIC has some advantages over the AD test.
1
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
Observation error model selection by
information criteria vs. normality
testing
Author
Rüdiger Lehmann
University of Applied Sciences Dresden
Faculty of Spatial Information
Friedrich-List-Platz 1
D-01069 Dresden, Germany
Tel +49 351 462 3146
Fax +49 351 462 2191
mailto:r.lehmann@htw-dresden.de
Abstract
To extract the best possible information from geodetic and geophysical observations, it is necessary
to select a model of the observation errors, mostly the family of Gaussian normal distributions.
However, there are alternatives, typically chosen in the framework of robust M-estimation. We give
a synopsis of well-known and less well-known models for observation errors and propose to select a
model based on information criteria. In this contribution we compare the Akaike information
criterion (AIC) and the Anderson Darling (AD) test and apply them to the test problem of fitting a
straight line. The comparison is facilitated by a Monte Carlo approach. It turns out that the model
selection by AIC has some advantages over the AD test.
Keywords
maximum likelihood estimation; robust estimation; Gaussian normal distribution; Laplace
distribution; generalized normal distribution; contaminated normal distribution; Akaike information
criterion; Anderson Darling test; Monte Carlo method
1 Introduction
In geodesy, geophysics and many other scientific branches we are confronted with observations
affected by observation errors. Since the operation of these errors is generally very complex and not
well understood, their effect is mostly treated as random. Consequently, for more than 200 years
geodesists and geophysicists take advantage of stochastics and partly also contribute to this field of
mathematics. See (Kutterer 2001) for general remarks on the role of statistics in geodetic data
analysis, also with a view to related concepts of uncertainty assessment.
To extract the best possible information from these observations by parameter estimation, e.g. by
the concept of maximum likelihood (ML) estimation , it is necessary to make an assumption on the
stochastical properties of the observation errors. These properties are completely derived from a
2
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
probability distribution of these errors. However, in practical applications such a probability
distribution is never exactly known. Fortunately, there are some methods of parameter estimation,
which do not need the full distribution, but only some moments like expectation and variance. But
when we arrive at the basic problem of testing statistical hypotheses, we can hardly do without the
assumption of a full stochastic observation error model.
The normal distribution, mostly credited to C.F. Gauss, is the best known model of geodetic and
geophysical observation errors. (As usual, when we speak about distributions, we often mean a
‘family’ of distributions, which is clear from the context.) Due to its well known nice mathematical
properties, first and foremost the property of being a stable distribution, it greatly simplifies the
parameter estimation problem. Its choice is further motivated by both the central limit theorem as
well as the maximum entropy principle. The application of the normal error distribution in practical
geodesy and geophysics is also not without success. The common hypothesis tests like t-test, τ-test,
χ²-test and F-test are all based on this distribution, and critical values of these tests are found in
widespread statistical lookup tables or are computed by popular scientific software (e.g. Teunissen
2000).
Already in the 19th century it was realized that typical error distributions of real observations are
more peakshaped and thickertailed than the Gaussian bell (see Hampel 2001 for a historical synopsis
of such investigations). This gave rise to the development of robust estimation methods like L1 norm
minimization or more generally in the framework of M-estimation (e.g. Huber 2009). However, only
until recently, there was not enough computer power to actually compute robust estimates for real-
life data sets. Peakshapedness of a probability distribution is measured by the standardized fourth
moment of the distribution, known as kurtosis. Distributions with kurtosis >3 are called leptokurtic.
Kurtosis minus 3, which is the kurtosis of the normal distribution, is also called excess kurtosis. Thus,
typical error distributions of real observations seem to exhibit a positive excess kurtosis, i.e., they are
leptokurtic. Wisniewski (2014) considers M-estimations with probabilistic models of geodetic
observations including the asymmetry and the excess kurtosis, which are basic anomalies of empiric
distributions of errors of geodetic, geophysical or astrometric observations.
This poses the problem of deciding, whether the normal distribution is an applicable observation
error model nonetheless or if it must be replaced by something better adapted to the observations.
This problem may be formalized as a stochastical hypothesis. Therefore, besides graphical methods
like the famous Q-Q-plot, hypothesis testing is the most popular approach. Many hypothesis tests for
normality have been proposed:
D'Agostino's K² test (D’Agostino 1970)
JarqueBera test (Jarque and Bera 1980)
AndersonDarling test (Anderson and Darling 1952, 1954)
Cramérvon Mises criterion (Cramér 1928; von Mises 1931)
Lilliefors test (Lilliefors 1967)
KolmogorovSmirnov test (Kolmogorov 1933; Smirnov 1948)
ShapiroWilk test (Shapiro and Wilk 1965)
Pearson's chi-squared test (Pearson 1900)
ShapiroFrancia test (Shapiro and Francia 1972)
3
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
However, all of them only work with samples of one random variable. Some of them require a known
mean and variance. The tests differ with respect to computational simplicity and statistical power.
Some of them are powerful only in case of certain types of deviation from normality (kurtosis,
skewness, etc.), i.e. with respect to a certain alternative hypothesis. Razali and Wah (2011) found in
a Monte Carlo simulation “that Shapiro-Wilk test is the most powerful normality test, followed by
Anderson-Darling test, Lilliefors test and Kolmogorov-Smirnov test.”
There is an ongoing interest in the adaption of distribution models to observations, e.g. in the field of
GNSS observations. Tiberius and Borre (2000) analyzed the distribution of GPS code and phase
observations evaluating sample moments and applying different statistical hypothesis tests. The
authors conclude that the normal distribution assumption seems to be reasonable for the data from
short baselines. However, deviations from normality arose for long baselines, and were attributed to
multipath effects and unmodeled differential atmospheric delays. Verhagen and Teunissen (2005)
present and evaluate the joint probability density function of the multivariate integer GPS carrier
phase ambiguity residuals. Cia et al. (2007) propose the von Mises normal distribution for GNSS
carrier phase observations. Luo et al. (2011) and Luo (2013) investigate the distribution of the same
type of observations by sample moments, various statistical hypothesis tests, and graphical tools.
The results based on a large and representative data set of GPS phase measurements showed various
deviations from normality.
In the more typical situation arising in geodesy and geophysics, when the observations are part of a
Gauss Markov model (GMM) or similar linear model, no rigorous test for normality is known.
Practically it is often tried to apply the test for normality to the residuals of the models because they
inherit their normality from the observation errors (e.g. Luo et al. 2011). But this does not say much
about the normality of the observation errors themselves, as will be further explained in section 3.
Deciding, which model for observation errors should be assigned to a set of observations can be
viewed as a problem of model selection. From information theory we know of different approaches
of model selection based on information criteria. The oldest and best known is the Akaike
Information Criterion (Akaike 1974):

(1)
where denotes the likelihood function of the model, which is maximized by the maximum
likelihood (ML) estimate of the -vector of parameters with respect to the observations . Note
that should comprise all parameters, i.e. also unknown variance factors or variance components.
The criterion is: Among all models under consideration the one with the least AIC is to be selected. It
has high likelihood and at the same time not too many parameters , which prevents over-
parametrization. If different models give AIC values very close to the minimum, it is generally
recommended to avoid the selection, if possible (Burnham and Anderson 2002). Some geodetic
applications of information criteria are presented recently for the selection of transformation models
by Lehmann (2014) and in the framework of geodetic multiple outlier detection by Lehmann and
Lösler (2015). Another scope of application is the auto regressive moving-average process (e.g. Klees
et al. 2002) especially in the framework of GNSS time series analysis (cf. Luo et al. 2011). In section 4
we will develop a strategy to apply information criteria for observation error model selection.
4
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
The paper is organized as follows: After introducing well and less well known models of observation
errors we briefly review the Anderson-Darling (AD) test in its special form as a test for normality.
Opposed to this we propose the strategy of observation error model selection by AIC. Finally, the
Monte Carlo method is used to investigate and compare both strategies applied to the model of a
straight line fit.
2 Models for observation errors
We start with the well-known Gaussian normal distribution  with expectation and
standard deviation. Its probability density function (PDF) reads



(2)
A common measure of peakshapedness and tail-thickness is the excess kurtosis
(3)
where denotes the fourth central moment of the distribution. The excess
kurtosis uses the normal distribution as a benchmark for peakshapedness, such that it becomes
for this distribution.
More typical error distributions of real observations seem to be leptokurtic, i.e. . The most
simple leptokurtic error distribution is the Laplace distribution  with expectation and
standard deviation. Its PDF reads

exp 

(4)
It has excess kurtosis , which is often overshooting the mark. It would be better to have a
distribution model with a shape parameter, that can be tuned to the kurtosis of the real error
distribution. Such a model is the generalized normal distribution  with expectation , a
scale parameter and a shape parameter .


(5)
denotes the Gamma function. This distribution includes normal and Laplace distribution as special
cases with and , respectively. Variance and kurtosis read

 

(6)
A different medium between normal and Laplace distribution can be derived from a common loss
function in M-estimation introduced by Huber (1964). It is a composite distribution ,
consisting of a Gaussian peak and two Laplacian tails. It has three parameters: the expectation , a
scale parameter and a shape parameter . The PDF reads


for

for |
(7)
5
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
where  is a normalization function. The composition is such that is continuous at the
connection points , where also the first derivatives are continuous. Nonetheless, this
composition character makes numerical computations rather costly. Variance and excess kurtosis
cannot be computed without a costly numerical quadrature.
An alternative leptokurtic error model is Student’s-t distribution. Here we introduce it in its three
parameter version  with expectation , a scale parameter
and a shape
parameter . The PDF reads


 
(8)
Variance and excess kurtosis may be computed by



(9)
The Student’s- distribution can be used as a model of an extremely leptokurtic distribution. For
the excess kurtosis is even no longer finite.
The scale contaminated normal distribution 
 is a further generalization of the normal
distribution, first discussed by Tukey (1960). Geodetic applications for robust estimation and outlier
detection are discussed by Lehmann (2012, 2013). This distribution describes a normal population
contaminated by a small number of members of a different normal population with much larger
variance (gross errors).
It has expectation , the variances of the original distribution and
of the contaminating
distribution and a weight parameter 0 ε 1, specifying the degree of contamination. The PDF reads







(10)
Table 1 gives a synopsis of the most important models for observation errors.
6
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
Table 1 Models for observation errors (*expressions for terms in brackets are intricate here)
Distribution
Huber (7)
Student’s- (8)
Scale contaminated normal (10)
relevant
special cases
normal:
Laplace:
normal:
Cauchy:
normal: or (or )
parameters
for
e.g.

,
gives

e.g. 
or 
or 
or 
or 
yield
parameters
for
not possible
e.g. 
or 
yield
closed
expressions*



importance
target density
in M-
estimation
generalization of
statistical test
distribution
instructive gross error modeling
according to the variance inflation
model (cf. Lehmann 2012, 2013)
3 Anderson-Darling normality test (AD test)
Anderson and Darling (1952,1954) developed a statistical hypothesis test for testing the distribution
of a stochastical sample. The test statistic basically measures the difference between the empirical
distribution of the sample and the hypothesized distribution, giving more weights to the tails of the
distribution than similar tests, e.g. Cramérvon Mises criterion.
In this investigation we focus on the Anderson-Darling (AD) test because it is recommended by Razali
and Wah (2011) as a very powerful test, but is at the same time relatively easy to implement. The
test procedure is as follows:
Let be the ordered sequence of sample values, then the test statistic is defined as



(11)
where is the hypothesized cumulative distribution function (CDF). If the distribution of differs
significantly from the hypothesized distribution then tends to assume large values.
The AD test is oftentimes used as a test for normality, e.g. as a pretest to check the presumption of
normality before a test requiring the sample to be normally distributed, like the t-test or the F-test, is
applied. In this case is the CDF of the normal distribution. Critical values for the normality test of
samples are given by Stephens (1974).
7
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
When observation errors in a GMM or similar linear model should be tested for normality, the AD
test cannot be applied directly. One could try to use the residuals and test them for normality, but
they are often found to be normally distributed, although the observation errors themselves are not.
This can be understood as follows: Assume that the observations of a linear model are not normally
distributed. The residuals are linear functions of the observations, and according to the central limit
theorem, they tend towards normality, as the number of observations increases. A normality test
applied to the residuals may not be rejected, although the observation errors are far from being
normally distributed. This results in a type II decision error.
Fortunately, a hypothesis test like  or F-test, where the test statistic is a function of the residuals, is
often relatively unsusceptible to non-normal error distributions. This is why such tests “work” even
though observation errors are not normal.
As a correction to this error, one must compute new critical values for each linear model.
Fortunately, today enough computer power is available to accomplish this, and it will be done in
section 5.
If the hypothesis of normality is rejected, it is not clear, which model of alternative distribution
models should be employed. This could perhaps be done in a multiple test, where e.g. a test for
generalized normality is invoked next. But as in any multiple test, there are pitfalls (e.g. Miller 1981).
In this contribution we do neither recommend nor pursue such an approach.
4 Observation error model selection by Akaike’s information
criterion (AIC)
As pointed out in the introduction, the selection of an observation error model can be viewed as a
general model selection problem, for which information theory provides so-called information
criteria. We already introduced the Akaike Information AIC by (1). A corrected version of AIC is

(12)
which is supposed to work better for small sample sizes. If is small or is large then AICc is strongly
recommended rather than AIC (Burnham and Anderson 2004). It is important that parameters in the
sense of (1) and (12) are also unknown variances and variance components. also counts these
quantities.
There are many alternatives to AIC, which seem to work better in special situations. We only mention
the Bayesian Information Criterion (BIC), which uses a further modification of (1).
If we decide to select an observation error model by information criteria, we could proceed as
follows:
1. Compute the model parameters by a ML estimation from all candidate observation error
model, e.g. normal distribution, generalized and contaminated normal distributions, Laplace
distribution, Huber’s distribution, Student’s-t distribution etc.
2. For all of the results, compute the information criterion, e.g. AIC by (1) or AICc by (12).
8
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
3. Select the model, where the information criterion assumes a minimum (possibly only if it is
significantly below the second smallest value).
4. Proceed with the parameters estimated from the selected model (if any).
Step 1 is the most time consuming and difficult step. To begin with, computing ML estimates of
normal, Laplace and Huber’s distribution is still relatively easy and well understood. In estimation
they are known L2 and L1 norm minimizations as well as M-estimation by Huber’s influence function.
Computing ML estimates of generalized normal distribution is harder. In our contribution we use a
kind of brute force method, which must be refined before problems of practical dimensions can be
tackled:
1. Use the normal distribution as initial guess, i.e. take the solution of the L2 norm minimization
problem computed before and let .
2. Perform a line search optimization for the shape parameter in (5) using proper bounds.
Here we use , because a leptokurtic distribution is desired. The remaining
parameters are held fixed.
3. Fix now and optimize the remaining parameters, i.e. solve the Lβ norm minimization
problem.
4. Return to step 2 until convergence.
Computing ML estimates of scale contaminated normal distribution is the hardest piece of work.
1. Again, use the normal distribution as initial guess, i.e. let .
2. Initially guess some variance of contamination, e.g.  is used here.
5. Perform a line search optimization for the shape parameter in (10) using bounds
. The remaining parameters are held fixed.
3. Fix now and optimize the remaining parameters by solving a general non-linear
minimization problem.
4. Return to step 3 until convergence.
Computing ML estimates of Student’s-t distribution is not discussed here.
5 Simulated observations and candidate observation error models
To compute the success rate of observation error model selection, a Monte Carlo method must be
applied. For this purpose we generate  observation vectors of a selected error model by
a pseudo random number (PRN) generator. It has been investigated that the results presented here
do not change significantly when the computations are repeated with different PRN, such that
 is sufficiently large to support the conclusions made below. It has been taken care that the
PRN generator is reseeded each time.
Four different observation error distributions are generated here:
- standard normal distribution
- standard Laplace distribution 
- weakly scale contaminated normal distribution 
- strongly scale contaminated normal distribution 
9
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
For the standard normal PRN we can use directly MATLAB 8.1’s PRN generator normrnd. In the
standard Laplace case we use uniformly distributed PRN generated by MATLAB 8.1’s PRN generator
unidrnd and apply a transformation by the inverse CDF (cf. Tanizaki 2004 p. 122 ff.). In the scale
contaminated cases we generate a normal PRN with and contaminate it with probability
 by a second normal PRN with either or . This simulates a normal error model with
a gross error rate of 10% of an either 3 times or 10 times larger standard deviation.
As a functional model we choose the straight line fit with  and  data points at fixed
equidistant abscissa:
 
(13)
This model is of general relevance in various fields of geodesy, geophysics and related
sciences as well as engineering disciplines. Examples are
extracting a linear trend from a geodetic or geophysical time series
fitting a linear calibration function for calibration of measuring devices
surveying points on a spatial straight line, which deviate from a straight line due
to observation errors
As candidate observation error models we choose
normal distribution  with unknown scale parameter
Laplace distribution  with unknown scale parameter
generalized normal distribution  with unknown scale and shape parameters 
scale contaminated normal distribution
 with two unknown scale parameters
and an unknown contamination parameter
The first two models have in total three parameters , the third has four parameters
 and the last has five parameters 
. The computations below
employ both a Anderson-Darling normality test for the residuals of the straight line fit as well as a the
model selection by AIC (1). Note that none of the presented results depend in any way on the actual
true parameters .
6 Results
First, we need to compute critical values for the Anderson-Darling normality test applied to the
residuals of the straight line fit. For this purpose a least squares fit is computed to each normal
observation error PRN vector and the corresponding value statistic in (11) is derived. From the
resulting frequency distribution of displayed in Fig. 1 we extract the quantiles as a good
approximation to the critical values for various type I error rates α. They are given in table 2.
The critical values are significantly smaller than those given by Stephens (1974) for samples. For
example, the critical value for a type I error rate of α=0.01 for samples is 1.09, while we found 1.03.
This confirms the assertion of section 3 that the normality test is more often positive for the residuals
than for the corresponding observation errors. This effect is now taken into account by the new
critical values.
10
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
Fig. 1. Histograms of the Anderson-Darling normality test statistic in (11) applied to the residuals
of the straight line fit with normal observation errors , left: , right:  observations.
Table 2. Critical values and statistical powers of the Anderson-Darling normality test for different
number of observations . : Laplace distribution, : scale contaminated normal distribution.


Type I error rate α
0.10
0.05
0.01
0.10
0.05
0.01
Critical values
0.63
0.74
1.03
0.62
0.73
1.03
Statistical power for 
0.40
0.31
0.14
0.87
0.81
0.63
Statistical power for 
0.36
0.29
0.16
0.74
0.67
0.51
Statistical power for 
0.86
0.85
0.81
0.99
0.99
0.99
Moreover, in contrast to the Anderson-Darling normality test, these values slightly depend on .
They are smaller for larger because as the model size increases, the residuals tend towards
normality.
These critical values are now used to test the hypothesis of normality of the non-normal observation
error PRN vectors. The results are displayed in Table 2 in terms of statistical power, which is the rate
of rejection of the (now known to be) false . First of all, we observe that the powers are larger for
 than for , which is a plausible result: Statistical inference is easier with more
observations. For the AD test it is more difficult to reject the false in the case of 
than in the case of  because in the latter case the gross errors have larger
magnitude. In other words,  is statistically more discriminable from  than
. For  the powers are mostly between  and .
11
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
Table 3. Rates of model selection (bold numbers are success rates). PRN - pseudo random number
generator,: normal distribution, : Laplace distribution, : generalized normal distribution, :
scale contaminated normal distribution.
PRN





Rates of selected models for 

0.83
0.17
0.00
0.00

0.31
0.64
0.04
0.01

0.42
0.44
0.03
0.11

0.10
0.19
0.09
0.62
Rates of selected models for 

0.96
0.04
0.00
0.00

0.07
0.88
0.03
0.02

0.20
0.53
0.01
0.26

0.00
0.00
0.01
0.99
Fig. 2. Histograms of maximum likelihood estimates of the contamination parameter , see (10), from
 observations. Left: weak contamination  right: strong contamination

Second, the observation error model selection by AIC is tried. For each observation vector we
compute the ML solution and therefrom the AIC by (1). The model with the minimum AIC is
selected. The rates of selected models are given in Table 3. First of all, we observe that the rates of
selecting the correct model are larger for  than for , which is again a plausible result.
12
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
Model selection is widely successful except for weakly scale contaminated observation errors
. Here the Laplacian observation error model is most often selected. A reason for this
behavior can be concluded from figure 2. It is shown there that in the case of weak scale
contamination the ML estimate of the contamination parameter ε is poor: These estimates scatter in
the interval 0.0…0.5. Remember that the true value of ε is always 0.1. A similar drawing with 
would even show a larger scattering. Summarizing, the parameters of  are poorly
recovered from the observations.
Next, it is interesting to compare the results of the AD test with the model selection by AIC. This is
most easy for , where the selection rate of the normal model for the normal observation
errors is 0.96, thus nearly matches the type I error rate of 0.05 for the AD test. The statistical power
of 0.81 for  is exceeded by the corresponding success rates of model selection of 0.88.
Moreover, not only is rejected, but also the correct alternative model is selected. For
 the power and success rate are both very high, such that here no advantage can be
concluded for either method. For  the AIC selects a normal distribution only with a
rate of 0.20, while for the AD test it is . This clearly is an advantage of AIC. However,
the selection of the proper alternative model is less successful for the reasons explained above. This
corresponds to what in statistics is called a type III error.
Finally, we must investigate, what the effect of model selection is on the parameter estimation of
intercept and slope . We expect that the estimated parameters are closer to their true values
when the model is properly selected. For this investigation we restrict ourselves to 
observations.
We compute the RMS of the estimation errors of the intercept and slope parameters
1. under the assumption that the correct model has always been chosen (which of course
would practically be impossible),
2. after chosing the model by AD test with  in such a way that the Laplace
distribution is used whenever normality is rejected, and
3. after selecting the model by AIC
The RMS values are given in this order in columns 2-6 of Table 4. We see that in the case of normally
distributed observation errors both AD test as well as model selection by AIC give satisfactory
results. The results improve when α is chosen smaller because then the normal model is selected
more often.
In the case of Laplacian observation errors the AIC gives the best results. They are even slightly better
than using the L1 norm throughout, which might be surprising. The reason is that even though we
generated Laplacian observation errors, in some occasions the L2 norm could produce a better fit
and hence give better estimates of the parameters. The AIC would then select the better fitting
normal model.
In the case of scale contaminated observation errors the AD test gives poor results because we
selected an improper alternative model. This is particularly true when the contamination is strong.
Here AIC performs better. For strong contamination we always select the true model such that the
values in the second and sixth column coincide. It might be surprising that the estimation even gives
13
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
better results when the contamination is strong. The reason is that the estimation of the
contamination parameter is easier in this case, see again Fig. 2.
It may also be surprising that fitting Laplacian observation errors is more successful by L1 norm than
by L2 norm, when measured by RMS. Remember that L2 norm minimization as a “best linear
unbiased estimation” (BLUE) is expected to give the least RMS values for the estimated parameters,
independent of the error distribution. However, the emphasis in on “linear”. A non-linear estimation
like L1 norm minimization could perform better, even when measured by RMS. And here it does.
Table 4. Root mean square (RMS) values of the estimation errors of the intercept parameter
and slope parameter in (13). PRN: pseudo random number generator, AD: Anderson-Darling, :
error rate, : normal distribution, : Laplace distribution, : scale contaminated normal
distribution.
PRN
True model
always used
AD test with
α=0.10
AD test with
α=0.05
AD test with
α=0.01
Model selec-
tion by AIC

0.202
0.202
0.202
0.202
0.202

0.194
0.194
0.194
0.195
0.190

0.230
0.270
0.270
0.270
0.251

0.224
0.529
0.529
0.529
0.224

0.00346
0.00353
0.00351
0.00349
0.00349

0.00320
0.00342
0.00325
0.00331
0.00316

0.00399
0.00465
0.00464
0.00465
0.00435

0.00384
0.00821
0.00821
0.00821
0.00384
7 Conclusions
It has been shown that a proper observation error model can be selected not only by a statistical
hypothesis test, but also by an information criterion like AIC.
The advantages of model selection by information criteria over hypothesis tests are:
1. It is not necessary to choose a significance level 1-α, where α is the type I decision error rate.
2. It is not necessary to compute any critical values.
3. In the case that the normal error model is not appropriate, the model selection by
information criteria also yields the proper non-normal model like generalized or
contaminated distributions. It is not necessary to invoke a multiple hypothesis test.
But there are also disadvantages of model selection by information criteria:
1. It does not support a statement like: “If the observation errors are truly normally distributed,
this error model is chosen with probability 1-α.”
2. The computational complexity is rather high. Not only the least squares (L2) fitting must be
computed, but also other ML solutions like the L1 fitting (Laplace), the generalized normal
and / or contaminated model fittings. The latter can be computationally demanding.
14
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
The first disadvantage should not be taken too seriously. We believe that such a statement is often
dispensable because in any case the opposite statement regarding a type II error rate is not
obtained. Also the second disadvantage is no longer an obstacle because today computing power is
not the bottleneck. In the future, the numerical procedures for generalized normal and contaminated
model fittings should be refined.
Therefore, we encourage geodesists, geophysicists and all other scientists and applied engineers to
select error models by AIC. In the future we should investigate similar information criteria for
observation error model selection.
References
Anderson TW, Darling DA (1952) Asymptotic theory of certain "goodness-of-fit" criteria based on
stochastic processes. Ann. Math. Stat. 23: 193212. doi:10.1214/aoms/1177729437
Anderson TW, Darling DA (1954) A Test of Goodness-of-Fit. Journal of the American Statistical
Association 49: 765769. doi:10.2307/2281537
Akaike H (1974) A new look at the statistical model identification, IEEE Transactions on Automatic
Control, 19: 716723
Burnham KP, Anderson DR (2002) Model Selection and Multimodel Inference: A Practical
Information-theoretic Approach. Springer, Berlin. doi:10.1007/b97636
Cai J, Grafarend E, Hu C (2007) The statistical property of the GNSS carrier phase observations and its
effects on the hypothesis testing of the related estimators. In: Proceedings of ION GNSS 2007,
Fort Worth, TX, USA, Sept 2528, 2007, pp 331338
Cramér H (1928) On the composition of elementary errors. Skandinavisk Aktuarietidskrift, 11, 13-74,
141-180
D’Agostino RB (1970) Transformation to normality of the null distribution of g1. Biometrika, 57, 679-
681, DOI: 10.1093/biomet/57.3.679
Hampel FR (2001) Robust statistics: A brief introduction and overview. In: Carosio A, Kutterer H (Eds)
Proc. First Internat. Symposium on Robust Statistics and Fuzzy Techniques in Geodesy and GIS,
Zurich March 2001
Huber PJ (1964) Robust Estimation of a Location Parameter, Ann. Stat. 53: 73101
Huber PJ (2009) Robust Statistics (2nd ed.) John Wiley & Sons Inc, New York. ISBN 978-0-470-12990-6
Jarque CM, Bera AK (1980) Efficient tests for normality, homoscedasticity and serial independence of
regression residuals. Econ. Lett., 6, 255-259, doi:10.1016/0165-1765(80)90024-5.
Klees R, Ditmar P, Broersen P (2002)How to handle colored observation noise in large least-squares
problems. J. Geodesy 76:629-640. doi:10.1007/s00190-010-0392-4
Kolmogorov A (1933) Sulla determinazione empirica di una legge di distribuzione. Giornale
dell’Istituto Italiano degli Attuari, 4, 83-91 (in Italian)
Kutterer H (2001) Uncertainty assessment in geodetic data analysis. In: Carosio A, Kutterer H (Eds)
Proc. First Internat. Symposium on Robust Statistics and Fuzzy Techniques in Geodesy and GIS,
Zurich March 2001
Lehmann R (2012) Geodätische Fehlerrechnung mit der skalenkontaminierten Normalverteilung.
Allgemeine Vermessungs-Nachrichten 5/2012. VDE-Verlag Offenbach (in German)
15
Postprint of Stud. Geophys. Geod. 59 (2015) 489-504, DOI: 10.1007/s11200-015-0725-0
Lehmann R (2013) On the formulation of the alternative hypothesis for geodetic outlier detection. J.
Geodesy 87(4) 373386
Lehmann R (2014) Transformation model selection by multiple hypothesis testing. J. Geodesy
88(12)1117-1130. doi:10.1007/s00190-014-0747-3
Lehmann R, Lösler M (2015) Multiple outlier detection - hypothesis tests versus model selection by
information criteria. J. Surv. Eng. (just released) doi:10.1061/(ASCE)SU.1943-5428.0000189
Lilliefors HW (1967) On the Kolmogorov-Smirnov for normality with mean and variance unknown. J.
Am. Stat. Assoc., 62, 399-402
Luo X (2013) GPS Stochastic Modelling Signal Quality Measures and ARMA Processes. Springer
Berlin Heidelberg. doi:10.1007/978-3-642-34836-5
Luo X, Mayer M, Heck B (2011) On the probability distribution of GNSS carrier phase observations.
GPS Solutions 15(4)369-379. doi:10.1007/s10291-010-0196-2
Miller RG (1981) Simultaneous statistical inference. Springer New York. ISBN:0-387-90548-0
Pearson K (1900) On the criterion that a given system of deviations from the probable in the case of a
correlated system of variables is such that it can be reasonably supposed to have arisen from
random sampling. Phil. Mag. Ser. 5, 50(302) 157-175, doi:10.1080/14786440009463897
Razali NM, Wah YB (2011) Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and
Anderson-Darling tests. J. Stat. Model. Anal. 2(1)21-33
Shapiro SS, Francia RS (1972) An approximate analysis of variance test for normality. J. Am. Stat.
Assoc., 67, 215-216, doi:10.1080/01621459.1972.10481232
Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrica,
52, 591-611
Smirnov N (1948) Table for estimating the goodness of fit of empirical distributions. Ann. Math. Stat.,
19, 279-281, doi:10.1214/aoms/1177730256
Stephens MA (1974) EDF Statistics for Goodness of Fit and Some Comparisons J. Amer. Stat. Assoc.
69: 730737. doi:10.2307/2286009
Tanizaki H (2004) Computational methods in statistics and econometrics. Marcel Dekker, New York.
ISBN-13: 9780824748043
Teunissen PJG (2000) Testing theory; an introduction. Series on mathematical geodesy and
positioning, 2nd Ed., Delft Univ. of Technology, Delft, Netherlands
Tiberius CCJM, Borre K (2000) Are GPS data normally distributed. In: KP Schwarz (Ed.) Geodesy
Beyond 2000. International Association of Geodesy Symposia Volume 121, 2000, pp 243-248
Tukey JW (1960) A survey of sampling from contaminated distributions. In: Olkin I. (Ed.):
Contributions to Probability and Statistics. University Press Stanford California
Verhagen S, Teunissen PJG (2005) On the probability density function of the GNSS ambiguity
residuals. GPS Solutions 10(1)21-28. doi:10.1007/s10291-005-0148-4
von Mises R (1931) Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und
Theoretischen Physik. F. Deutike, Leipzig, Germany (in German)
Wisniewski Z (2014) M-estimation with probabilistic models of geodetic observations. J. Geodesy
88:941957. doi:10.1007/s00190-014-0735-7
... The normal distribution is one of the most important distribution in theory of probability, statistics, estimation and hence the adjustment. This is due to the fact that it has well-known mathematic properties and many stochastic phenomena or processes can be described by applying the normal distribution (e.g., Wiśniewski, 2014;Lehmann, 2015). In the context of analysis, processing or adjustment of geodetic observations, the importance of such a distribution follows Hagen's hypothesis of elementary errors which leads to the distribution in question. ...
... In the context of analysis, processing or adjustment of geodetic observations, the importance of such a distribution follows Hagen's hypothesis of elementary errors which leads to the distribution in question. It is usually assumed that geodetic observation errors are normally distributed; however, some analyses point at slight leptokurtosis of some types of the observations (see, for example, Wiśniewski, 2014;Lehmann, 2015;Duchnowski and Wyszkowska, 2017). Such an assumption is also important in the adjustment processes when we need to estimate the distribution of the adjusted parameters, or more generally, distributions of the estimators. ...
... Robust estimation methods are supposed to cope with outliers by design, unless the number of outliers does exceed the method breakdown point (e.g., Xu, 2005). An alternative for robust estimation is data snooping in di erent variants which are dedicated to particular problems of geodetic data analysis (e.g., Lehmann, 2012Lehmann, , 2015Prószyński, 2015;Rofatto et al., 2018;Zaminpardaz and Teunissen, 2019). Data snooping involves statistical tests (global and local) to identifying outliers. ...
Article
Full-text available
The normal distribution is one of the most important distribution in statistics. In the context of geodetic observation analyses, such importance follows Hagen’s hypothesis of elementary errors; however, some papers point to some leptokurtic tendencies in geodetic observation sets. In the case of linear estimators, the normality is guaranteed by normality of the independent observations. The situation is more complex if estimates and/or the functional model are not linear. Then the normality of such estimates can be tested theoretically or empirically by applying one of goodness-of-fit tests. This paper focuses on testing normality of selected variants of the Hodges-Lehmann estimators (HLE). Under some general assumptions the simplest HLEs have asymptotical normality. However, this does not apply to the Hodges-Lehmann weighted estimators (HLWE), which are more applicable in deformation analysis. Thus, the paper presents tests for normality of HLEs and HLWEs. The analyses, which are based on Monte Carlo method and the Jarque–Bera test, prove normality of HLEs. HLWEs do not follow the normal distribution when the functional model is not linear, and the accuracy of observation is relatively low. However, this fact seems not important from the practical point of view.
... L 1 -norm is currently underdeveloped due to the relative complexity of its implementation compared to the LS (Amiri-Simkooei, 2018). However, computational advanced techniques can be used efficiently at present (Lehmann, 2015;Rofatto et al., 2020a). In this context, L 1 -norm has recently been applied, for example, to the deformation analysis of geodetic networks (Nowel, 2016;Amiri-Simkooei et al. 2017). ...
... The classes of 'controlled and non-controlled observations against outliers' (Hekimoglu et al., 2011) for IDS and L 1 -norm can also be addressed. In addition, other observational errors model rather than the normal distribution can be investigated in levelling networks by MC experiments, following the approach described by Lehmann (2015) for linear regressions. Furthermore, if a robust estimator like L 1 -norm is applied for outlier identification, then reliability measures should be derived as pointed out, for example, in Guo et al. (2011). ...
Article
The goal of this paper is to evaluate the outlier identification performance of iterative Data Snooping (IDS) and L1-norm in levelling networks by considering the redundancy of the network, number and size of the outliers. For this purpose, several Monte-Carlo experiments were conducted into three different levelling networks configurations. In addition, a new way to compare the results of IDS based on Least Squares (LS) residuals and robust estimators such as the L1-norm has also been developed and presented. From the perspective of analysis only according to the success rate, it is shown that L1-norm performs better than IDS for the case of networks with low redundancy (r < 0.5), especially for cases where more than one outlier is present in the dataset. In the relationship between false positive rate and outlier identification success rate, however, IDS performs better than L1-norm, independently of the levelling network configuration, number and size of outliers.
... The method described by Case 3 can be extended to multiple devices, similar to the extension from two-sample comparisons to multiple comparisons. Then, test errors need to be adjusted to accommodate the inflated test errors induced by multiple comparisons [53]. ...
Article
Full-text available
In the realm of quality assurance, the significance of statistical measurement studies cannot be overstated, particularly when it comes to quantifying the diverse sources of variation in measurement processes. However, the complexity intensifies when addressing 3D topography data. This research introduces an intuitive similarity-based framework tailored for conducting measurement studies on 3D topography data, aiming to precisely quantify distinct sources of variation through the astute application of similarity evaluation techniques. In the proposed framework, we investigate the mean and variance of the similarity between 3D surface topography measurements to reveal the uniformity of the surface topography measurements and statistical reproducibility of the similarity evaluation procedure, respectively. The efficacy of our framework is vividly demonstrated through its application to measurements derived from additive-fabricated specimens. We considered four metal specimens with 20 segmented windows in total. The topography measurements were obtained by three operators using two scanning systems. We find that the repeatability variation of the topography measurements and the reproducibility variation in the measurements induced by operators are relatively smaller compared with the variation in the measurements induced by optical scanners. We also notice that the variation in the surface geometry of different surfaces is much larger in magnitude compared with the repeatability variation in the topography measurements. Our findings are consistent with the physical intuition and previous research. The ensuing experimental studies yield compelling evidence, affirming that our devised methods are adept at providing profound insights into the multifaceted sources of variation inherent in processes utilizing 3D surface topography data. This innovative framework not only showcases its applicability but also underlines its potential to significantly contribute to the field of quality assurance. By offering a systematic approach to measuring and comprehending variation in 3D topography data, it stands poised to become an indispensable tool in diverse quality assurance contexts.
... The AD test results for the UCS in all rock units are in good agreement with the probability plots (Table 7), which indicates that the adjusted boxplot dataset had better fit than the 2SD method, whereas the AIC test results for the same datasets cannot confirm this finding. Lehmann (2015) conducted a comparative statistical analysis on the geodesy and geophysics data through the use of Monte Carlo simulations, and it was proven that the AIC test obtained more accurate results than the AD test. Therefore, the AIC test results was preferred than the AD test in this study. ...
Article
Geomechanical parameters of intact metamorphic rocks determined from laboratory testing remain highly uncertain because of the great intrinsic variability associated with the degrees of metamorphism. The aim of this paper is to develop a proper methodology to analyze the uncertainties of geomechanical characteristics by focusing on three domains, i.e. data treatment process, schistosity angle, and mineralogy. First, the variabilities of the geomechanical laboratory data of Westwood Mine (Quebec, Canada) were examined statistically by applying different data treatment techniques, through which the most suitable outlier methods were selected for each parameter using multiple decision-making criteria and engineering judgment. Results indicated that some methods exhibited better performance in identifying the possible outliers, although several others were unsuccessful because of their limitation in large sample size. The well-known boxplot method might not be the best outlier method for most geomechanical parameters because its calculated confidence range was not acceptable according to engineering judgment. However, several approaches, including adjusted boxplot, 2MADe, and 2SD, worked very well in the detection of true outliers. Also, the statistical tests indicate that the best-fitting probability distribution function for geomechanical intact parameters might not be the normal distribution, unlike what is assumed in most geomechanical studies. Moreover, the negative effects of schistosity angle on the uniaxial compressive strength (UCS) variabilities were reduced by excluding the samples within a specific angle range where the UCS data present the highest variation. Finally, a petrographic analysis was conducted to assess the associated uncertainties such that a logical link was found between the dispersion and the variabilities of hard and soft minerals.
... The w-test statistic with the largest absolute value and whose absolute value is also greater than the critical value indicates the observation that is most likely to contain an outlier (Teunissen 2000;Teunissen et al. 2017). Other approaches for outlier detection and identification include the reapplication of the global model test (Wang and Knight 2012), the generalized likelihood ratio test (Knight et al. 2010; Teunissen 2018; Zaminpardaz et al. 2019), the solution separation test (El-Mowafy et al. 2019;Bang et al. 2020), the differencing outlier statistics approach (Wang and Knight 2012;Prószyński 2015;Yang et al. 2017), the Bayesian approach (Pervan et al. 1998;Gui et al. 2007Gui et al. , 2011Zhang and Gui 2015), the approach of model selection based on information criteria (Lehmann 2015;Lehmann and Lösler 2016), and robust estimation approach (Yang and Xu 2016). ...
Article
Full-text available
Efficiency evaluations of statistical decision probabilities with multiple alternative hypotheses are a prerequisite for data quality control in positioning, navigation, and many other applications. Commonly, one uses a time-consuming simulation technique to obtain the statistical decision probabilities or builds lower and/or upper bounds to control the probability, which may be unconvincing when the bounds are loose. We aim to provide a computationally efficient way to calculate the multivariate statistical decision probabilities when performing data snooping in quality control. However, accurate evaluation of those probabilities is complicated considering the complexity of the critical region where the integration intervals contain a variable corresponding to the one with the largest absolute value. Hence, to improve the calculation of statistical decision probabilities, a simplified algorithm for computing the probabilities under the critical region is proposed based on a series of transformation strategies. We implement the proposed algorithm in a simulated numerical experiment and a GPS single-point positioning experiment. The results show that the probabilities computed with the proposed algorithm approximate the results of the simulation technique, but the proposed algorithm is computationally more efficient.
... This choice is further justified by both the central limit theorem and the maximum entropy principle. Some alternative observation error models can be found in Lehmann (2015) and Lichti et al. (2021). However, the null hypothesis may not be fulfilled if the dataset are contaminated by outliers. ...
Article
Full-text available
Data Snooping is the most best-established method for identifying outliers in geodetic data analysis. It has been demonstrated in the literature that to effectively user-control the type I error rate, critical values must be computed numerically by means of Monte Carlo. Here, on the other hand, we provide a model based on an artificial neural network. The results prove that the proposed model can be used to compute the critical values and, therefore, it is no longer necessary to run the Monte Carlo-based critical value every time the quality control is performed by means of data snooping.
... Tests such as the Cramér-von Mises test or the D'Agostino-Pearson test are used for statistical studies with large samples [13][14][15][16]. With this multitude of statistical tests, it was decided to choose the three most frequently used tests for large samples: Anderson-Darling, chi-square and Kolmogorov-Smirnov [17][18][19]. ...
Article
Full-text available
Positioning systems are used to determine position coordinates in navigation (air, land and marine). The accuracy of an object’s position is described by the position error and a statistical analysis can determine its measures, which usually include: Root Mean Square (RMS), twice the Distance Root Mean Square (2DRMS), Circular Error Probable (CEP) and Spherical Probable Error (SEP). It is commonly assumed in navigation that position errors are random and that their distribution are consistent with the normal distribution. This assumption is based on the popularity of the Gauss distribution in science, the simplicity of calculating RMS values for 68% and 95% probabilities, as well as the intuitive perception of randomness in the statistics which this distribution reflects. It should be noted, however, that the necessary conditions for a random variable to be normally distributed include the independence of measurements and identical conditions of their realisation, which is not the case in the iterative method of determining successive positions, the filtration of coordinates or the dependence of the position error on meteorological conditions. In the preface to this publication, examples are provided which indicate that position errors in some navigation systems may not be consistent with the normal distribution. The subsequent section describes basic statistical tests for assessing the fit between the empirical and theoretical distributions (Anderson-Darling, chi-square and Kolmogorov-Smirnov). Next, statistical tests of the position error distributions of very long Differential Global Positioning System (DGPS) and European Geostationary Navigation Overlay Service (EGNOS) campaigns from different years (2006 and 2014) were performed with the number of measurements per session being 900’000 fixes. In addition, the paper discusses selected statistical distributions that fit the empirical measurement results better than the normal distribution. Research has shown that normal distribution is not the optimal statistical distribution to describe position errors of navigation systems. The distributions that describe navigation positioning system errors more accurately include: beta, gamma, logistic and lognormal distributions.
... The stochastical properties of the measurement errors are directly associated with the assumption of the probability distribution of these errors. In geodesy and many other scientific branches the well-known normal distribution is one of the most used as measurement error model [80]. Because of this, the model ceases to be purely mathematical and becomes a statistical model with functional and stochastic part. ...
Preprint
Full-text available
The reliability analysis allows to estimate the system's probability of detecting and identifying outlier. Failure to identify an outlier can jeopardise the reliability level of a system. Due to its importance, outliers must be appropriately treated to ensure the normal operation of a system. The system models are usually developed from certain constraints. Constraints play a central role in model precision and validity. In this work, we present a detailed optical investigation of the effects of the hard and soft constraints on the reliability of a measurement system model. Hard constraints represent a case in which there exist known functional relations between the unknown model parameters, whereas the soft constraints are employed for the case where such functional relations can slightly be violated depending on their uncertainty. The results highlighted that the success rate of identifying an outlier for the case of hard constraints is larger than soft constraints. This suggested that hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. After identifying and removing possible outliers, one should set up the soft constraints to propagate the uncertainties of the constraints during the data processing. This recommendation is valid for outlier detection and identification purpose.
... The stochastical properties of the measurement errors are directly associated with the assumption of the probability distribution of these errors. In geodesy and many other scientific branches the well-known normal distribution is one of the most used as measurement error model [76]. Because of this, the model ceases to be purely mathematical and becomes a statistical model with functional and stochastic part. ...
Preprint
Full-text available
In this paper we evaluate the effects of hard and soft constraints on the Iterative Data Snooping (IDS), an iterative outlier elimination procedure. Here, the measurements of a levelling geodetic network were classified according to the local redundancy and maximum absolute correlation between the outlier test statistics, referred to as clusters. We highlight that the larger the relaxation of the constraints, the higher the sensitivity indicators MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) for both the clustering of measurements and the clustering of constraints. There are circumstances that increase the family-wise error rate (FWE) of the test statistics, increase the performance of the IDS. Under a scenario of soft constraints, one should set out at least three soft constraints in order to identify an outlier in the constraints. In general, hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. In that process, one should opt to set out the redundant hard constraints. After identifying and removing possible outliers, the soft constraints should be employed to propagate their uncertainties to the model parameters during the process of least-squares estimation.
Thesis
Full-text available
For more than half a century, the reliability theory introduced by Baarda (1968) has been used as a standard practice for quality control in geodesy and surveying. Although the theory meets mathematical rigor and probability assumptions, it was originally developed for a Data-Snooping which assumes a specific observation as a suspect outlier. In other words, only one single alternative hypothesis is in play. Actually, we do not know which observation is an outlier. Since the Data-Snooping consists of screening each individual measurement for an outlier, a more appropriate alternative hypothesis would be: “There is at least one outlier in the observations”. Now, we are interested to answer: “Where?”. The answer to this question lies in a problem of locating among the alternative hypotheses the one that led to the rejection of the null hypothesis. Therefore, we are interested in identifying the outlier. Although advances have occurred over that period, the theories presented so far consider only one single round of the Data-Snooping procedure, without any subsequent diagnosis, such as removing the outlier. In fact, however, Data-Snooping is applied iteratively: after identification and elimination of the outlier, the model is reprocessed, and outlier identification is restarted. This procedure of iterative outlier elimination is known as Iterative Data-Snooping (IDS). Computing the probability levels associated with IDS is virtually impossible to those analytical methods usually employed in conventional tests, such as, overall model test and Data-Snooping of only one single alternative hypothesis. Because of this, a rigorous and complete reliability theory was not yet available. Although major advances occurred in the mid-1970s, such as microprocessorbased computers, Baarda had a disadvantage: the technology of his time was insufficient to use intelligent computational techniques. Today, the computational scenario is completely different from the time of Baarda’s theory of reliability. Here, following the current trend of modern science, we can use intelligent computing and extend the reliability theory when the DSI is in play. We show that the estimation depends on the test and the adaptation and, therefore, the IDS is, in fact, an estimator. Until the present, no study has been conducted to evaluate empirically the accuracy of the Monte Carlo for quality control purposes in geodesy. Generally, only the degree of dispersion of the Monte Carlo is considered. Thus, an issue remains: how can we find the optimal number of Monte Carlo experiments for quality control purpose? Here, we use an exact theoretical reference probabilities to answer this question. We find that that the number of experiments m = 200, 000 can provide consistent results with sufficient numerical precision for outlier identification, with a relative error less than 0.1%. The test statistic associated with IDS is the extreme normalised least-squares residual. It is well-known in the literature that critical values (quantile values) of such a test statistic cannot be derived from well-known test distributions but must be computed numerically by means of Monte Carlo. This paper provides the first results on the Monte Carlo-based critical value inserted into different scenarios of correlation between outlier statistics. We also tested whether increasing the level of the family-wise error rate, or reducing the critical values, improves the identifiability of the outlier. The results showed that the lower critical value, or the higher the family-wise error rate, the larger the probability of correct detection, and the smaller the MDB. However, this relationship is not valid in terms of identification. We also highlight that an outlier becomes identifiable when the contributions of the observations to the wrong exclusion rate (Type III error) decline simultaneously. In this case, we verify that the effect of the correlation between outlier statistics on the wrong exclusion rate becomes insignificant for a certain outlier magnitude, which increases the probability of identification.
Article
Full-text available
Transformations between different geodetic reference frames are often performed such that first the transformation parameters are determined from control points. If in the first place we do not know which of the numerous transformation models is appropriate then we can set up a multiple hypotheses test. The paper extends the common method of testing transformation parameters for significance, to the case that also constraints for such parameters are tested. This provides more flexibility when setting up such a test. One can formulate a general model with a maximum number of transformation parameters and specialize it by adding constraints to those parameters, which need to be tested. The proper test statistic in a multiple test is shown to be either the extreme normalized or the extreme studentized Lagrange multiplier. They are shown to perform superior to the more intuitive test statistics derived from misclosures. It is shown how model selection by multiple hypotheses testing relates to the use of information criteria like AICc and Mallows’ Cp{C}_{{p}} , which are based on an information theoretic approach. Nevertheless, whenever comparable, the results of an exemplary computation almost coincide.
Article
Full-text available
The importance of normal distribution is undeniable since it is an underlying assumption of many statistical procedures such as t-tests, linear regression analysis, discriminant analysis and Analysis of Variance (ANOVA). When the normality assumption is violated, interpretation and inferences may not be reliable or valid. The three common procedures in assessing whether a random sample of independent observations of size n come from a population with a normal distribution are: graphical methods (histograms, boxplots, Q-Q-plots), numerical methods (skewness and kurtosis indices) and formal normality tests. This paper* compares the power of four formal tests of normality: Shapiro-Wilk (SW) test, Kolmogorov-Smirnov (KS) test, Lilliefors (LF) test and Anderson-Darling (AD) test. Power comparisons of these four tests were obtained via Monte Carlo simulation of sample data generated from alternative distributions that follow symmetric and asymmetric distributions. Ten thousand samples of various sample size were generated from each of the given alternative symmetric and asymmetric distributions. The power of each test was then obtained by comparing the test of normality statistics with the respective critical values. Results show that Shapiro-Wilk test is the most powerful normality test, followed by Anderson-Darling test, Lilliefors test and Kolmogorov-Smirnov test. However, the power of all four tests is still low for small sample size.
Article
The detection of multiple outliers can be interpreted as a model selection problem. The null model, which indicates an outlier free set of observations, and a class of alternative models, which contain a set of additional bias parameters. A common way to select the right model is the usage of a statistical hypothesis test. In geodesy Baarda's data snooping is most popular. Another approach arises from information theory. Here, the Akaike information criterion (AIC) is used to select an appropriate model for a given set of observations. AIC is based on the Kullback-Leibler divergence, which describes the discrepancy between the model candidates. Both approaches are discussed and applied to test problems: The fitting of a straight line and a geodetic network. Some relationships between data snooping and information criteria are elaborated. In a comparison it turns out that the information criteria approach is more simple and elegant. But besides AIC there are many alternative information criteria selecting different outliers, and it is not clear, which one is optimal.
Chapter
Knowledge of the probability density function of the observables is not needed to routinely apply a least-squares algorithm and compute estimates for the parameters of interest. For the interpretation of the outcomes, and in particular for statements on the quality of the estimator, the probability density has to be known A variety of tools and measures to analyse the distribution of data are reviewed and applied to code and phase observables from a pair of geodetic GPS receivers. As a conclusion the normal probability density function turns out to be a reasonable model for the distribution of GPS code and phase data, but this may not hold under all circumstances
Article
Let x1, x2 … xn be a system of deviations from the means of n variables with standard deviations σ1, σ2 … σn and with correlations r12, r13, r23 … rn −1,n.