Content uploaded by Marcelo Tomio Matsuoka

Author content

All content in this area was uploaded by Marcelo Tomio Matsuoka on Sep 30, 2019

Content may be subject to copyright.

| 21

Geoplanning

Vol 6, No 1, 2019, 21-30 Journal of Geomatics and Planning

E-ISSN: 2355-6544

http://ejournal.undip.ac.id/index.php/geoplanning

doi: 10.14710/geoplanning.6.1.21-30

Monte Carlo Simulation for Outlier Identification Studies in

Geodetic Network: An Example in A Levelling Network Using

Iterative Data Snooping

M.T. Matsuoka a,c , V.F. Rofatto a,c, I. Klein b, A. F. S. Gomes a , M.P. Guzatto b

a Institute of Geography, Federal University of Uberlândia (UFU), Monte Carmelo, Brazil

b Landing Surveying Program, Federal Institute of Santa Catarina (IFSC), Florianopolis, Brazil

c Graduate Program of Remote Sensing, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil

Abstract: Today with the fast and powerful computers, large data storage systems and

modern softwares, the probabilities distribution and efficiency of statistical testing algorithms

can be estimated using computerized simulation. Here, we use Monte Carlo simulation (MCS) to

investigate the power of the test and error probabilities of the Baarda’s iterative data snooping

procedure as test statistic for outlier identification in the Gauss-Markov model. The MCS discards

the use of the observation vector of Gauss-Markov model. In fact, to perform the analysis, the

only needs are the Jacobian matrix; the uncertainty of the observations; and the magnitude

intervals of the outliers. The random errors (or residuals) are generated artificially from the

normal statistical distribution, while the size of outliers is randomly selected using standard

uniform distribution. Results for simulated closed leveling network reveal that data snooping can

locate an outlier in the order of magnitude 5σ with high success rate. The lower the magnitude of

the outliers, the lower is the efficiency of data snooping in the simulated network. In general,

considering the network simulated, the data snooping procedure was more efficient for α=0.01

(1%) with 82.8% success rate.

Copyright © 2019 GJGP-UNDIP

This open access article is distributed under a

Creative Commons Attribution (CC-BY-NC-SA) 4.0 International license.

How to Cite (APA 6 th Style):

Matsuoka, M.T., Rofatto, V.F., Klein, I., Gomes. A.F.S, Guzatto, M.P. (2019). Monte Carlo Simulation for Outlier Identification Studies in Geodetic

Network: An Example in a Levelling Network Using Iterative Data Snooping. Geoplanning: Journal of Geomatics and Planning, 6(1), 21-30.

doi:10.14710/geoplanning.6.1.21-30.

1. INTRODUCTION

Data snooping is the best established method for identification of gross errors in geodetic data analysis.

This method is due to (Baarda, 1968). Here, it is assumed that outliers are observations contaminated by

gross errors (blunders), following the statement of Lehmann (2012) that in Geodesy, ‘outliers are most

often caused by gross errors and gross errors most often cause outliers’. In practice, Data Snooping

procedure is applied iteratively, identifying and removing an outlier at a time. The method is applied until

no observations are identified. Here, this procedure will be called Iterative Data Snooping. Since data

snooping is based on a statistical hypothesis testing, it may lead to a false decision as follows:

• Type I error or false alert (probability level α) – Probability of detecting an outlier when there is none;

• Type II error or missed detection (probability level β) – Probability of non-detecting an outlier when

there is at least one; and

• Type III error or wrong exclusion (probability level κ) – Probability of misidentifying a non-outlying

observation as an outlier, instead of the outlying one.

Lehmann & Voß-Böhme (2017) mention that while the rate of type I decision error can be selected by

the user, the rate of type II decision error cannot. They also point out that a test statistic with a low rate of

type II is said to be powerful. However, without considering the Type III error, there is a high risk of over-

estimating the successful identification probability. Besides that, we highlight that the Iterative Data

Snooping procedure can identify more observations than real number of outliers (we call here “over-

Article Info:

Received: 13 April 2018

in revised form: 30 March 2019

Accepted: 30 May 2019

Available Online: 30 August 2019

Keywords:

Geodetic Network, Outlier, Monte

Carlo Simulation

Corresponding Author:

Marcelo Tomio Matsuoka

Universidade Federal de Uberlândia,

Monte Carmelo, Brazil

Email: tomio@ufu.br

OPEN ACCESS

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

22 |

identification”). Thus, we consider a powerful statistical test when the rates of type II and type III errors as

well as the over-identification are simultaneously minimized for a given probability level α.

From this point, we posed the following problem: how to compute the probabilities levels above?

Unlike Baarda, we have fast computers at our disposal. In this paper we show that the statistical quantities

can be determined by frequency distributions of computer random experiments performed using random

numbers. This is known as Monte Carlo Simulation (MCS). MCS methods are used whenever the functional

relationships are analytically not tractable, as is the case for Iterative Data Snooping procedure (Rüdiger

Lehmann, 2012b). The MCS has already been applied in outlier detection (Lehmann & Scheffler, 2011;

Lehmann, 2012; Klein et al., 2012; Klein et al., 2015; Erdogan, 2014; Niemeier & Tengen, 2017)

The studies presented in this paper are a continuation of the first experiments presented by Rofatto et

al. (2017). However, unlike Rofatto et al., (2017), here in this paper we evaluate the proposed method in a

geodetic network with uncorrelated observations and also we analyze the power of the test of Iterative

Data Snooping procedure when outliers of magnitude equal to the MDB (Minimal Detectable Bias) are

inserted into the geodetic network.

The outline of the paper is as follows: first the paper show a theoretical background of Iterative Data

Snooping procedure in the Gauss Markov model. Next, the MCS approach is introduced as tool to analyze

the power of the test and the probabilities of decision errors (type II, type III and over-identification) of the

Iterative Data Snooping procedure. Then, the efficiency of the Iterative Data Snooping is demonstrated by

means of the Monte Carlo method on the example of simulated closed leveling network. The mathematical

model generally adopted in geodetic data analysis is the linear(ized) Gauss-Markov model, given by Koch

(1999):

e y Ax=−

.......(1)

where e is the n x 1 random error vector, A is the n x u design (or Jacobian) matrix with full rank column, x is

the u x 1 unknown parameters vector and y is the n x 1 observations vector. The most employed solution

for a redundant system of equations (

nu

) is the weighted least squares estimator (WLSE) for the vector

of unknowns (

ˆ

x

):

1

ˆ( ) ( )

TT

x A WA A Wy

−

=

........(2)

In which W is the n x n weight matrix of the observations, taken as

21

0y

W

−

=

, where

2

0

is the

variance factor (here it is assumed as known) and

y

is the covariance matrix of the observations; if

y

is

diagonal, one speaks of weighted LSE (WLSE); if it is full, generalized LSE (GLSE). More details about LSE

estimation in Ghilani (2017). A geometric interpretation of the LSE can be found in Teunissen (2003) and

Klein et al. (2011).

The least-squares method is the Best Linear Unbiased Estimator (BLUE) for the unknown parameters and

it is also a maximum likelihood solution when the observation errors follow a central Gaussian distribution

(Teunissen, 2003). However, the least squares is no longer optimal in the presence of grossly erroneous

observations (Baarda, 1968). In other words, despite optimal properties for least square, they lack

robustness or insensitivity to outliers in observations (Huber, 1992; Rousseeuw & Leroy, 1987; Lehmann,

2013). In recent years, two categories of advanced techniques for the treatment of observations

contaminated by outliers have been developed: robust adjustment procedures (Wilcox, 2011; Klein et al.,

2015) and outlier detection based on statistical tests (Klein et al., 2016) . The first one is outside the scope

of this paper. Besides the undoubted advantages of robust adjustment, the outlier tests are also used. The

following advantages of outlier analysis are mentioned by Lehmann (2013);

• Detected outliers provide the opportunity to investigate causes of gross measurement errors;

• Detected outliers can be re-measured; and

• If the outliers were discarded from the observations then the standard adjustment software, which

operates according to the least squares principle, can be used.

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

| 23

Data snooping procedure is a particular case of maximum likelihood ratio test when only one outlier (i.e.

q = 1) is present in the data set at a time (Baarda, 1968; Berber & Hekimoglu, 2003; Lehmann, 2012) Thus, it

is formulated by the following test hypotheses (Baarda, 1968; Teunissen, 2006):

0: { } vs : { } ; 0

Ay

H E y Ax H E y Ax c= = +

.......... (3)

Where cy is outlier model for q=1, i.e. the n x 1 unit vector with 1 in its ith entry and zeros in the

remaining (e.g.

10 0 1 0 0

nx

T

y

c=

), and ∇ is a scalar value with the gross error (outlier) at ith

observation being tested. Therefore, in the null hypothesis (H0), it is assumed that there are no outliers in

the observations, while in the alternative hypothesis (HA), is it assumed that the ith observation being tested

(

i 1, ,for n=

) is contaminated by gross error of magnitude ∇.

If we consider one outlying observation in at certain known locations (q = 1), then the likelihood ratio test

for data snooping (Tq = 1) is given by Teunissen (2006):

0

1 1 1 1 1

ˆ

1 0 0

ˆˆ

T ( )

T T T

q y y y y e y y y y

e c c c c e

− − − − −

==

.......... (4)

where

0

ˆ

e

and

0

ˆ

e

is the estimated random error vector and a posteriori covariance matrix of the

estimated random error computed by LSE into H0, respectively. Under H0, observation errors are zero-

mean (multivariate) normally distributed. The null hypothesis is rejected if the following test statistic (Tq =

1) of the ith observation being tested exceeds a given critical value Κα , i.e.:

0

01

2 2 1 1 2

ˆ

0 1 (1,0) 1 (1, )

Reject H if: T

:T ~ ; :T ~ , with

q

T

q A q y y e y y

K

H H c c

=

−−

==

=

........ (5)

Important to mention that the critical value follows from a chi-squared distribution with one degree

freedom at a significance level of in a one-tailed test. Baarda (1968) and Teunissen (2006) demonstrate

that if q = 1, then the test statistics (equation 4) can also be formulated based on a standard normal

distribution in a two-tailed test (so-called w-test). Both the chi-squared and normal distribution tests are

equivalent. Usually in geodesy, the value of is set between 0.1% and 1% (Kavouras, 1982; Aydin &

Demirel, 2004; Lehmann, 2013). Furthermore, data snooping contains multiple alternative hypotheses, as

each observation is individually tested. Therefore, the only observation considered contaminated by outlier

is the one whose test statistic satisfies the inequalities Tq=1 > Κα. In the case that two or more observations

exceed the critical value Κα only the observation with the largest Tq=1 is flagged as an outlier. After having

identified the observation most suspected of being an outlier (at given ), it is excluded usually from the

model, and the WLSE and data snooping procedure are applied iteratively until there are no further outliers

identified in the observations (Berber & Hekimoglu, 2003).

The power of the test (γ) is the probability of correctly identifying the outliers. In the case of a round of

Data Snooping, the power of the test depends on the type II and type III errors, for a given level of

significance (α) (i.e. γ = 1 – (β + κ)). Considering the “Iterative Data Snooping”, the power of the test also

depends on over-identification error, and it is given by γ = 1 – (β + κ + over-identification). Baarda's

conventional reliability theory considers only a single alternative hypothesis (Baarda, 1968), and therefore,

it is based only on type I and II errors. Type III error is addressed by Förstner (1983) considering two

alternative hypotheses. Yang et al. (2013) extended the solution given by Förstner (1983), and presented an

analytical solution for type III error considering multiple alternative hypotheses and the presence of an

outlier (i.e. for a round of Data Snooping). Examples of the efficiency of the analytical solution presented by

Yang et al. (2013) can be found in Klein et al. (2015).

The focus of this paper is the Iterative Data Snooping. An analytical solution to the probabilities of

decision error and power of the test for Iterative Data Snooping has not yet been developed and is of

rather difficult solution. A well-established procedure to compute the probabilities levels is the Monte

Carlo Simulation (MCS). As pointed out by Lehmann (2012), in essence the MCS replaces random variables

by computer generated pseudo random numbers, probabilities by relative frequencies and expectations by

arithmetic means over large sets of such numbers. A computation of one set of pseudo random numbers is

a Monte Carlo experiment. In Geodesy, Monte Carlo Simulation has been applied in some studies since the

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

24 |

pioneering work of (Hekimoglu & Koch, 1999). For example, Lehmann & Scheffler (2011) have already

applied MCS in data snooping to determine the optimal level of error probability α (type I error). Here, on

the other hand, we proposed to use MCS in the “Iterative Data Snooping” to compute the follows

probabilities levels: power of the test; type II error and type III error. In addition to these probabilities, we

also compute the rate of experiments where the Iterative Data Snooping procedure identified more outliers

than simulated (we call “over-identification”- i.e., q > 1). In the next section we show how to obtain these

probabilities levels experimentally. Thus, we can analyze the efficiency of the Iterative Data Snooping

testing procedure based on MCS as promised by the title of this paper.

2. DATA AND METHODS

In order to analyze the Iterative Data Snooping procedure, the MCS was applied to compute the

probabilities levels. To do so, a sequence of m random errors vector

, 1, ,=

K

e k m

of a desired

statistical distribution is generated. The “m” is known as the number of Monte Carlo experiments. Usually,

assume that the random errors of the good measurements are normally distributed with expectation zero.

Thus, we generate the random errors using multivariate normal distribution, since the assumed stochastic

model for random errors is based on covariance matrix of the observations, i.e.

2

0

~ (0, )y

eN

.

On the other hand, an outlier (q=1) is selected based on magnitude intervals of the outliers for each m

Monte Carlo experiments. Positive and negative outliers are clipped between 3σ and 3.5σ, 3.5σ and 4σ; 4σ

and 4.5σ; 4.5σ and 5σ; 5 and 5.5σ; 5.5σ and 6σ; 6σ and 6.5σ; 6.5σ and 7σ; 7σ and 7.5σ; 7.5σ and 8σ; 8σ and

8.5σ; 8.5σ and 9σ in each experiment (σ is the standard deviation of the observation). Here, we use the

standard uniform distribution to select the outlier magnitude. The uniform distribution is a rectangular

distribution with constant probability and implies the fact that each range of values that has the same

length on the distributions support has equal probability of occurrence (Lehmann & Scheffler, 2011). For

example, for 10,000 Monte Carlo experiments, if the one choices a magnitude interval of the outliers of |3σ

to 9σ|, the probability of a 3σ error occurring is virtually the same as -3σ, and so on. At each iteration of the

simulation, a specific observation is chosen to receive a gross error based on the discrete uniform

distribution (i.e., all observations have the same probability of being selected). Random and gross errors

are assumed to be independent (by definition) and both are combined to the total error as follow

(Kavouras, 1982):

, 0

y

ec

= +

...... (6)

where

is the n x 1 total error vector, e is n x 1 random errors vector and cy is outlier model for q=1 (see

expression 3), and ∇ is a scalar value with the outlier at ith observation being tested. We assume that ∇

>e. Before computing statistical test Tq=1 (expression 4) it is necessary to relate the random error vector e

and total error vector ε, since this statistical test depends on the estimated random error vector

0

ˆ

e

. In the

sense of LSE, this relationship is given by Kavouras (1982): in which R is the n x n redundancy matrix and I is

the n x n identity matrix.

0

ˆ=eR

, ....... (7)

1

()

TT

R I A A WA A W

−

=−

.......... (8)

In the equation 7 the reader should be informed that the multiplication of the redundancy matrix (R)

and the total error

provides the estimated random error vector

0

ˆ

e

. Now, the

0

ˆ

e

is not only composed by

random errors, but also it has one of its elements contaminated by an outlier. Now it becomes possible to

compute the test statistic Tq=1 considering the relation given by equation 4.

The significance level is varied, taken as α = 0.001 (0.1%), α = 0.01 (1%), α = 0.025 (2.5%), α = 0.05 (5%)

and α = 0.1 (10%). Each simulation has a unique combination of significance level and interval of magnitude

of outliers. We ran 10,000 experiments for each simulation and compute the probabilities levels of type II

error, type III error, the power of the test and the number of over-identification (more outliers identified

than simulated) in Iterative Data Snooping, totaling 12 x 5 x 10,000 = 600,000 Monte Carlo simulations. It is

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

| 25

important to emphasize that the proposed method does not depend on the unknown parameters vector or

the vector of observations.

3. RESULTS AND DISCUSSION

In order to demonstrate the analysis of the efficiency of data snooping, we simulated a closed leveling

network. The goal is to illustrate how to use MCS approach to compute statistical quantities numerically;

further considerations about levelling networks are outside the scope of this study.

We consider a closed levelling network, with one control station (benchmark), and 4 points with

unknown heights (A, B, C and D), totaling four minimally constrained points as shown in Figure 1. The

benchmark is fixed, and the distances of the adjacent and non-adjacent stations are approximately 240 m

and 400 m, respectively. The equipment used is a spirit level with nominal standard deviation for a single

staff reading of 0.02 mm/m. Lines of sight distances are kept at 40 m. Thus, each total height difference

i

h

between adjacent or non-adjacent stations is made of, respectively, three or five partial height

differences (p). Each partial height difference, in turn, involves one instrument setup and two sightings:

forward and back. The standard deviation for each

i

h

equals to

2 4 0 0 .0 2 mm /m 2 0. 8 mm

ii

pp

= =

, where p is 3 or 5. The readings are assumed

uncorrelated and

2

0

= 1.

Figure 1. Simulated leveling network

For each unknown point, there are four height difference measurements. Thus, there are n = 10

observations, u = 4 unknowns, and n - u = 6 degrees of freedom in this simulation. The design matrix (A) has

dimension 10 x 4 and the covariance matrix of observations has dimension

10 x1 0

. Each station is

involved in four height differences, so there are three redundant observations for the determination of

each unknown.

In the sense of reliability, the minimum and maximum redundancy numbers of the observations in the

network are 0.46 and 0.75, respectively. This means that the ability of the outlier detection is not uniform

in every part of the network. We also compute the Minimal Detectable Bias (MDB) as an indicator of

system internal reliability. The MDB is derived from a local test proposed by Baarda (1968), which makes a

decision between the null and a unique alternative hypothesis. By definition the MDB is based on Type I

(false alert) and Type II (missed detection) error. The conventional MDBs are ranged from 4.7σ to 6σ for

α=0.001; 3.9σ to 5σ for α=0.01; 3.5σ to 4.5σ for α=0.025; 3.2σ to 4σ for α=0.05; finally, 2.8σ and 3.6σ for

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

26 |

α=0.1. The computations of these MDBs were based on probability of Type II error of 0.2 (Baarda, 1968). In

addition, the maximum positive and negative correlation between the test statistics are 0.61 (between

4

h

and

5

h

) and -0.58 (between

2

h

and

3

h

), respectively. The correlation coefficient is presented by

Förstner (1983).

Applying the method presented in section 3, the success and error probabilities of Iterative Data

Snooping were estimated. Figure 2 shows the success rate (number of experiments that only outlying

observation was identified), i.e. the power of the Iterative Data Snooping testing procedure for one

simulated outlier (γ). The misidentifications rates are showed in the Figures 3-4. The misidentifications are

divided in two types of classes are counted in the simulations: number of experiments where the procedure

yielded none observation identification (type II error - β); number of experiments in which the procedure

detected a single observation but wrong identification (type III error - κ). In addition to these classes, we

consider “over-identification”, i.e. the number of experiments where the procedure detects more outliers

than simulated.

Figure 2. Success rate (power of the test) of the iterative data snooping testing procedure for simulated

leveling network vs. magnitude intervals of the outliers for each probability level α.

Figure 3. Type II error for simulated leveling network vs. magnitude intervals of the outliers for each

probability level α.

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

| 27

Figure 4. Type III error for simulated leveling network vs. magnitude intervals of the outliers for each

probability level α.

Figure 5. Over-identification vs. magnitude intervals of the outliers for each probability level α.

Figures 3–5 show that, in general, the lower the magnitude of the outliers, the lower is the efficiency of

data snooping in the simulated network. This is expected. However, there is not direct relation between the

power of the test and significance level (α) for the network analyzed. Here, we do not recommend to use

α=0.1(10%), because many good observations are eliminated. In this case, as shown in the Figure 5, the

over-identification rate stands out in relation to the other types of errors. Furthermore, in this case (α=0.1),

the power of the test is virtually independent of the outlier size (see Figure 1).

It appears that higher values for α are not recommended for outliers of greater magnitude and that

lower values for α are not recommended for outliers of smaller magnitude. Therefore, these results show

the importance of a correct choice of α, as pointed out by Lehmann (2012); it also highlights the challenges

in controlling the error rate in multiple hypotheses tests. Regarding the three classes of misidentification

rates, in general, an increase in the magnitude interval of outliers, leads to a slight increase in the over-

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

28 |

identification rate (more outliers being identified than simulated) and a cutback in the type II error. This

fact is due to the error propagation among all residuals. The rate of cases with correct number of outliers

but with wrong identification (type III error) also decreases when increasing the magnitude interval of

outliers.

We can observe (Figure 2) from the interval of magnitude outlier (5σ - 5.5σ) the value of the power of

the test (success rate) is practically stable for all levels of significance (except for α=0.001): approximately,

50% (α=0.1), 70% ( α=0.05), 83% (α=0.025) and 92% (α=0.01). For α=0.001, the success rate is greater than

90% from the interval of magnitude outlier (5.5σ - 6σ).

In general, considering the network simulated, the Iterative Data Snooping procedure is efficient for

outliers greater than 5σ, with a mean success rate of 76.4% for α=0.001(0.1%); 82.8% for α=0.01 (1%); 78%

for α=0.025 (2.5%); and 67.9% for α=0.05 (5%). Therefore, with an appropriate choice of α, results show

that data snooping can locate an outlier in the order of magnitude 5σ with high success rate. However, the

number of outliers to be considered, that also affects the efficiency of Iterative Data Snooping, requires

further investigation.

In order to compare the power of the test of Iterative Data Snooping and the conventional power of the

test (80%), the method described in section 3 was also applied considering the outliers with the size of its

MDBs for each significance level. As pointed out, the MDBs ranged from 4.7σ to 6σ for α=0.001; 3.9σ to 5σ

for α=0.01; 3.5σ to 4.5σ for α=0.025; 3.2σ to 4σ for α=0.05; finally, 2.8σ and 3.6σ for α=0.1. Such MDBs

were based on the conventional power of the test of 0.8 (80%). The probabilities of committing different

types of errors and the power of the Iterative Data Snooping considering these MDBs for each significance

level are showed in the Table 1.

Table 1. Probabilities of Iterative Data Snooping (%) considering the size of conventional MDBs

Significance levels α

Power of the test %

Type II Error %

Type III error %

Over-identification %

0.001

77.09

19.97

2.5

0.45

0.01

70.72

17.66

6.76

4.86

0.025

62.94

15.7

10.35

11.01

0.05

53.05

12.66

12.82

21.47

0.1

38.41

9.15

15.52

36.92

In Table 1, it is noticeable that the higher the significance level, the greater the divergence between the

power of the Iterative Data Snooping and the power used to compute the MDB (i.e. 80%). The explanation

for this difference is that the computation of the MDB depends only on Type I and Type II error, while the

Iterative Data Snooping also considers the probability of Type III error and over-identification. In future

research, it is intended to investigate a function that relates the power of the test considering the MDB to

the power of Iterative Data Snooping.

To conclude, it is important to mention that outlying observation can be presented among the detected

observations in the over-identification case. If all detected observations are wrong, the over-identification

case could be classified as type III error. The over-identification case will be investigated in more details in

future studies.

4. CONCLUSIONS

Many methods of quality control for geodetic data analysis have been developed and investigated since

the pioneering work of Baarda (1968). However, these methods still deserve further investigation. Thus, the

goal of this paper was to analyze the data snooping testing procedure to locate an outlier by means of the

MCS. The MCS discards the use of the observation vector of Gauss-Markov model. In fact, to perform the

analysis, the only needs are the geometrical network configuration (given by Jacobian matrix); the

uncertainty of the observations (given by nominal standard deviation of the equipment); and the

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

| 29

magnitude intervals of the outliers. The random errors (or residues) are generated artificially from the

normal statistical distribution, while the size of outliers is selected using standard uniform distribution.

Iterative Data Snooping shows high success rates in the experiments of a simulated levelling network for

single outlier randomly generated between four and five standard deviations. However, the efficiency of

Iterative Data Snooping significantly decreases for outlier smaller than five standard deviations. The

efficiency of the data snooping also depends on the significance level (α). Here, the optimal value for the

significance level was 0.01 (1%) for the simulated network. When we insert the MDB as an outlier in the

geodetic network, we verified that the higher the significance level, the greater the difference between the

power of the test of Iterative Data Snooping and the power of the test used for the MDB computation. In

future research, it is intended to investigate a function that relates the power of the test considering the

MDB to the power of Iterative Data Snooping.

Finally, we show that Monte Carlo Simulation is a feasible method to compute the probabilities level

associated to a statistical testing procedure regardless of the statistical tables. Future studies should

consider various issues: the performance of the data snooping in cases of linearized (originally non-linear)

models; it should consider geodetic networks in the sense of multiple outliers; the development of

reliability measures; and the method performance in different networks with various geometry and varying

redundancy. There are others approaches to identify multiple outliers in the observations, such as the

recent proposal of Lehmann & Lösler (2016) using the p-value concept and the Sequential Likelihood Ratio

Tests for Multiple Outliers (SLRTMO) presented by Klein et al. (2016). A suggestion for future work is to

increase the power of the test (success rate) of Iterative Data Snooping procedure by means of a unifying

testing procedure relating the iterative Data Snooping (a single outlier at time) with approaches for

multiple outliers identification such as those presented in Lehmann & Lösler (2016) and Klein (2016).

5. ACKNOWLEDGMENTS

The authors thank CNPq for financial support provided to the first author (proc. n. 305599/2015-1) and

to Fapemig by the scientific initiation fellowship of the fourth author.

6. REFERENCES

Aydin, C., & Demirel, H. (2004). Computation of Baarda’s lower bound of the non-centrality parameter.

Journal of Geodesy, 78(7–8), 437–441. [Crossref]

Baarda, W. (1968). A testing procedijre for use in geodetic networks. Netherlands Geodetic Commission,

2(5). [Crossref]

Berber, M., & Hekimoglu, S. (2003). What is the reliability of conventional outlier detection and robust

estimation in trilateration networks? Survey Review, 37(290), 308–318. [Crossref]

Erdogan, B. (2014). An outlier detection method in geodetic networks based on the original observations.

Boletim de Cincias Geodsicas, 20(3), 578–589. [Crossref]

Förstner, W. (1983). Reliability and discernability of extended Gauss-Markov models. Deut. Geodact.

Komm. Seminar on Math. Models of Geodetic Photogrammetric Point Determination with Regard to

Outliers and Systematic Errors p 79-104 (SEE N 84-26069 16-43).

Ghilani, C. D. (2017). Adjustment computations: spatial data analysis. John Wiley & Sons.

Hekimoglu, S., & Koch, K. R. (1999). How can reliability of the robust methods be measured? Proceedings of

the Third Turkish-German Joint Geodetic Days, Istanbul, 179–196.

Huber, P. J. (1992). Robust Estimation of a Location Parameter. In Springer Series in Statistics (pp. 492–518).

[Crossref]

Kavouras, M. (1982). On the Detection of Outliers and the Determination of Reliability in Geodetic Networks.

1982. m. sc. e. Thesis-Department of Geodesy and Geomatics Engineering, University of New

Brunswick.

Klein, I, Matsuoka, M. T., Guzatto, M. P., de Souza, S. F., & Veronez, M. R. (2014). On evaluation of different

methods for quality control of correlated observations. Survey Review, 47(340), 28–35. [Crossref]

Klein, I, Matsuoka, M. T., Guzatto, M. P., & Nievinski, F. G. (2016). An approach to identify multiple outliers

based on sequential likelihood ratio tests. Survey Review, 49(357), 449–457. [Crossref]

Matsuoka, et al. / Geoplanning: Journal of Geomatics and Planning, Vol 6, No 1, 2019, 21-30

doi: 10.14710/geoplanning.6.1.21-30

30 |

Klein, Ivandro, Matsuoka, M. T., & Guzzato, M. P. (2015). How to estimate the minimum power of the test

and bound values for the confidence interval of data snooping procedure. Boletim de Cincias

Geodsicas, 21(1), 26–42. [Crossref]

Klein, Ivandro, Matsuoka, M. T., Souza, S. F. de, & Collischonn, C. (2012). Design of geodetic networks

reliable against multiple outliers. Boletim de Ciências Geodésicas, 18(3), 480–507.

Klein, Ivandro, Matsuoka, M. T., Souza, S. F. de, & Veronez, M. R. (2011). Adjustment of observations: a

geometric interpretation for the least squares method. Boletim de Ciências Geodésicas, 17(2), 272–

294.

Koch, K.-R. (1999). Parameter Estimation and Hypothesis Testing in Linear Models. [Crossref]

Lehmann, Rdiger, & Scheffler, T. (2011). Monte Carlo-based data snooping with application to a geodetic

network. Journal of Applied Geodesy, 5(3–4). [Crossref]

Lehmann, Rüdiger. (2012). Improved critical values for extreme normalized and studentized residuals in

Gauss Markov models. Journal of Geodesy, 86(12), 1137–1146. [Crossref]

Lehmann, Rüdiger. (2012). On the formulation of the alternative hypothesis for geodetic outlier detection.

Journal of Geodesy, 87(4), 373–386. [Crossref]

Lehmann, Rüdiger, & Lösler, M. (2016). Multiple Outlier Detection: Hypothesis Tests versus Model Selection

by Information Criteria. Journal of Surveying Engineering, 142(4), 4016017. [Crossref]

Lehmann, Rüdiger, & Voß-Böhme, A. (2017). On the statistical power of Baarda’s outlier test and some

alternative. Journal of Geodetic Science, 7(1), 68–78. [Crossref]

Niemeier, W., & Tengen, D. (2017). Uncertainty assessment in geodetic network adjustment by combining

GUM and Monte-Carlo-simulations. Journal of Applied Geodesy, 11(2), 67–76. [Crossref]

Rofatto, V. F., Matsuoka, M. T., & Klein, I. (2017). An Attempt to Analyse Baarda’s Iterative Data Snooping

Procedure based on Monte Carlo Simulation. South African Journal of Geomatics, 6(3), 416. [Crossref]

Rousseeuw, P. J., & Leroy, A. M. (1987). Robust Regression and Outlier Detection. [Crossref]

Teunissen, P. J. G. (2003). Adjustment theory. VSSD.

Teunissen, P. J. G. (2006). Testing theory. VSSD.

Wilcox, R. R. (2011). Introduction to robust estimation and hypothesis testing. Academic press.

Yang, L., Wang, J., Knight, N. L., & Shen, Y. (2013). Outlier separability analysis with a multiple alternative

hypotheses test. Journal of Geodesy, 87(6), 591–604. [Crossref]