## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

Data snooping is one of the best established methods of gross error
detection in geodetic data analysis. Since it is based on hypothesis
testing, it requires the choice of levels of error probability. This
choice is often, to some degree, arbitrary. If the levels chosen are too
high, we run the risk of losing many good measurements that are not
actually contaminated by gross errors. If the levels chosen are too low,
we run the risk of leaving gross errors undetected. We propose to choose
levels of error probability such that the desired parameters are best
estimated in some sense. This can be done using the Monte Carlo method.
We applied this procedure to a geodetic precision network from
construction of a diversion tunnel. Depending on the stochastic model of
the measurement process, we observed a gain of such an optimal choice of
a few percent of the mean point standard deviation. This comes at a
price of considerable computer time consumption. Even on a fast
computer, a typical computation of a medium-sized geodetic network may
take several minutes.

To read the full-text of this research,

you can request a copy directly from the authors.

... In addition, the cause of global model weakness needs to replace the current model with a new or revised model for the entire sample. In this case, the mixture models usually are used to incorporate both inliers and outliers (Hawkins 1980;Hekimoglu and Koch 1999;Lehmann 2013;Lehmann and Scheffler 2011). Two mixture models commonly are used: the location-contaminated model, and the scale-contaminated model (Lehmann 2013). ...

... This is the so-called mixture model or contaminated distribution (Hekimoglu and Koch 1999;Lehmann 2013;Lehmann and Scheffler 2011;Yang 1991). Fig. 3 illustrates the distribution of inlying, outlying, and total observations in a mixture model. ...

... However, for the mixture model, the BG model parameters are known in advance or estimated simultaneously. It is exactly for this reason that mixture models contain more information about gross errors and have more potential to resist outliers (Hekimoglu and Koch 1999;Koch 2013;Lehmann 2013;Lehmann and Scheffler 2011;Yu et al. 2023). Therefore, the following discussion mainly focuses on the mixture models. ...

First, this paper introduces a statistical model of gross errors, namely the Bernoulli–Gaussian (BG) model, which characterizes the gross error as a product of a Bernoulli variable and a Gaussian variable. The BG model offers a framework to interpret various causes of outliers through the perspective of gross errors. In addition, it unifies commonly used observation models for outliers by adjusting the range of BG model parameters. Second, this paper proposes an estimation method for BG model parameters based on the expectation maximization (EM) algorithm. This approach attributes different gross error parameters for distinct types of observations, facilitating parameter estimation in both single-source and multisource observation systems. Additionally, by organizing equations in the form of individual observations, its applicability can be broadened to both static and dynamic scenarios. Finally, a normal sample example and a Global Navigation Satellite System (GNSS) positioning example verified the effectiveness of the proposed method for estimating the BG model parameters.

... For modern geodetic applications, there are typically large datasets that are very likely to contain multiple outliers. In this case, one of the most commonly used methods is to implement the data snooping procedure iteratively, processing outliers one by one (Mickey et al. 1967; Barnett and Lewis 1978;Gentle 1978), which is known as iterative data snooping (IDS) (Kok 1984;Lehmann and Scheffler 2011;Rofatto et al. 2017;Klein et al. 2022). However, this approach is not theoretically rigorous, since it is assumed that there is only one outlier in each iteration, but this assumption is overthrown immediately in the next iteration (Lehmann and Lösler 2016). ...

... The procedure for screening each observation for an outlier is known as "data snooping" (Kok 1984;Teunissen 2000). Furthermore, in the case of multiple outliers, the data snooping procedure can be implemented iteratively to process outliers one by one, which is known as iterative data snooping (IDS) (Kok 1984;Lehmann and Scheffler 2011;Rofatto et al. 2017;Klein et al. 2022). ...

... i . Note that q min is usually set as 0 to avoid dropping any inliers in most cases (Kok 1984;Lehmann and Scheffler 2011;Rofatto et al. 2017;Klein et al. 2022) As shown in Fig. 7, the iteration procedure in IDS is organized as follows. Assuming in the tth iteration, I 0 (t) and O 0 (t) are the initial inlying and outlying set, respectively, one could then construct test statistics for each data in the inlying set, which are w (t) k , y k ∈ I k and testing set {y k }, in which the elements numbers are q min + t − 1, m − q min − t and 1, respectively. ...

The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the w-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the w-test with its reliability measure is presented, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are proposed, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from the inlying set to the outlying set. Conversely, data refining is then a reverse process to transfer inliers from the outlying set to the inlying one. Both theoretical analysis and practical examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.

... In this paper, we consider iterative data snooping (IDS), which is the most common procedure found in the geodetic practice [12,33]. Most conventional geodetic studies have a chapter on IDS (see, e.g., [34,35]). ...

... The Monte Carlo method provides insights into these cases, in which analytical solutions are too complex to fully understand, are doubted for one reason or another or are not available [12]. The Monte Carlo method for quality control purposes has already been applied in geodesy (see, e.g., [2,10,22,23,33,46,[48][49][50][51]). For in-depth coverage of Monte Carlo methods, consult, for instance, [52][53][54]. ...

... In essence, Monte Carlo replaces random variables with computer ARN, probabilities with relative frequencies and expectations with arithmetic means over large sets of such numbers [12]. A computation with one set of ARN is a Monte Carlo experiment [33]. ...

An iterative outlier elimination procedure based on hypothesis testing, commonly known
as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of modern measurement systems in geodesy and surveying. The test statistic associated with IDS is the extreme normalised least-squares residual. It is well-known in the literature that critical values (quantile values) of such a test statistic cannot be derived from well known test distributions but must be computed numerically by means of Monte Carlo. This paper provides the first results on the Monte Carlo-based critical value inserted into different scenarios of correlation between outlier statistics. From the Monte Carlo evaluation, we compute the probabilities of correct identification, missed detection, wrong exclusion, over-identifications and statistical overlap associated with IDS in the presence of a single outlier. On the basis of such probability levels, we obtain the Minimal Detectable Bias (MDB) and Minimal Identifiable Bias (MIB) for cases in which IDS is in play. The MDB and MIB are sensitivity indicators for outlier detection and identification, respectively. The results show that there are circumstances in which the larger the Type I decision error (smaller critical value), the higher the rates of outlier detection but the lower the rates of outlier identification. In such a case, the larger the Type I Error, the larger the ratio between the MIB and MDB. We also highlight that an outlier becomes identifiable when the contributions of the measures to the wrong exclusion rate decline simultaneously. In this case, we verify that the effect of the correlation between outlier statistics on the wrong exclusion rate becomes insignificant for a certain outlier magnitude, which increases the probability of identification.

... We will show how to improve the computation of such critical values. Lehmann (2010) and Lehmann and Scheffler (2011) pose the problem how to determine the optimal levels of type I error probabilities for global and local tests in data snooping. If these levels are chosen too low then we get too large critical values and many outliers remain undetected. ...

... Those methods have already been applied in outlier detection (e.g. Koch 2007, Lehmann andScheffler 2011). ...

... We make a statement concerning the use of the approach proposed in (Lehmann 2010) and (Lehmann and Scheffler 2011) called "Monte Carlo based data snooping" and show how it relates to the subject of this paper. It finds the optimum level of error probability as follows: for a number of trial levels i , i=1,…,M the posterior variance of the estimated parameters is computed and the optimum *,i.e. the value i , for which the posterior variance of the estimated parameters is minimum, is selected and possibly refined by some interpolation. ...

We investigate extreme studentized and normalized residuals as test statistics for outlier detection in the Gauss–Markov model possibly not of full rank. We show how critical values (quantile values) of such test statistics are derived from the probability distribution of a single studentized or normalized residual by dividing the level of error probability by the number of residuals. This derivation neglects dependencies between the residuals. We suggest improving this by a procedure based on the Monte Carlo method for the numerical computation of such critical values up to arbitrary precision. Results for free leveling networks reveal significant differences to the values used so far. We also show how to compute those critical values for non-normal error distributions. The results prove that the critical values are very sensitive to the type of error distribution.

... In essence, the MMC replaces random variables by computer PRNGs, probabilities by relative frequencies, and expectations by arithmetic means over large sets of such numbers. A computation with one set of PRNG is a Monte Carlo experiment (Lehmann and Sche er;2011), also referred to as the number of Monte Carlo simulations (Altiok and Melamed;2007;Gamerman and Lopes;2006). ...

... In essence, the MMC replaces random variables by computer PRNGs, probabilities by relative frequencies, and expectations by arithmetic means over large sets of such numbers. A computation with one set of PRNG is a Monte Carlo experiment (Lehmann and Sche er;2011), also referred to as the number of Monte Carlo simulations (Altiok and Melamed;2007;Gamerman and Lopes;2006). ...

A Geodetic Network is a network of point interconnected by direction and/or distance measurements or by using Global Navigation Satellite System receivers. Such networks are essential for the most geodetic engineering projects, such as monitoring the position and deformation of man-made structures (bridges, dams, power plants, tunnels, ports, etc.), to monitor the crustal deformation of the Earth, to implement an urban and rural cadastre, and others. One of the most important criteria that a geodetic network must meet is reliability. In this context, the reliability concerns the network's ability to detect and identify outliers. Here, we apply the Monte Carlo Method (MMC) to investigate the reliability of a geodetic network. The key of the MMC is the random number generator. Results for simulated closed levelling network reveal that identifying an outlier is more diicult than detecting it. In general, considering the simulated network, the relationship between the outlier detection and identiication depends on the level of signiicance of the outlier statistical test.

... The basic idea is to approximate probability distributions by frequency distributions of computer random experiments performed using pseudo-random number generators (Koch 2007, Lehmann 2012. A computation with one set of pseudo-random number generators is a Monte Carlo experiment (Lehmann and Scheffler 2011). For more details about the MC, see, for example, Altiok and Melamed (2007), Robert and Casella (2013), Gamerman and Lopes (2006). ...

... Therefore, such obstacles are no longer existing because computing power is not a bottleneck at present (Lehmann 2015). In this context, the MC has been extensively applied in geodesy (Lehmann 1994, Hekimoglu and Koch 1999, Koch 2007, Lehmann and Scheffler 2011, Aydin 2012, Lehmann 2012, Yang et al. 2013, Marx 2015, Prószyński 2015, Koch 2016, Lehmann and Voß-Böhme 2017, Rofatto et al. 2017, Imparato et al. 2018. ...

Over the 50 years of its existence, Baarda’s concept of reliability has been used as a standard practice for the quality control in geodesy and surveying. In this study, we analysed the pioneering work of Baarda (Publ Geod New Ser 2(4) 1967; Publ Geod New Ser 2(5) 1968) and recent studies on the subject. We highlighted that the advent of personal computers with powerful processors has rendered Monte Carlo method as an attractive and cost-effective approach for quality control purposes. We also provided an overview of the latest advances in the reliability theory for geodesy, with particular emphasis on Monte Carlo method.

... Bulletin of Geodetic Sciences, 24 (2): 152-170, Apr-Jun, 2018 example, Ryan and Lachapelle (2001) used simulations to obtain the minimal detectable bias polygon for the case of two outliers; Lehmann and Scheffler (2011) used MCS method to solve the problem how to determine the optimal levels of Type I error probabilities for global and local tests in DS; Lehmann (2012) used MCS method to improve the critical values of the test statistics; Niemeier and Tengen (2017) extended the classical concept of geodetic network adjustment by combining the uncertainty modeling and MCS. ...

... The uniform distribution is a rectangular distribution with constant probability and implies the fact that each range of values that has the same length on the distributions support has equal probability of occurrence (see e.g. Lehmann and Scheffler, 2011). For example, for 10,000 Monte Carlo experiments, if the user choices a magnitude interval of the outliers of |3σ to 9σ|, the probability of a 3σ error occurring is virtually the same as -3σ, and so on. ...

We present a numerical simulation method for designing geodetic networks. The quality criterion considered is based on the power of the test of data snooping testing procedure. This criterion expresses the probability of the data snooping to identify correctly an outlier. In general, the power of the test is defined theoretically. However, with the advent of the fast computers and large data storage systems, it can be estimated using numerical simulation. Here, the number of experiments in which the data snooping procedure identifies the outlier correctly is counted using Monte Carlos simulations. If the network configuration does not meet the reliability criterion at some part, then it can be improved by adding required observation to the surveying plan. The method does not use real observations. Thus, it depends on the geometrical configuration of the network; the uncertainty of the observations; and the size of outlier. The proposed method is demonstrated by practical application of one simulated leveling network. Results showed the needs of five additional observations between adjacent stations. The addition of these new observations improved the internal reliability of approximately 18%. Therefore, the final designed network must be able to identify and resist against the undetectable outliers – according to the probability levels.

... The presented procedures are not the only solutions to the outlying observation problem in geodetic data processing. One can find the other based on or using, for example, Bayesian estimation [44], empirical influence function analysis [41], Monte Carlo simulations [4,47,48], changing stochastic observation model [49], etc. No doubt that their efficiency or effectiveness, hence also a proper choice, depends on a particular geodetic problem or network, the type of observations involved, and also the type, number, location, and magnitude of outliers. ...

Outlying observations are undesirable but possible elements of geodetic measurements. In such a context, the primary and trivial solution is to repeat “suspected” observations. The question arises: what if the measurements cannot be performed again, or if one cannot flag outliers easily and efficiently? In such a case, one should process data by applying methods that consider the possible occurrence of outlying observations. Historically, except for some previous attempts, the statistical approach to robust estimation originates in the 60s of the 20th century and refers to the pioneer papers of Huber, Tukey, Hampel, Hodges, and Lehmann. Also, the statistical procedures known as data snooping (data dredging) were developed at a similar time. It took not a long time before robust procedures were implemented for processing geodetic observations or adjustment of observation systems. The first works of Baarda and Pope encouraged other scientists or surveyors to elaborate robust procedures adapted for geodetic or surveying problems, which resulted in their rapid development in the last two decades of the 20th century. The question for the 21st century is whether robustness is still an important issue relating to modern measurement technologies and numerical data processing. One should realize that modern geodetic techniques do not decrease the probability of outlier occurrence. Considering measurement systems that yield big data, it is almost certain that outliers occur somewhere. The paper reviews different approaches to robust processing of geodetic observations, from the data snooping methods, random sampling, M-estimation, R-estimation, and Msplit estimation to robust estimation of the variance coefficient. Such a variety reflects different natures, origins, or properties of outliers and the apparent fact that there is no best and most efficient and universal robust approach. The methods presented are indeed the basis for future solutions based on, e.g., machine learning.

... For modern geodetic applications, there are typically large datasets that are very likely to contain multiple outliers. In this case, one of the most commonly used methods is to implement the data snooping procedure iteratively, processing outliers one after the other (Mickey et al. 1967; Barnett and Lewis 1978;Gentle 1978), which is known as iterative data snooping (IDS) (Kok 1984;Lehmann and Scheffler 2011;Rofatto et al. 2017;Klein et al. 2022). However, this approach is not theoretically rigorous, since it is assumed that there is only one outlier in each iteration, but this assumption is overthrown immediately in the next iteration (Lehmann and Lösler 2016). ...

The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the w-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and to propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the wtest with its reliability measure is proposed, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are presented, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from a preset inlying set to the outlying set. In contrast, data refining is then a reverse process to transfer inliers from an outlying set to the inlying one. Both theoretical analysis and application examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.

... For modern geodetic applications, there are typically large datasets that are very likely to contain multiple outliers. In this case, one of the most commonly used methods is to implement the data snooping procedure consecutively, processing outliers one after the other (Mickey et al. 1967; Barnett and Lewis 1978;Gentle 1978), which is known as iterative data snooping (IDS) (Kok 1984;Lehmann and Scheffler 2011;Rofatto et al. 2017;Klein et al. 2022). However, this approach is not theoretically rigorous, since it is assumed that there is only one outlier in each iteration, but this assumption is overthrown immediately in the next iteration (Lehmann and Lösler 2016). ...

The issue of dealing with outliers has been a research focus in the field of geodesy. Based on the statistical testing method known as the w-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, these methods may suffer from the masking and swamping effects, thereby limiting their detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and to propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the w-test with its reliability measure is proposed, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are presented, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from a preset inlying set to the outlying set. In contrast, data refining is then a reverse process to transfer inliers from an outlying set to the inlying one. Both theoretical analysis and actual application examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of accuracy loss when dealing with insufficient data.

... However, we may consider two main solutions when processing observation sets including outliers. The first one is data cleaning (in other words, hypothesis testing, e.g., [23][24][25]); the latter approach is the application of robust estimation methods, e.g., robust M-estimation, R-estimation, e.g., [26,27], which are well-known and successfully applied in surveying engineering, e.g., [28][29][30][31][32][33]. Alternatively, one can also use M split estimation, a modern development of M-estimation. ...

Terrestrial laser scanning (TLS) is a modern measurement technique that provides a point cloud in a relatively short time. TLS data are usually processed using different methods in order to obtain the final result (infrastructure or terrain models). Msplit estimation is a modern method successfully applied for such a purpose. This paper addresses the possible application of the method in processing TLS data from two different epochs to model a vertical displacement of terrain resulting, for example, from landslides or mining damages. Msplit estimation can be performed in two variants (the squared or absolute method) and two scenarios (two point clouds or one combined point cloud). One should understand that point clouds usually contain outliers of different origins. Therefore, this paper considers the contamination of TLS data by positive or/and negative outliers. The results based on simulated data prove that absolute Msplit estimation provides better results and overperforms conventional estimation methods (least-squares or robust M-estimation). In practice, the processing of point clouds separately seems to be a better option. This paper proved that Msplit estimation is a compelling alternative to conventional methods, as it can be applied to process TLS data disturbed by outliers of different types.

... (3) In order to have w-test statistics under H 0 , uniformly distributed random number sequences are produced by the Mersenne Twister algorithm, and then they are transformed into a normal distribution by using the Box-Muller transformation (Box and Muller 1958). Box-Muller has already been used in geodesy for Monte Carlo experiments (Lemeshko and Lemeshko 2005;Lehmann and Scheffler 2011;Lehmann 2012). Therefore, a sequence of m random vectors from the pdf assigned to the w-test statistics is generated according to (16). ...

Data Snooping is the most best-established method for identifying outliers in geodetic data analysis. It has been demonstrated in the literature that to effectively user-control the type I error rate, critical values must be computed numerically by means of Monte Carlo. Here, on the other hand, we provide a model based on an artificial neural network. The results prove that the proposed model can be used to compute the critical values and, therefore, it is no longer necessary to run the Monte Carlo-based critical value every time the quality control is performed by means of data snooping.

... evaluation methods such as the Monte Carlo simulation have been widely applied as an alternative for decades (Yang et al. 2013(Yang et al. , 2017Koch 2015;Lehmann and Scheffler 2011;Lehmann 2012), the large computation burden limits most real-time applications, such as for GNSS positioning. In this section, the interdependence of these quality control indices is firstly discussed. ...

Based on the unifying framework of the detection, identification and adaption (DIA) estimators, quality control indices are refined and formulated by taking the uncertainty of the combined estimation-testing procedure into account and performing the propagation of uncertainty. These indices are used to measure the confidence levels of the testing decisions, the reliability of the specified alternative hypothesis models, as well as the biasedness and dispersion of the estimated parameters. A simplified algebraic estimation (SAE) method is developed to calculate these quality control indices for the application of single outlier DIA. Compared to the conventional Monte Carlo simulation method, the proposed SAE method can achieve an adequate estimation accuracy and significantly higher computation efficiency. Using a GNSS single-point positioning example, the performance of the SAE method is evaluated and the quality control of the conventionally used DIA estimator is demonstrated for practical applications.

... We can consider it as a result of variance inflation. The PDF can be expressed as (Lehmann and Scheffler 2011) ...

Ultra-wideband (UWB) is well suited for indoor positioning due to its high resolution and good penetration through objects. As one of nonlinear filter algorithms, unscented Kalman filter (UKF) is widely used to estimate the position. However, UKF cannot resist the effect of outliers. The performance of the filter algorithm will be inevitably influenced. In this study, a robust UKF (RUKF) method accompanied by hypothesis test and robust estimation is proposed. Furthermore, the simulation and measurement experiments are performed to verify the effectiveness and feasibility of the proposed RUKF. Simulation experiment results are given to demonstrate that the RUKF can effectively control the influences of the outliers being treated as systematic errors and large variance random errors. When the outliers come from the thick-tailed distribution, the robust estimation does not play a role, and the RUKF does not work well. The measured experiment results show that the outliers will be generated in the non-line-of-sight environment whose impact is abnormally serious. The robust estimation can provide relatively reliable optimised residuals and control the influences of the outliers caused by gross errors. We can believe that the proposed RUKF is effective to resist the effects of outliers and improves the positioning accuracy.

... We can consider it as a result of variance inflation. e PDF can be expressed as [37] ...

Ultrawideband (UWB) is well-suited for indoor positioning due to its high resolution and good penetration through objects. The observation model of UWB positioning is nonlinear. As one of nonlinear filter algorithms, extended Kalman filter (EKF) is widely used to estimate the position. In practical applications, the dynamic estimation is subject to the outliers caused by gross errors. However, the EKF cannot resist the effect of gross errors. The innovation will become abnormally large and the performance and the reliability of the filter algorithm are inevitably influenced. In this study, a robust EKF (REKF) method accompanied by hypothesis test and robust estimation is proposed. To judge the validity of model, the global test based on Mahalanobis distance is implemented to assess whether the test statistical term exceeds the threshold for outlier detection. To reduce and eliminate the effects of the individual outlier, the robust estimation using scheme III of the Institute of Geodesy and Geophysics of China (IGGIII) based on local test of the normalized residual is performed. Meanwhile, three kinds of stochastic models for outliers are expressed by modeling the contaminated distributions. Furthermore, the simulation and measurement experiments are performed to verify the effectiveness and feasibility of the proposed REKF for resisting the outliers. Simulation experiment results are given to demonstrate that the outliers following all the three kinds of contaminated distributions can be detected. The proposed REKF can effectively control the influences of the outliers being treated as systematic errors and large variance random errors. When the outliers come from the thick-tailed distribution, the robust estimation does not play a role, and the REKF are equivalent to the EKF method. The measured experiment results show that the outliers will be generated in the nonline-of-sight environment whose impact is abnormally serious. The robust estimation can provide relatively reliable optimized residuals and control the influences of the outliers caused by gross errors. We can believe that the proposed REKF is effective to resist the effects of outliers and improves the positioning accuracy compared with least-squares (LS) and EKF method. Moreover, the adaptive filter and ranging error model should be considered to compensate the state model errors and ranging systematic errors respectively. Then, the measurement outliers will be detected more correctly, and the robust estimation will be used effectively.
1. Introduction
High accuracy position information is of great importance in location-based service (LBS). Due to a large bandwidth, ultrawideband (UWB) can obtain high-resolution distance estimation and enables reliable distance estimation [1]. Therefore, UWB is well-suited for indoor positioning applications. The observation model of UWB positioning is nonlinear. The approximate solutions can be obtained iteratively based on Taylor's expansion of nonlinear distance equations [2, 3]. As a standard method for solving general nonlinear equations, the Gauss–Newton iteration is efficient and has a linear convergence rate for points close to the solution [4]. However, in this procedure, only the measurements at the discrete epoch are employed to estimate the positions. This approach wastes the useful state model information which describes the dynamic process. The Kalman filter (KF) has been applied in the area of dynamic positioning. It makes full use of the information of the state model. When the process and measurement noises are Gaussian distribution, it can be proven that KF is unbiased and consistent, and it is optimal in the linear system [5]. In practical applications, most of the dynamic state is nonlinear system, and many nonlinear filter algorithms are developed, such as extended Kalman Filter (EKF), Unscented Kalman Filter (UKF), and Particle Filter (PF) [6]. As a standard filter algorithm of nonlinear estimation, EKF simply linearizes the nonlinear observation equation. The distribution is propagated through the first-order expression of the nonlinear system. Because of neglecting the high order items of the expansion, the linearization error is introduced [7].
In practical applications, an unbiased estimation of the positioning result will be expected. On one hand, the estimation of filter algorithms is affected by the model errors. When the dynamic system model is not accurate, the performance and the reliability of the filter algorithms are inevitably influenced. To compensate the model errors, one common way is to introduce the corresponding parameters into the state and observation functional model [8]. However, this approach may introduce wrong or insufficient parameters. Besides, too many input parameters will lead to the high dimension state vector, and this may result in the increasing computation load and rank defect model [9]. An adaptive fitting method for systematic errors of the observations and kinematic model errors is presented to resist the influences of systematic errors on the estimated states of navigation. The systematic errors are fitted with a mean or a weighted mean by using the residuals of observations and residuals of predicted states within a chosen time window [10, 11]. Another way for compensating the model errors is to introduce suitable covariance matrix in the stochastic model. However, good priori knowledge of the process and measurement information is difficult to obtain, and an ineffective covariance matrix will cause greater error or even filtering divergence. An innovation-based adaptive Kalman filter for integrated navigation is developed, and the adaptive Kalman filter is based on the maximum likelihood criterion for the proper choice of the filter weight [12]. An adaptive fading Kalman filter based on Mahalanobis distance is proposed, and this method has a stronger tracking ability to the true state than the standard Kalman filter in the presence of modeling errors [13].
On the contrary, the positioning estimation is subject to different environmental error factors, including signal blockage, multipath, and thermal noise. According to the cause of the error, it is divided into gross error, systematic error, and random error. The gross error is very different from the assumed stochastic model and is a small probability event. The systematic error which is exact value or changes with the law mainly affects the accuracy of the measurement. The random error occurs randomly and obeys the statistical model. It has a certain influence on the precision. The data is contaminated by the outliers, which is non-Gaussian distribution and heavy-tailed distribution [14]. Two categories of advanced techniques have been developed for treating the observations contaminated by outliers, one is the outlier detection method based on the statistical test and the other is the robust estimation method [15]. Statistical test or model errors, outliers, and biases usually consists of detection, identification, and adaptation (DIA) step which is an important diagnostic tool for data quality control [16]. The DIA method combines parameter estimation with hypothesis test, and parameter estimation is conducted to find estimates of the parameters one is interested in and testing is conducted to remove any biases that may be present [17]. Under the assumption with Gaussian distribution, the weighted sum of squared residuals follows the noncentral Chi-squared distribution for the global model test [18, 19]. To screen each individual observation for an outlier, the data snooping based on the local model test is implemented [20, 21]. A robust Kalman filter scheme based on the Chi-square test is proposed to resist effectively the influence of observation error including the outliers in the actual observations and the heavy-tailed distribution of the observation noise, so robustness can be achieved [22]. Different robust filter approaches are adopted for solving the measurement outliers. A robust Kalman filter based on the m-interval polynomial approximation (MIPA) method for unknown non-Gaussian noise is proposed, and the MIPA Kalman filter is computationally feasible, unbiased, more efficient, and robust [23]. The robust Kalman filter is obtained by Bayesian statistics and by applying a robust M-estimate for rank deficient observation models. The outliers are down-weighted not only in the observations but also in the updated parameters [24]. A general estimator for an adaptively robust filter is developed. This method can not only resist the influence of outlying kinematic model errors but also controls the effect of measurement outliers [25]. An adaptive method with fading memory and a robust method with enhancing memory is proposed in the Kalman filter. The method has the ability of strongly tracking the variation of the state and is insensitive to gross errors in observation [26]. A robust version of the Kalman filter to address process modeling errors in the linear system with rank deficient measurement models is developed using the generalized maximum likelihood estimator (M-estimator) [27]. A robust unscented Kalman filter based on the generalized M-estimation is proposed to improve the robustness of the integrated navigation system. The filter has the ability to suppress the effects of outliers from both the dynamic model and measurements on dynamic state estimates [28].
In this study, the discussion will be restricted to the problem of resisting the observations contaminated by outliers. A robust EKF (REKF) method is proposed accompanied by hypothesis test and robust estimation. The main feature of this proposed method consists of two parts. One is the global test based on Mahalanobis distance for outlier detection. The hypothesis test is carried out for testing the model. The other is the robust estimation using scheme III of the Institute of Geodesy and Geophysics of China (IGGIII) based on the local test of the normalized residual. The outliers are resisted by the equivalent weights according to the discrepancy between the model and the measurements.
The remainder of this paper is organized as follows. In Section 2, the nonlinear least-squares (LS) solution of overdetermined distance equations and the extended Kalman filter are introduced. In Section 3, the REKF method is proposed. Three kinds of contaminated distributions are illustrated. In Section 4, the effectiveness of proposed REKF is verified and analyzed by the numerical examples. In Section 5, the conclusions are summarized.
2. The Extended Kalman Filter
The distance equation of UWB positioning is given as [29]where denotes the observation distance between the tag and anchor; denotes the observation vector; is the Euclidean distance; represents the corresponding random error; and is the known coordinate of anchor. is the coordinate of tag.
When the state model is not considered and the nonlinear equations are overdetermined, the nonlinear solution of distance equation is to find , where , in which represents the residual vector [30].
The approximate solutions can be obtained iteratively based on the linearized distance equations. With a rough initial value, we can obtain the LS solution by the way of Gauss–Newton method with iterations, and the LS solution can be written aswhere is the Jacobian matrix of distance equations and is the direction cosine vector from the tag to the anchor.
The Gauss–Newton method only includes the first-order Taylor expansion of distance equations. The linearization of the positioning observation model results in biased LS estimators. The bias comes from neglected higher order terms, which can be regarded as a systematic error. It will be realized that the parameter estimator tends to be unbiased with a sufficiently small relative ranging error or a good positioning configuration. This makes the bias totally negligible.
When the state model is considered, the state and observation models of the nonlinear system are expressed aswhere and represent the state vector of system at epoch k − 1 and k; is the observation vector of system at epoch k; and is the nonlinear state transition function and observation function; and are process noise and observation noise, both of them are uncorrelated Gaussian white noise, where and are corresponding covariance.
By linearizing the state model and observation model, we combine the linear dynamic system and first-order observation equations, and the model of EKF can be expressed as [9]where and are the state transition and observation matrices, respectively.
The EKF implementation can be written as follows.
The predicted state and covariance matrix can be calculated as
The Kalman filter gain can be written as
The estimated state vector and posterior covariance matrix can be expressed as
3. The Robust Extended Kalman Filter Accompanied by Hypothesis Test and Robust Estimation
The UWB positioning data influenced by outliers do not fulfill the assumed stochastic model of extended Kalman filter, and it can be a potential problem for parameter estimation. A robust extended Kalman filter should be applied to resist the effects of measurement outliers. Firstly, we perform the global test based on Mahalanobis distance for outlier detection. Then, the robust estimation using the IGGIII scheme based on local test of the normalized residual is implemented. Furthermore, the stochastic model for outliers in the measurement process is expressed.
3.1. The Global Test Based on Mahalanobis Distance for Outlier Detection
The global model test is used to detect discrepancies between the measurements and the functional and stochastic models [31]. For outlier detection, a judging index is defined as the square of the Mahalanobis distance from the observation to its prediction, and the hypothesis test is performed by treating the judging index as the test statistic [22]. The test statistic term can be written aswhere denotes the Mahalanobis distance, is the test statistic term, and is the innovation vector.
The hypothesis test can be carried out for testing the model. The null hypothesis is that the stochastic model of parameter estimation is correct, and the observations do not contain gross error and obey Gaussian distribution. The alternative hypothesis is that there is at least one measurement outlier caused by gross error in the data. Then, the probability of null hypothesis being rejected satisfieswhere is predetermined score by the chosen level of significance based on Chi-square distribution table. The small significance level is chosen, the more frequently we will incline to accept the null hypothesis. A good hypothesis test for outlier detection would minimize the probabilities of decision error. The test statistic term can be a measure of significance.
When the innovation is normal, the test statistic term is smaller than the threshold of the test statistic, and should be Chi-square distributed with the dimension of the observation vector as the degree of freedom (DOF). The null hypothesis is confirmative. If is larger than , the value of test statistic will fall in the right tail area of the distribution, and the null hypothesis will be rejected. We can believe that the outliers in the observation do occur and the stochastic model of observation error is not correct. The test statistic term will follow a noncentral Chi-square distribution with noncentral parameter.
3.2. The Robust Estimation Using the IGGIII Scheme Based on Local Test of the Normalized Residual
If the global test rejects the null hypothesis, it shows that the model does not conform with the specifications. We should find and eliminate the individual outlier, and the local test is performed. For the outlier detection, the null hypothesis is that there is no observation affected by outliers. The alternative hypothesis is that there is an outlier in one known observation [32]. The normalized residual of observation in the observation vector constructed as a test statistic term is given aswhere is the prediction residual, is standard deviation of residual, is standard deviation of observation, and is the cofactor matrix corresponding to the prediction residual.
If the observations are contaminated by the outliers, the covariance should be inflated. To control the influence of the outliers, the equivalent weight elements based on the IGGIII which are established based on the M-estimation are applied [33]. In fact, some existing equivalent weight functions can be used to calculate the equivalent weight element. The robust gain matrix factor of Kalman filter can be written as [34]where and are two constants, which are usually determined based on the objective requirement, and represents the th element in the state vector.
3.3. The Stochastic Model for Outliers
The stochastic model is expressed by modeling the distribution and introducing a prior covariance matrix of the observations. We assume the measurement errors obey the normal distribution which is an independent Gaussian variable with zero mean and equal variance. The probability density function (PDF) readswhere is the random variable, is the expectation of the observation which reflects the average value of random variables, and is the standard deviation which indicates the dispersion degree of random variables.
In general, outliers require special attention in data analysis, the outliers are most often caused by gross errors and gross errors are most often caused outliers. The outliers are the result of two mechanisms. One mechanism is that the observations errors obey the normal distribution, but the gross errors follow a different distribution. The gross errors which contribute to the outliers can be treated as systematic errors and large variance random errors. Another one is that both the observations’ random errors and gross errors come from the thick-tailed distribution. The outliers come from the tails of the distribution [31]. Note that an extreme observation may not be an outlier, and it may instead be an indication of skewness of PDF. Three kinds of contaminated distributions can be expressed as follows.
If the observations are contaminated by the gross errors, the outliers are treated as systematic errors, and the PDF can be obtained as location-contaminated normal distribution [35]:where is the gross errors, which is a kind of gross error generated by adding a bias to the random variable, and is the probability of gross errors in the total observations.
If the outliers are treated as random variables with large variance, the PDF is given as scale-contaminated normal distribution [36]:where is the variance of gross errors, and this kind of gross error is obtained by increasing the variance in the random variable.
The Laplace distribution is introduced to express the nonnormal distributed gross error. The Laplace distribution has the heavier tail than the normal distribution. It is also called the double exponential distribution. We can consider it as a result of variance inflation. The PDF can be expressed as [37]where is the scale parameter of the model, and the kurtosis of this distribution is 3.
4. Numerical Examples
In this section, the simulation and measurement experiments are carried out for evaluating the performance of proposed REKF method for UWB Positioning. The LS procedure is performed first to provide the initial state estimates and then the state estimates are updated by the filter. Meanwhile, the individual positioning results of the three methods LS, EKF, and REKF are counted and compared.
4.1. Simulation Verification
In order to realize positioning application, the anchors are distributed in four upper corners of space. The performance of the REKF is analyzed based on three sets of tests with the three kinds of stochastic models for outliers. The positioning accuracy are obtained by calculating the RMS (root mean square). The simulation experimental scene and the test trajectory are shown in Figure 1.
(a)

... In order to have w-test statistics under H 0 , uniformly distributed random number sequences are produced by the Mersenne Twister algorithm, and then they are transformed into a normal distribution by using the Box-Muller transformation [75]. Box-Muller has already been used in geodesy for Monte Carlo experiments [46,76,77]. Therefore, a sequence of m random vectors from the pdf assigned to the w-test statistics is generated according to Equation (35). ...

The reliability analysis allows to estimate the system's probability of detecting and identifying outlier. Failure to identify an outlier can jeopardise the reliability level of a system. Due to its importance, outliers must be appropriately treated to ensure the normal operation of a system. The system models are usually developed from certain constraints. Constraints play a central role in model precision and validity. In this work, we present a detailed optical investigation of the effects of the hard and soft constraints on the reliability of a measurement system model. Hard constraints represent a case in which there exist known functional relations between the unknown model parameters, whereas the soft constraints are employed for the case where such functional relations can slightly be violated depending on their uncertainty. The results highlighted that the success rate of identifying an outlier for the case of hard constraints is larger than soft constraints. This suggested that hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. After identifying and removing possible outliers, one should set up the soft constraints to propagate the uncertainties of the constraints during the data processing. This recommendation is valid for outlier detection and identification purpose.

... In order to have w-test statistics under H 0 , uniformly distributed random number sequences are produced by the Mersenne Twister algorithm, and then they are transformed into a normal distribution by using the Box-Muller transformation [85]. Box-Muller has already been used in geodesy for Monte Carlo experiments [46,86,87]. Therefore, a sequence of m random vectors from the pdf assigned to the w-test statistics is generated according to Equation (A1). ...

In this paper we evaluate the effects of hard and soft constraints on the Iterative Data Snooping (IDS), an iterative outlier elimination procedure. Here, the measurements of a levelling geodetic network were classified according to the local redundancy and maximum absolute correlation between the outlier test statistics, referred to as clusters. We highlight that the larger the relaxation of the constraints, the higher the sensitivity indicators MDB (Minimal Detectable Bias) and MIB (Minimal Identifiable Bias) for both the clustering of measurements and the clustering of constraints. There are circumstances that increase the family-wise error rate (FWE) of the test statistics, increase the performance of the IDS. Under a scenario of soft constraints, one should set out at least three soft constraints in order to identify an outlier in the constraints. In general, hard constraints should be used in the stage of pre-processing data for the purpose of identifying and removing possible outlying measurements. In that process, one should opt to set out the redundant hard constraints. After identifying and removing possible outliers, the soft constraints should be employed to propagate their uncertainties to the model parameters during the process of least-squares estimation.

... In essence, the MCM replaces random variables by PRN, probabilities by relative frequencies, and expectations by arithmetic means over large sets of such numbers. A computation with one set of PRN is a Monte Carlo trial [31], also referred to as the number of Monte Carlo experiments [32]. ...

One of the main tasks of professionals in the Earth sciences is to convert coordinates from one reference frame into another. The coordinates transformation is widely needed and applied in all branches of modern geospatial activities. To convert from one reference frame to another, it is necessary initially to determine the parameters of a coordinate transformation model. A common way to estimate the transformation parameters is using the least-squares theory within a linearized Gauss-Markov Model (GMM). Another approach arises from a numerical method. Here, a Monte Carlo Method (MCM) is used to infer the uncertainty of estimators. This method is based on: (i) the assignment of probability distributions to the coordinates in the two reference frames, (ii) the determination of a discrete representation of the probability distribution for the transformation parameters, and (iii) the determination of the associated uncertainties from this discrete representation of the estimates of the transformation parameters. In this contribution, we compare the weighted least-squares within the GMM (WLSE-GM) and the proposed method regarding the transformation problem of computing 2D similarity transformation parameters. The results show that, the transformation parameters uncertainties are higher for the LS-MC than the WLSE-GM. This is due to the fact that the WLSE-GM solution does not take into account the uncertainties associated with the system matrix. In future studies the Monte Carlo method should be applied to the nonlinear least-squares solution.

... Thus given a m , we need to be able to compute a 1 or k = x a 1 (1, 0), thereby giving A a m = > m i=1 A i a 1 . The procedure of computing a 1 from a m is a problem already addressed in Lehmann (2010Lehmann ( , 2011) and can briefly be described as follows: simulate N samples from the H 0 -distribution of the m-vector of w-test statistics w = [w 1 , w 2 , . . . w m ] T H 0 N (0, Q ww ), with (Q ww ) ii = 1 and (Q ww ) ij = r(w i , w j ) being the correlation between w i and w j . ...

The Minimal Detectable Bias (MDB) is an important diagnostic tool in data quality control. The MDB is traditionally computed for the case of testing the null hypothesis against a single alternative hypothesis. In the actual practice of statistical testing and data quality control, however, multiple alternative hypotheses are considered. We show that this has two important consequences for one's interpretation and use of the popular MDB. First, we demonstrate that care should be exercised in using the single-hypothesis-based MDB for the multiple hypotheses case. Second, we show that for identification purposes, not the MDB, but the Minimal Identifiable Bias (MIB) should be used as the proper diagnostic tool. We analyse the circumstances that drive the differences between the MDBs and MIBs, show how they can be computed using Monte Carlo simulation and illustrate by means of examples the significant differences that one can experience between detectability and identifiability.

... The MCS has already been applied in outlier detection (e.g. Lehmann and Scheffler, 2011;Lehmann, 2012;Klein et al. 2012;Klein et al. 2015aKlein et al. , 2015bErdogan, 2014;Niemeier and Tengen, 2017). Following this line of thought, here our goal was to apply the MCS to analyse the efficiency of the iterative data snooping procedure for the correct identification (or not) of a single simulated outlier at time. ...

William Sealy Gosset, otherwise known as "Student", was one of the pioneers in the development of modern statistical method and its application to the design and analysis of experiments. Although there were no computers in his time, he discovered the form of the "t distribution" by a combination of mathematical and empirical work with random numbers. This is now known as an early application of the Monte Carlo simulation. Today with the fast computers and large data storage systems, the probabilities distribution can be estimated using computerized simulation. Here, we use Monte Carlo simulation to investigate the efficiency of the Baarda's iterative data snooping procedure as test statistic for outlier identification in the Gauss-Markov model. We highlight that the iterative data snooping procedure can identify more observations than real number of outliers simulated. It has a deserved attention in this work. The available probability of over-identification allows enhancing the probability of type III error as well as probably the outlier identifiability. With this approach, considering the analysed network, in general, the significance level of 0.001 was the best scenario to not make mistake of excluding wrong observation. Thus, the data snooping procedure was more realistic when the over-identifications case is considered in the simulation. In the end, we concluded that for GNSS network that the iterative data snooping procedure based on Monte Carlo can locate an outlier in the order of magnitude 4.5σ with high success rate.

... On the other hand, if there are indeed , gross errors then they must be rather extreme to be rejected. The problem of finding a best tradeoff can be solved by Monte-Carlo based data snooping as suggested by Lehmann and Scheffler (2011), but this issue will not be brought up here. For the sake of simplicity Eq. (20) is used here instead. ...

The detection of multiple outliers can be interpreted as a model selection problem. The null model, which indicates an outlier free set of observations, and a class of alternative models, which contain a set of additional bias parameters. A common way to select the right model is the usage of a statistical hypothesis test. In geodesy Baarda's data snooping is most popular. Another approach arises from information theory. Here, the Akaike information criterion (AIC) is used to select an appropriate model for a given set of observations. AIC is based on the Kullback-Leibler divergence, which describes the discrepancy between the model candidates.
Both approaches are discussed and applied to test problems: The fitting of a straight line and a geodetic network. Some relationships between data snooping and information criteria are elaborated. In a comparison it turns out that the information criteria approach is more simple and elegant. But besides AIC there are many alternative information criteria selecting different outliers, and it is not clear, which one is optimal.

... To find the best compromise, the loss function premium and the profit function protection, originally introduced by Anscombe (1960), are applied to geodesy for the first time by Lehmann (2013). Actually already (21) (22) (Lehmann 2010) and (Lehmann and Scheffler 2011) worked with the protection-term, without naming it so, because at that time the term was not known to the authors. Fixed values for α and β, as can be found in the geodetic literature (e.g. ...

The so-called 3 sigma-rule is a simple and widely used heuristic for outlier detection. This term is a generic term of some statistical hypothesis tests whose test statistics are known as normalized or studentized residuals. The conditions, under which this rule is statistically substantiated, were analyzed, and the extent it applies to geodetic least-squares adjustment was investigated. Then, the efficiency or nonefficiency of this method was analyzed and demonstrated on the example of repeated observations.

... The vector L is weighted by the random vector μ ~ N ﴾0, σ 2 ), and its real value is Z = L + μ. The equalization of the vector of observations L will result in weighting the components X ˆ = (x 1 , x 2 , …, x n ), of the vector of parameters X = (x 1 , x 2 , …, x n ) with a certain rate of indeterminacy (Lehmann and Scheffl er, 2011). As additional observations are carried out there occurs a change in the state of the observation system. ...

Abstract: The paper attempts to determine an optimum structure of a directional
measurement and control network intended for investigating horizontal displacements.
For this purpose it uses the notion of entropy as a logarithmical measure of probability of
the state of a particular observation system. An optimum number of observations results
from the difference of the entropy of the vector of parameters � �H(X) corresponding
to one extra observation. An increment of entropy interpreted as an increment of the
amount of information about the state of the system determines the adoption or rejection
of another extra observation to be carried out.

The detection, identification, and adaptation method based on data snooping (DIA-datasnooping) is commonly used to deal with outliers in the Gauss–Markov model. However, the application of DIA-datasnooping might be limited in case of multiple outliers. In this contribution, the Maximum a posteriori (MAP) estimate is applied to the DIA framework, and a DIA method based on MAP (DIA-MAP) is proposed. Based on the prior distribution of gross errors, DIA-MAP chooses the hypothesis with the maximum posterior probability to conduct the detection and identification of outliers. To this end, a hyperparameter determination method based on supervised learning is proposed to find suitable priors for gross errors. With the priors of gross error, DIA-MAP provides a unified DIA process for both single and multiple outliers. Also, the prior can be flexibly adjusted rather than fixed to be uniform, so that the DIA method can be adapted to different application cases. Finally, a set of new evaluation indices for the DIA method with multiple outliers is defined, including True Positive Rate (TPR) which describes the detectability for outliers and True Negative Rate (TNR) which denotes the acceptance ability for inliers. Experimental results of GNSS positioning examples verified that the performance of the proposed DIA-MAP method is superior to the conventional used methods when dealing with multiple outliers.

This contribution introduces a statistical model of gross errors, which is called the Bernoulli-Gaussian (BG) model, where the gross error consists of the product of a Bernoulli variable and a Gaussian variable. First, with the BG model, different causes of outliers can be interpreted from the perspective of gross errors. As well, the commonly used observation models, such as the mean shift model and variance inflation model, can be unified by the BG model, via choosing different values range of model parameters. Second, based on the EM (expectation maximization) algorithm, the estimation method of BG model parameters for linear observation equations is proposed. With this method, the BG model parameters can be estimated in both a static observation system and a dynamic observation system. Finally, normal sample examples and GNSS examples proved that it is effective in estimating the BG model parameters via the the EM algorithm.

Die Erde befindet sich in einem kontinuierlichen Wandel, der aus verschiedenen variierenden dynamischen Prozessen und einwirkenden Kräften resultiert. Die globale Erderwärmung, der Anstieg des Meeresspiegels oder tektonische Verschiebungen sind einige der globalen Phänomene, die diesen Veränderungsprozess sichtbar machen. Um diese Veränderungen besser zu verstehen, deren Ursachen zu analysieren und um geeignete Präventivmaßnahmen abzuleiten, ist ein eindeutiger Raumbezug zwingend notwendig. Der International Terrestrial Reference Frame (ITRF) als globales erdfestes kartesisches Koordinatensystem bildet hierbei die fundamentale Basis für einen eindeutigen Raumbezug, zur Bestimmung von präzisen Satellitenorbits oder zum Detektieren von Verformungen der Erdkruste. Die 2015 verabschiedete Resolution „A global geodetic reference frame for sustainable development“ (A/RES/69/266) der Vereinten Nationen (UN) verdeutlicht den hohen Stellenwert und die Notwendigkeit eines solchen globalen geodätischen Bezugssystems.
Das Global Geodetic Observing System (GGOS) wurde 2003 durch die International Association of Geodesy (IAG) gegründet. „Advancing our understanding of the dynamic Earth system by quantifying our planet’s changes in space and time“ lautet die 2011 formulierte Zielsetzung, auf die alle Arbeiten von GGOS ausgerichtet sind, um die metrologische Plattform für sämtliche Erdbeobachtungen zu realisieren. Die Bestimmung eines globalen geodätischen Bezugsrahmens, der weltweit eine Positionsgenauigkeit von 1mm ermöglicht, ist eine der großen Herausforderungen von GGOS. Das Erreichen dieses Ziels setzt neben der technischen Weiterentwicklung und dem infrastrukturellen Ausbau geodätischer Raumverfahren das Identifizieren und Quantifizieren von systematischen Abweichungen sowohl im lokalen als auch im globalen Kontext voraus.
Die Bestimmung eines globalen geodätischen Bezugsrahmens erfolgt durch eine kombinierte Auswertung aller geodätischen Raumverfahren. Da diese untereinander nur eine geringe physische Verknüpfung aufweisen, stellen lokal bestimmte Verbindungsvektoren, die auch als Local-Ties bezeichnet werden, eine der wesentlichen Schlüsselkomponenten bei der Kombination dar. Ungenaue, fehlerbehaftete und inaktuelle Local-Ties limitieren die Zuverlässigkeit des globalen geodätischen Bezugssystems.
In der vorliegenden Arbeit werden ein Modell sowie verschiedene Lösungsverfahren entwickelt, die eine Verknüpfung der geometrischen Referenzpunkte von Radioteleskopen bzw. Laserteleskopen mit anderen geodätischen Raumverfahren durch prozessbegleitende lokale terrestrische Messungen erlauben. Während Radioteleskope zur Interferometrie auf langen Basislinien (VLBI) verwendet werden, ermöglichen Laserteleskope Entfernungsmessungen zu Erdsatelliten (SLR) oder zum Mond (LLR). Die Bestimmung des geometrischen Referenzpunktes von Laser- und Radioteleskopen ist messtechnisch herausfordernd und erfordert eine indirekte Bestimmungsmethode. Bestehende geometrische Methoden sind entweder auf eine bestimmte Teleskopkonstruktion beschränkt oder erfordern ein spezielles Messkonzept, welches ein gezieltes Verfahren des Teleskops voraussetzt. Die in dieser Arbeit hergeleitete Methode weist keine konstruktionsbedingten Restriktionen auf und erfüllt zusätzlich alle Kriterien der durch das GGOS angeregten prozessintegrierten in-situ Referenzpunktbestimmung. Hierdurch wird es möglich, den Referenzpunkt kontinuierlich und automatisiert zu bestimmen bzw. zu überwachen.
Um die Zuverlässigkeit von VLBI-Daten zu erhöhen und um die Zielsetzung von 1mm Positionsgenauigkeit im globalen Kontext zu erreichen, wird das bestehende VLBI-Netz gegenwärtig durch zusätzliche Radioteleskope unter dem Namen VLBI2010 Global Observing System (VGOS) erweitert. Die hierbei entstehenden VGOS-Radioteleskope zeichnen sich u. a. durch eine sehr kompakte Bauweise und hohe Rotationsgeschwindigkeiten aus. Weitgehend ununtersucht ist das Eigenverformungsverhalten dieser Teleskope. Während für konventionelle Radioteleskope bspw. Signalwegänderungen von z. T. mehreren Zentimetern dokumentiert sind, existieren nur wenige vergleichbare Studien für VGOS-Radioteleskope. Hauptgründe sind zum einen die erhöhten Genauigkeitsanforderungen und zum anderen fehlende Modelle zur Beschreibung der Reflektorgeometrien, wodurch eine direkte Übertragung bisheriger Mess- und Analyseverfahren erschwert wird.
In dieser Arbeit werden für VGOS-spezifizierte Radioteleskope Modelle erarbeitet, die eine geometrische Beschreibung der Form des Haupt- und Subreflektors ermöglichen. Basierend auf diesen Modellen lassen sich u. a. Änderungen der Brennweite oder Variationen der Strahllänge infolge von lastfallabhängigen Deformationen geometrisch modellieren. Hierdurch ist es möglich, wesentliche Einflussfaktoren zu quantifizieren, die eine Variation des Signalweges hervorrufen und unkompensiert vor allem zu einer systematischen Verfälschung der vertikalen Komponente der Stationskoordinate führen.
Die Wahl eines geeigneten Schätzverfahrens, um unbekannte Modellparameter aus überschüssigen Beobachtungen abzuleiten, wird häufig als trivial und gelöst angesehen. Im Rahmen dieser Arbeit wird gezeigt, dass neben messprozessbedingten systematischen Abweichungen auch systematische Abweichungen durch das gewählte Schätzverfahren entstehen können. So resultieren aus der Anwendung eines Schätzverfahrens, welches ausschließlich in linearen Modellen Gültigkeit besitzt, i.A. keine erwartungstreuen Schätzwerte bei nichtlinearen Problemstellungen. Insbesondere in der Formanalyse des Hauptreflektors eines VLBI-Radioteleskops zeigt sich, dass die resultierenden Schätzwerte verzerrt sind, und diese Verzerrungen Größenordnungen erreichen, die als kritisch zu bewerten sind.

For more than half a century, the reliability theory introduced by Baarda (1968) has been used as a standard practice for quality control in geodesy and surveying. Although the theory meets mathematical rigor and probability assumptions, it was originally developed for a Data-Snooping which assumes a specific observation as a suspect outlier. In other words, only one single alternative hypothesis is in play. Actually, we do not know which observation is an outlier. Since the Data-Snooping consists of screening each individual measurement for an outlier, a more appropriate alternative hypothesis would be: “There is at least one outlier in the observations”. Now, we are interested to answer: “Where?”. The answer to this question lies in a problem of locating among the alternative hypotheses the one that led to the rejection of the null hypothesis. Therefore, we are interested in identifying the outlier. Although advances have occurred over that period, the theories presented so far consider only one single round of the Data-Snooping procedure, without any subsequent diagnosis, such as removing the outlier. In fact, however, Data-Snooping is applied iteratively: after identification and elimination of the outlier, the model is reprocessed, and outlier identification is restarted. This procedure of iterative outlier elimination is known as Iterative Data-Snooping (IDS). Computing the probability levels associated with IDS is virtually impossible to those analytical methods usually employed in conventional tests, such as, overall model test and Data-Snooping of only one single alternative hypothesis. Because of this, a rigorous and complete reliability theory was not yet available. Although major advances occurred in the mid-1970s, such as microprocessorbased computers, Baarda had a disadvantage: the technology of his time was insufficient to use intelligent computational techniques. Today, the computational scenario is completely different from the time of Baarda’s theory of reliability. Here, following the current trend of modern science, we can use intelligent computing and extend the reliability theory when the DSI is in play. We show that the estimation depends on the test and the adaptation and, therefore, the IDS is, in fact, an estimator. Until the present, no study has been conducted to evaluate empirically the accuracy of the Monte Carlo for quality control purposes in geodesy. Generally, only the degree of dispersion of the Monte Carlo is considered. Thus, an issue remains: how can we find the optimal number of Monte Carlo experiments for quality control purpose? Here, we use an exact theoretical reference probabilities to answer this question. We find that that the number of experiments m = 200, 000 can provide consistent results with sufficient numerical precision for outlier identification, with a relative error less than 0.1%. The test statistic associated with IDS is the extreme normalised least-squares residual. It is well-known in the literature that critical values (quantile values) of such a test statistic cannot be derived from well-known test distributions but must be computed numerically by means of Monte Carlo. This paper provides the first results on the Monte Carlo-based critical value inserted into different scenarios of correlation between outlier statistics. We also tested whether increasing the level of the family-wise error rate, or reducing the critical values, improves the identifiability of the outlier. The results showed that the lower critical value, or the higher the family-wise error rate, the larger the probability of correct detection, and the smaller the MDB. However, this relationship is not valid in terms of identification. We also highlight that an outlier becomes identifiable when the contributions of the observations to the wrong exclusion rate (Type III error) decline simultaneously. In this case, we verify that the effect of the correlation between outlier statistics on the wrong exclusion rate becomes insignificant for a certain outlier magnitude, which increases the probability of identification.

An iterative outlier elimination procedure based on hypothesis testing, commonly known as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of the modern measurement systems in geodesy and surveying. The test statistic associated with IDS is the extreme normalised least-squares residual. It is well-known in the literature that critical values (quantile values) of such a test statistic cannot be derived from well-known test distributions, but must be computed numerically by means of Monte Carlo. This paper provides the first results about Monte Carlo-based critical value inserted to different scenarios of correlation between the outlier statistics. From the Monte Carlo evaluation, we compute the probabilities of correct identification, missed detection, wrong exclusion, overidentifications and statistical overlap associated with IDS in the presence of a single outlier. Based on such probability levels we obtain the Minimal Detectable Bias (MDB) and Minimal Identifiable Bias (MIB) for the case where IDS is in play. MDB and MIB are sensitivity indicators for outlier detection and identification, respectively. The results show that there are circumstances that the larger the Type I decision error (smaller critical value), the higher the rates of outlier detection, but the lower the rates of outlier identification. For that case, the larger the Type I Error, the larger the ratio between MIB and MDB. We also highlight that an outlier becomes identifiable when the contribution of the measures to the wrong exclusion rate decline simultaneously. In that case, we verify that the effect of the correlation between the outlier statistics on the wrong exclusion rates becomes insignificant from a certain outlier magnitude, which increases the probability of identification.

Today with the fast and powerful computers, large data storage systems and modern softwares, the probabilities distribution and efficiency of statistical testing algorithms can be estimated using computerized simulation. Here, we use Monte Carlo simulation (MCS) to investigate the power of the test and error probabilities of the Baarda’s iterative data snooping procedure as test statistic for outlier identification in the Gauss-Markov model. The MCS discards the use of the observation vector of Gauss-Markov model. In fact, to perform the analysis, the only needs are the Jacobian matrix; the uncertainty of the observations; and the magnitude intervals of the outliers. The random errors (or residuals) are generated artificially from the normal statistical distribution, while the size of outliers is randomly selected using standard uniform distribution. Results for simulated closed leveling network reveal that data snooping can locate an outlier in the order of magnitude 5σ with high success rate. The lower the magnitude of the outliers, the lower is the efficiency of data snooping in the simulated network. In general, considering the network simulated, the data snooping procedure was more efficient for α=0.01 (1%) with 82.8% success rate.

With the increasing automation of the processes of geodetic data acquisition, it is nearly impossible that the set of observations is free of outliers. One of the ways often employed for the treatment of observations contaminated by outliers is based on statistical hypothesis testing. Because it is a strategy formulated on the basis of statistical hypothesis, it may lead to a false decision. Although in theory it is possible to describe the probability density functions of the wrong decisions, in practice the algebraic computation of such functions is highly complex. However, today we have fast and power computers at our disposal that allow to accomplish this. Here, we briefly introduce the computer simulation based on Monte Carlo method. An objective discussion of the issue found in the literature followed by simple numerical example is presented in order to clarify the Monte Carlo simulation method.

For deriving the robust estimation by the EM (expectation maximization) algorithm for a model, which is more general than the linear model, the nonlinear Gauss Helmert (GH) model is chosen. It contains the errors-in-variables model as a special case. The nonlinear GH model is difficult to handle because of the linearization and the Gauss Newton iterations. Approximate values for the observations have to be introduced for the linearization. Robust estimates by the EM algorithm based on the variance-inflation model and the mean-shift model have been derived for the linear model in case of homoscedasticity. To derive these two EM algorithms for the GH model, different variances are introduced for the observations and the expectations of the measurements defined by the linear model are replaced by the ones of the GH model. The two robust methods are applied to fit by the GH model a polynomial surface of second degree to the measured three-dimensional coordinates of a laser scanner. This results in detecting more outliers than by the linear model.

The concept of outlier detection by statistical hypothesis testing in geodesy is briefly reviewed. The performance of such tests can only be measured or optimized with respect to a proper alternative hypothesis. Firstly, we discuss the important question whether gross errors should be treated as non-random quantities or as random variables. In the first case, the alternative hypothesis must be based on the common mean shift model, while in the second case, the variance inflation model is appropriate. Secondly, we review possible formulations of alternative hypotheses (inherent, deterministic, slippage, mixture) and discuss their implications. As measures of optimality of an outlier detection, we propose the premium and protection, which are briefly reviewed. Finally, we work out a practical example: the fit of a straight line. It demonstrates the impact of the choice of an alternative hypothesis for outlier detection.

A mixture of normal distributions is assumed for the observations of a linear model. The first component of the mixture represents the measurements without gross errors, while each of the remaining components gives the distribution for an outlier. Missing data are introduced to deliver the information as to which observation belongs to which component. The unknown location parameters and the unknown scale parameter of the linear model are estimated by the EM algorithm, which is iteratively applied. The E (expectation) step of the algorithm determines the expected value of the likelihood function given the observations and the current estimate of the unknown parameters, while the M (maximization) step computes new estimates by maximizing the expectation of the likelihood function. In comparison to Huber’s M-estimation, the EM algorithm does not only identify outliers by introducing small weights for large residuals but also estimates the outliers. They can be corrected by the parameters of the linear model freed from the distortions by gross errors. Monte Carlo methods with random variates from the normal distribution then give expectations, variances, covariances and confidence regions for functions of the parameters estimated by taking care of the outliers. The method is demonstrated by the analysis of measurements with gross errors of a laser scanner.

COPYRIGHT c 2004 by MERCEL DEKKER, INC.

L1 norm estimator has been widely used as a robust parameter estimation method for outlier detection. Different algorithms have been applied for L1 norm minimization among which the linear programming problem based on the simplex method is well known. In the present contribution, in order to solve an L1 norm minimization problem in a linear model, an interior point algorithm is developed which is based on Dikin's method. The method can be considered as an appropriate alternative for the classical simplex method, which is sometimes time-consuming. The proposed method, compared with the simplex method, is thus easier for implementation and faster in performance. Furthermore, a recursive form of the Dikin's method is derived, which resembles the recursive least-squares method. Two simulated numerical examples show that the proposed algorithm gives as accurate results as the simplex method but in considerably less time. When dealing with a large number of observations, this algorithm can thus be used instead of the iteratively reweighted least-squares method and the simplex method.

This revised book provides a thorough explanation of the foundation of robust methods, incorporating the latest updates on R and S-Plus, robust ANOVA (Analysis of Variance) and regression. It guides advanced students and other professionals through the basic strategies used for developing practical solutions to problems, and provides a brief background on the foundations of modern methods, placing the new methods in historical context. Author Rand Wilcox includes chapter exercises and many real-world examples that illustrate how various methods perform in different situations. Introduction to Robust Estimation and Hypothesis Testing, Second Edition, focuses on the practical applications of modern, robust methods which can greatly enhance our chances of detecting true differences among groups and true associations among variables. * Covers latest developments in robust regression * Covers latest improvements in ANOVA * Includes newest rank-based methods * Describes and illustrated easy to use software.

The standard reference in uncertainty modeling is the “Guide to
the Expression of Uncertainty in Measurement (GUM)”. GUM groups
the occurring uncertain quantities into “Type A” and
“Type B”. Uncertainties of “Type A” are
determined with the classical statistical methods, while “Type
B” is subject to other uncertainties which are obtained by
experience and knowledge about an instrument or a measurement process.
Both types of uncertainty can have random and systematic error
components. Our study focuses on a detailed comparison of probability
and fuzzy-random approaches for handling and propagating the different
uncertainties, especially those of “Type B”. Whereas a
probabilistic approach treats all uncertainties as having a random
nature, the fuzzy technique distinguishes between random and
deterministic errors. In the fuzzy-random approach the random components
are modeled in a stochastic framework, and the deterministic
uncertainties are treated by means of a range-of-values search problem.
The applied procedure is outlined showing both the theory and a
numerical example for the evaluation of uncertainties in an application
for terrestrial laserscanning (TLS).

This revised book provides a thorough explanation of the foundation of robust methods, incorporating the latest updates on R and S-Plus, robust ANOVA (Analysis of Variance) and regression. It guides advanced students and other professionals through the basic strategies used for developing practical solutions to problems, and provides a brief background on the foundations of modern methods, placing the new methods in historical context. Author Rand Wilcox includes chapter exercises and many real-world examples that illustrate how various methods perform in different situations. Introduction to Robust Estimation and Hypothesis Testing, Second Edition, focuses on the practical applications of modern, robust methods which can greatly enhance our chances of detecting true differences among groups and true associations among variables. * Covers latest developments in robust regression * Covers latest improvements in ANOVA * Includes newest rank-based methods * Describes and illustrated easy to use software.

The survey of the gravity field of the Earth is interpreted as a process of communication. The information inferred from the data is represented in the form of geopotential models. The paper presents a quantitative analysis of this information for spherical harmonic expansions of the potential in terms of information measures, particularly the first
Kullback-Leibler
information number for continuous random vectors. Common degree variance models are used for the construction of prior information. The informational viewpoint is compared to the usual interpretation in terms of errors or error degree variances.

This textbook on theoretical geodesy deals with the estimation of unknown parameters, the testing of hypothesis and the estimation of intervals in linear models. The reader will find presentations of the Gauss-Markoff model, the analysis of variance, the multivariate model, the model with unknown variance and covariance components and the regression model, as well as the mixed model for estimation random parameters. To make the book self-contained most of the necessary theorems of vector and matrix-algebra and the probability distributions for the test statistics are derived. Students of geodesy, as well as of mathematics and engineering, will find the geodetical application of mathematical and statistical models extremely useful.

A testing procedure for use in geodetic networks, Neth-erlands Geodetic Commission A note on the generation of random normal deviates

- W Baarda

Baarda, W., A testing procedure for use in geodetic networks, Neth-erlands Geodetic Commission, Publication on Geodesy, Delft, Netherlands, 2:5, 1968. Bessel, F. W., Fundamenta Astronomiae, Nicolovius Kö, East Prussia, 1818. Box, G. E. P. and Muller, M. E., A note on the generation of random normal deviates, Ann. Math. Stat. 29:2 (1958), 610– 611.

Ein Verfahren zur Abstimmung der Signifikanzniveaus fü r allgemeine F m; n -verteilte Teststatistiken, Teil I: Theorie

- M Hahn
- B Heck
- R Jäger
- R Scheuring

Hahn, M., Heck, B., Jäger, R. and Scheuring, R., Ein Verfahren zur
Abstimmung der Signifikanzniveaus fü r allgemeine F m; n -verteilte
Teststatistiken, Teil I: Theorie, Z. Vermessungswesen 114 (1989),
234-248.

Ein Verfahren zur Abstimmung der Signifikanzniveaus fü r allgemeine F m; n -verteilte Teststatistiken, Teil II: Anwendungen

- M Hahn
- B Heck
- R Jäger
- R Scheuring

Hahn, M., Heck, B., Jäger, R. and Scheuring, R., Ein Verfahren zur
Abstimmung der Signifikanzniveaus fü r allgemeine F m; n -verteilte
Teststatistiken, Teil II: Anwendungen, Z. Vermessungswesen 116
(1991), 15-26.

Recursive algorithm for L1 norm estimation in linear models Parameter Estimation and Hypothesis Testing in Lin-ear Models Outlier Detection in Observations Including Leverage Points by Monte Carlo Simulations

- A Khodabandeh
- A R Simkooei
- K R Koch

Khodabandeh, A. and Amiri-Simkooei, A. R., Recursive algorithm for L1 norm estimation in linear models, J. Surv. Eng. 137 (2011), 1. Koch, K. R., Parameter Estimation and Hypothesis Testing in Lin-ear Models, Springer Verlag, Berlin, Heidelberg, New York, 1999. Koch, K. R., Outlier Detection in Observations Including Leverage Points by Monte Carlo Simulations, Allgemeine Vermessungs-nachrichten, VDE Verlag Berlin O¤enbach, 2007.

Gö tterdämmerung over least squares adjustment

- T Krarup
- J Juhl
- K Kubik

Krarup, T., Juhl, J. and Kubik, K., Gö tterdämmerung over least
squares adjustment, Proceedings of the 14th Congress of the International Society of Photogrammetry, Hamburg, vol. B3, 369378, 1980.

Ausgleichung in nichtlinearen Modellen mittels adaptiver Monte-Carlo-Integration, Allgemeine Vermessungsnachrichten

- R Lehmann

Lehmann, R., Ausgleichung in nichtlinearen Modellen mittels adaptiver Monte-Carlo-Integration, Allgemeine Vermessungsnachrichten, VDE Verlag Berlin O¤enbach, 1994.

Normierte Verbesserungen-wie groß ist zu groß?, Allgemeine Vermessungsnachrichten

- R Lehmann

Lehmann, R., Normierte Verbesserungen-wie groß ist zu groß?,
Allgemeine Vermessungsnachrichten, VDE Verlag Berlin O¤enbach, 2010.

Zur Grobfehlererkennung in geo-dä Deformationsnetzen

- R Lehmann
- T Sche¿er

Lehmann, R. and Sche¿er, T., Zur Grobfehlererkennung in geo-dä Deformationsnetzen, in: Sroka, A. and Wittenburg, R., editors, 7. Geokinematischer Tag, Verlag Glü ckauf, Essen, (2006), 173–183.

The statistics of residuals and the detection of outliers

- A J Pope

Pope, A. J., The statistics of residuals and the detection of outliers,
NOAA Technical Report NOS65 NGS1, US Department of
Commerce, National Geodetic Survey, Rockville, MD, 1976.

Outlier Detection in Observations Including Leverage Points by Monte Carlo Simulations, Allgemeine Vermessungsnachrichten

- K R Koch

Koch, K. R., Outlier Detection in Observations Including Leverage
Points by Monte Carlo Simulations, Allgemeine Vermessungsnachrichten, VDE Verlag Berlin O¤enbach, 2007.

Klassische und robuste Ausgleichungsverfahren -Ein Leitfaden fü r Ausbildung und Praxis von Geodäten und Geoinformatikern

- R Jäger
- T Mü Ller
- H Saler
- R Schwäble

Jäger, R., Mü ller, T., Saler, H. and Schwäble, R., Klassische und
robuste Ausgleichungsverfahren -Ein Leitfaden fü r Ausbildung
und Praxis von Geodäten und Geoinformatikern, Herbert-Wichmann-Verlag, Heidelberg, 2005.

Normierte Verbesserungen -wie groß ist zu groß?

- R Lehmann

Lehmann, R., Normierte Verbesserungen -wie groß ist zu groß?,
Allgemeine Vermessungsnachrichten, VDE Verlag Berlin O¤en-bach, 2010.