ArticlePDF Available

Sensitivity of robust estimators applied in strategy for testing stability of reference points. EIF approach

Authors:

Abstract and Figures

In deformation analyses, it is important to find a stable reference frame and therefore the stability of the possible reference points must be controlled. There are several methods to test such stability. The paper's objective is to examine one of such methods, namely the method based on application of R -estimation, for its sensitivity to gross errors. The method in question applies three robust estimators, however, it is not robust itself. The robustness of the method depends on the number of unstable points (the fewer unstable points there are, the more robust is the proposed method). Such property makes it important to know how the estimates applied and the strategy itself respond to a gross error. The empirical influence functions (EIF) can provide necessary information and help to understand the response of the strategy for a gross error. The paper presents examples of EIFs of the estimates, their application in the strategy and describes how important and useful is such knowledge in practice.
Content may be subject to copyright.
GEODESY AND CARTOGRAPHY c
Polish Academy of Sciences
Vol. 60, No 2, 2011, pp. 123-134
Sensitivity of robust estimators applied in strategy for testing
stability of reference points. EIF approach
Robert Duchnowski
Institute of Geodesy, University of Warmia and Mazury
1 Oczapowskiego St., 10-957 Olsztyn, Poland
e-mail: robert.duchnowski@uwm.edu.pl
Received: 25 May 2011/Accepted: 13 October 2011
Abstract: In deformation analyses, it is important to find a stable reference frame and
therefore the stability of the possible reference points must be controlled. There are several
methods to test such stability. The paper’s objective is to examine one of such methods,
namely the method based on application of R-estimation, for its sensitivity to gross errors.
The method in question applies three robust estimators, however, it is not robust itself.
The robustness of the method depends on the number of unstable points (the fewer unstable
points there are, the more robust is the proposed method). Such property makes it important
to know how the estimates applied and the strategy itself respond to a gross error. The em-
pirical influence functions (EIF) can provide necessary information and help to understand
the response of the strategy for a gross error. The paper presents examples of EIFs of the
estimates, their application in the strategy and describes how important and useful is such
knowledge in practice.
Keywords: displacement, stability, R-estimation, empirical influence function
1. Introduction
Analysis of deformation or displacement is a complex surveying problem. The foun-
dations and methodology of such analyses are known well (see, e.g. Prószyński and
Kwaśniak, 2006), however, the theory as well as practical techniques are still developed
(e.g. Hekimoglu et al., 2010), and new methods are proposed to solve certain problems
that also concern deformation (e.g. Denli, 2008; Wiśniewski, 2009; Duchnowski, 2010).
It follows the increasing accuracy of surveying measurements, new technologies applied
as well as the requirement of analysing deformation more frequently and with higher
precision.
Deformation or displacement of certain points can be determined and analysed on
the basis of measurement results obtained at least at two different epochs. Usually the
observation sets are required and assumed to be free of outliers, namely observations
that are affected with gross errors. However, gross errors may sometimes occur and
spoil deformation analyses. Therefore, it is necessary to apply special procedures to
find and reject outliers (e.g. Baarda, 1968; Ding and Coleman, 1996; Gui et al., 2007)
124 Robert Duchnowski
or to use robust methods of adjustment. The problem of how to identify and cope with
outliers is especially complex in the case of deformation analysis. This is due to the
fact that tested points can really displace, which results in changes of observation at the
second epoch. The problem is how to distinguish between such expected changes and
unexpected gross errors (Shaorong, 1990). Another problem is that the robust proce-
dures or methods are not always efficient, and their robustness, effectiveness or success
depend on many factors (see, e.g. Hekimoglu and Erenoglu, 2007; Prószyński, 2010).
Thus, it may happen that gross errors still influence results of analyses, regardless of
methods applied. Therefore, it is important to know and understand how such errors
may influence estimation process (e.g. Gui et al., 2011) or its results (e.g. Duchnowski,
2011).
Duchnowski (2008, 2009) proposed to apply estimates based on rank tests (R-esti-
mates) to analyse deformation of geodetic networks. The main advantage of the esti-
mators in question is their robustness against outlying observations. This property was
the basis for the strategy for testing of stability of possible reference marks presented
in (Duchnowski, 2010). Since this problem becomes complicated when some possible
reference marks are not stable then application of R-estimate only is good enough in the
simplest cases. Generally, R-estimates of point displacements should be supported by
some robust estimates of standard deviation to find stable points properly. Duchnowski
(2010) proposed to use two such estimates, namely MAD (median absolute deviation)
and ADM (average distance to the median). They both are classified as estimates robust
against outliers but they are robust in different ways (Rousseeuw and Verboven, 2002;
Duchnowski, 2010, 2011).
Robustness of an estimate can be studied by applying two main statistical tools, i.e.
breakdown values or influence functions. The breakdown values give an answer to the
question of how many outliers can make the estimate fail. There are several kinds of
breakdown values that can be applied: contamination breakdown values, replacement
breakdown values or subjective breakdown points (see, e.g. Rousseeuw and Croux,
1993; Rousseeuw and Verboven, 2002; Xu, 2005). Duchnowski (2011) applied the
most convenient kind of breakdown values, namely replacement ones, to investigate
robustness of all estimates applied in the strategy. All the estimates in question are
robust against gross errors, however, they do not guarantee robustness of the strategy
itself. Generally, the higher number of unstable points, the less gross errors the strategy
can withstand. Thus, the robustness of the strategy depends on the number of unstable
reference marks (Duchnowski, 2011). For that reason it is important to understand how
gross errors may influence the final strategy results.
The second way to investigate robust estimates is application of influence functions
(IF). From a theoretical point of view such functions can be based on some assump-
tions concerning distributions of random variables and properties of estimates that
are investigated (see, e.g. Huber, 1981; Hampel et al., 1986; Rousseeuw and Croux,
1993). In practice, it is also very useful to apply some versions of IF, namely empirical
influence function (EIF) or its stylised variant (SEIF). In general, all these functions
describe what happens with the estimate if an observation set is disturbed by a gross
Sensitivity of robust estimators.. . 125
error (see, e.g. Huber, 1981; Hampel et al., 1986; Rousseeuw and Verboven, 2002).
Thus, EIF can be applied to study how estimates that are used in the strategy respond
to a gross error that may affect measurement results.
2. Strategy for testing stability of reference marks based on R-estimates
The main aim of the present paper is to investigate how the strategy for testing stability
of reference marks, which was proposed in (Duchnowski, 2010), responds to a single
gross error. This is very important from the practical point of view, especially when
considering robustness of the estimates that are applied, namely R-estimate of the
expected value, MAD and ADM, which was examined from a theoretical point of view
in (Duchnowski, 2011). Before the strategy is tested the brief review is presented.
The basis for the strategy is the following R-estimate of the vertical displacement
(Duchnowski, 2009, 2010):
ˆ
R
k=med h˜
vk
2iih˜
vk
1ij(1)
where 1 6i6n, 1 6j6n, and ˜
vk
1Rn×1and ˜
vk
2Rn×1are the vectors of the initial
residuals of the observations that concern the kth element of the parameter vector in
two measurement epochs, respectively; here [˜
vk
2]iis the ith element of the vector ˜
vk
2.
The initial residuals are computed on the basis of the initial parameter vector ˜
xRm×1
and the functional model of a geodetic network in the following form:
˜
v=yx (2)
where yRn×1is the observation vector, ARn×mis a known rectangular matrix.
Of course to compute the vectors of the initial residuals ˜
v1and ˜
v2one should apply two
vectors of the observations y1and y2that are obtained at two different measurement
epochs, respectively. Usually, if a levelling network is considered, the parameter vector
xcontains the heights of the network points. The vector ˜
xof their initial values can
be taken from the former computations or computed from the first epoch observations
(Duchnowski, 2009). Thus, vectors ˜
vk
1and ˜
vk
2contain the initial residual to these height
differences, for which the kth point is one of the network vertices. Such initial residuals
fulfil some theoretical assumptions concerning their distribution and enable to apply
R-estimation in deformation analyses (Duchnowski, 2008, 2010).
In general, the stability of a potential reference point is tested on the basis of the
estimate of Eq. (1). If the value of such estimated vertical displacement is acceptable
considering random errors of the measurements then such point can be regarded as
stable. Such approach is of course sufficient only in simple cases (Duchnowski, 2010).
Generally, the R-estimates must be supported by some robust estimates of the standard
deviation to correctly identify the stable reference frame. Here, such estimates are
applied to analyse the standard deviation of the sample created from the elements of
the vector ˜
vk
2. From a theoretical point of view, the standard deviations of such samples
126 Robert Duchnowski
should be equal to the known or assumed accuracy of the measurements unless there
are some outliers in the sample (or if it is too many outliers considering robustness
of the estimates). Thus, they provide information about contaminations of vectors
˜
vk
2with outlying observations which helps to find unstable points (see next section).
Of course, such knowledge is essential in more complicated cases to verify if the
estimated vertical displacement is a true one, or it is a consequence of the vertical
displacements of the other points (Duchnowski, 2010, 2011). The estimates in question
can also be computed for the vectors ˜
vk
1to check if these vectors are free of outliers.
Duchnowski (2010) proposed to use two estimates of standard deviation, namely
MAD (median absolute deviation, or more formally the median distance to the median)
M AD(Z)=1.4826
n
med
i=1
|zimed(zi)|(3)
and ADM (the average distance to the median)
ADM(Z)=n
ave
i=1
|zimed(zi)|(4)
where Zis a random variable and z1,z2, . . . , znis a sample containing its realisations.
Such choice resulted from properties of both estimates. In some critical cases these
estimates are good supplements for each other, and comparison of their values can
provide valuable information about outliers and their locations (see, e.g. Rousseeuw
and Verboven, 2002; Duchnowski, 2010, 2011). Detailed description of the strategy
can be found in (Duchnowski, 2010).
3. Outliers, their role and influences on the strategy results
It was already mentioned that the strategy is based on robustness against outliers of
three estimates applied. In fact, it is sometimes based on the lack of the robustness,
which helps to analyse more complicated cases, for example when two of four possible
reference marks are not stable (Duchnowski, 2010). To understand the role of outliers,
which may occur in the samples created from the elements of the vectors ˜
vk
2, one
should first consider their sources (the vector ˜
vk
1is assumed to be free of outliers).
The obvious source of outlying observations are gross errors, however, it is not the
only one. Let us now assume that all height differences between possible reference
marks are measured twice, namely once at every epoch. In such case, if one of the
possible reference marks, for example point number i, is unstable then it becomes a
source of outlying observations in all vectors ˜
vk
2(where k,i) (Duchnowski, 2011).
This is a very important property that helps to identify unstable points correctly. For
example, if there is only one unstable point, say the point number m, then the vector ˜
vm
2
is free of outliers that result from such instability while other vectors ˜
vk
2(where k,m)
contain one such type outlier. Thus, if one knows the number of outlying observations
in all vectors ˜
vk
2then the unstable point can be found much easier. However, in some
Sensitivity of robust estimators.. . 127
cases it may also result in increased sensitivity of the strategy to gross errors, namely
it might be hard to distinguish two types of outliers from each other. This can happen
when there are more unstable points, and therefore there are many outlying observations
of the second kind, namely resulting from instability of certain reference marks. Then
neither the estimates nor the strategy can absorb any more outliers from the first source
(resulting from gross errors). In other words, in such cases even one gross error can
spoil the strategy results. The observation that is affected by a gross error might disturb
estimation of the vertical displacement, especially if it coincides with other outlying
observations, and also might spoil the MAD and ADM.
Let us now describe how the estimates respond to a gross error, which can be
done by application of empirical influence functions. Consider a levelling network
with five possible reference points and let the theoretical point heights be as follows:
H1= 0.000 m,H2= 1.000 m,H3= 2.000 m,H4= 2.000 mand H5= 1.000 m(these
heights will also be assumed as the initial values of the parameters ˜
xR5×1). Let us
assume that all combinations of height differences hi j (where iand jare the numbers
of the network vertices) be measured twice (once in each measurement epoch) with a
standard deviation of σ=1mm. Thus, in general the observation vector can be written
as y=[h12,h13 ,h14,h15 ,h23,h24,h25 ,h53,h54 ,h43 ]T(see, Fig. 1; similar example but in
a different context was also considered in Duchnowski (2011)).
Fig. 1. Levelling network of the possible reference points
If all five points are stable then under such assumptions two vectors of measurement
results can be simulated, namely y1and y2, as follows (the estimated measurement
accuracy is 0.6 mm and 0.7 mm, respectively):
y1=[1.0002,2.0007,2.0008,0.9998,0.9988,0.9999,0.0005,1.0003,0.9996,0.0004]T
y2=[0.9993,1.9995,2.0006,0.9998,0.9996,1.0010,0.0007,1.0001,1.0007,0.0009]T
These two vectors together with the theoretical heights of the network points, which
are presented above, are the basis for the computations of the vectors ˜
vk
1R4×1and
˜
vk
2R4×1(1 6k65). For example, in Figure 1, the observations that concern the
128 Robert Duchnowski
point 3 are marked with a circle. These four observations take part in computation of
the vectors ˜
v3
1and ˜
v3
2for respective observation vectors. However, it is always important
to check the direction of the observations, namely, whether the height difference was
measured from, for example, the point number 1 towards the point 3 or in the opposite
direction, from the point 3 to the point 1 (if the direction is opposite then the sign
of the certain initial residual should be changed). One should notice that each initial
residual is used twice, namely in computations for each of the network vertices.
Let us now describe how the estimates applied in the strategy respond to a gross
error. It was mentioned that this can be done by application of EIF that is generally
defined for the estimate Tnand the sample z1,z2, . . . , zn1as (see, Huber, 1981;
Rousseeuw and Verboven, 2002)
EIF (x)=Tn(z1,z2, . . . , zn1,x)(5)
This function can be adapted for the purpose of the present paper in the following
form:
EIF (x)=Tk
ny1,y2+g(6)
where g=[x0· · · 0]Tthus, here we assume that the first observation at the second
epoch is affected with a gross error of x;Tk
nis Restimate, namely ˆ
R
k, of the dis-
placement of the point number k(it may also be other estimates like for example
MAD or ADM). Of course, several appropriate variants must be considered to study
the robustness and responses of the estimates properly. Thus, several EIFs will be
created for different variants in which certain points are not stable. Note that if some
points are not stable then values of the certain height differences at the second epoch,
i.e. the elements of the vector y2, change. For comparison, EIFs are also created for
conventional estimates of the least-squares method (LSE), which are results of free
adjustment of the levelling network.
Let us now consider some variants when two of five possible reference points
are not stable. In such variants, there are many outliers that result from instability of
reference points and hence such cases are the most complicated and the hardest to be
analysed (Duchnowski, 2010, 2011). Furthermore, they may also be the most sensitive
to a gross error. So let us examine three following variants: Variant 1, where points 1
and 2 are unstable and their assumed displacements are 20 mm and 10 mm, respecti-
vely; Variant 2, where points 1 and 5 are unstable with the theoretical displacements
of 20 mm and -10 mm; and Variant 3, where points 4 and 5 are unstable with the
theoretical displacements of 20 mm and -10 mm,respectively. Note that the gross error
xaffects the first observation, namely the height difference h12, at the second epoch.
The empirical influence functions of the R-estimates of Eq. (1) and LSEs, for
all proposed Variants, are presented in Figures 2, 3, and 4, respectively. The EIFs of
LSE of vertical displacement are straight lines with the constant, for each of Variants,
slope (in fact, the slope depends on a number of observations that is also constant for
each Variant (see, e.g. Rousseeuw and Verboven, 2002)). Such shape of EIFs reflects
sensitivity of the least squares method to gross errors.
Sensitivity of robust estimators.. . 129
Fig. 2. EIFs of R-estimates and LSE of the vertical displacements
when the points 1 and 2 are unstable
Fig. 3. EIFs of R-estimates and LSE of the vertical displacements
when the points 1 and 5 are unstable
130 Robert Duchnowski
The EIFs for R-estimates are bounded, however, they show that the response of
R-estimate of the vertical displacement to a gross error depends on configuration of
unstable points. The gross error hardly influences the estimation results if the points
1 and 2 are not stable (note that the gross error affects the height difference between
those two points). In such case, the estimate as well as the strategy itself is robust
against the gross error.
Fig. 4. EIFs of R-estimates and LSE of the vertical displacements
when the points 4 and 5 are unstable
The influence of a gross error becomes more significant if the points 1 and 2
are stable and the other points are not. It is especially evident in Figure 4, however,
even then the influence of the gross error is bounded, and the maximum influence is
about 1 cm. It is also worth noting that for small values of gross errors the EIFs of
the R-estimates have bigger slope than the EIFs of the LSE, which means that in such
cases R-estimates are more sensitive to a gross error than the traditional estimates of
displacement.
Now, let us examine the estimates of the standard deviation which are also applied
in the strategy. Consider the same three variants and let us create EIFs of the MAD
and the ADM, Eqs (3) and (4), respectively, and for comparison EIFs of the traditional
estimate, namely the sample standard deviation (SD). Figures 5, 6 and 7 present the
EIFs of the estimates in question, and obtained for the sample created from the elements
of the vector ˜
v1
2(computation for the point 1). One can create similar EIFs for the
estimates that are obtained for the samples created from the elements of the vector ˜
v2
2
(computation for the point 2), (these functions are omitted here just in order not to
obfuscate the figures).
The EIFs of the MAD are always bounded in contrast to the EIFs of the ADM and
SD. When points 1 and 2 are not stable then a gross error does not affect the MAD.
Sensitivity of robust estimators.. . 131
Fig. 5. EIFs of MAD, ADM and SD when the points 1 and 2 are unstable
Fig. 6. EIFs of MAD, ADM and SD when the points 1 and 5 are unstable
Fig. 7. EIFs of MAD, ADM and SD when the points 4 and 5 are unstable
132 Robert Duchnowski
The situation changes when other points are unstable. The most significant influence
of a gross error on the MAD is in Variant 3. Note that the EIF of the MAD has the
most complicated shape in this Variant too. The EIFs of the ADM and SD are similar
to each other, however, the ADM is always less affected by a gross error than SD. It is
also worth noting that in Variant 3, and to a lesser extent in Variant 2, the EIF of the
MAD has bigger values than EIFs of the ADM and SD for smaller values of a gross
error. Thus the MAD is more sensitive to a gross error than the traditional estimate
in such a case (note that the similar conclusion concerns the robust and conventional
estimates of the vertical displacement, see Figure 4). Such disadvantage of the MAD
might become an advantage of the strategy itself, namely this may help to analyse the
estimation results. Generally, if an observation set is free of outliers resulting from
gross errors and there is at least one unstable point in the network then values of the
MAD are smaller than or similar to values of the ADM (see, Duchnowski, 2010).
Thus, if it happens that the value of the MAD is bigger than the value of the ADM
then it is a suggestion, or a warning, that some observation must have been affected
by a gross error (of course, the contrary conclusion is not true). This conclusion is
especially important for relatively small gross errors which are hard to detect.
4. Conclusions
The strategy for testing stability of possible reference points proposed in (Duchnowski,
2010) applies robust estimates, however, this does not guarantee the robustness of the
strategy itself. Thus, it is very important to know and understand how the estimates
and the strategy might respond to a gross error that may affect an observation. It is
no doubt that if all reference points are stable then the strategy cannot be spoiled
by a gross error. Such case is easy to be analysed even if a gross error occurs. If
some points are unstable the situation is more complicated because certain vectors ˜
vk
2
contain outlying observations that result from such instabilities. Note that in the case
at hand, if there are no gross errors then analyses are also easy to be carried out, and
the strategy can identify unstable reference points. However, if a gross error occurs
then the estimates might respond in different ways. The responses of the estimates in
case of two unstable points are described by the EIFs. In case of one unstable point
the EIFs would be similar but have less complicated shapes. These functions show
that the most significant influence of a gross error on the estimates is when such an
error affects an observation, namely a height difference, of which network vertices are
stable. Then the effect of a gross error might spoil the estimation and the strategy
results. The EIFs, which describe how the estimates change in a presence of a gross
error, can provide important information to identify “suspicious observations. Such
observation may be, for example, rejected from the observation set. Usually, this leaves
enough observations to repeat the process, and thus to find stable reference frame.
The strategy for testing stability of possible reference marks that is based on the
application of R-estimates gives good results if the observations are not affected by a
gross error. Thus, the application of empirical influence functions is advisable when the
Sensitivity of robust estimators.. . 133
results of estimation are not satisfactory, i.e. when the analyses of the estimation results
cannot provide enough information about stability of all possible reference points.
The EIFs presented in this paper apply to the case of a levelling network that contains
five possible reference points, and all height differences are measured twice, at two
different epochs. Of course, this is only an example network, however, the idea of
application of EIF and the general conclusions have wider significance. When another
network is analysed, then one should create separate EIFs which reflect the geometric
structure of the network well.
Acknowledgments
The paper was prepared within the statutory research of the Institute of Geodesy,
University of Warmia and Mazury in Olsztyn.
References
Baarda W., (1968): A testing procedure for use in geodetic networks, Publication on Geodesy. New series 2,
No 5, Netherlands Geodetic Commission, Delft.
Denli H.H., (2008): Stable Point Research on Deformation Networks, Survey Review, Vol. 40, pp. 74–82.
Ding X., Coleman R., (1996): Multiple outlier detection by evaluating redundancy contributions of ob-
servations, Journal of Geodesy, Vol. 70, pp. 489–498.
Duchnowski R., (2008): R-estimation and its application to the LS adjustment, Bollettino di Geodesia e
Scienze Affini, Vol. 67, No 1, pp. 17–32.
Duchnowski R., (2009): Geodetic Application of R-estimation Levelling Network Examples, Technical
Sciences, Vol. 12, pp. 135–144.
Duchnowski R., (2010): Median-based estimates and their application in controlling reference mark
stability, Journal of Surveying Engineering, Vol. 136, No 2, pp. 47–52.
Duchnowski R., (2011): Robustness of strategy for testing levelling mark stability based on rank tests,
Survey Review, Vol. 43, pp. 687–699.
Gui Q., Gong Y., Li G., Li B., (2007): A Bayesian approach to the detection of gross errors based on
posterior probability, Journal of Geodesy, Vol. 81, pp. 651–659.
Gui Q., Gong Y., Li G., Li B., (2011): A Bayesian unmasking method for locating multiple gross errors
based on posterior probabilities of classification variables, Journal of Geodesy, Vol. 85, pp. 191–203.
Hampel F.R., Ronchetti E.M., Rousseeuw P.J., Stahel W.A., (1986): Robust Statistics. The Approach Based
on Influence Functions, Wiley, New York.
Hekimoglu S., Erenoglu R.C., (2007): Effect of heteroscedasticity and heterogeneousness on outlier
detection for geodetic networks, Journal of Geodesy, Vol. 81, pp. 137–148.
Hekimoglu S., Erdogan B., Butterworth S., (2010): Increasing the efficacy of the conventional deformation
analysis methods: alternative strategy, Journal of Surveying Engineering, Vol. 136, No 2, pp. 53–62.
Huber P.J., (1981): Robust Statistics, Wiley, New York.
Prószyński W., Kwaśniak M., (2006): Basis of geodetic calculations of displacements. Notions and
methodology elements (in Polish), Oficyna Wydawnicza Politechniki Warszawskiej, Warszawa.
Prószynski W., (2010): Another approach to reliability measures for systems with correlated observations,
Journal of Geodesy, Vol. 84, pp. 547–556.
Rousseeuw P.J., Croux C., (1993): Alternative to the Median Absolute Deviation, J. Am. Stat. Assoc., 88,
pp. 1273–1283.
134 Robert Duchnowski
Rousseeuw P.J., Verboven S., (2002): Robust estimation in very small samples, Computation Statistics
Data Analysis, Vol. 40, No 4, pp. 741–758.
Shaorong Z., (1990): On separability for deformations and gross errors, Journal of Geodesy, Vol. 64,
pp. 383–396.
Wiśniewski Z., (2009): Estimation of parameters in a split functional model of geodetic observations
(Ms plit estimation), Journal of Geodesy, Vol. 83, pp. 105–120.
Xu P., (2005): Sign-constrained robust least squares, subjective breakdown point and effect of
weights of observations on robustness, Journal of Geodesy, Vol. 79, pp. 146–159.
Wykorzystanie empirycznych funkcji wpływu do badania wrażliwości odpornych estymatorów
zastosowanych w strategii badania stabilności punktów nawiązania
Robert Duchnowski
Instytut Geodezji, Uniwersytet Warmińsko-Mazurski w Olsztynie
ul. Oczapowskiego 1, 10-957 Olsztyn
e-mail: robert.duchnowski@uwm.edu.pl
Streszczenie
Ważnym etapem badania deformacji jest wyznaczenie stabilnej bazy odniesienia a więc także badanie
stabilności potencjalnych punktów odniesienia. Istnieje kilka metod badania stabilności, jedną z nich jest
metoda wykorzystująca R-estymatory. Celem niniejszej pracy jest zbadanie wrażliwości na błędy grube
estymatorów stosowanych w wymienionej metodzie. Jakkolwiek zastosowane estymatory odporne, to
sama metoda nie ma tej własności a jej odporność zależy w dużej mierze od liczby punktów niestabilnych
(ogólnie mówiąc, im mniej jest punktów niestabilnych tym strategia jest odporniejsza na błędy grube).
Z tego powodu ważnym jest by rozumieć w jaki sposób błędy grube wpływają na zastosowane estymatory
i na wyniki samej strategii. Powyższy problem może b rozwiązany z zastosowaniem empirycznych
funkcji wpływu (EIF). W pracy przedstawiono przykładowe funkcje EIF, ich zastosowanie w strategii
oraz omówiono jak pozyskane informacje mogą być ważne i przydatne w praktyce.
... Metoda ta opiera się na przeskalowanym odchyleniu medianowym oraz na zasadzie iteracyjnej oceny metodą dwukryterialną. Metoda stosowana była we wcześniejszych pracach badawczych z zakresu geodezji i kartografii [Kadaj 1980, Wiśniewski 1993, Duchnowski 2011, Doskocz 2013. ...
... W metodzie duńskiej, analogicznie jak w metodach statystycznych, powszechnie stosowaną miarą diagnozowania obserwacji odstających względem próby jest zależność: Wynikające stąd zawyżenie błędów średnich może znacząco zakłócić identyfikację obserwacji odstających [Wiśniewski 2005]. Aby uniknąć tego problemu, można zastosować specjalne metody estymacji [Duchnowski 2011] albo wykorzystać odporne miary położenia i rozrzutu, do których zalicza się, przykładowo, miara zapisana formułą (2.23). ...
Book
The quality of data collected in official spatial databases is crucial in making strategic decisions as well as in the implementation of planning and design works. Awareness of the level of the quality of these data is also important for individual users of official spatial data. The author presents methods and models of description and evaluation of the quality of spatial data collected in public registers. Data describing the space in the highest degree of detail, which are collected in three databases: land and buildings registry (EGiB), geodetic registry of the land infrastructure network (GESUT) and in database of topographic objects (BDOT500) were analyzed. The results of the research concerned selected aspects of activities in terms of the spatial data quality. These activities include: the assessment of the accuracy of data collected in official spatial databases; determination of the uncertainty of the area of registry parcels, analysis of the risk of damage to the underground infrastructure network due to the quality of spatial data, construction of the quality model of data collected in official databases and visualization of the phenomenon of uncertainty in spatial data. The evaluation of the accuracy of data collected in official, large-scale spatial databases was based on a representative sample of data. The test sample was a set of deviations of coordinates with three variables dX, dY and Dl – deviations from the X and Y coordinates and the length of the point offset vector of the test sample in relation to its position recognized as a faultless. The compatibility of empirical data accuracy distributions with models (theoretical distributions of random variables) was investigated and also the accuracy of the spatial data has been assessed by means of the methods resistant to the outliers. In the process of determination of the accuracy of spatial data collected in public registers, the author’s solution was used – resistant method of the relative frequency. Weight functions, which modify (to varying degree) the sizes of the vectors Dl – the lengths of the points offset vector of the test sample in relation to their position recognized as a faultless were proposed. From the scope of the uncertainty of estimation of the area of registry parcels the impact of the errors of the geodetic network points was determined (points of reference and of the higher class networks) and the effect of the correlation between the coordinates of the same point on the accuracy of the determined plot area. The scope of the correction was determined (in EGiB database) of the plots area, calculated on the basis of re-measurements, performed using equivalent techniques (in terms of accuracy). The analysis of the risk of damage to the underground infrastructure network due to the low quality of spatial data is another research topic presented in the paper. Three main factors have been identified that influence the value of this risk: incompleteness of spatial data sets and insufficient accuracy of determination of the horizontal and vertical position of underground infrastructure. A method for estimation of the project risk has been developed (quantitative and qualitative) and the author’s risk estimation technique, based on the idea of fuzzy logic was proposed. Maps (2D and 3D) of the risk of damage to the underground infrastructure network were developed in the form of large-scale thematic maps, presenting the design risk in qualitative and quantitative form. The data quality model is a set of rules used to describe the quality of these data sets. The model that has been proposed defines a standardized approach for assessing and reporting the quality of EGiB, GESUT and BDOT500 spatial data bases. Quantitative and qualitative rules (automatic, office and field) of data sets control were defined. The minimum sample size and the number of eligible nonconformities in random samples were determined. The data quality elements were described using the following descriptors: range, measure, result, and type and unit of value. Data quality studies were performed according to the users needs. The values of impact weights were determined by the hierarchical analytical process method (AHP). The harmonization of conceptual models of EGiB, GESUT and BDOT500 databases with BDOT10k database was analysed too. It was found that the downloading and supplying of the information in BDOT10k creation and update processes from the analyzed registers are limited. An effective approach to providing spatial data sets users with information concerning data uncertainty are cartographic visualization techniques. Based on the author’s own experience and research works on the quality of official spatial database data examination, the set of methods for visualization of the uncertainty of data bases EGiB, GESUT and BDOT500 was defined. This set includes visualization techniques designed to present three types of uncertainty: location, attribute values and time. Uncertainty of the position was defined (for surface, line, and point objects) using several (three to five) visual variables. Uncertainty of attribute values and time uncertainty, describing (for example) completeness or timeliness of sets, are presented by means of three graphical variables. The research problems presented in the paper are of cognitive and application importance. They indicate on the possibility of effective evaluation of the quality of spatial data collected in public registers and may be an important element of the expert system.
... Additionally, it has the best possible breakdown point (equal to 0.5) [117][118][119]156]. The second robust standard deviation estimate is the average distance to the median (ADM) [117][118][119]157] ...
Article
Full-text available
Outlying observations are undesirable but possible elements of geodetic measurements. In such a context, the primary and trivial solution is to repeat “suspected” observations. The question arises: what if the measurements cannot be performed again, or if one cannot flag outliers easily and efficiently? In such a case, one should process data by applying methods that consider the possible occurrence of outlying observations. Historically, except for some previous attempts, the statistical approach to robust estimation originates in the 60s of the 20th century and refers to the pioneer papers of Huber, Tukey, Hampel, Hodges, and Lehmann. Also, the statistical procedures known as data snooping (data dredging) were developed at a similar time. It took not a long time before robust procedures were implemented for processing geodetic observations or adjustment of observation systems. The first works of Baarda and Pope encouraged other scientists or surveyors to elaborate robust procedures adapted for geodetic or surveying problems, which resulted in their rapid development in the last two decades of the 20th century. The question for the 21st century is whether robustness is still an important issue relating to modern measurement technologies and numerical data processing. One should realize that modern geodetic techniques do not decrease the probability of outlier occurrence. Considering measurement systems that yield big data, it is almost certain that outliers occur somewhere. The paper reviews different approaches to robust processing of geodetic observations, from the data snooping methods, random sampling, M-estimation, R-estimation, and Msplit estimation to robust estimation of the variance coefficient. Such a variety reflects different natures, origins, or properties of outliers and the apparent fact that there is no best and most efficient and universal robust approach. The methods presented are indeed the basis for future solutions based on, e.g., machine learning.
... If the normalized residual exceeds three times its standard deviation (SD), also 30 called the 3-sigma rule, an observation is flagged as an outlier (Lehmann 2013). However, tests for outliers can be dealt with a single outlier sufficiently since the LSE has an unbounded IF (Duchnowski 2011;Maronna et al. 2006;Huber 1981;Durdag et al. 2022). Studies show that the reliability of these techniques, established with the additive bias model, decreases significantly as the number of outliers increases. ...
Preprint
Full-text available
Conventional and robust methods are based on the additive bias model, which may cause type-I and type-II errors. However, outliers can be regarded as additional unknown parameters in the Gauss-Markov Model. It is based on modeling the outliers as unknown parameters, considering as many combinations as possible outliers selected from the observation set. In addition, this method is expected to be more effective than conventional methods as it is based on the principle of minimal variance and removes dependency in iterations. The primary purpose of this study is to seek the novel outlier detection approach efficiency in the geodetic networks. The efficiency of the proposed model was measured and compared with the robust and conventional methods by the Mean Success Rate (MSR) indicator for different types and magnitudes of outliers. Thereby, this approach enhances the MSR by almost 40–45 % compared to the Baarda and Danish (with the variance unknown case) method for multiple outliers (i.e., 1
... W ramach przeprowadzonych prac badawczych [19] zrealizowano odporną ocenę dokładności metodą najmniejszych odchyleń absolutnych (zwaną również metodą najmniejszych modułów). Metodę najmniejszych modułów już wcześniej stosowano w badaniach w dyscyplinie geodezja i kartografia [32, 33, 59, 20]. Zastosowaną metodę oparto na przeskalowanym odchyleniu medianowym oraz na idei iteracyjnej oceny metodą dwukryterialną, których założenia przedstawiono w pracy [53]. ...
Article
Full-text available
W związku z tym, że współczesne bazy danych map wielkoskalowych gromadzą dane pozyskane różnymi technologiami ocena dokładności tych baz nastręcza pewnych trudności. W pracy przedstawiono zagadnienie oceny dokładności opracowania sytuacyjnego wielkoskalowych map cyfrowych. Na wstępnie odniesiono się do klasycznie stosowanych ocen, a następnie omówiono odporną metodę oceny dokładności map. Zaproponowana metoda opiera się na przeskalowanym odchyleniu medianowym, które jest niewrażliwe na wpływ błędów o wielkościach odstających i grubych. Powyższy fakt oraz możliwości sformułowania procedur automatyzacji oceny dokładności opracowania sytuacyjnego map wielkoskalowych pozwala uważać odporną metodę za właściwe rozwiązanie problemu oceny dokładności map. W przeprowadzonych wcześniej badaniach stwierdzono brak zgodności rozkładu błędów położenia punktów z rozkładem normalnym i innymi teoretycznymi rozkładami. Natomiast po zastosowaniu metody najmniejszych odchyleń absolutnych uzyskano stabilną ocenę środka zgrupowania analizowanych zbiorów błędów położenia punktów kontrolnych. W niniejszej pracy wyznaczono medianę, poprzez którą oszacowano wielkości błędu położenia (mP) punktów kontrolnych dla poszczególnych fragmentów mapy i dla całości ocenionej bazy danych. Oszacowano odporny mP z uwzględnieniem wszystkich punktów kontrolnych wyznaczonych na badanym obiekcie (wraz z punktami wykazującymi odstające odchyłki położenia i błędy grube). Stwierdzono, że wielkość mP dla ocenionych map wynosi: w przypadku mapy wykonanej z pomiarów tachimetrem elektronicznym 0.051±0.004 m, w zakresie mapy opracowanej z operatów wcześniejszych pomiarów terenowych 0.135±0.007 m, odnośnie zbioru danych sytuacyjnych pozyskanych poprzez manualną wektoryzację rastrowego obrazu ortofotomapy 0.231±0.015 m, natomiast w przypadku mapy opracowanej z danych pozyskanych z wektoryzacji zeskanowanych map analogowych 0.303±0.012 m, a po przetransformowaniu tychże danych wektorowych do układu PL-2000 wielkość błędu położenia punktu zmalała i wyniosła 0.195 ± 0.045 m.
Article
Full-text available
Regression Analysis (RA) is an important statistical tool that is applied in most sciences. The Ordinary Least Squares (OLS) is a tradition method in RA and there are many regression techniques based on OLS. The Weighted Least Squares(WLS) method is iteratively used in M-estimators. The Least Squares Ratio (LSR) method in RA gives better results than OLS, especially in case of the presence of outliers. This paper includes a new approach to M-estimators, called Weighted Least Squares Ratio (WLSR), and comparison of WLS and WLSR according to mean absolute errors of estimation of the regression parameters (mae ß) and dependent value (mae y).
Article
Full-text available
Geodetic observations are crucial for monitoring landslides, crustal movements, and volcanic activity. They are often integrated with data from interdisciplinary studies, including paleo-seismological, geological, and interferometric synthetic aperture radar observations, to analyze earthquake hazards. However, outliers in geodetic observations can significantly impact the accuracy of estimation results if not reliably identified. Therefore, assessing the outlier detection model's reliability is imperative to ensure accurate interpretations. Conventional and robust methods are based on the additive bias model, which may cause type-I and type-II errors. However, outliers can be regarded as additional unknown parameters in the Gauss–Markov model. It is based on modeling the outliers as unknown parameters, considering as many combinations as possible of outliers selected from the observation set. In addition, this method is expected to be more effective than conventional methods as it is based on the principle of minimal variance and eliminates the interdependence of decisions made in iterations. The primary purpose of this study is to seek an efficient outlier detection model in the geodetic networks. The efficiency of the proposed model was measured and compared with the robust and conventional methods by the mean success rate (MSR) indicator of different types and magnitudes of outliers. Thereby, this model enhances the MSR by almost 40 %–45 % compared to the Baarda and Danish (with the variance unknown case) method for multiple outliers. Besides, the proposed model is 20 %–30 % more successful than the others in the low-controllability observations of the leveling network.
Article
In some cases, tests for outliers and robust methods based on the Least Square Estimation (LSE) fail to detect and isolate outliers. LSE 'smears the effect' of an outlier on all estimates of the residuals, the unknowns, and the a posteriori variance of unit weight. Therefore as bias goes to infinity, the Influence Function (IF) also goes to infinity. This study aims to investigate the effect of an outlier on the unknown parameters, etc., compared to the IF concept. Moreover, how the ratio of the resulting outlier effect is related to the redundancy of the geodetic network has been shown through the concepts of Sensitivity Curve (SC) and smearing effect by Monte Carlo Simulation. Also, it has proved that the SC of LSE was almost equal to the ‘smearing effect’ of LSE, which behaves systematically as a function of the partial redundancy that varies from one residual to another in the geodetic network.
Chapter
Currently, cyber-physical systems and technologies are successfully used while implementing space missions and creating coordinate and time reference systems. To solve the problems of spatial orientation and to apply on-board sight cameras and goniometers for this purpose, it is necessary to implement referencing to the visible limb of a celestial body and determine dynamic parameters from the long-term series of observations containing large data array and also erroneous measurements. In this process, the use of robust methods for assessing produced values of the desired parameters plays an important role. This chapter suggests a noise-immune Huber’s method (Huber M estimator method—HMEM) for estimating selenographic and lunar libration parameters. In the investigations, we used positional observations of Mösting A crater concerning the lunar limb. Such observations represent unequal observations depending on the Moon’s optical librations and conditions of observations. Such time series are therefore described by the complex system of conditional equations of desired parameters whose solution by the classic least squares method cannot eliminate erroneous observations from the processing. It is more plausible to estimate long-term observational series using HMEM. As a result, the values of lunar characteristics are obtained with high accuracy of their estimation.
Article
Full-text available
This work is dedicated to the issues of reducing long-period series of astronomical observations. Such series contain both unequal and erroneous observations. Determination of the desired parameters from such observations is, thus, a rather complicated task. The main challenge is fair assessment of the accuracy of results produced. There are a number of methods for solving this problem, but the most suitable one for determining noise-immune estimates is the Huber M-estimator method (NIHEM). The selenophysical parameters were found by the analysis of measurements of the Mösting A crater from the Kaguya and Apollo lunar missions and from the heliometric observations. Such observations have a complex internal structure, and their analysis with the use of the method of least squares makes it impossible to either assess and eliminate erroneous measurements or take into account the unequal accuracy of the observations taken. Hence, to derive the desired selenophysical parameters, the NIHEM approach was used. As a result, the values and estimates of the Mösting A crater’s radius-vector, its selenographic longitude and latitude, lunar obliquity, values of harmonics in the expansion of physical libration into longitude, and corrections to the mean radius of the Moon were obtained.
Article
The adjustment of the precise planar networks may be subject to gross errors that occur in coordinates of control points and that substantially affect the estimated coordinates. Therefore, robust estimation methods, e.g. Huber M-estimation, are typically applied for fitting high-accuracy planar networks to the unstable points of national control networks. The classic Huber method may result in unreliable results in some cases, especially when assuming small values of initial reference coordinate errors. This paper presents a linear modification of the Huber method that overcomes this limitation. The proposed method is validated on a precise planar network consisting of 12 points, in which 3 out of 7 control points are outliers and thus demand robust estimation. The proposed linear method has a simple geometrical interpretation and concise formulae, and gives very similar results to other high-accuracy and advanced robust estimation methods.
Article
The paper presents the basics of R-estimation and proposes how to apply R-estimates in geodetic calculations. This kind of robust estimation is very useful for eliminating non-random disturbances from a measurement result set. Thus some examples illustrating that application are included. A method based on combination of R-estimation with the LS method is also proposed. Such method (denoted as the R-LS method) can be used, for example, when an observation set is contaminated with a gross error.
Article
The paper presents a method of estimating parameters in two competitive functional models. The models considered here are concerned with the same observation set and are based on the assumption that an observation may result from a realization of either of two different random variables. These variables differ from one another at least in the main characteristic (for example, outliers can be realizations of one variable). A quantity that describes the opportunity of identifying a single observation with one random variable is assumed to be known. That quantity, called the elementary split potential, is strictly referred to the amount of information that an observation can provide about two competitive assumptions concerning the observation distribution. Parameter assessments that maximize the global elementary split potential (concerning all observations), are called Msplit estimators. A generalization of Msplit estimation presented in the paper refers to the theoretical foundation of M-estimation.
Article
According to the testing theory with two alternative hypotheses, the theory of detectability for a deformation monitoring network has been extended to the theory of separability for deformations and gross errors. With this theory it can be evaluated whether deformations and gross errors can be statistically distinguished each other, which is very important for deformation analysis. General formulas are established and the separability between deformations and gross errors for some typical deformation models is investigated. An example of separating gross errors from deformations is given.
Article
In deformation analysis, it is important to know whether the points detected by conventional deformation analysis (CDA) are really displaced or not, and also if there are any more displaced points in the network. It is impossible to answer these questions unless the actual positions of the displaced points before the analysis are known. Moreover, the efficacy of the analysis method used must be known. The efficacy of CDA methods can be measured using the mean success rate (MSR). When a displacement occurs at a point, both the observations that belong to the displaced point and to those undisplaced points closest to it are affected. The least-squares estimation (LSE) spreads these effects to various degrees over all the coordinates. As a result, the actual displacements are not exactly reflected in the coordinates of that point. Consequently, CDA methods may wrongly identify a point as being displaced. Also the F-test is known not to be successful in some cases. Hence, the MSRs of the CDA methods are smaller than what can be expected. To eliminate the smearing effect of the LSE and the indifference of the F-test, in order to obtain more specific results a new strategy that works on absolute deformation monitoring network has been developed based on division into subnetworks.
Article
Control of reference mark stability is essential for monitoring of deformations of engineering structures or displacements of points in geodetic networks. Some robust estimates, which are based on median application, can be used for such purposes. For example, some R-estimates can be applied to determine displacements of geodetic points thus to identify unstable reference marks. However, such simplest approach can sometimes fail. Thus, a more complex strategy for monitoring of geodetic point stability is elaborated and presented in this paper. The concept presented here concerns leveling networks and applies some R-estimate and also other median-based estimates, i.e., the median distance to the median and the average distance to the median. The method proposed in this paper is illustrated with some practical examples and is compared with the traditional method for geodetic point stability monitoring.
Article
The paper's objective is to examine strategy for monitoring the stability of levelling reference marks, which was proposed in [5], for robustness against gross errors. The strategy in question is based on the application of three robust estimators: R-estimate of vertical displacement (based on rank tests) and two robust estimates of the standard deviation. (Because of the estimates' properties the application of the method is for the moment limited to one dimension networks.) The present paper shows that robustness of the estimates does not result in robustness of the strategy itself. This stems from the origin of outliers. There are two sources of outliers when the stability of reference mark is tested: gross errors and unstable points. Each unstable point generates outliers. If there are too many outlying observations of this type then the strategy cannot be robust against any gross error. The present paper indicates that the robustness of the strategy is strongly dependent on the number of unstable points. Thus, the method cannot be generally regarded as robust against gross errors. However, it is advisable to know when the strategy can withstand a single gross error (or multiple gross errors) and to understand how gross errors may influence the estimation results. The theoretical properties of the estimates and of the strategy are illustrated with two numerical examples.
Article
Determination of the deformation of a geodetic network is an objective of the analysis of various geodetic deformation measurements. Deformation analysis is a step by step procedure, which in general begins with a global congruency test. If the coordinates of repeated measurement campaigns with its variance-covariance matrices and their datum are available, the question arises if congruency between different epochs exists. The analysis of the deformation field was performed in three dimensions. An alternative test procedure for detection of unstable points within the block assumed as stable beside the conventional method for congruency investigations is proposed and for both methods example solutions are given. The procedure is based on Similarity transformation. The benefit of both methods are discussed.