Working PaperPDF Available

Robust estimation of covariance and correlation functions of a stationary multivariate process

Authors:
Robust estimation of covariance and correlation
functions of a stationary multivariate process
Higor H. A. Cotta∗† , Valdério A. Reisen, Pascal Bondonand Wolfgang Stummer
Graduate program in Environmental Engineering and Department of Statistics
Federal University of Espírito Santo, Brazil
Laboratoire des Signaux et Systèmes, CNRS - CentraleSupélec - Université Paris-Sud - France
Department of Mathematics, University of Erlangen-Nürnberg - Germany
Abstract—In this paper, the effect of additive outliers is
considered in the estimation of the covariance and correlation
matrix functions of a multivariate stationary process. Robust
estimators of these matrices are presented in order to mitigate
the effect of outliers. Some Monte Carlo simulations are carried
out to empirically clarify the impact of additive outliers in the
standard estimators and to assess the robustness of the proposed
estimators. A real data set is analyzed as example of application.
I. INTRODUCTION
The estimation of the covariance and correlation matrix
functions is an important step in the identification and es-
timation of a multivariate signal model, e.g., for parameter
estimation using the Yule-Walker equations. It is well known
that outliers in signals affect the correlation structure of the
data which may lead to erroneous estimators [1]. How to
mitigate this phenomenon is still a challenging problem.
Robust estimation theory has been extensively studied in the
statistical community since the 1970s following the seminal
works of Huber and Maronna [2], [3]. Several efforts have
been done by the statistical signal processing community
in order to weaken the impact of atypical observations. A
concise review of the fundamentals for the signal processing
community can be found in [4].
In this context, some theory and applications to real prob-
lems considering robust estimation for univariate data cor-
related in time are presented in [5], [6], [7], these in the
frequency domain and in [8], for the time domain. Applica-
tions of robust techniques in biotechnology, finance and power
management can be found in [9], [10] an [11], respectively.
In the multivariate context, highly robust estimation of
the covariance and correlation matrices for time independent
data sets are proposed in [12]. The estimators use the so-
called Qn(.)estimator proposed in [13] which has appealing
feature such as being location-free, a high breakdown point
(50%) and a bounded influence function. The robustness
and efficiency properties of the estimators have also been
investigated through analysis of numerical experiments and
real data analysis for univariate time series. For further details
on these theoretical and numerical studies, see [14].
In this work, we extend to a multivariate stationary time
series the robust estimator of the autocovariance and the
autocorrelation functions of a univariate stationary time series
proposed by [12], [15]. We compare the proposed estimator
to the sample estimator by means of temporal breakdown
point and influence functions, and through Monte Carlo ex-
periments.
This paper is organized as follows. In Section 2, the effect
of additive outliers in the covariance and correlation matrix
functions of a multivariate time series is shown and the robust
estimators of the autocovariance and autocorrelation matrix
functions are proposed. Section 3 presents some Monte Carlo
experiments. A real data example is considered in Section 4
and some concluding remarks are provided in Section 5.
II. ROB US T ES TI MATI ON O F AUT OC OVARI AN CE A ND
AUTOCORRELATION MATRIX FUNCTIONS
A. Linear Time Series
Let Xt= [X1,t, X2,t , . . . , Xk,t]0,tZbe a k-dimensional
linear vector process defined by
Xt=µ+P
j=0 Ψjεtj,(1)
where µ= [µ1, . . . , µk]0is the mean vector of {Xt},Ψ0is
the identity k×kmatrix, Ψj, j = 1,...,are k×kmatrices
of coefficients satisfying P
j=0 kΨjk2<, where kAkis
the matrix norm of matrix Adefined by kAk2= Tr(A0A).
The vector process εt= [ε1,t, . . . , εk,t ]0is zero-mean and
uncorrelated, i.e., E(εt)=0and Cov(εt,εt+h)=Σε1{h=0}.
Thus, although the elements of εjat different times are
uncorrelated, they may be contemporaneously correlated. It
results from (1) that
γX(h) = Cov(Xt,Xt+h) = P
j=0 ΨjΣΨ0
j+h, h 0.
The lag-hcorrelation matrix function of {Xt}is defined by
ρX(h) = D1/2γX(h)D1/2,(2)
where D= diag[γX
11 (0), ..., γX
kk(0)]. The (i, j)th element of
ρX(h)is
ρX
ij (h) = Cov(Xi,t, Xj,(t+h))
pVar(Xi,t) Var(Xj,t)=γX
ij (h)
qγX
ii (0)γX
jj (0)
.(3)
A parametric class of linear time series satisfying (1) is
the vector autoregressive moving average (VARMA) model of
orders (p, q)defined by the difference equation
Φ(B)(Xtµ) = Θ(B)εt,(4)
where Bis the backward shift operator (BXt=Xt1),
Φ(B) = IPp
i=1 ΦiBiand Θ(B) = I+Pq
i=1 ΘiBiwhere
Φiand Θiare k×kmatrices, and {εt}is a vector white
noise process. When the polynomials Φ(z)and Θ(z)satisfy
det(Φ(z)) 6= 0 and det(Θ(z)) 6= 0 for all zCsuch that
|z| ≤ 1, (4) has a unique stationary causal and invertible
solution and the matrices Ψjare determined uniquely by
Ψ(z) = P
j=0 Ψjzj=Φ1(z)Θ(z)for |z| ≤ 1.
B. Impact of additive outliers in multivariate time series
Outliers can affect the dependence structure of a multivari-
ate time series. In this section, some results related to the
effects of outliers on the covariance and correlation structures
of a correlated process are derived. We suppose that the
observed process {Zt}results from the contamination of
{Xt}by additive random outliers, i.e.,
Zt=Xt+δt,(5)
where = diag[ω1, ..., ωk]and ωi, i = 1, ..., k, is the magni-
tude of the outliers which affects {Xi,t},δt= [δ1,t, ..., δk,t]0is
a random vector indicating the occurrence of an outlier at time
t. We assume that {Xt}and {δt}are uncorrelated processes
and that P(δi,t =1) = P(δi,t = 1) = pi/2,P(δi,t = 0) =
1pifor i= 1, . . . , k where 0pi<1. Then E(δi,t) = 0
and Var(δi,t) = pi. We assume also that Cov(δt,δt)=Σδ=
diag[p1, ..., pk]and that Cov(δt,δt+h)=0when h6= 0.
It follows from (5) that E(Zt) = E(Xt),γZ(0) = γX(0)+
Σδ0and γZ(h) = γX(h)when h6= 0. Therefore
ρZ
ij (h) =
γX
ij (h)
(γX
ii (0)+piω2
i)(γX
jj (0)+pjω2
j), h 6= 0,
γX
ij (0)+piω2
i1{i=j}
(γX
ii (0)+piω2
i)(γX
jj (0)+pjω2
j), h = 0.
(6)
We observe that ρZ
ij (h)0as |ωi| → ∞ or |ωj|→∞when
h6= 0, these conclusions are deeper analyzed in Proposition 1.
The recent works of [14], [18], [17], [16] and [19] discuss this
problem in univariate time series with short and long memory
properties.
Proposition 1. Suppose that Z1,t,Z2,t ,...,Zn,t is a set of
k-dimensional time series observations of Model 5 and mis
the expected number of additive outliers as stated in (5). Let
ˆρZ
ij (h) = γZ
ij (h)/(pγZ
ii (0)qγZ
jj (0)), for i, j = 1, ..., k, then
a. For m= 1 (one outlier occurring only at Zi),
lim
n→∞ plim
ωi→∞
ˆρZ
ij (h) = 0.
b. For m= 2 (two outliers occurring at Zi,t or/and at
Zj,t) and assuming that ˆγZ
ij (h)6= 0, for Zi,t and Zj,t,
it follows
lim
n→∞ plim
ωi→∞
and/or
ωj→∞
ˆρZ
ij (h)=0.
In (a.) and (b.), wiand wjare the magnitudes of the additive
outliers occurring at position iand j., respectively
The proof of Proposition 1 follows the same lines as in
[18], [17], [16] and are not presented here to save space, but
is available upon request.
C. Robust estimation of the covariance and correlation matrix
functions
Let X1, . . . , Xnbe independent and identically distributed
univariate random variable with finite variance and X=
(X1, . . . , Xn)0. The Qn(.)estimator of the standard deviation
of X1is the kth order statistic defined by
Qn(X) = c{|XiXj|;i < j}{k}, i, j = 1, .., n,
where cis a constant to guarantee consistency (c = 2.2191 for
the Gaussian distribution), k=b(n
2+ 2)/4c+ 1 and bxc
is the largest integer smaller than x. An efficient algorithm
to calculate Qn(X)is proposed in [20]. The asymptotic
breakdown point of Qn(X)is 50%, see [13].
For any univariate second order random variables Xand Y,
we have
Cov(X, Y ) = αβ
4Var(X/α +Y)Var(X/α Y),
(7)
for any α, β R, see [21].
Now, let {Xt}, t Z, be a univariate stationary time series
with finite variance. Taking α=βin (7) and replacing Var(.)
by Q2
n(.), [15] proposed the following highly robust estimate
of the covariance function of {Xt},
ˆγQn(h) = 1
4Q2
nh(U+V)Q2
nh(UV),(8)
where U= (X1, . . . , Xnh)0and V= (Xh+1, . . . , Xn)0. The
autocorrelation function of {Xt}can be estimated by
ˆρQn(h) = Q2
nh(U+V)Q2
nh(UV)
Q2
nh(U+V) + Q2
nh(UV).(9)
The consistency and asymptotic normality of ˆγQn(h)and
ˆρQn(h)are studied in [14] and [19] when {Xt}is a short
and a long memory process.
In this work, we extend to multivariate time series the
estimators (8) and (9). Let {Xt}be a k-dimensional station-
ary vector process with finite variance, we robustly estimate
γX(h)by
ˆ
γQn(h) = [ˆγ(Xi,t,Xj,t)
Qnh(h)]k
i,j=1 (10)
where,
ˆγ(Xi,t ,Xj,t)
Qnh(h) = αβ
4Q2
nhU
α+V
β
Q2
nhU
αV
β,(11)
U= (Xi,1...,Xi,nh)0,V= (Xj,h+1, . . . , Xj,n )0,α=
Qn(Xi,t)and β=Qn(Xj,t ). The autocorrelation matrix
function of {Xt}can be estimated by
ˆ
ρQn(h) = [ˆρ(Xi,t,Xj,t )
Qnh]k
i,j=1 (12)
ˆρ(Xi,t,Xj,t )
Qnh=
Q2
nhU
α+V
βQ2
nhU
αV
β
Q2
nhU
α+V
β+Q2
nhU
αV
β,(13)
where U,V,αand βare defined in (11).
III. SIMULATIONS
The process {Xt}is generated by (4) where k= 3,(p, q) =
(1,0),µ= 0,
Φ1=
0.6 0.3 0.0
0.1 0.2 0.0
0.1 0.8 0.4
,
and {εt}is a zero-mean Gaussian white noise process with
covariance
Σε=
1.00 0.70 0.70
0.70 1.00 0.95
0.70 0.95 1.00
.
The contaminated process {Zt}is simulated according to (5)
where ωi= 4pVar(Xi,t)and piare given in the plots for
i= 1,2,3. The sample size nis 500 and each experiment is
repeated 1000 times. We denote by ˆ
ρX(h)the sample estimate
of ρX(h), i.e., the estimate that we obtain by replacing the
unknown covariances in (2) by their sample estimates. The
sample estimate ˆ
ρZ(h)is defined as ˆ
ρX(h)and ˆ
ρZ
Qn(h)is
obtained by (12) where {Xt}is replaced by {Zt}. The means
of ˆ
ρX(h),ˆ
ρX
Qn(h),ˆ
ρZ(h)and ˆ
ρZ
Qn(h)are computed over the
1000 replications for each lag h,0h7.
0 2 4 6
0.2 0.4 0.6 0.8 1.0
ACF
Theoretical
0 2 4 6
0.2 0.6 1.0
ACF
Samp. Estim. Uncont
0 2 4 6
0.2 0.6 1.0
Robust ACF
Robust Estim. Uncont
0 2 4 6
0.0 0.4 0.8
ACF
Samp. Estim. Cont
0 2 4 6
0.2 0.6 1.0
Robust ACF
Robust Estim. Con
Fig. 1. Autocorrelation function of Z1,t . From left to right and top to bottom,
plots are ρX
11(h),ˆρX
11(h),ˆρX1,t
Qnh(h), and ˆρZ
11(h),ˆρZ1,t
Qnh(h)when pi=
0.05,i= 1,2,3.
Figure 1 displays the true value ρX
11(h)and the estimates
ˆρX
11(h),ˆρX1,t
Qnh(h),ˆρZ
11(h)and ˆρZ1,t
Qnh(h). Figure 2 plots the
true value ρX
12(h)and the estimates ˆρX
12(h),ˆρ(X1,t ,X2,t)
Qnh(h),
ˆρZ
12(h)and ˆρ(Z1,t ,Z2,t)
Qnh(h). For the other components of {Xt}
and {Zt}, we obtain similar figures.
In both figures, the effects of additive outliers appears
by comparing the true correlation to the sample estimates
ˆ
ρZ(h)obtained in the contaminated case. Indeed, the values of
ˆρZ
11(h)and ˆρZ
12(h)are much smaller than ρX
11(h)and ρX
12(h),
respectively. These graphical results are an expected result and
empirically confirms Proposition 1, and it is also in accord
with the discussion in [16], [17], [18], [14]. Now, we observe
on both figures that the sample and the robust estimators have
similar behaviors in the absence of contamination, thus, in a
0 2 4 6
0.1 0.3 0.5
ACF
Theoretical
0 2 4 6
0.1 0.3 0.5
ACF
Samp. Estim. Uncont
0 2 4 6
0.1 0.3 0.5
Robust ACF
Robust Estim. Uncont
0 2 4 6
0.1 0.3 0.5
ACF
Samp. Estim. Cont
0 2 4 6
0.1 0.3 0.5
Robust ACF
Robust Estim. Con
Fig. 2. Correlation function between Z1,t and Z2,t. From left to right
and top to bottom, plots are ρX
12(h),ˆρX
12(h),ˆρ(X1,t,X2,t )
Qnh(h), and ˆρZ
12(h),
ˆρ(Z1,t,Z2,t)
Qnh(h)when pi= 0.05,i= 1,2,3.
practical situation, when the practitioner is uncertain of the
presence outliers, (11) and (13) are still a reasonable choices.
When there are outliers among the data, i.e., pi6= 0,i, it
is possible to see that the robust estimator is not affected by
a percentage of contamination of 5% providing to be a good
alternative when, in fact, outliers are presented in the data. f
In Table I, we present the root mean squared errors (RMSE)
of ˆρX
11(h),ˆρX1,t
Qnh(h),ˆρZ
11(h)and ˆρZ1,t
Qnh(h). Table II, gives the
RMSE of ˆρX
12(h),ˆρ(X1,t ,X2,t)
Qnh(h),ˆρZ
12(h)and ˆρ(Z1,t ,Z2,t)
Qnh(h).
We observe that the sample and the robust estimators have
RMSE close to the each other in the absence of contamination
while the RMSE of the sample estimate is much larger than
the RMSE of the robust estimator when the percentage of
contamination is 5%. Moreover, the RMSE of the robust
estimator are almost the same in the uncontaminated and the
contaminated cases.
The RMSE of ˆρZ
12(h)and ˆρ(Z1,t ,Z2,t)
Qnh(h)as the percentage
of outliers in {Zt}increases are presented in Tables III and IV,
respectively. Not surprisingly, increasing the percentage of
outliers reduces the performance of both estimators. However,
ˆρ(Z1,t,Z2,t )
Qnh(h)is less affect by the outliers.
IV. REA L DATA EXA MP LE
In this real data example, we consider the estimation of the
samples ACFs of the monthly personal consumption expen-
diture (PCE) and disposable personal income (DSPI) of the
United States from January 1959 to March 2012 (n= 639)
obtained from the Federal Reserve Bank of St. Louis (FRED
Economic Data). This data set has already been considered
as an example in [22]. As pointed out by the author, the
original series are not stationary, thus the working series are
the differenced observations, in percentages, after applying the
log transformation. Figure 3 shows the time series plots for
the data in the study. From the plots, it is possible to see
the presence of some outlying observations which justifies the
comparison between the robust and non-robust methods.
In Figures 4 and 5 we present the plots of ˆ
ρX(h)and
ˆ
ρQn(h), respectively. Comparing both plots, one may see that
vertical scale of both plots are slightly different, indeed, as a
simple evidence, the classical sample cross-correlation values
between PCE and DSPI are 0.25, 0.11, 0.11 and 0.14 for lags
TABLE I
RMSE OF ˆρX
11(h),ˆρX1,t
Qnh(h),ˆρZ
11(h)A ND ˆρZ1,t
Qnh(h)WH EN pi= 0.05,i= 1,2,3.
Lag h01234567
ˆρX
11(h)0.00000 0.03050 0.05138 0.06391 0.06965 0.07290 0.07506 0.07590
ˆρX1,t
Qnh(h)0.00000 0.03257 0.05545 0.06855 0.07450 0.07740 0.08050 0.08187
ˆρZ
11(h)0.00000 0.33091 0.23215 0.16371 0.11777 0.09080 0.07516 0.06827
ˆρZ1,t
Qnh(h)0.00000 0.06132 0.07724 0.07927 0.07782 0.07682 0.07804 0.07806
TABLE II
RMSE OF ˆρX
12(h),ˆρ(X1,t,X2,t )
Qnh(h),ˆρZ
12(h)A ND ˆρ(Z1,t,Z2,t )
Qnh(h)WHEN pi= 0.05,i= 1,2,3.
Lag h01234567
ˆρX
12(h)0.02595 0.03459 0.04592 0.05291 0.05451 0.05496 0.05685 0.05755
ˆρ(X1,t,X2,t)
Qnh(h)0.02913 0.03804 0.04962 0.05748 0.05939 0.05979 0.06128 0.06269
ˆρZ
12(h)0.28934 0.26774 0.19370 0.13680 0.09784 0.07479 0.06176 0.05521
ˆρ(Z1,t,Z2,t)
Qnh(h)0.05754 0.06356 0.06710 0.06538 0.06236 0.05965 0.06068 0.06170
TABLE III
RMSE OF ˆρZ
12(h).
PPPPP
P
pi
Lag h01234567
0.05 0.28934 0.26774 0.19370 0.13680 0.09784 0.07479 0.06176 0.05521
0.10 0.39047 0.35826 0.25729 0.17943 0.12558 0.08903 0.06950 0.05666
0.15 0.44192 0.40677 0.28982 0.20084 0.14067 0.09991 0.07345 0.06005
0.20 0.46956 0.43148 0.31149 0.21421 0.14771 0.10663 0.07947 0.06249
TABLE IV
RMSE OF ˆρ(Z1,t,Z2,t )
Qnh(h).
PPPPP
P
pi
Lag h01234567
0.05 0.05754 0.06356 0.06710 0.06538 0.06236 0.05965 0.06068 0.06170
0.10 0.10492 0.10795 0.09833 0.08348 0.07124 0.06395 0.06177 0.05992
0.15 0.16088 0.16147 0.13579 0.10610 0.08424 0.07054 0.06293 0.05969
0.20 0.22260 0.21759 0.17829 0.13229 0.09814 0.07742 0.06682 0.05891
-2 0 2
PCE
-4 0 4
0 100 200 300 400 500 600
DSPI
Time
Fig. 3. Time plots of the U.S. personal consumption expenditures and
disposable personal income, in percentages.
h= 0,1,2,4, respectively. The cross-correlation given by (13)
are 0.3, 0.16, 0.13 and 0.18.
Now, consider the estimation of Φmatrix in (4) using
Yule-Walker equations. Based on information criteria, [22]
selected a VAR(3) model. Figure 6 presents the standard and
the robust ACFs of the residuals of the selected model. In
order to save space, only the ACF of PCE and the cross-
correlation between PCE and DSPI are shown, the others plots
presented similar behavior and are available upon request.
Contrasting both plots, it is possible to observe that the robust
ACF present higher values of cross-correlations, for example,
0.0 0.4 0.8
ACF
(a) PCE
0 5 10 15 20 25
0.0 0.4 0.8
ACF
0 5 10 15 20 25
(b) PCE & DSPI
0.0 0.4 0.8
ACF
-25 -15 -5 0
(c) DSPI & PCE
0.0 0.4 0.8
ACF
0 5 10 15 20 25
(d) DSPI
Fig. 4. ACF of PCE and DSPI.
the cross-correlation between PCE and DSPIG at lag h= 12 is
0.003 and 0.16, for the standard and robust sample estimators,
respectively. This result in in accordance with Proposition
(1) which shows that one outlier is enough to destroy the
properties of the sample ACF, and thus impacting in any other
sub-sequential estimation step.
To end this exerciser, we replaced the standard covariance
estimator by the proposed robust one in the Yule-Walker
equations. The plots are not shown to save space, but now, for
0.0 0.4 0.8
Robust ACF
(a) PCE
0 5 10 15 20 25
0.0 0.4 0.8
Robust ACF
0 5 10 15 20 25
(b) PCE & DSPI
0.0 0.4 0.8
Robust ACF
-25 -15 -5 0
(c) DSPI & PCE
0.0 0.4 0.8
Robust ACF
0 5 10 15 20 25
(d) DSPI
Fig. 5. Robust ACF of PCE and DSPI.
0.0 0.4 0.8
ACF
(a) PCE
0 5 10 15 20 25
0.0 0.4 0.8
ACF
0 5 10 15 20 25
(b) PCE & DSPI
0.0 0.4 0.8
Robust ACF
0 5 10 15 20 25
(a) PCE
0.0 0.4 0.8
Robust ACF
0 5 10 15 20 25
(b) PCE & DSPI
Fig. 6. ACF and Robust ACF of the residuals the fitted VAR(3) via Yule-
Walker.
instance, the value obtained by the classical sample estimator
of the cross-correlation between PCE and DSPI at lag h= 12
is 0.1 while the robust estimator gives 0.3.
0.0 0.4 0.8
ACF
(a) PCE
0 5 10 15 20 25
0.0 0.4 0.8
ACF
0 5 10 15 20 25
(b) PCE & DSPI
0.0 0.4 0.8
Robust ACF
0 5 10 15 20 25
(a) PCE
0.0 0.4 0.8
Robust ACF
0 5 10 15 20 25
(b) PCE & DSPI
Fig. 7. ACF and Robust ACF of the residuals of fitted VAR(3) via Yule-
Walker using the Robust covariance estimator.
V. CONCLUSIONS
The effect of additive outliers on the estimation of the
covariance and correlation matrix functions of a stationary
multivariate discrete time series was analyzed. A robust es-
timation method was proposed as a generalization of existing
results in the univariate case. Monte Carlo simulation results
have illustrated the good behavior in terms of mean square
error of the proposed robust estimator. An real data set
was analyzed where the proposed robust covariance estimator
replaced the standard covariance in the Yule-Waler equations.
REFERENCES
[1] R. S. Tsay, D. Peña, and A. E. Pankratz, “Outliers in multivariate time
series,” Biometrika, vol. 87, no. 4, pp. 789–804, 2000.
[2] P. J. Huber, “Robust estimation of a location parameter,” The Annals
of Mathematical Statistics, vol. 35, no. 1, pp. 73–101, 1964.
[3] R. A. Maronna, “Robust M-Estimators of multivariate location and
scatter,The Annals of Statistics, vol. 4, no. 1, pp. 51–67, 1976.
[4] A. M. Zoubir, V. Koivunen, Y. Chakhchoukh, and M. Muma, “Robust
estimation in signal processing: A tutorial-style treatment of fundamen-
tal concepts,” IEEE Signal Processing Magazine, vol. 29, no. 4, pp.
61–80, 2012.
[5] A. J. Q. Sarnaglia, V. A. Reisen, P. Bondon, and C. Lévy-Leduc, “A
robust estimation approach for fitting a PARMA model to real data,”
in 2016 IEEE Statistical Signal Processing Workshop (SSP), 2016.
[6] O. Kouamo, C. Lévy-Leduc, and E. Moulines, “Robust estimation
of the memory parameter of gaussian time series using wavelets, in
2011 IEEE Statistical Signal Processing Workshop (SSP), 2011, pp.
553–556.
[7] T. H. Li, “A nonlinear method for robust spectral analysis,” IEEE
Transactions on Signal Processing, vol. 58, no. 5, pp. 2466–2474, 2010.
[8] S. N. Batalama and D. Kazakos, “On the robust estimation of the au-
tocorrelation coefficients of stationary sequences, IEEE Transactions
on Signal Processing, vol. 44, no. 10, pp. 2508–2520, 1996.
[9] H. Semmaoui, J. Drolet, A. Lakhssassi, and M. Sawan, “Setting
adaptive spike detection threshold for smoothed TEO based on robust
statistics theory, IEEE Transactions on Biomedical Engineering, vol.
59, no. 2, pp. 474–482, 2012.
[10] L. Yang, R. Couillet, and M. R. McKay, “A robust statistics approach
to minimum variance portfolio optimization,” IEEE Transactions on
Signal Processing, vol. 63, no. 24, pp. 6684–6697, 2015.
[11] Y. Chakhchoukh, P. Panciatici, and L. Mili, “Electric load forecasting
based on statistical robust methods,” IEEE Transactions on Power
Systems, vol. 26, no. 3, pp. 982–991, 2011.
[12] Y. Ma and M. G. Genton, “Highly robust estimation of dispersion
matrices,” Journal of Multivariate Analysis, vol. 78, pp. 11–36, 2001.
[13] P. J. Rousseeuw and C. Croux, “Alternatives to the median absolute
deviation,Journal of the American Statistical Association, vol. 88,
no. 424, pp. 1273–1283, 1993.
[14] C. Lévy-Leduc, H. Boistard, E. Moulines, M. S. Taqqu, and V. A.
Reisen, “Robust estimation of the scale and of the autocovariance func-
tion of Gaussian short-and long-range dependent processes,” Journal
of Time Series Analysis, vol. 32, no. 2, pp. 135–156, 2011.
[15] Y. Ma and M. G. Genton, “Highly robust estimation of the autocovari-
ance function,” Journal of Time Series Analysis, vol. 21, pp. 663–684,
2000.
[16] W. Chan, “A note on time series model specification in the presence
of outliers,” Journal of Applied Statistics, vol. 19, no. 1, pp. 117–124,
1992.
[17] W. Chan, “Outliers and financial time series modelling: a cautionary
note,” Mathematics and Computers in Simulation, vol. 39, no. 3, pp.
425–430, 1995.
[18] F. F. Molinares, V. A. Reisen, and F. Cribari-Neto, “Robust estimation
in long-memory processes under additive outliers, Journal of Statis-
tical Planning and Inference, vol. 139, no. 8, pp. 2511–2525, 2009.
[19] C. Lévy-Leduc, H. Boistard, E. Moulines, M. S. Taqqu, and V. A.
Reisen, “Large sample behavior of some well-known robust estimators
under long-range,” Statistics, vol. 45, no. 1, pp. 59–71, 2011.
[20] C. Croux and P. J. Rousseeuw, “Time-efficient algorithms for two
highly robust estimators of scale,” Computational Statistics, vol. 1, pp.
1–18, 1992.
[21] P. J. Huber, Robust Statistics, Wiley Series in Probability and Statistics
- Applied Probability and Statistics Section Series. Wiley, 2004.
[22] Ruey. S. Tsay, Multivariate Time Series Analysis: with R and financial
applications, John Wiley & Sons, 2013.
... It shows that the time structure of the pollutants did not alter the cumulative proportion of the variance, i.e. the variability in the first three components explains 83% of the variability in the filtered data, which is equivalent to the results in Table 5. This may be explained by the fact that the serial dependence of the pollutants was not sufficiently strong to produce an effect on the PCA (Zamprogno, 2013), or because of the effect of the high levels of the pollutant on the estimation of the covariance matrix (see, for example, Reisen et al. (2017), Cotta et al. (2017) and Zamprogno (2013)). ...
Article
Full-text available
Environmental epidemiological studies of the health effects of air pollution frequently utilize the generalized additive model (GAM) as the standard statistical methodology, considering the ambient air pollutants as explanatory covariates. Although exposure to air pollutants is multi-dimensional, the majority of these studies consider only a single pollutant as a covariate in the GAM model. This model restriction may be because the pollutant variables do not only have serial dependence but also interdependence between themselves. In an attempt to convey a more realistic model, we propose here the hybrid generalized additive model–principal component analysis–vector auto-regressive (GAM–PCA–VAR) model, which is a combination of PCA and GAMs along with a VAR process. The PCA is used to eliminate the multicollinearity between the pollutants whereas the VAR model is used to handle the serial correlation of the data to produce white noise processes as covariates in the GAM. Some theoretical and simulation results of the methodology proposed are discussed, with special attention to the effect of time correlation of the covariates on the PCA and, consequently, on the estimates of the parameters in the GAM and on the relative risk, which is a commonly used statistical quantity to measure the effect of the covariates, especially the pollutants, on population health. As a main motivation to the methodology, a real data set is analysed with the aim of quantifying the association between respiratory disease and air pollution concentrations, especially particulate matter PM10, sulphur dioxide, nitrogen dioxide, carbon monoxide and ozone. The empirical results show that the GAM–PCA–VAR model can remove the auto-correlations from the principal components. In addition, this method produces estimates of the relative risk, for each pollutant, which are not affected by the serial correlation in the data. This, in general, leads to more pronounced values of the estimated risk compared with the standard GAM model, indicating, for this study, an increase of almost 5.4% in the risk of PM10, which is one of the most important pollutants which is usually associated with adverse effects on human health.
Conference Paper
Full-text available
This paper proposes an estimation approach of the Whittle estimator to fit periodic autoregressive moving average (PARMA) models when the process is contaminated with additive outliers and/or has heavy-tailed noise. It is derived by replacing the ordinary Fourier transform with the non-linear M-regression estimator in the harmonic regression equation that leads to the classical periodogram. A Monte Carlo experiment is conducted to study the finite sample size of the proposed estimator under the scenarios of contaminated and non-contaminated series. The proposed estimation method is applied to fit a PARMA model to the sulfur dioxide (SO 2 ) daily average pollutant concentrations in the city of Vitoria (ES), Brazil.
Article
Full-text available
We study the design of portfolios under a minimum risk criterion. The performance of the optimized portfolio relies on the accuracy of the estimated covariance matrix of the portfolio asset returns. For large portfolios, the number of available market returns is often of similar order to the number of assets, so that the sample covariance matrix performs poorly as a covariance estimator. Additionally, financial market data often contain outliers which, if not correctly handled, may further corrupt the covariance estimation. We address these shortcomings by studying the performance of a hybrid covariance matrix estimator based on Tyler's robust M-estimator and on Ledoit-Wolf's shrinkage estimator while assuming samples with heavy-tailed distribution. Employing recent results from random matrix theory, we develop a consistent estimator of (a scaled version of) the realized portfolio risk, which is minimized by optimizing online the shrinkage intensity. Our portfolio optimization method is shown via simulations to outperform existing methods both for synthetic and real market data.
Article
Full-text available
The paper concerns robust location and scale estimators under long-range de-pendence, focusing on the Hodges-Lehmann location estimator, on the Shamos-Bickel scale estimator and on the Rousseeuw-Croux scale estimator. The large sample properties of these estimators are reviewed. The paper includes computer simulation in order to examine how well the estimators perform at finite sample sizes.
Article
Full-text available
In this paper we present deterministic algorithms of time O(n log n) and space O(n) for two robust scale estimators with maximal breakdown point. The actual source codes are included, and execution times are compared.
Article
The word robust has been used in many contexts in signal processing. Our treatment concerns statistical robustness, which deals with deviations from the distributional assumptions. Many problems encountered in engineering practice rely on the Gaussian distribution of the data, which in many situations is well justified. This enables a simple derivation of optimal estimators. Nominal optimality, however, is useless if the estimator was derived under distributional assumptions on the noise and the signal that do not hold in practice. Even slight deviations from the assumed distribution may cause the estimator's performance to drastically degrade or to completely break down. The signal processing practitioner should, therefore, ask whether the performance of the derived estimator is acceptable in situations where the distributional assumptions do not hold. Isn't it robustness that is of a major concern for engineering practice? Many areas of engineering today show that the distribution of the measurements is far from Gaussian as it contains outliers, which cause the distribution to be heavy tailed. Under such scenarios, we address single and multichannel estimation problems as well as linear univariate regression for independently and identically distributed (i.i.d.) data. A rather extensive treatment of the important and challenging case of dependent data for the signal processing practitioner is also included. For these problems, a comparative analysis of the most important robust methods is carried out by evaluating their performance theoretically, using simulations as well as real-world data.
Article
In this paper, we introduce an alternative semiparametric estimator of the fractional differencing parameter in ARFIMA models which is robust against additive outliers. The proposed estimator is a variant of the GPH estimator [Geweke, J., Porter-Hudak, S., 1983. The estimation and application of long memory time series model. Journal of Time Series Analysis 4, 221–238]. In particular, we use the robust sample autocorrelations of Ma, Y. and Genton, M. [2000. Highly robust estimation of the autocovariance function. Journal of Time Series Analysis 21, 663–684] to obtain an estimator for the spectral density of the process. Numerical results show that the estimator we propose for the differencing parameter is robust when the data contain additive outliers.
Article
We investigate the usefulness of sample autocorrelations and partial autocorrelations as model specification tools when the observed time series is contaminated by an outlier. The results indicate that the specification power of these statistics could be significantly jeopardized by an additive outlier. On the other hand, an innovational outlier seems to cause no harm to them.
Article
In this paper, the problem of the robustness of the sample autocovariance function is addressed. We propose a new autocovariance estimator, based on a highly robust estimator of scale. Its robustness properties are studied by means of the influence function, and a new concept of temporal breakdown point. As the theoretical variance of the estimator does not have a closed form, we perform a simulation study. Situations with various size of outliers are tested. They confirm the robustness properties of the new estimator. An S-Plus function for the highly robust autocovariance estimator is made available on the Web at http://www-math.mit.edu/~yanyuan/Genton/Time/time.html. At the end, we analyze a time series of monthly interest rates of an Austrian bank.