Content uploaded by Bouchra R. Nasri

Author content

All content in this area was uploaded by Bouchra R. Nasri on Oct 28, 2019

Content may be subject to copyright.

The Canadian Journal of Statistics

Vol. xx, No. yy, 2019, Pages 1–25

La revue canadienne de statistique

1

Goodness-of-ﬁt for regime-switching copula

models with application to option pricing

Bouchra R. Nasri 1*, Bruno N. R ´

emillard2and Mamadou Yamar Thioub2

1McGill University, Department of Mathematics and Statistics, 805 Rue Sherbrooke O, Montr´

eal

(Qu´

ebec), QC H3A 0B9

2HEC Montr´

eal, 3000 chemin de la Cˆ

ote Sainte-Catherine, Montr´

eal (Qu´

ebec), Canada H3T 2A7

Key words and phrases: Goodness-of-ﬁt; time series; copulas ; regime-switching models ; generalized error

models.

MSC 2010: Primary 62M10; secondary 62P05

Abstract:

We consider several time series and for each of them, we ﬁt an appropriate dynamic parametric model.

This produces serially independent error terms for each time series. The dependence between these error

terms is then modeled by a regime-switching copula. The EM algorithm is used for estimating the pa-

rameters and a sequential goodness-of-ﬁt procedure based on Cram´

er-von Mises statistics is proposed to

select the appropriate number of regimes. Numerical experiments are performed to assess the validity of the

proposed methodology. As an example of application, we evaluate a European put-on-max option on the

returns of two assets. In order to facilitate the use of our methodology, we have built a R package HMMcop-

ula available on CRAN. The Canadian Journal of Statistics xx: 1–25; 2019 c

2019 Statistical Society

of Canada

R´

esum´

e: Nous consid´

erons plusieurs s´

eries temporelles univari´

ees, et pour chacune nous trouvons un

mod`

ele dynamique param´

etrique appropri´

e. Nous obtenons alors des termes d’erreur ind´

ependants pour

chaque s´

erie. La d´

ependance entre ces termes d’erreur est ensuite mod´

elis´

ee par une copule avec change-

ment de r´

egime. L’algorithme EM est utilis´

e pour estimer les param`

etres et une proc´

edure s´

equentielle de

tests d’ad´

equation bas´

es sur la statistique de Cram´

er-von Mises est propos´

ee pour s´

electionner le nombre

appropri´

e de r´

egimes. Nous r´

ealisons une s´

erie d’exp´

eriences num´

eriques aﬁn d’´

evaluer la validit´

e et la per-

formance de la m´

ethodologie propos´

ee. Comme exemple d’application, nous ´

evaluons le prix d’une option

de vente europ´

eenne sur le rendement maximal de deux titres en utilisant un mod`

ele de copule `

a change-

ment de r´

egime. Finalement, aﬁn de faciliter l’utilisation future de la m´

ethodologie propos´

ee, nous avons

construit une librairie de fonctions bas´

ee sur le progiciel R, qui s’intitule HMMcopula, et qui est disponible

gratuitement sur CRAN. La revue canadienne de statistique xx: 1–25; 2019 c

2019 Soci´

et´

e statistique

du Canada

1. INTRODUCTION

In ﬁnance, many instruments are based on several risky assets and their evaluation rest on the

joint distribution of these assets. In fact, to determine this joint distribution, we must take into

account the serial dependence in each asset, as well as the dependence between the assets. Under-

estimating the latter can have devastating ﬁnancial and economic consequences, as exempliﬁed

by the 2008 ﬁnancial crisis. We must also consider that the dependence may vary with time, po-

tentially increasing in crisis periods. Some ways to take into account time-varying dependence

have been proposed. Recently, Adams et al. (2017) ﬁtted DCC-GARCH models (Engle, 2002)

*Author to whom correspondence may be addressed.

E-mail: bouchra.nasri@gmail.com

c

2019 Statistical Society of Canada / Soci´

et´

e statistique du Canada

CJS ???

2 Vol. xx, No. yy

to multivariate time series, which is a bit restrictive in terms of dependence since it is based on

the multivariate Gaussian distribution. To overcome this limitation, and because copulas are spe-

cially designed to model dependence, it is no wonder that many time-varying dependence models

are based on copulas.

To our knowledge, the ﬁrst papers involving time-dependent copulas were Patton (2004) and

van den Goorbergh et al. (2005).InPatton (2004), the author ﬁtted a Gaussian copula on monthly

returns (assumed independent), where the correlation parameter was a function of covariates. In

van den Goorbergh et al. (2005), the authors, in order to evaluate call-on-max options, ﬁtted

a copula family to the residuals of two GARCH time series, with a parameter expressed as a

function of the volatilities. Note that both are special cases of what is now known as single-

index copula (Fermanian and Lopez, 2018). One can also use the methodology proposed in Nasri

and R´

emillard (2019), where generalized error models are ﬁtted to each time series, and the

underlying copula has time-dependent parameters. In order to be able to take into account abrupt

changes in the dependence, it can be appropriate to use regime-switching copulas.

This approach has been proposed recently for vines in St¨

ober and Czado (2014), Fink et al.

(2017) and for hierarchical Archimedean models in H¨

ardle et al. (2015). In all cases, the dynamic

models for the marginal distributions were ARMA-GARCH, and there was no formal test of

goodness-of-ﬁt. The selection of the number of regimes was based on comparisons of likelihoods,

using also rolling windows. There is not yet a theory supporting this method in our setting but

results from Capp´

e et al. (2005)[Chapter 15] shows that the BIC selection criterion works for

HMM with a discrete ﬁnite state space for the observations; unfortunately, here this hypothesis

is not met.

In this article, we propose a formal goodness-of-ﬁt test for regime-switching copulas, which

was not done before. As a by-product, we obtain another way to select the number of regimes

based on P-values. More precisely, in Section 2, we describe the model for the time series and we

deﬁne regime-switching copulas. In Section 3, we detail the estimation procedure, the goodness-

of-ﬁt test, and the selection of the number of regimes. Numerical experiments are performed in

Section 4to assess the validity of the procedures to choose the number of regimes. In Section 5,

we give an example of application for option pricing, along the same lines as van den Goorbergh

et al. (2005) but with different data. Note that we have built a R package for regime switching

copula models, HMMcopula available at CRAN (Thioub et al., 2018).

2. LINKING MULTIVARIATE TIME SERIES WITH REGIME-SWITCHING COPULAS

To introduce copula-based models, we proceed in two steps: ﬁrst, for each univariate time se-

ries, we use a “generalized error model” (Du, 2016) to produce iid univariate series; second,

regime-switching copulas are ﬁtted to these series. To ﬁx ideas, let Xt= (X1t, . . . , Xdt), be

a multivariate time series. For each j∈ {1, . . . , d}, let Fj,t−1contains information from the

past of Xj1,...Xj,t−1, and possibly information from exogenous variables as well. Further

set Ft=∨d

j=1Fj,t . Assume that for each j∈ {1, . . . , d}, there exist continuous, increasing,

and Fj,t−1-measurable functions Gα,jt so that εjt =Gα,jt(Xjt)are iid with continuous dis-

tribution function Fjand density fj, for some unknown parameter α∈ A. Note that stochas-

tic volatility models and Hidden Markov models (HMM) are particular cases of generalized

error models. Next, to introduce the dependence between the time series, we choose a se-

quence of Ft−1-measurable copulas Ct, so that the joint conditional distribution function Kt

of εt= (ε1t, . . . , εdt)given Ft−1is Kt(x) = Ct{F(x)}, with F(x)=(F1(x1), . . . , Fd(xd))>,

for any x= (x1, . . . , xd)>∈Rd. In particular, Ut=F(εt)∼Ct, for every t∈ {1, . . . , n}.

This way of modeling dependence between several time series is usually applied to innova-

tions of stochastic volatility models (van den Goorbergh et al., 2005; Chen and Fan, 2006; Pat-

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 3

ton, 2006; R´

emillard, 2017). For example, suppose that X1t=µ1t(α) + σ1t(α)ε1t,ε1t∼F1,

where µ1tand σ1tare F1,t−1-measurable, and the innovations ε1tare independent of F1,t−1. In

this case, one could take Gα,1t(x1) = x1−µ1t(α)

σ1t(α), and then ε1t=Gα,1t(X1t)∼F1. We can also

consider Gaussian HMM models for some univariate time series. For example, suppose that there

exists a Markov chain ston {1, . . . , m}with transition matrix Qso that given s1=i1, . . . , sn=

in,X11, . . . , X1nare independent, and X1t∼N(µit, σ2

it). If ηt−1(k)is the probability of being

in regime k∈ {1, . . . , m}at time t−1given the past observations X11, . . . , X1,t−1, then the

conditional distribution G1tof X1tgiven the past is G1t(x) = Pm

k=1 Wt−1(k)F(k)(x), where

Wt−1(k) = Pm

j=1 ηt−1(j)Qjk is the probability of being in regime kat time tgiven the past

observations, and F(k)is the cdf of a Gaussian distribution with mean µkand variance σ2

k. It

then follows that the sequence U1t=G1t(X1t)are iid uniform random variables.

After having chosen the generalized error models for each univariate time series, we need

to choose the regime-switching copula model Ctfor the multivariate series Ut. This means that

there exists a ﬁnite Markov chain τton {1, . . . , `}with transition matrix P, so that given τ1=

i1, . . . , τn=in,U1,...,Unare independent, and Ut∼Cβit,t∈ {1, . . . , n}, where {Cβ;β∈

B} is a given parametric copula family. Also we assume the usual smoothness conditions on the

associated densities cβso that the pseudo-maximum likelihood estimator exists. Note that for

a given j∈ {1, . . . , d}, one needs that the values Ujt,t∈ {1, . . . , n}, are iid uniform. This is

indeed true as proven in the following theorem.

Theorem 1. Suppose that the multivariate time series Uthas distribution function Ctgiven

Ft−1. Then for any given j∈ {1, . . . , d}, the values Ujt ,t∈ {1, . . . , n}, are iid uniform.

Proof of Theorem 1. F. or simplicity, suppose that j= 1. By hypothesis, P(U1t≤

u1, . . . , Udt ≤ud|Ft−1) = Ct(u1, . . . , ud). From the properties of copulas, one gets that

P(U1t≤u1|Ft−1) = Ct(u1,1,...,1) = u1. As a result, one may conclude that U11, . . . , U1n

are iid uniform.

Since the generalized errors εtare not observable, αbeing unknown, the latter must be

estimated by a consistent estimator αn. One can then compute the pseudo-observations

en,t = (en,1t, . . . , en,dt)>=Gαn,t (Xt), where en,jt =Gαn,jt (Xjt),j∈ {1, . . . , d}and t∈

{1, . . . , n}. Using these pseudo-observations might be a problem, but in Nasri and R´

emillard

(2019), it was shown that using the normalized ranks of these pseudo-observations, one can esti-

mate the parameters β1,...,β`and P, as if one was observing U1,...,Un. The same applies

to the goodness-of-ﬁt test that will be deﬁned in Section 3.2.

Based on Theorem 2, note that in order to simulate the multivariate time series, it suf-

ﬁces to generate Ut= (U1t, . . . , Udt)according to the regime-switching copula model, set

εjt =F−1

j(Ujt ), and then compute Xjt =G−1

α,jt (εjt),j∈ {1, . . . , d}, and t∈ {1, . . . , n}.

3. ESTIMATION AND GOODNESS-OF-FIT TEST

We ﬁrst present general regime-switching models which can be applied to univariate time series

or copula. Then, we describe an estimation procedure and a goodness-of ﬁt test for regime-

switching copula models. Finally, we propose a sequential procedure for selecting the optimal

number of regimes.

3.1. General regime-switching models

Let τtbe a homogeneous discrete-time Markov chain on S={1, . . . , `}, with transition proba-

bility matrix Pon S×S. Given τ1=k1, . . . , τn=kn, the observations Y= (Y1, . . . , Yn)are

independent with densities gβkt,t∈ {1, . . . , n}. Set θ= (β1,...,β`, P ). Then the joint density

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

4 Vol. xx, No. yy

of τ= (τ1, . . . , τn)and Yis

fθ(τ, Y) = n

Y

t=1

Pτt−1,τt!×

n

Y

t=1

gβτt(Yt),(1)

so one can write

log fθ(τ, Y) =

n

X

t=1

log Pτt−1,τt+

n

X

t=1

log gβτt(Yt).(2)

Because the regimes τtare not observable, an easy way to estimate the parameters is to use

the EM algorithm (Dempster et al., 1977), which proceeds in two steps: expectation (E step),

where Qy(˜

θ,θ) = Eθ{log f˜

θ(τ, Y)|Y=y}is computed, and maximization (M step), where

one computes

θ(k+1) = arg max

θQyθ,θ(k),

starting from an initial value θ(0). As k→ ∞,θ(k)converges to the maximum likelihood es-

timator of the density of Y. The formulas for the EM steps are given in Appendix 6. As a

particular case of regime-switching models, if Pij =νj, then one gets mixture models. In this

case τ1, . . . , τnare iid. The simpliﬁed formulas for the EM steps are given in Appendix . For

application to copulas, the density gβis the density of a parametric family of copulas Cβ, with

β∈ B. However Y1, . . . , Ynare not observable so they must be replaced by the normalized ranks

of the pseudo-observations en,t, i.e., Yjt = rank(en,j t)/(n+ 1).

3.2. Goodness-of-ﬁt

In this section, we propose a methodology to perform a goodness-of-ﬁt test on a multivariate

time series, by using the Rosenblatt’s transform. First, following R´

emillard (2013), under the

general regime-switching model described in Section 3.1, the conditional density ftof Ytgiven

Y1, . . . , Yt−1can be expressed as a mixture viz.

ft(yt|y1, . . . , yt−1) =

`

X

i=1

f(i)(yt)

`

X

j=1

ηt−1(j)Pji =

`

X

i=1

f(i)(yt)Wt−1(i),(3)

where f(i)=gβiand

Wt−1(i) =

`

X

j=1

ηt−1(j)Pji , i ∈ {1. . . `},(4)

ηt(j) = f(j)(yt)

Zt|t−1

`

X

i=1

ηt−1(i)Pij , j ∈ {1, . . . , `},(5)

Zt|t−1=

`

X

j=1

f(j)(yt)

`

X

i=1

ηt−1(i)Pij .(6)

Note that formulas (3)–(6) also hold for univariate Gaussian HMM; in this case, f(j)is the

Gaussian density with mean µjand variance σ2

j. Next, let i∈ {1, . . . , `}be ﬁxed and suppose

that Z= (Z1, . . . , Zd)has density f(i). For any q∈ {1, . . . , d}, denote by f(i)

1:qthe density of

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 5

(Z1, . . . , Zq). Also, let f(i)

qbe the conditional density of Zqgiven Z1, . . . , Zq−1. Further denote

by F(i)

qthe distribution function corresponding to density f(i)

q. The Rosenblatt’s transform Ψt

corresponding to the density (3) conditional on y1, . . . , yt−1∈Rdis given by

Ψ(1)

t(y1t) =

`

X

i=1

Wt−1(i)F(i)

1(y1t),(7)

and for q∈ {2, . . . , d},

Ψ(q)

t(y1t, . . . , yqt) = P`

i=1 Wt−1(i)f(i)

1:q−1(y1t, . . . , yq−1,t)F(i)

q(ytq)

P`

i=1 Wt−1(i)f(i)

1:q−1(y1t, . . . , yq−1,t).(8)

Suppose now that U1,...,Unis a random sample of size nof d-dimensional vectors drawn

from a joint continuous distribution Pbelonging to a parametric family of regime-switching

copula models with `regimes. Formally, the hypothesis to be tested is

H0:P∈ P ={Pθ;θ∈ O} vs H1:P/∈ P.

Under H0, it follows that V1= Ψ1(U1,θ),V2= Ψ2(U1,U2,θ),...,Vn=

Ψn(U1,...,Un,θ)are iid uniform over (0,1)d, where Ψ1(·,θ),...,Ψn(·,θ)are the

Rosenblatt’s transforms for the true parameters θ∈ O. However, θmust be estimated, say by

θn. Also, the random vectors U1,...,Unare not observable, so they must be replaced by the

normalized ranks un,t of the pseudo-observations en,t,t∈ {1, . . . , n}. Then, deﬁne the pseudo-

observations Vn,t = Ψt(un,t,θn),t∈ {1, . . . , n}, and for any u= (u1, . . . , ud)∈[0,1]d,

deﬁne the empirical process Dn(u) = 1

nPn

t=1 Qd

j=1 1(Vn,jt ≤uj). To test H0against H1,

Genest et al. (2009) suggest to use the Cram´

er-von Mises type statistic Sndeﬁned by

Sn=Sn(Vn,1,...,Vn,n) = nZ[0,1]d

Dn(u)−

d

Y

j=1

uj

2

du

=1

n

n

X

t=1

n

X

i=1

d

Y

q=1

{1−max (Vn,qt, Vn,q i)} − 1

2d−1

n

X

t=1

d

Y

q=1 1−V2

n,qt+n

3d.

We can interpret Snas the distance of our empirical distribution and the independence copula.

Since Vn,t ,t∈ {1, . . . , n}are almost uniformly distributed over (0,1)dunder H0, large values

of Snlead to the rejection of the null hypothesis. Unfortunately, the limiting distribution of the

test statistic will depend on the unknown parameter θ, but it does not depend on the estimated

parameters of the univariate time series (Nasri and R´

emillard, 2019). Therefore, we will use the

parametric bootstrap described in Algorithm 1to estimate P-values.

Algorithm 1 For a given number of regimes `, get estimator θnof θusing the EM algorithm

described in Section 3.1, applied to the pseudo-observations un,t,t∈ {1, . . . , n}. Then compute

the statistic Sn=Sn(Vn,1,...,Vn,n), using the pseudo-observations Vn,t = Ψt(un,t ,θn),

t∈ {1, . . . , n}. Then for k= 1, . . . , B,Blarge enough, repeat the following steps:

•Generate a random sample U∗

1,...,U∗

nfrom distribution Pθn, i.e., from a regime-switching

copula model with parameter θn.

•Get the estimator θ∗

nfrom U∗

1,...,U∗

n.

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

6 Vol. xx, No. yy

•Compute the normalized ranks u∗

n,1,...,u∗

n,n from U∗

1,...,U∗

n.

•Compute the pseudo-observations V∗

n,t =Ψtu∗

n,t,θ∗

n,t∈ {1, . . . , n}and calculate S(k)

n=

SnV∗

n,1,...,V∗

n,n.

Then, an approximate P-value for the test based on the Cram´

er-von Mises statistic Snis

given by

1

B

B

X

k=1

1S(k)

n> Sn.

3.3. Selecting the number of regimes

There are many ways one could select the copula model and the number of regimes. In the lit-

erature on regime-switching copulas, see, e.g., St¨

ober and Czado (2014), Fink et al. (2017), it is

often suggested to choose the model with the smallest AIC/BIC. However, there is no empirical

study backing up this idea. Note also that model selection based on AIC or BIC does not guaran-

tee that the model is correct. This is why one could also rely on the goodness-of-ﬁt test described

in the previous section. In the case of a Gaussian HMM, R´

emillard (2013) suggested to choose

the number of regimes `∗as the ﬁrst `for which the P-value is larger than 5%. We will also use

the same idea here. The consistency of all these procedures is investigated numerically in Section

4.

4. NUMERICAL EXPERIMENTS

In this section we consider Monte Carlo experiments for assessing the power of the proposed

goodness-of-ﬁt test and the validity of the procedures proposed in Section 3.3. To this end, we

generated random samples of size n∈ {250,500,1000}from four regime-switching bivariate

copula families: Clayton, Frank, Gaussian, and Gumbel with one, two, and three regimes. For

the 1-regime model, all copulas have a Kendall’s τ=.5, while for the 2-regime copula, we

took τ=.25 for regime 1 and τ= 0.75 for regime 2, with transition matrix P= 0.25 0.75

0.50 0.50 !.

Finally, for the 3-regime copula, we took τ=.25 for regime 1, τ= 0.5for regime 2, and τ=

0.75 for regime 3, with transition matrix P=

0.5 0.25 0.25

0.3 0.5 0.2

0.1 0.3 0.6

.

For each sample size and for each model with a given number of regimes, we performed

1000 replications and in each replication, when needed, B= 100 bootstrap samples were used

to compute the P-value of the test statistic Sn.

In the ﬁrst set of experiments, we assess the power of the proposed goodness-of-ﬁt test for

different copula families. To this end, we ﬁx the number of regimes `∈ {2,3}and vary the

copula family. The results displayed in Table 1show that for two regimes, the empirical levels

are not signiﬁcantly different from the target value of 5%. For three regimes, for all but the

Frank copula with n= 250, the empirical levels are not signiﬁcantly different from 5%. Next,

for two and three regimes, the estimated power is quite good, and as expected, it increases with

the sample size. From these results, we may conclude that the goodness-of-ﬁt test can distinguish

between copula families when the number of regimes is ﬁxed.

In the second set of experiments, we assess the power of the goodness-of-ﬁt test for different

regimes. To this end, we ﬁx the copula family and we vary the number of regimes `∈ {1,2,3}.

We observe from Table 2that for all models, the empirical levels are not signiﬁcantly different

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 7

TABL E 1: Percentage of rejection of H0at the 5% level for copula models with `∈ {2,3}regimes, with

N= 1000 replications and B= 100 bootstrap samples.

Copula family under H0

`= 2 `= 3

H1Clayton Frank Gaussian Gumbel Clayton Frank Gaussian Gumbel

n= 250

Clayton 7.6 45.8 54.9 89.8 3.9 65.1 51.9 92.5

Frank 24.4 4.8 14.5 36.1 21.7 10.5 10.2 36.2

Gaussian 20.1 8.2 4.8 18.6 23.2 19.9 3.3 16.1

Gumbel 41.3 16.7 8.5 4.4 36.0 29.2 14.7 5.4

n= 500

Clayton 6.5 77.2 92.5 100 5.20 94.6 93.6 100

Frank 65.3 5.5 31.6 80.3 75.4 8.0 20.6 85.8

Gaussian 60.2 12.3 5.5 39.4 70.3 25.6 4.6 43.7

Gumbel 89.4 16.7 18.2 6.3 90.3 47.3 28.7 4.7

n= 1000

Clayton 5.3 99.5 100 100 5.4 100.0 100.0 100.0

Frank 99.0 5.3 67.8 100 99.9 7.0 58.7 99.8

Gaussian 97.3 29.0 5.8 77.3 70.3 24.8 3.6 43.6

Gumbel 100 43.3 41.5 5.4 99.9 82.0 66.0 6.1

from the target value of 5%. Also, when the true number of regimes is one, the percentage of

rejection of the null hypothesis of two or three regimes is also about 5%. Next, when the true

number of regimes is two or three, then the null hypothesis of one regime is easily rejected, while

generally, the percentage of rejection of the hypothesis of three (resp. two) regimes is about 5%

when there are in fact two (resp. three) regimes. We can conclude that when there are more than

one regime, the goodness-of-ﬁt test rejects easily the null hypothesis of one regime. However,

when the true number of regimes is ` > 2, to get a good power for rejecting kregimes, with

1< k < `, one needs a large sample size. Furthermore, it seems that the goodness-of-ﬁt test is

likely to accept a model with more regimes than necessary. This justiﬁes that we should select

the least number of regimes with a P-value larger than 5%. This procedure is investigated next

when there are one or two regimes. The results, displayed in Table 3, show that the proposed

methodology works ﬁne, especially when the sample size is large enough. We also tried this

method of selection when there are three regimes but the results were not satisfactory, because

most of the time, two regimes were selected, in agreement with the results in Table 2.

Finally, we repeated the second set of experiments using the AIC and BIC criteria instead

of the goodness-of-ﬁt test for a sample size of n= 1000 only. The results are given in Table 4

and they show that when the true model has one or two regimes, both criteria are quite good,

the better one being the BIC. However, when the model has three regimes, the true number of

regimes is almost never discovered with the BIC, while the percentage is a bit better with the

AIC. These results are similar to those obtained in Table 2by using our goodness-of-ﬁt test.

From all these results, trying to distinguish between two and three regimes seems illusory

when n≤1000. We checked the results of the estimation procedure when there are `≥3

regimes and the estimation errors are too large, because in this case, the number of parame-

ters to estimate is at least `2. We believe that in increasing the sample size enough, the estimation

errors would be smaller and one could then distinguish between two or three regimes. However,

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

8 Vol. xx, No. yy

TABL E 2: Percentage of rejection of H0at the 5% level for the regime-switching Clayton, Frank,

Gaussian, and Gumbel copula models with one, two, and three regimes, using N= 1000 replications and

B= 100 bootstrap samples.

Copula family under H0

Clayton Frank

H11 regime 2 regimes 3 regimes 1 regime 2 regimes 3 regimes

n= 250

1 regime 4.5 3.8 4.0 5.8 5.3 6.4

2 regimes 99.6 7.0 4.5 75.4 5.0 10.2

3 regimes 79.7 5.6 4.2 36.3 7.0 11.6

n= 500

1 regime 4.2 3.9 3.4 5.4 6.0 6.1

2 regimes 100.0 7.0 5.4 96.4 5.2 6.9

3 regimes 98.3 6.2 5.6 66.9 5.2 7.9

n= 1000

1 regime 4.4 4.5 4.5 5.5 5.1 5.3

2 regimes 100.0 7.3 6.1 100.0 5.7 4.8

3 regimes 100.0 6.4 5.3 92.3 5.5 6.1

Gaussian Gumbel

H11 regime 2 regimes 3 regimes 1 regime 2 regimes 3 regimes

n= 250

1 regime 5.0 6.2 6.7 6.7 6.0 4.9

2 regimes 94.4 5.1 4.6 59.2 4.7 4.5

3 regimes 57.6 4.8 4.4 26.1 4.8 4.6

n= 500

1 regime 5.4 5.2 5.3 5.0 4.3 4.6

2 regimes 99.9 5.4 4.9 92.3 5.2 4.2

3 regimes 87.9 5.9 4.5 46.4 5.4 4.9

n= 1000

1 regime 4.9 6.1 5.0 5.6 5.8 6.0

2 regimes 100.0 4.4 4.5 99.8 7.1 5.1

3 regimes 99.9 4.3 4.4 80.3 4.6 4.2

numerical experiments to prove this would require months of computations. In the end, we rec-

ommend to combine goodness-of-ﬁt tests and the AIC/BIC criteria, to ensure at least that the

chosen model is valid.

Note that one should also expect better results for the power of the goodness-of-ﬁt test by

taking a larger number of bootstrap samples. Here, in order to build the tables in a reasonable

time, we restricted ourselves to bootstrap samples of size B= 100, which is quite small. In real

life, we do not repeat the experiments N= 1000 times, so we may use at least B= 1000, espe-

cially when the P-value is around 5%. Furthermore, we did not consider the regime-switching

Student copula since it has more parameters and, according to Table 6, the computation time is

approximately 10 times longer for the Student family than for the Gumbel family, which has the

longest computation time amongst the four other families.

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 9

TABL E 3: Estimation of the number of regimes `∗for N= 1000 replications, using B= 100 bootstrap

samples. Boldface values indicate the percentage of the correct choice of the number of regimes.

Copula family

Clayton Frank Gaussian Gumbel

Number of regimes Number of regimes Number of regimes Number of regimes

`?1 2 1 2 1 2 1 2

n= 250

194.4 0.8 94.8 25.1 95.3 5.4 95.2 37.5

22.3 91.8 1.5 57.0 1.6 88.7 1.5 58.0

30.5 2.9 0.7 2.3 0.4 2.3 0.4 1.2

≥42.8 4.5 3.0 15.6 2.7 3.6 3.6 3.3

n= 500

193.8 094.8 2.4 93.6 0.1 95.2 8.6

22.4 92.3 1.6 76.4 1.9 95.4 1.2 86.7

30.4 1.8 0.7 2.9 0.5 1.6 0.6 1.0

≥43.4 5.9 2.9 18.3 4 2.9 3.0 3.7

n= 1000

195.0 0.0 94.1 0.0 95.6 0.0 94.9 0.1

22.2 94.3 1.7 79.5 0.8 95.2 1.2 93.7

30.6 1.7 0.3 2.1 0.7 1.0 0.4 1.0

≥42.2 4.0 3.9 18.4 2.9 3.8 3.5 5.2

5. APPLICATION TO OPTION PRICING

In this application, we want to evaluate a European put-on-max option on Amazon (amzn)

and Apple (aapl) stocks. The payoff of this option is given by Φ(s1, s2) = max{K−

max(s1, s2),0}, where s1and s2are the values of the stocks at the maturity of the option,

normalized to start at 1$, and Kis the strike price. An investor would be interested in such an

option to protect the returns of his assets, since he will exercise the option if both returns are

lower than log K. Also this option is cheaper than a put-on-min. In order to evaluate this option,

we need ﬁrst to ﬁnd the joint distribution of both assets. Next, we will choose an appropriate risk

neutral probability measure.

5.1. Joint distribution

The ﬁrst step is to ﬁt dynamic models for the univariate time series. To this end, we used the

adjusted prices of Amazon and Apple from January 1, 2015 to June 29, 2018. The sample size is

880 observations for each time series. The 879 daily log-returns of the stocks are shown in Figure

1. Since van den Goorbergh et al. (2005) used GARCH models for the log-returns of the assets

they considered, we also tried to ﬁt GARCH(p,q) models with Gaussian innovations, but we

rejected the null hypothesis for p, q ≤3. We next tried to ﬁt Gaussian HMMs to the log-returns.

Using the selection procedure described in Section 3.2, we obtained a Gaussian HMM with three

regimes for the daily log-returns of Amazon as well as for the daily log-returns of Apple. Here,

the P-values are 38.8% and 15.1% respectively, computed using B= 1000 bootstrap samples.

The estimated parameters for both time series are given in Table 5. Note that the regimes are

ordered by their stationary distribution (ν), meaning that the least frequent regime is 1, and the

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

10 Vol. xx, No. yy

TABL E 4: Percentage of selection of the number of regimes `∈ {1,2,3}for a sample size n= 1000 and

N= 1000 replications, based on the AIC and BIC criteria. Boldface values indicate the percentage of the

correct choice of the number of regimes.

Copula family under H0

Clayton

AIC BIC

H11 regime 2 regimes 3 regimes 1 regime 2 regimes 3 regimes

1 regime 97.4 2.6 0 100 0 0

2 regimes 0 96.4 3.6 0 99.9 0.1

3 regimes 0 90.2 9.8 0 99.9 0.1

Frank

AIC BIC

H11 regime 2 regimes 3 regimes 1 regime 2 regimes 3 regimes

1 regime 97.8 2.2 0 100 0 0

2 regimes 0 97.2 2.8 0 100 0

3 regimes 0 98.2 1.8 4.4 95.6 0

Gaussian

AIC BIC

H11 regime 2 regimes 3 regimes 1 regime 2 regimes 3 regimes

1 regime 98.4 1.6 0 100 0 0

2 regimes 0 98.1 1.9 0 100 0

3 regimes 0 95.3 4.7 0 100 0

Gumbel

AIC BIC

H11 regime 2 regimes 3 regimes 1 regime 2 regimes 3 regimes

1 regime 97.6 2.4 0 100 0 0

2 regimes 0 96.2 3.8 0 100 0

3 regimes 0 96.4 3.6 0.9 99.1 0

most frequent regime is 3. As can be seen from Figure 1, for Amazon, regime 1 corresponds to

large positive returns with a frequency of 2%, while for Apple, regime 1 consists in large negative

values with a frequency of 10%. So in both cases, regime 1 is not a persistent state, while the two

other regimes are much more persistent. For the two stocks, since regime 2 has always a negative

mean, it can be interpreted as a bear market, while for regime 3, the mean µ3is positive, as well

as µ3−σ2

3/2, so this regime can be interpreted as a bull market.

From now on, let X1tdenotes the log-returns of Amazon and let X2tdenotes the log-returns

of Apple, and let F1tand F2tbe the conditional distributions of X1tand X2tgiven the past

observations, corresponding to the densities deﬁned by Equation (3). Further set U1t=F1t(X1t)

and U2t=F2t(X2t). As deﬁned in Section 3, let en,jt =Fn,j t(Xjt ),j= 1,2, be the pseudo-

observations, where Fn,jt is the conditional distribution function computed with the parameters

of Table 5. The graph of the normalized ranks of un,t = (un,1t, un,2t)is displayed in Figure 2.

Next, in order to select the appropriate regime-switching copula model, we performed goodness-

of-ﬁt tests using B= 1000 bootstrap samples to select the copula family and the number of

regimes amongst the Clayton, Frank, Gaussian, Gumbel and Student families. Note that for the 1-

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 11

2015-01-01 2016-01-01 2017-01-01 2018-01-01

-0.1

-0.05

0

0.05

0.1

0.15 Daily returns of Amazon

2015-01-01 2016-01-01 2017-01-01 2018-01-01

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08 Daily returns of Apple

2015-01-01 2016-01-01 2017-01-01 2018-01-01

1

2

3Predicted regimes of Amazon

2015-01-01 2016-01-01 2017-01-01 2018-01-01

1

2

3Predicted regimes of Apple

FIGURE 1: Daily log returns and predicted regimes for Amazon and Apple.

TABL E 5: Estimated parameters for the log-returns of Amazon and Apple, using Gaussian HMM. Here, ν

is the stationary distribution of the regimes, and Qis the transition matrix.

Amazon Apple

Regime Regime

Parameter 1 2 3 1 2 3

µ×10−24.4122 -0.1179 0.1892 -0.2777 -0.0433 0.2215

σ×10−357.2765 22.9170 10.1334 2.4846 20.6456 8.8176

ν0.0199 0.2574 0.7227 0.1015 0.3894 0.5091

Q

0.1572 0.3978 0.4450

0.0545 0.8849 0.0606

0.0038 0.0300 0.9662

0.0674 0.4154 0.5172

0.0000 0.8788 0.1212

0.1859 0.0098 0.8042

regime and 2-regime Student copulas, and for the 2-regime Gaussian copula, we took B= 10000

in order to get more precise results. The corresponding P-values are given in Table 6, together

with the computation time in seconds for B= 1000 bootstrap samples,

and the BIC values. From this table, based on the P-values, we can see that the 2-regime

Gaussian and the 2-regime Student copula models are valid, while the 1-regime Student copula

model is almost acceptable. However, the estimated degrees of freedom of the 2-regime Student

copula model are very large, indicating that it is indeed a 2-regime Gaussian copula. We then

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

12 Vol. xx, No. yy

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Apple

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Amazon

FIGURE 2: Scatter plot of the normalized ranks of the pseudo-observations un,t for Apple and Amazon.

restricted the degrees of freedom to be less than 25, and we obtained the upper bounded as their

estimations. Looking at the BIC values, we see that the smallest one is for the 2-regime Gaussian

copula. Based on these results, we choose the 2-regime Gaussian copula as the best model. Its

estimated parameters appear in Table 7.

TABL E 6: P-values (in percentage) of the different regime-switching copula families, together with the

computation time in seconds and the BIC criterion.

Copula family

Clayton Frank Gaussian Gumbel Student

Number of regimes Number of regimes Number of regimes Number of regimes Number of regimes

1 2 1 2 1 2 1 2 1 2

P-value 0.0 1.0 0.6 0.8 0.0 9.8 0.0 0.0 4.4 10.1

Sec. 195 1272 128 490 175 712 198 1342 1715 15386

BIC -174.0 -181.3 -188.8 -190.8 -179.6 -205.3 -168.7 -182.3 -201.8 -191.8

TABL E 7: Estimated parameters for the 2-regime Gaussian copula. Here, τis Kendall’s tau,

ρ= sin(πτ /2) is the correlation coefﬁcient of the copula, νis the stationary distribution of the regimes,

and Pis the transition matrix.

Parameter Regime 1 Regime 2

τ0.0859 0.5816

ρ0.1346 0.7917

ν0.5209 0.4791

P 0.7414 0.2586

0.2812 0.7188 !

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 13

5.2. Bivariate option pricing

In order to price an option with payoff Φ(S1n, S2n)over ntrading days, we perform a Monte

Carlo simulation under a risk neutral measure. First, as in van den Goorbergh et al. (2005), we

assume that the selected regime-switching copula model with parameters appearing in Table 7

is also valid under the risk neutral measure. Next, for the dynamic models of both time series,

we assume that we still have Gaussian HMM, but with new parameters, namely ˜µjk =r−σ2

jk

2

,˜σjk =σjk, and ˜

Q(j)=Q(j), where ris the risk free daily interest rate. This way, under the

risk neutral measure, the discounted prices e−rtSj t =ePt

i=1(Xj i−r)form a martingale, for each

j= 1,2.

The following steps illustrate the procedure to evaluate a European option with payoff Φin

the case of a general regime-switching copula with `regimes, where each univariate time series

is modeled by a Gaussian HMM with mjregimes and parameters ˜µj1,...,˜µj`j,˜σj1,...,˜σj`j,

˜

Q(j):

1. Generate Ut,t∈ {1, . . . , n}, from the regime-switching copula model.

2. For t∈ {1, . . . , n}, and j= 1,2, compute the conditional distribution function Fjt under

the risk neutral measure, and set Xjt =F−1

jt (Ujt).

3. For j= 1,2, compute Sjn =ePn

t=1 Xji .

4. Repeat Ntimes steps 1−3, in order to get Nindependent values of (S1n, S2n).

The value of the option is then approximated by the average of the discounted values

e−rnΦ(S1n, S2n).

To evaluate the put-on-max option, we used N= 10000 simulations, with a maturity of n=

20 trading days and a risk free rate r= 4%.

Figure 3displays the price of the option as a function of the strike Kfor the best model,

i.e., the 2-regime Gaussian copula, versus the other four 2-regime copula families. As expected,

the prices given by the 2-regime Gaussian copula and the 2-regime Student are almost identical.

Note that the prices of the 2-regime Frank and Gumbel copulas are always lower than those of

the 2-regime Gaussian copula, while those of the 2-regime Clayton copula are higher than those

of the 2-regime Gaussian copula when the strike value is lower than 1.0005, and are lower when

the strike is larger than 1.0005.

6. CONCLUSION

In this paper, for a regime-switching copula model, we proposed a methodology based on a

goodness-of-ﬁt test to select the copula family and the number of regimes. This methodology

can also be used for mixtures of copula models, as well as for univariate HMM. We performed

Monte Carlo simulations with a sample size n∈ {250,500,1000}, and we showed that the level

of the goodness-of-ﬁt test is correct and that it is powerful enough to distinguish between regime-

switching copula families and also to detect if there is more than one regime. The proposed

procedure for selecting the number of regimes works when the sample size is large enough and

there are less than three regimes. For three regimes or more, the sample size must be larger than

1000. As an example of application, we showed how to evaluate a European put-on-max option,

but the proposed methodology can also be applied to a wide range of options on multivariate

assets. The empirical results emphasize the importance of choosing the correct copula family.

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

14 Vol. xx, No. yy

0.98 0.985 0.99 0.995 1 1.005 1.01 1.015 1.02

Strike

0

20

40

60

80

100

120

140

160

180

200

Price

Gaussian

Clayton

0.98 0.985 0.99 0.995 1 1.005 1.01 1.015 1.02

Strike

0

20

40

60

80

100

120

140

160

180

200

Price

Gaussian

Frank

0.98 0.985 0.99 0.995 1 1.005 1.01 1.015 1.02

Strike

0

20

40

60

80

100

120

140

160

180

200

Price

Gaussian

Gumbel

0.98 0.985 0.99 0.995 1 1.005 1.01 1.015 1.02

Strike

0

20

40

60

80

100

120

140

160

180

200

Price

Gaussian

Student

FIGURE 3: Comparison of put-on-max prices for n= 20 trading days maturity, as a function of the strike,

between a 2-regime Gaussian copula and 2-regime Clayton, Frank, Gumbel and Student copula models.

Acknowledgments

The authors are grateful to the Guest Editor, Cody Hyndman and two anonymous referees for

their comments and suggestions. Partial funding in support of this work was provided by the

Natural Sciences and Engineering Research Council of Canada (Grant 04430–2014), the Cana-

dian Statistical Sciences Institute (postdoctoral fellowship), the Fonds de recherche du Qu´

ebec –

Nature et technologies (2015–PR–183236 and postdoctoral fellowship 259667), and the Groupe

d’´

etudes et de recherche en analyse des d´

ecisions (postdoctoral fellowship).

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 15

BIBLIOGRAPHY

Adams, Z., F¨

uss, R., and Gl¨

uck, T. (2017). Are correlations constant? Empirical and theoretical results on

popular correlation models in ﬁnance. Journal of Banking & Finance, 84:9–24.

Capp´

e, O., Moulines, E., and Ryd´

en, T. (2005). Inference in Hidden Markov Models. Springer Series in

Statistics. Springer, New York.

Chen, X. and Fan, Y. (2006). Estimation of copula-based semiparametric time series models. Journal of

Econometrics, 130:307–335.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the

EM algorithm. J. Roy. Statist. Soc. Ser. B, 39:1–38.

Du, Z. (2016). Nonparametric bootstrap tests for independence of generalized errors. The Econometrics

Journal, 19(1):55–83.

Engle, R. (2002). Dynamic conditional correlation. Journal of Business & Economic Statistics, 20(3):339–

350.

Fermanian, J.-D. and Lopez, O. (2018). Single-index copulas. Journal of Multivariate Analysis, 165:27–55.

Fink, H., Klimova, Y., Czado, C., and St¨

ober, J. (2017). Regime switching vine copula models for global

equity and volatility indices. Econometrics, 5(1):1–38.

Genest, C., R´

emillard, B., and Beaudoin, D. (2009). Omnibus goodness-of-ﬁt tests for copulas: A review

and a power study. Insurance Math. Econom., 44:199–213.

H¨

ardle, W. K., Okhrin, O., and Wang, W. (2015). Hidden Markov structures for dynamic copulae. Econo-

metric Theory, 31(5):981—-1015.

Nasri, B. and R´

emillard, B. (2019). Copula-based dynamic models for multivariate time series. Journal of

Multivariate Analysis, 172:107–121.

Nasri, B. R., R´

emillard, B. N., and Bouezmarni, T. (2019). Semi-parametric copula-based models under

non-stationarity. Journal of Multivariate Analysis, 173:347–365.

Patton, A. J. (2004). On the out-of-sample importance of skewness and asymmetric dependence for asset

allocation. Journal of Financial Econometrics, 2(1):130–168.

Patton, A. J. (2006). Modelling asymmetric exchange rate dependence. International Economic Review,

47(2):527–556.

R´

emillard, B. (2013). Statistical Methods for Financial Engineering. Chapman and Hall/CRC Financial

Mathematics Series. Taylor & Francis.

R´

emillard, B. (2017). Goodness-of-ﬁt tests for copulas of multivariate time series. Econometrics, 5(1):13.

R´

emillard, B., Hocquard, A., Lamarre, H., and Papageorgiou, N. A. (2017). Option pricing and hedging for

discrete time regime-switching model. Modern Economy, 8:1005–1032.

St¨

ober, J. and Czado, C. (2014). Regime switches in the dependence structure of multidimensional ﬁnancial

data. Computational Statistics & Data Analysis, 76:672–686. CFEnetwork: The Annals of Computa-

tional and Financial Econometrics.

Thioub M. Y., Nasri, B. R., Pieugueu, R., and R ´

emillard, B. N. (2019), HMMcopula, R package version

1.0.3.

van den Goorbergh, R. W. J., Genest, C., and Werker, B. J. M. (2005). Bivariate option pricing using

dynamic copula models. Insurance: Mathematics and Economics, 37:101–114.

APPENDIX

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

16 Vol. xx, No. yy

Estimation for general regime-switching models

E-Step

Set ˜

θ= ( ˜

β1,..., ˜

β`,˜

P). Then, according to (R´

emillard, 2013, Appendix 10.A),

Qy(˜

θ,θ) = Eθ{log f˜

θ(τ, Y )|Y=y}

=

n

X

t=1 X

j∈SX

k∈S

Pθ(τt−1=j, τt=k|Y=y) log ˜

Pjk +

n

X

t=1 X

j∈S

Pθ(τt=j|Y=y) log g˜

βj(yt)

=

n

X

t=1 X

j∈SX

k∈S

Λθ,t(j, k) log ˜

Pjk +

n

X

t=1 X

j∈S

λθ,t(j) log g˜

βj(yt),

where λθ,t(j) = P(τt=j|Y=y)and Λθ,t(j, k ) = P(τt−1=j, τt=k|Y=y), for all t∈

{1, . . . , n}and j, k ∈S. Next, deﬁne for all j∈S,¯ηθ,n(j)=1/`,ηθ,0(j)=1/`,

¯ηθ,t(j) = P rob(τt=j|yt+1 , . . . , yn), t = 1, . . . , n −1,

ηθ,t(j) = P r ob(τt=j|y1, . . . , yt), t = 1, . . . , n.

It follows easily that for t= 1, . . . , n,ηt(j) = gβj(yt)

Zt|t−1P`

i=1 ηt−1(i)Pij , where

Zt|t−1=

`

X

j=1

gβj(yt)

`

X

i=1

ηt−1(i)Pij .

Next, for all i∈ {1, . . . , l}, and for all t= 0, . . . , n −1,

¯ηθ,t(i) = P`

β=1 ¯ηθ,t+1(β)Piβ gββ(yt+1)

P`

k=1 P`

β=1 ¯ηθ,t+1(β)Pkβ gββ(yt+1 ),

λθ,t(i) = ηθ,t (i)¯ηθ,t(i)

Pl

k=1 ηθ,t(k)¯ηθ,t(k).

Hence, for all i, j ∈ {1, . . . , l}, and for all t= 1, . . . , n,

Λθ,t(i, j ) = Pij ηθ,t−1(i) ¯ηθ,t(j)gβj(yt)

Pl

k=1 Pl

β=1 Pkβ ηθ,t−1(k)¯ηθ,t(β)gββ(yt).

As a result, for all i∈ {1, . . . , l}, and for every t= 1, . . . , n,Pl

j=1 Λθ,t(i, j ) = λθ,t−1(i).

M-Step

For this step, given θ(k),θ(k+1) is deﬁned as θ(k+1) = arg maxθQyθ,θ(k). Setting λ(k)

t(i) =

λθ(k),t(i)and Λ(k)

t(i, j)=Λθ(k),t (i, j), it follows from Section that

θ(k+1) = arg max

θ

n

X

t=1 X

i,j∈S

Λ(k)

t(i, j) log Pij +

n

X

t=1 X

i∈S

λ(k)

t(i) log gβi(yt).

The Canadian Journal of Statistics / La revue canadienne de statistique DOI:

2019 GOODNESS-OF-FIT FOR REGIME-SWITCHING COPULA MODELS 17

Using Lagrange multipliers, the function to maximize is h(θ, ψ), where ψ= (ψ1, . . . , ψ`), and

h(θ, ψ) =

n

X

t=1 X

i,j∈S

Λ(k)

t(i, j) log Pij +

n

X

t=1 X

i∈S

λ(k)

t(i) log gβi(yt) +

l

X

i=1

ψi

1−

`

X

j=1

Pij

.

For i, j ∈Swe have ∂h

∂Pi,j =Pn

t=1 Λ(k)

t(i, j)1

Pij −ψi. As a result, for any i, j ∈S, the partial

derivative of hwith respect to Pij is zero if and only if ψiPij =Pn

t=1 Λ(k)

t(i, j). Summing over

jyields that

ψi=

`

X

j=1

ψiPij =

`

X

j=1

n

X

t=1

Λ(k)

t(i, j) =

n

X

t=1

λ(k)

t−1(i) =

n

X

t=1

λθ(k),t−1(i).

Hence P(k+1)

ij =Pn

t=1 Λ(k)

t(i, j).Pn

t=1 λ(k)

t−1(i). Also, maximizing hwith respect to

β1,...,β`amounts to maximize Pn

t=1 P`

i=1 λ(k)

t(i) log gβi(yt)with respect to βi, for all i∈S.

Estimation for general mixture models

This model is a particular case of regime-switching where Pij =νj,j∈ {1, . . . , `}. So, under

this model, τtis a sequence of iid observations with distribution ν= (ν1, . . . , ν`). The algo-

rithm described previously can then be simpliﬁed. To this end, set θ= (β1,...,β`, ν ). The joint

density of τ= (τ1, . . . , τn)and Yis fθ(τ, Y )=(Qn

t=1 ντt)×Qn

t=1 gβτt(Yt), yielding

log fθ(τ, Y ) =

n

X

t=1

log Pτt−1,τt+

n

X

t=1

log gβτt(Yt).

E-Step

Set ˜

θ= ( ˜

β1,..., ˜

β`,˜ν). Then, according to the previous computations,

Qy(˜

θ,θ) = Eθ{log f˜

θ(τ, Y )|Y=y}=

n

X

t=1 X

j∈S

λθ,t(j)log ˜νj+ log g˜

βj(Yt),(1)

where λθ,t(j) = Pθ(τt=j|Y=y) = ˜νjg˜

βτt(yt)

P`

k=1 ˜νkg˜

βk(yt)for all t∈ {1, . . . , n}and j∈S.

M-Step

For this step, given θ(k),θ(k+1) is deﬁned as θ(k+1) = arg maxθQyθ,θ(k). Setting λ(k)

t(i) =

λθ(k),t(i), one obtains

Qy(˜

θ,θ) =

n

X

t=1

`

X

j=1

λ(k)

t(j)log ˜νj+ log g˜

βj(Yt)

=

n

X

t=1

`

X

j=1

λ(k)

t(j) log ˜νj+

n

X

t=1

`

X

j=1

λ(k)

t(j) log g˜

βj(Yt).

DOI: The Canadian Journal of Statistics / La revue canadienne de statistique

18 Vol. xx, No. yy

For j∈Swe have, ∂ Qy

∂˜νj=n

˜νjPn

t=1 λ(k)

t(j). Hence ˜νj(k+1) =Pn

t=1 λ(k)

t(j)

n,j∈ {1, . . . , l}, and

˜

βj

(k+1) = arg max

˜

βj

n

X

t=1

`

X

j=1

λ(k)

t(j) log g˜

βj(yt).

Received 10 March 2019

Accepted 18 October 2019

The Canadian Journal of Statistics / La revue canadienne de statistique DOI: