Available via license: CC BY 4.0
Content may be subject to copyright.
Approximate Bayes factors for unit root testing
Martin Magris∗1and Alexandros Iosifidis1
1Department of Electrical and Computer Engineering, Aarhus University, Denmark
Abstract
This paper introduces a feasible and practical Bayesian method for unit root testing in financial
time series. We propose a convenient approximation of the Bayes factor in terms of the Bayesian
Information Criterion as a straightforward and effective strategy for testing the unit root hypothe-
sis. Our approximate approach relies on few assumptions, is of general applicability, and preserves a
satisfactory error rate. Among its advantages, it does not require the prior distribution on model’s
parameters to be specified. Our simulation study and empirical application on real exchange rates
show great accordance between the suggested simple approach and both Bayesian and non-Bayesian
alternatives.
Keywords: Unit root inference; Bayesian analysis; Bayes factor; BIC
JEL classification: C11; C12; C22
1 Introduction
In time series analysis the interest is often over some persistence properties of the process in the medium-
long term. In this regard, unit roots are important since they relate to persistence of shocks and non-
mean-reverting dynamics (e.g. Campbell and Perron,1991). In other words, unit roots go in hand with
non-stationarity. This has severe implications in the applicability of standard econometrics techniques,
among which spurious regression (Granger et al.,1974;Phillips,1986) is the most notable one. Non-
stationarity appears to arise quite commonly in economic time series, and with little surprise unit root
testing has been one of the most proficient research areas in econometrics. Most theoretical and empirical
work in the domain of non-stationary time series relies on classical frequentist methods, which stand as
the reference approach.
Among others, (Zellner,1971;Geweke,1988;Poirier,1988) showed that the Bayesian framework
is in general well-suited to inferential problems in econometrics, and it was since (Sims,1988) that a
number of Bayesian methods for unit root testing and the corresponding Bayesian unit root literature
developed. The earliest works appear remarkably optimistic and confident about the impact that a
Bayesian approach could have in unit root testing, even claiming an overall superiority of the Bayesian
approach over classical methods (Sims and Uhlig,1991). On the contrary, the adoption and development
of Bayesian methods would shortly appear to be much less straightforward and very debated. It was
in (Phillips,1991b) that a reconciliation with frequentist methods was first discussed under the light
of impartial and objective Bayesian methods. This work raised several further issues related to the
Bayesian approach in unit root testing that stimulated active research and strong debates. (Phillips,
1991a,b) advanced the idea that the achievement of impartial Bayesian analysis through flat priors on
the parameters is not well-suited in time series models. Actually, flat priors over the autoregressive
parameter achieve the opposite effect and have shown to be quite informative (e.g. Kim and Maddala,
1991;Schotman and Van Dijk,1991b;Phillips,1991b;Leamer,1991, among the others). A long debate
on appropriate uninformative priors for Bayesian unit root inference followed (see Section 2). The
determination of suitable input information through the prior distribution, which is generally the major
reason for the divergence between the classical and Bayesian approach, is tightly bonded to the exact
hypothesis being tested. For a simple autoregressive process of order one, hereafter denoted by AR(1),
∗Corresponding author, magris@ece.au.dk. Submitted for consideration at the 2021 annual conference of the Interna-
tional Association for Applied Econometrics (IAAE).
1
arXiv:2102.10048v1 [econ.EM] 19 Feb 2021
the unit root inference problem corresponds to testing the null hypothesis of the autoregressive parameter
being equal to one. In general, we shall emphasize the exact/point-wise/non-interval nature of an
hypothesis by referring to it as point hypothesis. The goal of testing a point null hypothesis cannot be
easily achieved with continuous priors, as the consequent continuous posterior would assign zero weight
to the unit root hypothesis. Feasible tests can either use discontinuous priors that assign a non-zero
mass to the unit root hypothesis and distribute the reaming one over some interval (e.g. Schotman
and Van Dijk,1991a;DeJong and Whiteman,1991), or test closely-related non-point hypotheses with
continuous priors (Koop,1994, is instructive as it considers three possible nulls). The first approach
is susceptible to poor objectivity (Phillips et al.,1993) while the second one does not properly match
the exact and exclusive purpose of testing the unit root hypothesis (Schotman and Van Dijk,1991b).
Consequently, the Bayesian analysis can be respectively based on two radically different approaches: on
Bayes factors and odds ratios, or Bayes confidence sets and probability intervals over the posterior, with
the first one being of difficult interpretation in terms of traditional p-values (Berger and Delampady,
1987). Furthermore, the inclusion of an intercept, a trend, or any richer structure in a simple AR(1)
model, are not smooth extensions, as prior beliefs on the autoregressive parameter generally change
according to the particular deterministic component added to the model (Schotman and Van Dijk,
1991b). Also, the particular form under which a model is expressed, e.g. “structural” or “reduced” from,
plays a role in unveiling feasible directions for the Bayesian analysis. Not less importantly, (Schotman
and Van Dijk,1991b;Uhlig,1994;Lubrano,1995) underline the importance of the conditioning set, and
in particular the sensitivity of the whole inferential procedure on the first observation. Lastly, Bayesian
methods are generally known for not being keen on simple algebra: also in unit root inference numerical
methods are often required for achieving approximate solutions (e.g. Zivot,1994).
In this paper, we propose an approximate Bayesian unit root testing procedure that mitigates the
above-mentioned criticalities, especially the choice of the prior. In particular, we focus on the simple
AR(1) dynamics, which is a fundamental process and a major ingredient in the theory of non-stationary
time series and unit root testing, and on testing the point null unit root hypothesis. We apply standard
approximation results to obtain a generic Bayesian testing procedure based on approximate Bayes factors
and thus approximate posterior odds. With an asymptotic error rate sufficient to guarantee the appli-
cability of the proposed method also for samples of moderate size, the approximate form of the Bayes
factor is independent on the choice of the priors, it is remarkably simple to compute, and scales to more
complex models as long as their maximum likelihood estimates are attainable. Indeed, our approximate
Bayes factor is formulated as a simple function of the well-known Bayesian Information Criteria (BIC),
being thus easy to implement and attractive for empirical research. Although the BIC approximation
of Bayes factors has proved to be a valid tool in different fields, its use in unit root testing has not
been investigated, though it appears to be very well-suited for this class of problems. In the empirical
sections, we propose a Monte Carlo experiment and an empirical application on real exchange rates, and
analyze the performance of our BIC approach with respect to other competing Bayesian methods and
the most widespread frequentist Dickey-Fuller test (Dickey and Fuller,1981). Our experiment validates
the proposed approach, as it stands out as a feasible and viable simple alternative for Bayesian unit root
inference.
This paper is organized as follows. Section 2reviews the literature on approaches, issues and advances
in Bayesian unit root testing, underlying the numerous problems that arise in this context. Section 3
introduces Bayes factors and generally discusses the Bayesian testing framework based on evidence.
Section 4introduces our suggested testing approach based on the BIC approximation of Bayes factors.
Section 5reports the results of the Monte Carlo study, while an empirical application on real exchange
rates is presented in Section 6. Section 7concludes and suggests directions for future research. The
Appendix collects details on some benchmark Bayesian unit root tests and further results from the
simulation study.
2 Bayesian unit root testing
On the premise that the asymptotic distribution theory changes discontinuously between the station-
ary and unit root case, with the classical hypothesis testing appearing as a not reasonable inferential
procedure opposed to a Bayesian flat prior, the Bayesian analysis of unit root models was first sug-
gested in (Sims,1988). Since then, a wide and rich literature on the field pinpointed the advantages
and flaws of the Bayesian approach making it complex and long-debated. First and most notably, the
2
identification of a suitable prior is in this context remarkably difficult and widely discussed. Besides
this, there are several issues associated with model specification and formulation, the role played by the
initial observation, issues related to the invariance of the prior under different sampling frequencies, and
computational arguments. In the following, we review the most relevant aspects of these issues. For a
more comprehensive overview on the topic see e.g. (Maddala and Kim,1998, Ch. 8), and the articles
in following dedicated special issues: Journal of Applied Econometrics (1991, vol. 6, n.4), Econometric
Theory (1994, vol. 10) and Journal of Econometrics (1995, vol. 69, n.1).
2.1 The choice of the prior
Here we shortly outline the setting based on flat priors adopted, among the others, by (Sims and
Uhlig,1991;Sims,1988;Geweke,1988;Thornber,1967;Zellner,1971;Schotman and Van Dijk,1991b).
Consider the simple AR(1) model:
xt=ρxt−1+ut,
and assume a flat prior for (ρ, σ), π(ρ, σ)∝1/σ with −1<ρ<1, and σ > 0 being the standard
deviation of the normal i.i.d. error u.ρis the autoregressive parameter and ρ= 1 corresponds to the
unit root hypothesis of interest, under which the AR(1) model reduces to a Brownian motion. Be x0
the initial starting value of Tconsecutive observations. With Gaussian likelihood
L(x|ρ, σ, x0) = (2π)−T/2σ−Texp −PT
t=1(xt−ρxt−1)2
2σ2!
the joint posterior for (ρ, σ) is given by
π(ρ, σ|x, x0)∝σ−T−1exp −PT
t=1(xt−ρxt−1)2
2σ2!=σ−T−1exp "−R+ (ρ−ˆρ)2Q
2σ2#,
with ˆρ=Pxtxt−1/Px2
t−1being the OLS estimate for ρ,R=Pˆ2
t=P(xt−ρˆxt)2the residual sum
of squares, and Q=Px2
t−1. For the above joint posterior, one obtains the following margins:
π(ρ|x, x0)∝R+ (ρ−ˆρ)2Q−T/2
π(σ|x, x0)∝σ−Texp−R
2σ2.
Our notation distinguishes between priors and posteriors depending on the conditioning set: π(·) opposed
to π(·|data), respectively.
The marginal posterior for ρhas the form of a symmetric univariate t-distribution, centered around
the OLS estimate ˆρ, while the marginal posterior for σis an inverted gamma-2 distribution (Zellner,
1971). Sims and Uhlig (1991) conclude that classical methods relying on asymmetric distributions of
the OLS estimator of ρ, such as the Dickey-Fuller statistics, attribute too much weight to large values
of ρ, while the above Bayesian framework based on flat prior is a more logical and sounder basis for
inference than classical testing. This argument is advanced by comparing the distributions ρ|ˆρ= 1
and ˆρ|ρ= 1, with the first being the posterior distribution of the true parameter with the estimated
parameter taken as given (Bayesian approach), and the second being the sampling distribution of the
estimated parameter under the value of the true parameter (classical approach). While classical methods
are generally based on an asymmetric and nonstandard distribution for the autoregressive parameter,
Bayesian methods lead to a symmetric and standard posterior. The asymmetry in ˆρ|ρ= 1 drives the
argument that classical procedures based on p-values are misleading.
A similar approach is that developed in (Schotman and Van Dijk,1991b), see Appendix A.1 for a
detailed description. This appears to be the most widespread setting for Bayesian unit root testing,
commonly referenced also in the recent literature and here used as a benchmark. It takes ρ∈ {S, 1},
S={ρ| − 1< a ≤ρ < 1}, and specifies the priors for ρand σas
Pr(ρ= 1) = π0, Pr(ρ|ρ∈S) = 1
1−a, Pr(σ)∝1
σ.
That is, ρis taken uniform over Sand with probability mass π0on ρ= 1, σand ρare independent. The
mass at ρ= 1 is intended to allow for a feasible testing of the null hypothesis H0:ρ= 1 (see Section
3
2.2). For clarity, we shall refer to such an exact/point-wise/non-interval null hypothesis as point null.
Restrictions over the domain of ρare also adopted in (Geweke,1988) and (DeJong and Whiteman,1991),
with the latter also providing an empirical analysis and a comparative study over the classical approach
on the Nelson-Plosser data. The arbitrarity in selecting the restricted domain for the autoregressive
parameter and the values of the statistics supporting a unit root decision are pointed out and criticized
in (Sowell,1991) and (Phillips,1991b).
Opposed to the use of non-flat priors like Normal-Wishart conjugates, which are known to be infor-
mative about the properties of the model (Zellner,1971;Phillips,1991b) and that correspond to a prior
belief that explosive roots are unlikely when centered around the unit root (Uhlig,1994), the choice of
flat priors is generally attractive. This is due to that flat priors often appear as a suitable approach
for attributing a degree of “neutrality” or “objectivity” to Bayesian analyses, while being convenient
in terms of computations, often leading to algebraic solutions (e.g. Zellner,1971). The early Bayesian
inference as of (Sims,1988) strongly favors the above approach, but the use of flat priors is not innocu-
ous as it may appear. Phillips (1991b) raised concerns on uniform priors since the inference on ρis
conditional on the observed sample moments and sufficient statistics, which depend on the value taken
by ρand are radically different for ρ= 1 and |ρ|<1. The use of flat priors does not correspond to
uninformativeness and indeed is shown to downweight large values of ρ, i.e. of unit roots and explosive
processes. This is because when |ρ|is large, the data is more informative about ρand treating all the
values of ρas equally-likely implicitly corresponds to downweighting large values of ρ. Consequently,
the testing strategy based on regression with or without unit roots taken with the same likelihood irre-
spectively of the value of ρis inadequate. Furthermore, Phillips (1991b) finds that discrepancy in the
results between standard and Bayesian methods in unit root testing for macroeconomic US time series
is largely due to the use of the misleading flat prior.
As an objective setting (Phillips,1991b) proposed a Jeffreys’ prior (Jeffreys,1946;Perks,1947).
Jeffreys’ priors (often called “ignorance priors”) are defined up to a proportionality factor as ∝p|i|,
where |i|is the determinant of the expected Fisher information matrix i. Jeffreys’ priors render the
posterior invariant under one-to-one reparametrizations and enjoy a number of desirable properties (Ly
et al.,2016). Such priors are often interpreted as reflecting the properties of the sampling process and
emphasizing data evidence, being interpretable as equivalent to the information that a typical single
observation, on average, provides (Xia and Griffiths,2012). Yet, the general use of the Jeffrey’s prior
with respect to the flat prior and its ability to convey a state of “ignorance” about the existence of a unit
root has been readily argued in (Leamer,1991). Furthermore, in an extensive Monte Carlo study Kim
and Maddala (1991) find that the above approach favors big values of ρdistorting the sample evidence.
(Uhlig,1994) finds however that for the univariate AR(1) model the major differences between flat and
informative priors are limited to the explosive regions. Further details on the approach of (Phillips,
1991b) are provided in Appendix A.2.
2.2 The null hypothesis
As argued in (Schotman and Van Dijk,1991b), a genuine and exclusive interest in the unit root hypothesis
formally corresponds to testing the null ρ= 1. Therefore, this hypothesis should be preferred for a
Bayesian treatment of unit root inference. If a continuous prior density for ρis to be adopted, the
probability associated with ρ= 1 is zero and so its posterior probability (e.g. Zivot,1994). In other
words, testing a point null ρ= 1 – which corresponds to the only exact hypothesis on the existence of a
unit root – is not trivial. Indeed, a continuous prior π(ρ) requires a full Bayesian analysis for retrieving
the corresponding posterior π(ρ|x, x0)∝L(x|ρ, x0)π(ρ), for some data xand initial value x0.
A posterior confidence interval at the probability level αis a subset Cαsuch that
Pr(Cα|x, x0) = ZCα
π(ρ|x, x0)dρ ≥1−α,
where typically the subset Cαis chosen is such a way that its size is minimal (highest posterior density
criterion). The unit root hypothesis can be rejected by checking whether ρ= 1 does not belong to the
posterior confidence interval Cα. However, it is not unlikely for the posterior to be bimodal (e.g. Kim
and Maddala,1991) and Cαmay result in a disconnected set. Since the interest is around the unit root,
an alternative is to redefine Cαso that
Pr(Cα|x, x0) = Zρsup
ρinf
π(ρ|x, x0)dρ ≥1−α,
4
with ρinf (ρsup) being any convenient truncation point of the likelihood or −∞ (+∞). Alternatively,
one can consider the probability
Pr(ρ≥1|x, x0) = Z+∞
1
π(ρ|x, x0)dρ (1)
and decide to reject the unit root when its value is below a certain threshold, say 5%. This however
is not anymore a test on a point null hypothesis as now H0jointly covers the unit and explosive roots
cases:
H0:ρ≥1H1:ρ < 1 .
This is the setting adopted by (e.g. Koop,1994;DeJong and Whiteman,1991;Phillips,1991b).
The continuity problem and the impossibility for testing a point null could be technically resolved
by using a discontinuous prior. In fact, a natural solution is to give ρ= 1 a positive probability π0
and assign to the values of ρover some interval Sthe density (1 −π0)π(ρ), where π(ρ) is a proper
prior on S(as e.g. in Schotman and Van Dijk,1991b;DeJong and Whiteman,1991). As in (Schotman
and Van Dijk,1991b;Zivot,1994) a common choice for Sis the stationary region |ρ|<1. With this
procedure, a discontinuous prior can be easily adopted and the following hypotheses can be tested
H0:ρ= 1 H1:ρ∈S.
In general, H1is not a generic non- unit root alternative since explosive processes are ruled out, and
neither a properly stationary alternative, since the lower bound of Scould potentially be either within
or outside the stationary region. Furthermore, is it possible to draw a data-driven criterion for selecting
Sin the testing procedure (see Appendix A.1). In this case, the specific formulation of the alternative
depends on the OLS estimate of ρand on the sample size. It is therefore the test itself that defines the
possible values of ρunder H1, and such implicitly imposed restrictions on Sare de facto analogous to
adopting a strong prior.
2.3 Other issues
In addition to the above difficulties, the whole inferential procedure is remarkably sensitive on model
formulation, and the effect of nuisance parameters is not negligible.
Schotman and Van Dijk (1991b) warn on the effects of extending the AR(1) model with trends
and intercepts since the inclusion of such elements changes with (or implicitly corresponds to) prior
beliefs. Their study shows that the use of Jeffreys’ priors as in (Phillips,1991b) downweights the unit
root hypothesis relative to a flat prior in models with trend and intercept. By using the alternative
reduced-form parametrization of the AR model with trend Schotman and Van Dijk (1991b) are able of
explaining the observed bias towards stationarity under flat priors.
Lately, Phillips (1991b) explains that such behavior is dependent on the initial value and that con-
ditioning the likelihood on the x0resolves the issue. The apparently secondary role played by the initial
value is thus shown to have an enormous impact inference. Analyses in this regard can be found in
(Zivot,1994) and (Lubrano,1995). The latter shows that the treatment of the first observation pro-
duces results that are more or less in accordance with the classical results and that e.g. the fixed or
random treatment of x0does make a difference when an intercept is included or not (opposed to the
simple AR(1) as outlined in (Zellner,1971)). Following (Thornber,1967), Lubrano (1995) extends the
discussion suggesting the use of uninformative Beta densities.
Further issues related to the choice of priors are interactions/correlations between the different el-
ements in multivariate parameters (i.e. appropriate priors specifications with non-diagonal covariance
matrices), their often improper nature, and computational issues (e.g. Zivot,1994). Also, the concep-
tual problem of adopting priors that are irrespective and insensible to the sampling frequency of the
observations has been pointed out e.g. in (Leamer,1991) and (Sims,1991).
2.4 Some recent developments
The literature review would be certainly incomplete without references to more recent works. Though
the major advances in the theory of Bayesian unit root testing come from the 20th century, Bayesian
unit root testing keeps being an area of active research.
5
Closely related to the above literature is the Full Bayesian Significance Test (FBST) of de Bra-
gan¸ca Pereira and Stern (1999). This procedure allows to test for the point null unit root hypothesis,
has no limiting requirements on the prior, allows for flexible error distributions, applies to small sample
sizes, and is invariant with respect to models’ parametrization. Though attractive, the FBST test has
not gained popularity in Bayesian unit root testing as the only application is that of (Diniz et al.,2011)
where the FBST performance is compared against the Augmented Dickey-Fuller but not against existing
alternative Bayesian methods, in both their simulation study and application.
Recently also unit root testing in Stochastic Volatility (SV) models where the underlying volatility
process is unobservable has attracted several research contributions. So and Ll (1999) develop a Monte
Carlo Markov Chain approximation of the odds for certain SV models. Their method is improved in
(Li and Yu,2010) with a more robust algorithm and increased test power. Extensions with leverage
effects within an SV model on an AR(1) process are considered in (Li et al.,2012). Kalaylıo˘glu and
Ghosh (2009) focus on the role played by priors in unit root Bayesian inference in SV models. They
introduce a class of non-informative priors to develop a testing procedure that is feasible based on
Gibbs-sampled approximations of the posterior but unpractical for adopting Bayes factors. Simulation
methods for approximating posterior credibility intervals are also adopted in (Kalaylıo˘glu et al.,2013),
where correlations between returns’ series errors and the latent SV with potential unit roots are allowed
to be non-zero. Extensions for heteroskedasticity are considered in (Chen et al.,2013). Severe distortions
in the size of the Dickey-Fuller test statistics under unit roots in an AR(1) with SV are reported in an
extensive simulation study in (Zhang et al.,2013), where a Bayesian testing approach is introduced as a
remedy. Developments over more complex dynamics over the simple AR(1) model include non-normal
innovations (Hasegawa et al.,2000), polynomial trends (Chaturvedi and Kumar,2005), nonlinear smooth
transitions (Chen and Lee,2015) and structural breaks (Park and Shintani,2016;Vosseler,2016), also
in panel data (Kumar and Agiwal,2018).
Not strictly relevant for our analysis, yet closely related to Bayesian unit root literature, are the
dozens of research works on Bayesian cointegration testing, appeared since (Koop,1991).
3 Evidence
Consider a random variable Xwith density parametrized over θ∈Θ. The hypothesis testing problem
we are concerned about consists of deciding among the null hypothesis H0:θ=θ0and the alternative
H1:θ6=θ0. This is achieved by considering suitable measures of evidence of a hypothesis against the
other, such as the widespread p-value, or Bayes factors and Bayesian posterior probabilities.
3.1 P-value
Let us denote by T(·) a test statistics, and by t=T(x) its value when data X=xis observed. The null
hypothesis H0is rejected in favor of the alternative H1if T(x) is more extreme than one would expect
if H0was true. By choosing a significance level α,H0is rejected when the probability of T(X) being
greater than T(x) is small (i.e. lower or equal to α), given that H0is true. Formally, the hypothesis H1
is accepted if
Pr(|T(X)| ≥ T(x)|H0)≤α,
that is, extreme values of the statistics are deemed to provide evidence against H0.
While it is straightforward to identify in higher values of |t|stronger evidence against the null
hypothesis, the problem of evaluating the strength of evidence for H0against H1is left open. Frequentists
use a scale of evidence set from Fisher’s work in the 1920s: a common interpretation is that α= 0.99
corresponds to very strong evidence, α= 0.95 to strong evidence and so on, with neutral evidence at
around α= 0.9. In other words, the higher the evidence the lower the Type I error of rejecting the true
hypothesis. Bayesian literature provides different answers to the problem, based on quantities known as
Bayes factor and posterior odds.
6
3.2 Posterior odds for a point null
Considering the unknown model probabilities as random, the Bayes’ rule yields posterior probabilities
given the observed data. For the hypothesis H1(and analogously for H0):
Pr(H1|x) = Pr(x|H1)Pr(H1)
Pr(x),
where Pr(H1) is the prior probability of hypothesis H1being true, and Pr(x) = Pr(x|H0)Pr(H0) +
Pr(x|H1)Pr(H1) is the marginal density of X=x. The quantity Pr(x|H1) is referred to as marginal
likelihood or marginal probability (of the data) under H1. In this context, the language and notation
encountered in the literature can be heterogeneous and slightly abused as it is common to indistinctly
use the terms (joint) “density”, “likelihood”, (joint) “probability” and their corresponding notations,
e.g. f,L, Pr. Here we adopt the notation and terminology of (Kass and Raftery,1995).
By the law of the total probability the marginal probability Pr(x|H1) is obtained by marginalizing
out the parameter θ1under H1:
Pr(x|H1) = ZPr(x|θ1, H1)π(θ1|H1)dθ1, (2)
where π(θ1|H1) is a continuous density. From a frequentist perspective, π(θ1|H1) is a mere weight
function to allow the computation of the average likelihood. For a Bayesian, π(θ1|H1) would be the
prior density for θ1conditional on H1being true (Berger and Delampady,1987). Note the difference
between the prior probability of a hypothesis or model being true opposed to the prior density referring
to its corresponding parameter.
The ratio of the posterior probabilities for the two hypotheses is referred to as posterior odds ratio,
or posterior odds:
Pr(H0|x)
Pr(H1|x)=Pr(x|H0)
Pr(x|H1)
Pr(H0)
Pr(H1).
Analogously, one may define the prior odds as Pr(H0)/Pr(H1). Posterior odds quantify the evidence of
H0over H1after data xhas been observed. On the other hand, prior odds do not convey any evidence as
they solely quantify the prior plausibility of H0over H1before any data is observed. The interpretation
of odds ratios is straightforward as they correspond to simple probability ratios. Odds ratios K1greater
than one (or log K1>0) stand for evidence in favour of the null hypothesis, with a corresponding
probability K1/(1 + K1). This is aligned with the general Bayesian rationale. After observing x, the
prior probability Pr(H0) = π0and corresponding prior odds K0=π0/(1 −π0) reflecting the prior belief
on H0being the true model are updated into the posterior odds K1and the corresponding new posterior
probability for H0being true.
For a point hypothesis H0:θ=θ0, the assignment of a positive probability will be rarely thought
possible for θ=θ0to hold exactly: this is to be understood as a realistic approximation of the hypothesis
H0:|θ−θ0| ≤ b, for some small b, so that π(θ0|H0) in fact represents the prior probability assigned to
{|θ−θ0| ≤ b}(see Berger and Sellke,1987). The way to depict such prior is through a smooth density
with a sharp peak around θ0.
3.3 Bayes factors
The focus of this paper is on the ratio
B01 =Pr(x|H0)
Pr(x|H1),
referred to as Bayes factor. With this definition, the posterior odds for a null hypothesis H0and the
alternative H1, as discussed so far, can be written as:
Pr(H0|x)
Pr(H1|x)=B01
Pr(H0)
Pr(H1).
Bayes factors can be interpreted as either the ratio quantifying the plausibility of observing the data x
under H0over H1, or as the degree by which the observed data updates the prior odds Pr(H0)/Pr(H1).
With respect to the posterior odds which involve prior probabilities on the hypotheses, the interest in
Bayes factors arises from the fact that they appear as actual odds implied by the observed data only.
7
Moreover, Bayes factors are of attractive interpretation since they can be viewed as likelihood ratios
obtained by averaging likelihoods Pr(x|θk, Hk) across θk|Hk, with weights π(θk|Hk), k={0,1}.
Not less importantly, the calculation of the Bayes factor requires the prescription only of the prior
distributions π(θk|Hk), while the full Bayesian analysis leading to posterior odds requires the additional
specification of the prior probabilities Pr(Hk), for k={0,1}. The interpretation from above still
applies: a Bayes factor of e.g. 1/10 means that H1is supported by an evidence 10 times as high as H0
is. Furthermore, under the appealing “neutral” choice Pr(H0) = Pr(H1) = 1/2, Bayes factors coincide
with posterior odds, further enforcing B01 as a suitable alternative to p-values. Similar to the frequentist
approach where decisions are based on the critical level α, the higher the Bayes factor the stronger the
evidence in favor of H0over H1. (Jeffreys,1961) provides a scale for interpreting B01 as the degree to
which H0is supported by the data over H1, with ratios greater than 100 being decisive. A comparison
between the Bayesian and frequentist decision scales can be found in (Efron et al.,2001).
3.3.1 Bayes factors for unit root testing
We show how the above description on Bayes factor and posterior odds practically applies to test the
set of hypotheses
H0:ρ= 1 H1:ρ∈S.
Here we extend the discussion to any generic AR-like model, holding the usual interpretation of ρas
the autoregressive parameter on interest. First, to compare the average likelihood of a model over the
complementary region through posterior odds as outlined above, we assign a certain positive weight π0
to the point null hypothesis H0and share the complement (1 −π0) over the interval Srelevant under H1.
Second, as a general case the likelihood function is parametrized over a rich K-dimensional parameter
{ρ, θ}defined over some appropriate set {S, Θ} ∈ RK. Therefore, the computation of the Bayes factor
in this setting involves a multidimensional marginalization over the elements of θ, thus posteriors odds
read:
K1=π0
1−π0RΘL(x|ρ= 1, θ, x0)π(θ|ρ= 1)dθ
RSRΘL(x|ρ, θ, x0)π(θ|ρ)π(ρ)dθdρ =Pr(ρ= 1|x)
Pr(ρ∈S|x). (3)
Eq.(3) embeds a feasible and largely adopted specification for the prior over (ρ, θ) that imposes con-
ditional independence on the parameters, allowing for a convenient factorization of the joint prior as
a product of conditionally independent factors, that is π(ρ, θ) = π(θ|ρ)π(ρ). Rather than the above
probability notation (Pr) typical in introductory discussions on Bayes factors, here we use the likelihood
notation (L) as it is more common in the related econometric literature.
For inference on the simple AR(1) process
xt=ρxt−1+ut,
K= 2, as θgenerally includes the unknown variance σof the innovations. In this case, posterior odds
have the simple form
K1=π0
1−π0R∞
0L(x|ρ= 1, σ, x0)π(σ)dσ
RSR∞
0L(x|ρ, σ, x0)π(σ)π(ρ)dσdρ .
The conditional independence assumption among the parameters is generally not very restrictive and
commonly extended to unconditional independence, thus σis provided with its own prior π(σ|ρ) = π(σ).
Further information can be found e.g. in (Schotman and Van Dijk,1991b;Zivot,1994;Zellner and Siow,
1980, among the others).
In the following section we show how to approximate the general Bayes factor involved in Eq.(3)
with a friendly form of moderate error that neither requires integration nor the priors to be specified.
4 Hypothesis testing with BIC
4.1 Laplace approximation
For k={0,1}consider the densities Pr(x|Hk) = RPr(x|θk, Hk)π(θk|Hk)dθkinvolved in the definition of
Bayes factors. Be θkthe parameter under Hk,π(θk|Hk) its prior, and Pr(x|θk, Hk) the prior density of x
given the values of θk.θkin general represents a vector parameter with dimension dk. In the following,
8
we shall refer to the marginal probability of the data (or marginal likelihood) Pr(x|Hk) as Iand adopt
a simplified notation where we drop kand rewrite the marginal likelihood as
I=ZPr(x|θ, H )π(θ|H)dθ .
Except for some elementary cases where the above integral can be evaluated analytically, the computation
of the marginal likelihood is intractable and requires numerical methods. In fact, analytic solutions for
Iare limited to exponential family distributions and conjugate priors, including normal linear models
(e.g. DeGroot,2005;Zellner,1971). A general description of the different approaches for evaluating I
with numerical methods is provided in (Evans et al.,1995).
To recover a first useful approximation for I, assume that the posterior density, proportional to
Pr(x|θ, H )π(θ|H), is peaked around its maximum ˜
θ, which is the posterior mode. This is generally the
case for large samples if the likelihood function of the data Pr(x|θ, H ) is peaked around its maximum ˆ
θ
(Kass and Raftery,1995). Let g(θ) = log(Pr(x|θ, H )π(θ|H)) and consider its Taylor expansion around
˜
θ:g(θ) = g(˜
θ)+(θ−˜
θ)Tg0(˜
θ) + 1/2(θ−˜
θ)Tg00(˜
θ)(θ−˜
θ) + o(||θ−˜
θ||2). Since g0(˜
θ) = 0 as greaches its
maximum at ˜
θ, it follows that:
I=Zexp[g(θ)]dθ ≈exp[g(˜
θ)] Zexp h1/2(θ−˜
θ)Tg00(˜
θ)(θ−˜
θ)idθ , (4)
where we recognize in the integrand the kernel of a generic d-dimensional multivariate normal distribution
with mean ˜
θand covariance matrix ˜
Σ. ˜
Σ also corresponds to minus the inverse Hessian matrix of the
second order derivatives of g(θ) evaluated at θ=˜
θ, i.e. ˜
Σ−1=−g00(˜
θ). The integrand in Eq.(4) therefore
equals (2π)d/2|˜
Σ|1/2, from which the following approximation is known as Laplace approximation (e.g.
Konishi and Kitagawa,2008):
˜
I= (2π)d/2|˜
Σ|1/2Pr(x|˜
θ, H )π(˜
θ|H) . (5)
In particular, as ndiverges I=˜
I(1 + On−1), see e.g. (Tierney et al.,1989;Kass et al.,1991). Eq.(5)
can be applied to any regular statistical model and stands as a viable general approach for evaluating
the marginal likelihoods involved in the definitions of Bayes factors with an approximation error of order
On−1.Slate (1994) discusses requirements on the sample size for reaching posterior normality, and
the accuracy of Laplace’s method has been more generally investigated in (e.g. Efron and Hinkley,1978;
Kass and Vaidyanathan,1992). An empirical rule is provided in (Kass and Raftery,1995): sample sizes
of at least 5dprovide a satisfactory accuracy in well-behaved problems, with 20dapplicable in most
situations.
The use of Eq.(5) is impractical since ˜
θrefers to the posterior mode and ˜
Σ to the negative inverse
Hessian of g(θ), while maximum likelihood estimates and information matrices are of common use and
generally readily available as standard outputs in any statistical software. Indeed, a variation over Eq.(5)
that has attracted much attention uses the maximum likelihood estimator ˆ
θ, applies to large samples
where ˜
θ≈ˆ
θ, and relies on the covariance matrix ˆ
Σ so that ˆ
Σ−1corresponds to the observed information
matrix, i.e. the negative Hessian of the log-likelihood evaluated at ˆ
θ(Tierney et al.,1989;Kass and
Vaidyanathan,1992):
ˆ
I= (2π)d/2|ˆ
Σ|1/2Pr(x|ˆ
θ, H )π(ˆ
θ|H) . (6)
The relative error in this case is still the best rate On−1. If one replaces the observed information
matrix with the expected information matrix i, the asymptotic error rate moves to the larger order
On−1/2. The expected information matrix iis a d×dmatrix whose (h, k) element is
−E∂log Pr(x1|θ, H )
∂θh∂θkθ=ˆ
θ,
and the expectation is taken over xiwith θheld constant. Therefore, in large samples the observed
information matrix ˆ
Σ−1can be approximated based on the expected information matrix ˆ
Σ−1≈ni, and
ndi=|ˆ
Σ−1|=|ˆ
Σ|−1. With this substitution Eq.(6) rewrites as:
ˆ
I= (2π)d/2n−d/2|i|−1/2Pr(x|ˆ
θ, H )π(ˆ
θ|H) ,
9
from which
log I= log Pr(x|ˆ
θ, H ) + logπ(ˆ
θ|H) + d
2log(2π)−d
2log n−1
2log |i|+On−1/2. (7)
Note that the prior density π(θ|H) needs to be fully specified, as it is involved throughout the approxi-
mation procedure.
This leads to the conclusive approximation form that does not involve prior densities:
log I= log Pr(x|ˆ
θ, H )−d
2log n+O(1) . (8)
This last approximation is in virtue of the fact that in Eq.(7) besides Pr(x|ˆ
θ, H ) and lognwhich are
respectively of order O(n) and O(log n), all the remaining terms are of order O(1) or lower. From
Eq.(8) we have that the marginal likelihood is thus equal to the maximized likelihood Pr(x|ˆ
θ, H ) minus
a correction term where the approximation is O(1). Even though the O(1) term does not vanish, because
all the other terms tend to infinity as nincreases, the error is dominated and vanishing as a proportion
of I.Raftery (1995) shows that in reality the error term is not as high as one might think, although
an O(1) error suggests that the approximation is in general quite crude. In fact, the error can be of a
smaller order of magnitudes given a reasonable choice of the prior.
As a remark, the definition of the sample size nshould reflect the rate at which the Hessian matrix
of the log-likelihood grows, i.e. satisfactory for the approximation ˆ
Σ−1≈ni. This nturns to be the
number contributions to the summation appearing in the definition of the Hessian (Raftery,1995;Kass
and Raftery,1995) – e.g. in survival analysis, nwould match the number of non-censored observations
rather than the total number of observations.
4.2 BIC Approximation of the Bayes factor
The above discussion provides the basis for the following approximation of the Bayes factor. Hereafter we,
focus on the case where the null hypothesis is nested. That is, we assume some parametrization under H1
of the form θ1= (ρ, β) such that H0is obtained from H1by imposing the restriction ρ=ρ0for some ρ0.
Both ρand βcan be vectors. Let θ1denote the parameter under H1with prior π(θ1|H1) = π(ρ, β |H1),
and for H0:ρ=ρ0let its prior be π(θ0|H0) = π(β|H0).
Based on Eq.(6), by applying the definition of the Bayes factor to the log-ratio of the marginal
likelihoods, one obtains:
2 log B10 ≈Λ + log |ˆ
Σ1| − log |ˆ
Σ0|+ 2 log π(ˆ
θ1|H1)−2 log π(ˆ
θ0|H0)+(d1−d0) log(2π)
where Λ10 = 2(log Pr(x|ˆ
θ1, H1)−log Pr(x|ˆ
θ0, H0)) corresponds to the log-likelihood ratio statistics with
d1−d0degrees of freedom. Refer to (Raftery,1996) for an additional discussion, and for the approxi-
mation of the Bayes factor under to Eq.(5). On the other hand, based on the approximation in Eq.(8),
one obtains:
2 log B10 ≈Λ10 −(d1−d0) log(n)=2S10 , (9)
where
2S10 = 2 log Pr(x|ˆ
θ1, H1)−2 log Pr(x|ˆ
θ0, H0)−(d1−d0) log(n)
=hd0log(n)−2 log Pr(x|ˆ
θ0, H0)i−hd1log(n)−2 log Pr(x|ˆ
θ1, H1)i.
The following consistency result for n→ ∞ known as Schwarz criterion
(S10 −log B10)/log B10 →0 , (10)
is attractive as it establishes S10 as a standardized quantity to be used even when the priors are hard
to set and a useful reference quantity in scientific reporting (Kass and Raftery,1995). The O(1) error
implies that even in large samples S10 does not lead to the correct value in absolute terms, the error does
go to zero in terms of proportion with respect to the actual log of the Bayes factor (e.g. Kass and Raftery,
1995;Raftery,1995,1996). Importantly, for certain classes for priors the error of approximation reduces
to On−1/2. One class is that of Jeffrey’s priors with a specific choice of the constant preceding them,
10
another class is that of unit information priors (Raftery,1995;Wasserman,2000;Wagenmakers,2007).
With respect to subjectively determined priors, a surprisingly good agreement between the Schwarz
criterion and actual Bayes factors is observed in Kass and Wasserman (1995). In general when the
sample size nis sufficiently large, the approximation is very satisfactory for most of purposes and is of
widespread use, including applications in psychology (e.g. Wasserman,2000), ecology (e.g. Aho et al.,
2014), and computer vision (e.g. Stanford and Raftery,2002). Kass and Wasserman (1995) further
show that for the intuitive and reasonable choice of the unit information prior exp(S10)/B10 →1 with
an error of order On−1/2. This provides a direct interpretation of the Schwartz criterion in terms of
Bayes factors and evidence.
For a given model k, recall the definition of the Bayesian Information Criterion (BIC):
BICk=dklog n−2 log Pr(x|ˆ
θk, Hk) . (11)
It is easy to recognize that the right side of Eq.(8) is closely related to BIC as −2 log I= BIC + O(1),
and that 2S10 = BIC0−BIC1= ∆BIC01. Eq.(9) then is equivalent to
log B10 ≈1
2∆BIC01 . (12)
Eq.(12) establishes ∆BIC01 as an approximate measure of the log-evidence in support of the hypothesis
H1over H0. We shall refer to either Eq.(12) or its exp-version B10 ≈exp(1/2BIC01) as the BIC
approximation of the Bayes factor.
For the above BIC approximation the approximation error is generally O(1), but in virtue of the
Schwartz criterion (∆BIC10 −log B10)/log B10 →0 the error approaches zero as a proportion of the
Bayes factor. As discussed above, it can further reduce to On−1/2for certain priors. Eq.(12) formally
justifies the extensive practice of model selection based on the smallest BIC value. Indeed, the higher
the evidence in support of model 1, i.e. the higher the log-Bayes factor, the more positive ∆BIC01 and
the smaller BIC1with respect to BIC0.
In the context of linear models with normal errors, BIC rewrites in the alternative convenient form
BICk=nlog1−R2
k+dklog n,
with R2being the usual R-squared. The proportion 1 −R2
kof the variance that model kfails to explain
relates to the sum of squares table through the equivalence 1 −R2
k=SSEk−SStotal, where S SEkis
the sum of squared errors for model kand SStotal the total sum of squared errors. This leads to the
following expression:
∆BIC01 =nlog SSE0
SSE1
+ (d0−d1) log(n) . (13)
Applied examples on the use of Eq.(13) are provided e.g. in (Wagenmakers,2007;Masson,2011).
Furthermore, for nested models such that d1−d0= 1, Λ10 ≈t2with tbeing the t-statistics for testing
the significance of the parameter in model 1 that is set to zero in model 0, and Λ10 the corresponding
likelihood ratio statistics. From Eq.(12):
2 log B10 ≈∆BIC01 = Λ10 −log n≈t2−log n. (14)
This underlines a proportionality between t, ∆BIC and B, which means that the tstatistics can be
directly translated into BIC and into grades of evidence through Bayes factors (Johnson,2005). High
values of tsupport the statistical significance of the additional parameter in the full model. In turn,
BIC1is smaller than BIC0, so ∆BIC01 >0 and log B10 >0, which is indicative of evidence against the
reduced model. By reparametrization, the above extends to hypotheses where an element θ0of θis set
to a fixed value (rather than zero), e.g. H0:θ0= 1 is analogous to H0:θ00 = 0 by taking θ00 =θ0−1.
4.3 Unit root testing based on BIC approximation
The major contribution of this paper is to test for unit roots in financial time series by the BIC approx-
imation of the Bayes factor, Eq.(12). We shall list the major points in favor of this approach.
i. As reviewed above, the choice of the prior is the principal problem in Bayesian testing of unit roots.
On the contrary, our proposed BIC approximation does not require a full specification of the priors.
Neither that of the autoregressive parameter, nor of any other parameter. The independence of
the BIC approximation on prior specifications is also attractive from the point of objectivity in
Bayesian analysis.
11
ii. Bayes factors and posterior odds allow test point nulls on the autoregressive parameter, which can
be problematic as shown earlier. This setting is however natural for the BIC approximation.
iii. The BIC approximation to Bayes factor is a general procedure that does not depend on the model
form or parametrization. Regardless of whether the AR model under investigation has an intercept,
a trend component, exogenous regressors, or any richer structure, the BIC procedure applies. In
general, the error is O(1) but reduces to zero as a proportion of the Bayes factor (cf. Eq.(10)).
iv. Testing based on BIC approximation does not require any integration and does not present major
computational issues: it only requires the maximum likelihood estimates, as of definition in Eq.(11).
v. The applicability of the method depends on the feasibility of the approximation in Eq.(6), i.e.
on the feasible hypotheses that the data xconsist of i.i.d. observations, that the posterior is
peaked around its maximum, and that the sample size is sufficient for ˜
θ≈ˆ
θand ˆ
Σ−1≈ni to be
satisfactory approximations. A sample size of about 20 times the number of parameters appears
to be generally fair for well-behaved problems where the likelihood is not grossly non-normal. For
the simple AR(1) model with unknown innovations’ variance, this corresponds to about 40 points,
e.g. about two months of daily market data.
Lastly, consider that even if the applicability of the above procedure is quite broad, time series
models involved in classical unit root testing applications are broadly of linear form, so the simplified
alternative form in Eq.(13) commonly applies. Also, the point form of the null hypothesis naturally
suggests a nested hypotheses structure where the only restriction is on the autoregressive parameter.
That is, applications will generally encounter linear model specifications where d1−d0= 1.
5 Simulation Study
To validate our proposed testing methodology and compare it with some existing alternatives, we develop
a Monte Carlo simulation study. In particular, we simulate 20,000 simple AR(1) processes xt=ρxt−1+
ut, with t={1, . . . , T }and independent standard normal innovations, by considering different sample
lengths T={50,100,200,500,1000,5000}and different values of the autoregressive parameter ρ=
{0.2,0.5,0.8,0.9,0.99,0.999,1}. We test the point null unit root hypothesis H0:ρ= 1 and summarize
in Table 1the corresponding results. All the tables related to the Monte Carlo study report averages
across the simulated samples.
Table 1includes probabilities associated with different alternative testing methods. For the ap-
proach of Schotman and Van Dijk (1991b) we both adopt a constant lower integration bound afor
the autoregressive parameter fixed to −1 (SVD) and the data-driven one (SVD*), see Appendix A.1.
As a reference measure, Table 1includes the average p-value for the Dickey-Fuller test (DF) and the
posterior non-stationary probabilities Prρ≥1= Pr(ρ≥1|x) from (Phillips,1991b), computed through
Eq.(1). These posterior probabilities are reported only for T≤200 as larger Tquickly drive Eq.(17)
below machine precision and integration turns problematic: recovering such probabilities for large Tis
here beyond our scope. For the SVD, SVD* and BIC entries in Table 1, we report as our main result the
probabilities corresponding to the Bayes factors in Table 3, since of easier interpretation. Furthermore,
assuming prior odds equal to one, these probabilities also interpret as posterior odds.
Our simulation results show a smooth and coherent behavior of the proposed BIC approximation
leading to posterior probabilities behaving in accordance with how one might expect from this controlled
setting. (i) The acceptance probabilities for the null are progressively higher as ρapproaches the unity.
(ii) For a fixed ρ, larger samples reduce the posterior unit root probability for small values of ρwhile
increase it for ρaround 1. Indeed larger samples embed higher evidence in support of the true hypothesis,
for which we observe increasing posterior probabilities (and log- Bayes factors). With ρ= 0.8 and T= 50,
for the BIC approximation we compute an average posterior probability for H0of .240, however as T
increases the evidence towards the true hypothesis ρ= 0.8 increases as well, and the posterior probability
of the null moves rapidly towards zero (e.g. from T= 200 ahead). Similarly, for ρ= 1 and T= 50,
the small size of the sample advocates for a stationary dynamics with a considerable 0.211 probability,
while at larger Tthe probability associated with far-from-unity values of ρsharply decreases to zero.
Furthermore, our simulation study leads to posterior probabilities that are also well-aligned across
the BIC, SVD and SVD* testing approaches, following the same trends across different values of ρfor
fixed T. The BIC approximation is however leading to probabilities that are not uniformly greater or
12
smaller than those from SVD*. In fact, for small to moderate sample sizes BIC returns higher posteriors
probabilities for the unit root hypothesis, while smaller for large Twith respect to SVD*. This behavior
could be partially explained by the different specifications of the alternative hypothesis. While for SVD
the reported probabilities are those of a unit root against a stationary alternative, for BIC the alternative
is a generic H1:ρ6= 1. It is thus reasonable that the rejection probabilities are larger for BIC than
SVD, as by construction for BIC the feasible parameter space under the alternative H1is broader. Also
with respect to posterior probabilities of explosive roots/non-stationarity dynamics Prρ≥1under the
ignorance prior, the BIC approximation appears coherent and well-aligned. Though BIC and Prρ≥1
probabilities refer to different hypotheses, we observe that indeed the higher the posterior probability of
a unit root, the higher the posterior probability supporting the non-stationary option.
As expected, p-values associated with the most classical frequentist Dickey-Fuller test cannot exclude
the non-stationary hypothesis with increasing confidence as ρmoves towards one. Accordingly, the
evidence in support of a unit root is reflected in increasing posterior BIC probabilities. All the above
discussion applies as well to the Bayes factors reported in Appendix A.3, where negative signs stand for
evidence against the unit root null.
This study confirms an overall very satisfactory behavior of the proposed method with respect to
some Bayesian and non-Bayesian alternatives for unit root hypothesis testing. Despite the general O(1)
error associated with BIC approximation and its complete independence on the prior specification, our
simulated posterior probabilities are well-behaving (i.e. showing desirable smooth monotonicity over ρ
for Tfixed, and the other way around), coherent with their expected behavior, and aligned with the
decisions over the null the other approaches suggest.
T= 50 T= 500
ρSVD SVD* BIC DF Prρ≥1ρSVD SVD* BIC DF
0.200 .000 .000 .000 .001 .015 0.200 .000 .000 .000 .001
0.500 .003 .002 .004 .001 .107 0.500 .000 .000 .000 .001
0.800 .292 .124 .240 .032 .244 0.800 .000 .000 .000 .001
0.900 .686 .341 .545 .111 .306 0.900 .000 .000 .000 .001
0.990 .955 .663 .787 .390 .445 0.990 .957 .336 .801 .115
0.999 .973 .719 .797 .485 .514 0.999 .994 .652 .924 .406
1.000 .975 .729 .798 .501 .529 1.000 .996 .701 .926 .491
T= 100 T= 1000
0.200 .000 .000 .000 .001 .002 0.200 .000 .000 .000 .001
0.500 .000 .000 .000 .001 .078 0.500 .000 .000 .000 .001
0.800 .041 .011 .031 .003 .184 0.800 .000 .000 .000 .001
0.900 .458 .125 .321 .032 .243 0.900 .000 .000 .000 .001
0.990 .965 .608 .827 .332 .425 0.990 .900 .131 .623 .034
0.999 .983 .701 .849 .474 .528 0.999 .996 .618 .942 .355
1.000 .985 .714 .850 .495 .546 1.000 .998 .697 .947 .488
T= 200 T= 5000
0.200 .000 .000 .000 .001 .000 0.200 .000 .000 .000 .001
0.500 .000 .000 .000 .001 .057 0.500 .000 .000 .000 .001
0.800 .000 .000 .000 .001 .138 0.800 .000 .000 .000 .001
0.900 .084 .012 .049 .004 .182 0.900 .000 .000 .000 .001
0.990 .968 .529 .844 .249 .389 0.990 .001 .000 .000 .001
0.999 .990 .686 .888 .457 .534 0.999 .996 .351 .932 .123
1.000 .992 .706 .889 .492 .562 1.000 1.000 .697 .976 .487
Table 1: Simulation results. SVD, SVD* and BIC: unit root posterior probabilities (prior odds equal to
one). DF: p-values of the Dickey-Fuller test. Prρ≥1: posterior probabilities Pr(ρ≥1|x).
6 Empirical Application
In our empirical application, we analyze Real Exchange Rates (RERs) time series for nine major cur-
rencies. RERs are obtained by deflating nominal exchange rates by the relative price of domestic vs.
foreign goods and services, thus reflecting the competitiveness of a country with respect to a reference
13
basket. Common choices for deflation are the consumer price index (CPI), the producer price indices,
or GDP-based deflators. An increase in RER implies that exports become more expensive and imports
turn cheaper, indicating a loss in trade competitiveness, for instance in response to an appreciation of
the domestic currency, or in response to increased domestic inflation. We extract official monthly RERs
(CPI deflated) time series distributed by the European Commission and available from the Statistical
data warehouse of the European Central Bank1, for the period between January 2010 and November
2020. This corresponds to 131 records for each of the nine2major currencies we consider in the analysis.
Real exchange rates relate to the long-run equilibrium condition – known as Purchasing Power Parity
(PPP) – which implies a steady long-term level and a constant unconditional mean for RERs series. The
existence of a unit root would contradict this theory. There have been considerable efforts in empirically
verifying the PPP theory and several papers discussed mid- and long- term departures from the expected
RER stationarity, leading to a controversial debate. The essence is that conclusions based on empirical
research strongly depend on the exact definition of equilibrium that one adopts, on the methods used
to test it, on the underlying hypotheses on the time series, and on its length. This applies to unit root
analyses as well, which often lead to opposite results on PPP’s validity (see e.g. MacDonald,1995, and
the references therein). In this regard, the unit root analysis of Parikh and Wakerly (2000) suggests
that such equilibrium is expected to be generally observed over a time-span of at least 50 years. Non-
short-lived disequilibrium periods in RER dynamics are thus common. At first sight, Figure 1seems
to confirm such a tendency, suggesting a recent disequilibrium period for some of the major currencies.
For instance, we observe an upward trend for the US dollar series and an apparent non-mean-reversing
behavior for the Chinese yuan and Japanese yen, which gradually adjust towards new RER levels,
suggesting the presence of unit roots.
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
0.6
0.8
1
1.2
1.4
RER (CPI adjusted)
EUR USD GBP JPY CNY
Figure 1: Real exchange rates for some selected currencies.
Posterior probabilities of the unit root hypothesis in RERs are reported in Table 2. Our results
indicate ubiquitous evidence in favor of the unit root hypothesis for all the currencies. Bayesian re-
sults are aligned with the p-values from the Dickey-Fuller (DF) test and posterior probabilities from
(Phillips,1991b). Indeed, higher posterior probabilities are both associated with higher p-values of the
Dickey-Fuller statistics and higher posterior probabilities Prρ≥1= Pr(ρ≥1|x) associated with the non-
stationary alternative. Explanatory are the results for the Euro-zone where the .047 p-value of the DF
test indicates a mid rejection of the unit root hypothesis. This is aligned with Figure 1where the EUR
series indeed displays a much stationary behavior than any other series. Accordingly, BIC probabilities
take their smallest value across all the countries analyzed, with the stationary hypothesis having .455
probability, much smaller than the average .880 observed for all the other currencies, where the DF
p-value is on average .625. This is also aligned with the conclusion one would draw from the SVD* ap-
proach, where the highest evidence in favor of stationarity is again associated with the euro series. This
is reasonable, as euro-zone countries have a significant presence in the basket for determining RER’s
CPI correction (19 out of 37 countries in the basket adopt euro). The observed greater plausibility
of the stationary hypothesis is here not surprising but expected and coherent. Lastly, note that BIC
probabilities appear to uniformly dominate SVD* ones, this could perhaps be explained in terms of the
typical bias towards stationarity implied by the use of uniform priors.
1Data and its documentation are available at https://sdw.ecb.europa.eu/browse.do?node=9691113.
2Australian dollar (AUD), Canadian dollar (CAD), Swiss franc (CHF), Chinese yuan (CNY), euro (EUR), British
pound (GBP), Hong Kong dollar (HKD), Japanese yen (JPY), US dollar (USD).
14
log BF01 Prob.
Currency SVD* BIC SVD* BIC DF Prρ≥1
AUD -0.333 1.076 .418 .746 .090 .288
CAD 1.455 2.431 .811 .919 .676 .705
CHF 0.996 2.282 .730 .907 .442 .569
CNY 1.848 2.044 .864 .885 .904 .875
EUR -1.194 0.180 .232 .545 .047 .161
GBP 0.403 1.802 .599 .858 .240 .374
HKD 1.812 2.120 .860 .893 .883 .857
JPY 1.154 2.369 .760 .914 .533 .553
USD 1.641 2.349 .838 .913 .802 .778
Table 2: SVD* and BIC: log- Bayes factors and their corresponding posterior probabilities for the unit
root hypothesis (prior odds equal to one). DF: p-values of the Dickey-Fuller test. Prρ≥1: posterior
probabilities Pr(ρ≥1|x).
7 Conclusion
Unit root testing has historically been among the most active areas in econometric research. With clas-
sical frequentist methods following Dicker Fuller (DF) statistics, being broadly adopted and established,
Bayesian methods did not gain much attention in empirical research and applications. However, research
and debates on Bayesian unit root testing have been very active and sound. Indeed, the econometric
research in the field pointed out a series of unique criticalities that turn the Bayesian approach for unit
root inference particularly challenging.
On the other hand, the testing procedure based on the BIC approximation of the Bayes factor
addressed in this paper appears to provide a simple and satisfactory method for Bayesian unit root
testing. With such an approach, the integration problem involved in Bayes factors turns to a standard
maximum likelihood estimation problem, and the Bayes factors have the very simple form of Eq. (12).
Notably, (i) priors are not involved in the approximated form (though they determine the asymptotic
error rate), (ii) the procedure smoothly scales to more complex time series models, and (iii) allows to
target the exact point null unit root hypothesis. The ratio of the Bayes factor discriminates between
the hypotheses according to the common interpretation scale of (Jeffreys,1961).
The simulation study confirms the validity of our proposed approach showing that, in this controlled
setting, the BIC approximation, leads to decisions that are entirely coherent with the expectations,
under a wide range of values for the autoregressive parameter, and sample sizes. The posterior prob-
abilities associated with the null are furthermore aligned with those of other Bayesian procedures and
in accordance with the p-values from the DF test. The same coherence between BIC, other Bayesian
methods, and the frequentist DF test also arises from the the analysis of real exchange rates series. In
particular, our BIC-based conclusion of non-stationarity matches the decisions one would draw based
on the DF and SVD tests. The results are furthermore aligned with the posterior probabilities from
(Phillips,1991b), in support of an apparent violation of the real exchange rate and purchasing power
parity equilibrium in the last decade.
Recognizing that BIC is just a model Information Criterion (IC) among many others that perhaps
reduce to BIC as special cases, it would be interesting to explore the use of such alternatives. Among
these, generalized variants of BIC (see e.g. Konishi and Kitagawa,2008, Ch. 9), the FIC (Wei,1992),
and the ICs based on predictive distributions (Phillips and Ploberger,1994,1996) serve a broader model
selection scope (than e.g. dampening the drawbacks associated to priors’ selection in unit root testing)
and rely on different theoretical and motivation bases. Generalized ICs are potentially superior for
the generic purpose of model selection, however, they do not necessarily have a direct connection with
well-identified Bayesian inference problems, though the use of some of them in unit root testing could
now be explained by our results on BIC. Perhaps a direction for related future works could be that of
investigating to what extent ICs and model selection techniques have a clear link with certain problems
in Bayesian inference. This would shed light on whether it is possible to rely on ICs and model selection
methods as general tools for approximated Bayesian inference in situations where e.g. priors are of
difficult specification.
15
References
Aho, K., Derryberry, D., and Peterson, T. (2014). Model selection for ecologists: the worldviews of aic
and bic. Ecology, 95(3):631–636.
Berger, J. O. and Delampady, M. (1987). Testing precise hypotheses. Statistical Science, pages 317–335.
Berger, J. O. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values
and evidence. Journal of the American statistical Association, 82(397):112–122.
Campbell, J. Y. and Perron, P. (1991). Pitfalls and opportunities: what macroeconomists should know
about unit roots. NBER macroeconomics annual, 6:141–201.
Chaturvedi, A. and Kumar, J. (2005). Bayesian unit root test for model with maintained trend. Statistics
& Probability Letters, 74(2):109–115.
Chen, C. W., Chen, S.-Y., and Lee, S. (2013). Bayesian unit root test in double threshold heteroskedastic
models. Computational Economics, 42(4):471–490.
Chen, C. W. and Lee, S. (2015). A local unit root test in mean for financial time series. Journal of
Statistical Computation and Simulation, 86(4):788–806.
de Bragan¸ca Pereira, C. A. and Stern, J. M. (1999). Evidence and credibility: full bayesian significance
test for precise hypotheses. Entropy, 1(4):99–110.
DeGroot, M. H. (2005). Optimal statistical decisions, volume 82. John Wiley & Sons.
DeJong, D. N. and Whiteman, C. H. (1991). Reconsidering ‘trends and random walks in macroeconomic
time series’. Journal of Monetary Economics, 28(2):221–254.
Dickey, D. A. and Fuller, W. A. (1981). Likelihood ratio statistics for autoregressive time series with a
unit root. Econometrica: journal of the Econometric Society, pages 1057–1072.
Diniz, M., Pereira, C. A. d. B., and Stern, J. M. (2011). Unit roots: Bayesian significance test. Com-
munications in Statistics-Theory and Methods, 40(23):4200–4213.
Efron, B., Gous, A., Kass, R., Datta, G., and Lahiri, P. (2001). Scales of evidence for model selection:
Fisher versus jeffreys. Lecture Notes-Monograph Series, pages 208–256.
Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator:
Observed versus expected fisher information. Biometrika, 65(3):457–482.
Evans, M., Swartz, T., et al. (1995). Methods for approximating integrals in statistics with special
emphasis on bayesian integration problems. Statistical science, 10(3):254–272.
Geweke, J. (1988). The secular and cyclical behavior of real gdp in 19 oecd countries, 1957–1983. Journal
of Business & Economic Statistics, 6(4):479–486.
Granger, C. W., Newbold, P., and Econom, J. (1974). Spurious regressions in econometrics. Baltagi,
Badi H. A Companion of Theoretical Econometrics, pages 557–61.
Hasegawa, H., Chaturvedi, A., and van Hoa, T. (2000). Bayesian unit root test in nonnormal ar(1)
model. Journal of Time Series Analysis, 21(3):261–280.
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of
the Royal Society of London. Series A. Mathematical and Physical Sciences, 186(1007):453–461.
Jeffreys, H. (1961). Theory of probability, clarendon.
Johnson, V. E. (2005). Bayes factors based on test statistics. Journal of the Royal Statistical Society.
Series B (Statistical Methodology), 67(5):689–701.
Kalaylıo˘glu, Z. I., Bozdemir, B., and Ghosh, S. K. (2013). Bayesian unit-root testing in stochastic
volatility models with correlated errors. Hacettepe Journal of Mathematics and Statistics, 42(6):659–
669.
Kalaylıo˘glu, Z. I. and Ghosh, S. K. (2009). Bayesian unit-root tests for stochastic volatility models.
Statistical Methodology, 6(2):189–201.
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the american statistical association,
90(430):773–795.
Kass, R. E., Tierney, L., and Kadane, J. B. (1991). Laplace’s method in bayesian analysis. Contemporary
16
Mathematics, 115:89–99.
Kass, R. E. and Vaidyanathan, S. K. (1992). Approximate bayes factors and orthogonal parameters,
with application to testing equality of two binomial proportions. Journal of the Royal Statistical
Society: Series B (Methodological), 54(1):129–144.
Kass, R. E. and Wasserman, L. (1995). A reference bayesian test for nested hypotheses and its relation-
ship to the schwarz criterion. Journal of the american statistical association, 90(431):928–934.
Kim, I.-M. and Maddala, G. (1991). Flat priors vs. ignorance priors in the analysis of the ar (1) model.
Journal of Applied Econometrics, 6(4):375–380.
Konishi, S. and Kitagawa, G. (2008). Information criteria and statistical modeling. Springer Science &
Business Media.
Koop, G. (1991). Cointegration tests in present value relationships: A bayesian look at the bivariate
properties of stock prices and dividends. Journal of Econometrics, 49(1-2):105–139.
Koop, G. (1994). An objective bayesian analysis of common stochastic trends in international stock
prices and exchange rates. Journal of Empirical Finance, 1(3-4):343–364.
Kumar, J. and Agiwal, V. (2018). Panel data unit root test with structural break: A bayesian approach.
Hacettepe Journal of Mathematics and Statistics, 48(3).
Leamer, E. E. (1991). Comment onto criticize the critics’. Journal of Applied Econometrics, pages
371–373.
Li, Y., Chong, T. T.-L., and Zhang, J. (2012). Testing for a unit root in the presence of stochastic
volatility and leverage effect. Economic Modelling, 29(5):2035–2038.
Li, Y. and Yu, J. (2010). A new bayesian unit root test in stochastic volatility models. Singapore School
of Economics, Research collection paper 1240.
Lubrano, M. (1995). Testing for unit roots in a bayesian framework. Journal of Econometrics, 69(1):81–
109.
Ly, A., Verhagen, J., and Wagenmakers, E.-J. (2016). Harold jeffreys’s default bayes factor hypothesis
tests: Explanation, extension, and application in psychology. Journal of Mathematical Psychology,
72:19–32.
MacDonald, R. (1995). Long-run exchange rate modeling: a survey of the recent evidence. Staff Papers,
42(3):437–489.
Maddala, G. S. and Kim, I.-M. (1998). Unit roots, cointegration, and structural change. Cambridge
university press.
Masson, M. E. (2011). A tutorial on a practical bayesian alternative to null-hypothesis significance
testing. Behavior research methods, 43(3):679–690.
Parikh, A. and Wakerly, E. (2000). Real exchange rates and unit root tests. Weltwirtschaftliches Archiv,
136(3):478–490.
Park, J. Y. and Shintani, M. (2016). Testing for a unit root against transitional autoregressive models.
International Economic Review, 57(2):635–664.
Perks, W. (1947). Some observations on inverse probability including a new indifference rule. Journal
of the Institute of Actuaries (1886-1994), 73(2):285–334.
Phillips, P. C. (1986). Understanding spurious regressions in econometrics. Journal of econometrics,
33(3):311–340.
Phillips, P. C. (1991a). Bayesian routes and unit roots: De rebus prioribus semper est disputandum.
Journal of Applied Econometrics, 6(4):435–473.
Phillips, P. C. (1991b). To criticize the critics: An objective bayesian analysis of stochastic trends.
Journal of Applied Econometrics, 6(4):333–364.
Phillips, P. C. et al. (1993). The long-run Australian consumption function reexamined: an empirical
exercise in Bayesian inference. Cowles Foundation for Research in Economics at Yale University.
Phillips, P. C. and Ploberger, W. (1994). Posterior odds testing for a unit root with data-based model
selection. Econometric Theory, pages 774–808.
17
Phillips, P. C. and Ploberger, W. (1996). An asymtotic theory of bayesian inference for time series.
Econometrica: Journal of the Econometric Society, pages 381–412.
Poirier, D. J. (1988). Frequentist and subjectivist perspectives on the problems of model building in
economics. Journal of economic perspectives, 2(1):121–144.
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological Methodology, 25:111–163.
Raftery, A. E. (1996). Approximate bayes factors and accounting for model uncertainty in generalised
linear models. Biometrika, 83(2):251–266.
Schotman, P. and Van Dijk, H. K. (1991a). A bayesian analysis of the unit root in real exchange rates.
Journal of Econometrics, 49(1-2):195–238.
Schotman, P. C. and Van Dijk, H. K. (1991b). On bayesian routes to unit roots. Journal of Applied
Econometrics, 6(4):387–401.
Sims, C. A. (1988). Bayesian skepticism on unit root econometrics. Journal of Economic dynamics and
Control, 12(2-3):463–474.
Sims, C. A. (1991). Comment by christopher a. sims on ‘to criticize the critics’, by peter c. b. phillips.
Journal of Applied Econometrics, 6(4):423–434.
Sims, C. A. and Uhlig, H. (1991). Understanding unit rooters: A helicopter tour. Econometrica: Journal
of the Econometric Society, pages 1591–1599.
Slate, E. H. (1994). Parameterizations for natural exponential families with quadratic variance functions.
Journal of the American Statistical Association, 89(428):1471–1482.
So, M. K. and Ll, W. (1999). Bayesian unit-root testing in stochastic volatility models. Journal of
Business & Economic Statistics, 17(4):491–496.
Sowell, F. (1991). On dejong and whiteman’s bayesian inference for the unit root model. Journal of
Monetary Economics, 28(2):255–263.
Stanford, D. C. and Raftery, A. E. (2002). Approximate bayes factors for image segmentation: The
pseudolikelihood information criterion (plic). IEEE Transactions on Pattern Analysis and Machine
Intelligence, 24(11):1517–1520.
Thornber, H. (1967). Finite sample monte carlo studies: An autoregressive illustration. Journal of the
American Statistical Association, 62(319):801–818.
Tierney, L., Kass, R. E., and Kadane, J. B. (1989). Fully exponential laplace approximations to ex-
pectations and variances of nonpositive functions. Journal of the American Statistical Association,
84(407):710–716.
Uhlig, H. (1994). What macroeconomists should know about unit roots: a bayesian perspective. Econo-
metric Theory, pages 645–671.
Vosseler, A. (2016). Bayesian model selection for unit root testing with multiple structural breaks.
Computational Statistics & Data Analysis, 100:616–630.
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems ofp values. Psychonomic
bulletin & review, 14(5):779–804.
Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of mathematical psy-
chology, 44(1):92–107.
Wei, C.-Z. (1992). On predictive least squares principles. The Annals of Statistics, pages 1–42.
Xia, C. and Griffiths, W. (2012). Bayesian unit root testing: The effect of choice of prior on test
outcomes. Advances in Econometrics:, 30:27–57.
Zellner, A. (1971). An introduction to Bayesian inference in econometrics. John Wiley & Sons, Inc,.
Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. Trabajos de
estad´ıstica y de investigaci´on operativa, 31(1):585–603.
Zhang, J. Y., Li, Y., and Chen, Z. M. (2013). Unit root hypothesis in the presence of stochastic volatility,
a bayesian analysis. Computational Economics, 41(1):89–100.
Zivot, E. (1994). A bayesian analysis of the unit root hypothesis within an unobserved components
model. Econometric Theory, pages 552–578.
18
A Appendices
A.1 The model of Schotman and Van Dijk (1991b)
Consider the simplest autoregressive process of order one with zero-mean
xt=ρxt−1+ut. (15)
Assume that (i) x0is a known constant, implying that we work conditionally on the initial observation,
(ii) utare independent and identically distributed (i.i.d.) normal random variables with mean zero and
unknown variance σ2, (iii) ρ∈ {S, 1};S={ρ| − 1< a ≤ρ < 1}. We assume (iv) to observe a sample of
Tobservations on a time series {xt}. The Bayesian analysis is carried out via posterior odds. For the
simple model here under consideration we have
K1=K0R∞
0L(x|ρ= 1, σ, x0)π(σ)dσ
RSR∞
0L(x|ρ, σ, x0)π(σ)π(ρ)dσdρ =Pr(ρ= 1|x, x0)
Pr(ρ∈S|x, x0). (16)
where K0represents the prior odds ratio in favour of the hypothesis ρ= 1, and K1the corresponding
posterior odds ratio. The ratio between the integrals corresponds to the Bayes factor, π(σ) and π(ρ)
represent the prior densities for σand ρ∈S,L(x|·) the likelihood function for the observed data x.
The prior odd K0expresses the relative weight of the null hypothesis against its stationary alternative
such that the point ρ= 1 is given the probability mass π0=K0/(1 + K0), and analogously K1/(1 + K1)
provides the posterior probability of the null hypothesis ρ= 1.
Schotman and Van Dijk (1991b) specify the marginal distribution of ρand σas
Pr(ρ= 1) = π0, Pr(ρ|ρ∈S) = 1
1−a, Pr(σ)∝1
σ,
that is, ρis taken uniform over Sand with probability mass π0on ρ= 1, with σand ρindependent.
Besides the fact that the density for ρdepends only on the parameter αwith great simplification of the
integration problem in the denominator of Eq.(16), the overall solution even in this simple setting is not
obvious:
K1=π0
1−π0
C−1
T
(T−1)1/2σ2
0
ˆσ2−T
21−a
sˆρF1−ˆρ
sˆρ
;T−1−Fa−ˆρ
sˆρ
;T−1−1
.
ˆρis the OLS estimator of ρ,s2
ˆρthe squared OLS standard error of ˆρ, ˆσ2the estimated variance of the
residuals, σ2
0the variance of the first differences in x,F(·, ν) the cumulative density of the t-distribution
with νdegrees of freedom, CT= Γ((T−1)/2)Γ(1/2)/Γ(T/2) a constant and π0/(1 −π0) the prior odds
ratio K0.
By choosing a prior equal balance between the stationary and random walk hypothesis π0= 1/2
Schotman and Van Dijk (1991b) recover a feasible operational test procedure based on an empirical
determination of the lower bound a,
a∗= ˆρ+sˆρF−1(αF (−ˆτ)) ,
where τ=1−ρ
sˆρis the Dickey-Fuller statistics, and 0 < α < 1 a constant (typically between 0.001 and
0.1) such that the posterior contains 1 −αof its probability mass in [a∗,1). Therefore Bayes factors
turn to be entirely functions of the data, and are computed as:
K1=C−1
T
(T−1)1/2σ2
0
ˆσ2
0−T
2−ˆτ−F−1(αF (−ˆτ))
F(−ˆτ).
A.2 The model of Phillips (1991b)
Phillips (1991b) adopts the information matrix prior from (Jeffreys,1946). In particular, for a generic
family of densities with parameter θ= (ρ, σ) and information matrix i, the uninformative Jeffreys’ prior
he considers it is defined as π(θ)∝ |i|1
2.
19
For the AR(1) model xt=ρxt−1+utwith uti.i.d. zero-mean normal with variance σ2, and initial
value x0, the above prior becomes
π(ρ, σ)∝1
σI
1
2
ρ,
where the continuous function Iρ, for −∞ <ρ<+∞and sample size T, is defined as:
Iρ=
T
1−ρ2−1
1−ρ2
1−ρ2T
1−ρ2+x0
σ21−ρ2T
1−ρ2if ρ6= 1,
T(T−1)
2+Tx0
σ2
if ρ= 1.
This choice of the prior achieves tighter confidence sets for large values of |ρ|, is invariant to transforma-
tions of the parameter and enjoys other desirable properties. The prior depends on x0and its information
grows with the sample size Tat a geometric rate when ρ > 1. Under the Gaussian likelihood for the
observed sample xand the above prior, the posterior distribution of ρreads
π(ρ|x)∝α
1
2
0hR+ (ρ−ˆρ)2Qi−T
2. (17)
The practical contribution to the posterior from the choice of the prior is embedded in the factor α0:
α0=
T
1−ρ2−1
1−ρ2
1−ρ2T
1−ρ2if ρ6= 1,
T(T−1)
2if ρ= 1.
On the other hand, the sum of squared residuals R, the quantity Q=Px2
t−1and the OLS estimate
of the autoregressive coefficient ˆρare entirely depend on the data and the model choice. The posterior
in Eq.(17) is generally less susceptible of the downward bias than the posterior based on a flat prior, is
not symmetric around ˆρ, and depending on the values of the data-dependent quantities described above
may have a second mode for ρ > 1.
A.3 Simulation study: complement
T= 50 T= 100 T= 200
ρSVD SVD* BIC SVD SVD* BIC SVD SVD* BIC
0.200 -11.35 -11.60 -11.13 -23.81 -24.25 -23.60 -48.97 -49.54 -48.77
0.500 -5.72 -6.25 -5.62 -12.61 -13.36 -12.52 -26.59 -27.51 -26.51
0.800 -0.88 -1.95 -1.15 -3.16 -4.52 -3.43 -8.08 -9.66 -8.36
0.900 0.78 -0.66 0.18 -0.17 -1.95 -0.75 -2.39 -4.43 -2.97
0.990 3.06 0.68 1.31 3.31 0.44 1.56 3.42 0.12 1.68
0.999 3.60 0.94 1.37 4.08 0.85 1.73 4.58 0.78 2.07
1.000 3.68 0.99 1.38 4.19 0.91 1.73 4.76 0.88 2.08
T= 500 T= 2000 T= 5000
ρSVD SVD* BIC SVD SVD* BIC SVD SVD* BIC
0.200 -125.05 -125.74 -124.85 -252.40 -253.15 -252.20 -1273.42 -1274.25 -1273.21
0.500 -69.28 -70.35 -69.20 -140.86 -142.01 -140.78 -715.38 -716.66 -715.30
0.800 -23.43 -25.23 -23.71 -49.41 -51.34 -49.69 -259.36 -261.48 -259.65
0.900 -9.61 -11.93 -10.20 -22.09 -24.57 -22.69 -123.80 -126.54 -124.41
0.990 3.10 -0.68 1.39 2.20 -1.89 0.50 -7.01 -11.63 -8.74
0.999 5.17 0.63 2.50 5.56 0.48 2.78 5.45 -0.62 2.62
1.000 5.57 0.85 2.53 6.21 0.84 2.88 7.78 0.83 3.69
Table 3: Simulation results, log- Bayes factors. Refer to Section 5for a discussion.
20