ArticlePDF Available

Bollinger Bands Thirty Years Later

Authors:

Abstract and Figures

The goal of this study is to explain and examine the statistical underpinnings of the Bollinger Band methodology. We start off by elucidating the rolling regression time series model and deriving its explicit relationship to Bollinger Bands. Next we illustrate the use of Bollinger Bands in pairs trading and prove the existence of a specific return duration relationship in Bollinger Band pairs trading.Then by viewing the Bollinger Band moving average as an approximation to the random walk plus noise (RWPN) time series model, we develop a pairs trading variant that we call "Fixed Forecast Maximum Duration' Bands" (FFMDPT). Lastly, we conduct pairs trading simulations using SAP and Nikkei index data in order to compare the performance of the variant with Bollinger Bands.
Content may be subject to copyright.
arXiv:1212.4890v2 [stat.AP] 1 Jan 2013
Bollinger Bands Thirty Years Later
Mark Leeds
January 3, 2013
Abstract
The goal of this study is to explain and examine the statistical underpinnings of the Bollinger
Band methodology. We start off by elucidating the rolling regression time series model and
deriving its explicit relationship to Bollinger Bands. Next we illustrate the use of Bollinger
Bands in pairs trading and prove the existence of a specific return duration relationship in
Bollinger Band pairs trading[3]. Then by viewing the Bollinger Band moving average as an
approximation to the random walk plus noise (RWPN) time series model, we develop a pairs
trading variant that we call “Fixed Forecast Maximum Duration’ Bands” (FFMDPT). Lastly,
we conduct pairs trading simulations using SAP and Nikkei index data in order to compare the
performance of the variant with Bollinger Bands.
Keywords: Bollinger Bands, pairs trading, time series models
1
1 Introduction
Developed by John Bollinger in the early 1980’s, the Bollinger Band methodology is a frequently
used tool in the analysis of financial markets. Traders frequently use the outputs of Bollinger Bands
in conjunction with other technical indicators in order to choose the position to take in the asset
being monitored. Although Bollinger Bands are a common tool for analyzing asset behavior, the
Bollinger Band components have generally been viewed as outputs of an algorithm rather than as
estimates of the parameters of a statistical model. More details about the history and development
of Bollinger Bands can be found in [1] and [2].
A basic explanation of the Bollinger Band construction follows. Given a time series ytat t=t,
define the n day rolling moving average of the series as mavet:
mavet=
t=t
X
t=tn+1
yt/n t=n,...,T (1)
Note that, because the moving average uses ndata points, the first time tat which the mavetcan
be calculated is at t=n. Similarly, the nday rolling variance at time t=t, σ2
t, is defined as:
ˆσ2
t1=
t=t
X
t=tn+1
(ytmavet)2/(n1) t=n,...,T (2)
Then, given the relations above, the Bollinger Band components are constructed using a center line
and an upper and lower band defined respectively as:
CLt=mavet
BBuppert=mavet+kˆσt
and
BBlowert=mavetkˆσt
where kis referred to as the width multiplier and represents the distance in standard deviation
units from the center line to each band. An example of the Bollinger Band construction is shown in
Figure 6 in Appendix A on page 31.
1Note that this is the formula used to obtain an unbiased estimate of the unknown variance, σ2
t. Another common
way of defining the variance, ˆ
σ2t, is to use nin the denominator rather than (n1). The results that follow are
dependent on the denominator in equation (2) being defined as (n-1).
2
The use of the moving average for the center line has generally been viewed by market technicians as
a low pass filter for the time series being monitored. By calculating the moving average of the actual
series and plotting the resulting series, the high frequency component is eliminated from the original
series and only the trend remains. The upper and lower band calculations use this trend as an input
and they are useful for developing indicator rules such as “when the price crosses BBU pper and
the RSI is above X, this indicates that the price is expected to . . . “. As far as the the origin of the
dispersion component is concerned, many different types of bands were experimented with before
John Bollinger came up with the idea of using the sample standard deviation, ˆσ, as the measure of
the current dispersion of the time series. The details of his discovery are captured quite vividly in
[1] and are left for the reader to explore.
The goal of this study is to make connections between Bollinger Bands and time series models
and show how these connections can lead to useful statistical insights. The first connection shows
that although Bollinger Bands are generally viewed as a somewhat ad-hoc algorithm that generate
outputs used as trading indicators, they actually have strong statistical foundations. The second
connection provides an alternative way of viewing the Bollinger Band pairs trading algorithm and
leads to an interesting Bollinger Band variant.
2 Bollinger Band Literature
The literature with respect to Bollinger Bands simulations is quite vast. Butler and Kazakov [4]
apply swarm optimization techniques to search for optimal Bollinger Band Bollinger parameters.
The optimizations are done with respect to the profit and loss of Bollinger Band pairs trading
strategies.2Similarly, Ni and Zhang [5] use genetic algorithms to find the optimal Bollinger Band
window length and band width jointly. The research regarding variations on Bollinger Bands is less
plentiful. Oleksiv [6] uses different algorithms for the construction of the bands including kriging, a
method more common in geostatistics. Chande [7] uses an exponentially weighted moving average as
a low pass filter for prices and adjusts the smoothing parameter dynamically based on the volatility
of prices. Finally, Tilley [8] combines the moving average with the concept of support and resistance
in order to switch between emerging markets funds and small cap funds to and from the SAP 500.
The rest of this article is organized as follows. In Section 3 we demonstrate an equivalence between
Bollinger Bands and the rolling regression time series model. In Section 4 we describe how Bollinger
Bands can be used in pairs trading as a mechanism for capturing the mean reversion behavior
2The application of Bollinger Bands to pairs trading will be discussed in detail in Section 4.
3
expected in the asset pair being traded. In Section 5 , we make a connection between Bollinger Bands
and a state space model called the random walk plus noise model. This connection provides another
approximate statistical framework for Bollinger Bands and leads to a variant of Bollinger Bands
called Fixed Forecast Maximum Duration Bands. We then construct a pairs trading simulation
in order to compare the out of sample performance of the Bollinger Bands pairs trading strategy
(BBPT) and the Fixed Forecast Maximum Duration pairs trading strategy (FFMDPT). Finally, in
Section 6, we summarize our findings and provide suggestions for future research areas.
3 Bollinger Bands as a Rolling Regression Time Series Model
In order to develop a connection between Bollinger Bands and the rolling regression time series
model, we first need to describe the latter in precise detail.3
3.1 The Rolling Time Series Regression Model
The rolling regression time series model is commonly used when model coefficients are expected to
change over time. Following the notation of Zivot and Wang [10], the rolling regression time series
model using an n day moving window is shown below:
yt(n) = Xt(n)βt(n) + ǫt(n)t=n, ···T(3)
Here yt(n) is an (n×1) vector of independent observations on the response, Xt(n) is an (n×k)
matrix of explanatory variables and finally ǫt(n) is an (n×1) vector of error terms each being
N(0, σ2t) . Note that (n) indicates that the the nobservations in yt(n) and Xt(n) are the n
most recent values from time (tn+ 1) to t. Clearly we need to assume that n > k.
It is important to understand what is being assumed by the use of the (n) notation. First of all,
although the new observation at time tis univariate, at time t, the vector yt(n) of observations
from tn+ 1 to tis used to estimate βt(n). Therefore, we need to differentiate between the
new univariate observation at tand the n-dimensional vector of observations at t,yt(n). In what
follows, we always refer to the n×1 vector at some t=tas vecobstand the new univariate
observation seen at t = tas uniobst.
The rolling regression estimation algorithm proceeds in the following manner: Initially, we start
out at t=nbecause that is first point at which we can construct an estimate of β. We observe
3The originator of the rolling regression model is not known by the author but its popularity is most likely due to
Fama and MacBeth [9].
4
vecobst=nwhich is the n-dimensional vector of the first n observations in the series. Note that
vecobsthas a regression model associated with it, namely, vecobst=Xtβt+ǫtwith the error
term ǫtassumed to be independent (i.e. zeros off the diagonal of its covariance matrix). So, vecobst
is observed and the coefficients, βt, in the model are then estimated. Next, time proceeds from
t=tto t=t+ 1 = n+ 1 and a new observation vecobst+1 is observed. But this supposedly new
observation is constructed in the following manner: uniobst(n1) is removed from vecobstand
the new uniobst+1 is observed and added to the front of the vecobstobservation. This modified
vecobstvector is now vecobst+1 and is the “new” n-dimensional observation at t=t+ 1. Again,
vecobst+1 has a regression model associated with it namely, yt+1 =Xt+1 βt+1 +ǫt+1. The error
term ǫt+1 is again assumed to be independent. So, once vecobst+1 is observed, the coefficients in
the associated regression model are estimated thereby obtaining a new set of βtcoefficients at time
t=t+ 1. This process repeats itself again at t=n+ 2 and, n+ 3,··· and so on and so forth until
t=T.
Note that there is a serious statistical problem with the model represented in equation (3). Clearly
the response vecobstis highly correlated with the response vecobst+1 because of how these observa-
tions are constructed. In fact, any two n-dimensional observations vecobstand vecobst′′ constructed
less than n periods apart will be correlated because they will contain common observations due to
the rolling window construction. Now, even though this correlation exists, the rolling regression
methodology still assumes that each regression model has independent error terms and therefore
independent vecobstt=n. ···, T . We should define the assumption more rigorously. Formally,
let us assume that the probability at time t of vecobst(i.e. Yt) possesses the following property:
Prob(Yt=yt)|Bt) = Prob(Yt=yt) (4)
where
Bt={Ytt= 1,···, t1}
This assumption implies that the likelihood of any vecobstis independent of the previous vecobst
observations even though this is clearly not true. Why is this assumption required ? Often it is
believed that, due to structural changes or simply noise , the βtparameter is expected to change
over time. Yet, at the same time, one also knows with certainty that estimates of βtthat are
close to each other in time are highly positively correlated. Therefore, the only way to generate
correlation in the estimates, allow them to change over time and yet keep the model analytically
tractable without resorting to more complex techniques is to make this independence assumption.
Rather than imposing a model for βtand allowing the data to speak for the new estimate of
βt, each time there is a new data point, the assumption is that, at each time t, a totally new n-
5
dimensional data point is observed. In essence, from a time series modelling standpoint, the rolling
window construction together with the independence assumption is an ad-hoc way of dealing with
the fact that dynamics are not being specified for βt. The expected correlation of the βtestimates
is achieved through the use of the constant overlap in the adjacent n-dimensional observations.
Intuitively, a larger window will generate more highly correlated estimates than a shorter window.
The statistical flaw of the rolling regression time series model is that the independence assumption
clearly does not hold so the estimates are biased with respect to the true underlying DGP.
Conversely, the well known time varying regression-Kalman filter type model, also quite popular
in econometrics, is more complex than the rolling regression model mathematically but has the
advantage that only one model is assumed from the start and the dynamics for the beta coefficients
are specified directly. Consequently there is no need for the ad-hoc construction of a rolling window.
In the Kalman kilter framework, when a new uniobstis observed at a new time t=t, the current
model estimate, βt1, is updated and becomes the new estimate at t. This is probably why the
rolling regression time series model is often referred to as the “poor man’s time varying coefficient
regression model”. More details on the Kalman filtering approach can be found in [11] and [12] and
it will also be discussed in more detail in Section 5.4.
Below, Figure 1 displays the relationship between adjacent windows in the rolling time series regres-
sion model at t=t(red line segment) and t=t+ 1. (green line segment). Figure 2 displays what
is assumed to be happening with the same adjacent windows.
Figure 1: The rolling regression windows at t=tand t=t+ 1 contain common observations.
t=t
ytn+1 .................. ytn+10 .........ytn+(n1) yt
t=t+ 1
ytn+2........................
ytn+11 ........... ytyt+1
6
Figure 2: Although Figure 1 on page 6 clearly shows the contrary, the assumption in the rolling
regression model is that adjacent windows at t=tand t=t+ 1 do not contain common observa-
tions.
t=t
y
tn+1
..... ..........y
tn+10
.........y
tn+(n1) y
t
t=t+ 1
y′′
tn+2
..... ..........y′′
tn+11
......... y′′
ty′′
t+1
3.2 The Equivalence of Bollinger Bands and the Rolling Regression Time
Series Model
Let us consider the intercept only version of the rolling regression time series model represented by
(3) so that Xt(n) is a vector of ones and βt(n) is a scalar. The rolling regression model in (3)
then becomes:
yt(n) = βt(n)Xt(n) + ǫt(n)t=n, ···T(5)
Using classical least squares results, it is straightforward to show that the estimates of this model
at each time tare:
ˆ
βt=
t=t
X
t=tn+1
yt/n t=n,...,T (6)
and
ˆσ2
t=
t=t
X
t=tn+1
(ytˆ
β2
t)/(n1) t=n,...,T (7)
Note that if we equate ˆ
βtand mavet, then the expressions for the estimates in equations (6) and
(7) are exactly the same as the Bollinger Band components in equations (1) and (2)4. Therefore,
the Bollinger Band algorithm results in estimates that are identical to those of an intercept only
rolling regression model where the intercept is the center line and the residual standard deviation is
the Bollinger Band standard deviation.
It is also straightforward to show that, for the model in (5), a 100(1α)% confidence interval for the
future one step ahead response , also referred to in the statistics literature as a prediction interval
4In fact, the R [14] command: rollapply(inseries, width = ndays, FUN = function(y) summary(lm( y 1))$sigma,
align = “right”, fill = TRUE) and the R command: sqrt(rollapply(inseries, ndays, var, align = “right”, fill = TRUE))
will give identical results given the same series “inseries”.
7
[13], is the following:
ˆ
βt±tα/2
n1׈σtp(1 + (1/n))
Note that it is possible to approximate this 100(1 α)% prediction interval in the following way.
For window values of n between 10 and 50, p(1 + 1/n) ranges between 1.05 and 1.01. Therefore,
for values of n between 10 and 50, the approximate 100(1 α)% prediction interval becomes:
ˆ
βt±tα/2
n1׈σt(8)
But notice that, if ˆ
βtis replaced by mavetand tα/2
n1is replaced by k, then (8) reduces to
BBuppert=mavet+kˆσt
and
BBlowert=mavetkˆσt
Therefore, assuming that k is chosen appropriately, the upper and lower bands in Bollinger Bands
are approximately equivalent to the prediction intervals constructed from an intercept only rolling
regression model. We should point out that the approximation does depend on the order of mag-
nitude of σand will get worse as σincreases. At the same time, it is always possible to avoid the
approximation by including the 1/n factor in the construction of BBupper and BBlower and obtain
the exact prediction interval associated with the intercept only rolling regression time series model.
In summary, the Bollinger Band methodology can be viewed as an intercept only rolling regression
time series model with the center line and the standard deviation being the mean and residual
standard deviation from the rolling regression model respectively. The BBUpper and BBLower
components of the Bollinger Bands will be approximately the same as the prediction intervals of the
intercept only rolling regression model as long as k is chosen appropriately.
8
4 Bollinger Bands and Pairs Trading
Bollinger Band components are usually used with various other indicators in order to decide whether
an asset is declining or trending. The one quantitative strategy where the Bollinger Bands are often
used solely on their own is that of pairs trading. In pairs trading [15], where asset Z and asset X are
the asset pair being traded, the quantity yt=ln(Pz/Px)tis tracked over time as a time series. It is
assumed that ytis weakly stationary and therefore mean reverting.5Therefore, when the quantity
ytgets too high or too low, the expectation is that it will eventually return to its unconditional mean
ut. Bollinger Bands are commonly used as a tool for exploiting this reversion behavior. Recall that
BBupper and BBlower can be constructed from the Bollinger Band algorithm using the series yt.
Therefore, Bollinger Bands exploits reversion in the following manner: If at any time t,yttouches
or crosses BBupper (BBlower) at say t, then this is viewed a signal that, after t,ytis expected
to decrease (increase) sometime in the relatively near future so a short (long) position is taken in
the pair at t+ 1.6Since mavetis the rolling mean estimate of the ytseries, the crossing of ytback
through mavet∗∗ at some later time t∗∗ is used to indicate that the series ythas completely reverted
and the position is closed out.
For example, suppose that the series yt= (ln(Pz/Px)tis tracked over time and that it crosses
BBlowertat time t=t. Then, at time t+ 1, a long position is taken in asset Z and a short
position, equal in dollars to that amount taken in asset Z, is taken in asset X. This position is held
until yteventually crosses through the rolling mean mavet∗∗ at some time t∗∗ in the future. The
overall position entered into at t+ 1 is then closed out by selling the long position in asset Z and
buying back the short position in asset X. Conversely, suppose that over the course of time ytcrosses
through BBupper rather than BBLower. Then, since we expect ytto decrease in the near future,
we would go short asset Z and long asset X. Then, when the ytprocess crosses back through the
moving average, the overall position is closed out. For convenience going forward, we create an
acronym for the Bollinger Band pairs trading strategy by referring to it as the BBPT strategy.
5The weak stationarity assumption is equivalent to assuming that the mean, ut, of the process is constant. For
this to be the case, mean reversion has to exist.
6A long position is defined as being long the asset that is in the numerator of the ratio of log prices and short the
asset that is in the denominator. A short position is defined analogously.
9
4.1 The SAP 500 and the Nikkei: A Pairs Trading Example
In order to illustrate the actual BBPT strategy using real data, we take the Standard and Poors
500 Index prices as asset Z and the Nikkei Index prices as asset X and construct yt, the difference
between their log prices over the year 2004.7The Bollinger Bands generated by ytare based on
a rolling window size of n = 20 and a bandwidth multiplier of k= 2. The pair trades generated
during 2004 using these parameters are shown in Figure 7 in Appendix B. Each line segment in the
figure, whether it red or green, represents the entry and exit of one trade which is initially generated
by the touching or crossing of BBUpper or BBLower by yt.. We will discuss the first, fifth and
sixth trades in detail because these particular trades are representative of the typical behavior of
the BBPT strategy.
Consider the first green line segment which represents the first trade. BBU pper was crossed so
the action taken was to go short the paired asset by shorting the Standard and Poors Index and
going long the Nikkei index. The time of the entry of short position is indicated by the arrow, (i.e.
arrow always denotes the entry point) which is red to denote that the position was a short position.
Clearly ytreverts to the moving average very quickly and the position is then closed out. Since the
moving average was crossed from above, this indicates that there was a profit from this short trade
so the line segment is green. Finally, the diamond denotes the exit point. The diamond is red only
because it is always the same color as its associated arrow which was red because a short position
was taken.
The use of line segments, arrows and diamonds along with their respective colors allows for a large
amount of information to be conveyed in Figure 7. Also, because ytis defined as yt= (ln(Pz/Px)t,
the return from any long pair trade8entered into at tand exited at t∗∗ trade is equal to
(ln(Pz/Px)t∗∗ (ln(Pz/Px)twhich is conveniently equal to the vertical distance between the arrow
and the diamond of the line segment. This relation is useful because one can then easily identify
where there was a large positive return or large negative return. Green line segments with large
vertical distances between their endpoints are signs of large positive returns. Red line segments with
large vertical distances between their endpoints represent large negative returns.
Next, we consider the fifth line segment which represents the fifth trade. Just as was the case with
the first trade, this is a short where the trade is short the SAP and long the Nikkei. But unlike
the first trade, the moving average is not crossed by ytuntil it is above the original entry point of
7It is important to realize that in an actual trading scenario one would want to test that ytwas cointegrated over
some historical period immediately preceding the trading period 2004.
8The return to a short trade is -1.0* (ln(Pz/Px)t∗∗ (ln(Pz/Px)t.
10
the trade. Therefore, the trade results in a loss and consequently the color of the line segment is
red rather than green. The sixth trade in the figure is a long trade because BBlower was crossed
but this trade also results in a loss because the ytseries did not cross through the moving average
quickly enough. It crossed the moving average at a point lower vertically than where ytwas at the
entry point. ytwas expected to increase after entry and cross the moving average from below rather
than above but this did not happen. The vertical distance between the entry point and exit point
represents the negative return of the trade. Notice that, for this trade, the endpoints of the line
segment are green indicating that the trade was long the SAP and short the Nikkei.9
Since the horizontal axis of the plot in Figure 7 represents time, the duration of any trade is simply
the horizontal distance between when the trade opens (i.e. the arrow) and closes (i.e. the diamond).
One interesting aspect of the plot in Figure 7 that may not be obvious due to the scale of the axes is
that the durations of the winning trades (i.e. green line segments) are consistently shorter than the
durations of the losing trades (i.e. red line segments). In fact the average duration of the winning
trades in is 8.6 and the average duration of the losing trades is 20.5. This duration behavior is not
just specific to the use of the parameter values, n = 20 and k = 2. Consider Figure 8 in Appendix
C. Each of the eight plots represents the same pairs trading strategy simulated over different time
periods using various combinations of the values of the window size and the multiplier. The rolling
window size parameter ntakes on the values of 20 and 30 while the width multiplier kwhile the
width multiplier is either 1 or 2. The plots in Figure 8 clearly indicate that, in a BBPT strategy,
the average duration of winning trades is a consistently shorter than the average duration of losing
trades. Later on a more fundamental result will be proven concerning the return-duration behavior
in the BBPT strategy. This result is a key component of the the Fixed Forecast Maximum Duration
Bands pairs trading (FFDBPT) strategy which is discussed in the following section.
9Note that the weighted sum of the the returns of all the trades in the figure is referred to as the return of the
strategy over 2004 where the weights are proportional to the dollars allocated to each trade. For our purposes, we
assume that all trades are given the same portfolio weight so that each trade weight = 1/number of trades.
11
5 Fixed Forecast Maximum Duration Bands
The use of Bollinger Bands in pairs trading goes back to the middle of the 1980’s. The algorithm’s
popularity alone suggests that it has been at least reasonably successful in capturing mean reversion
in paired assets with cointegrated price behavior.10 In this section, we create a series of successive
links between various well known time series models which eventually lead back to Bollinger Bands.
This successive linking will lead to the development of a variant of Bollinger Bands called Fixed
Forecast Maximum Duration Bands. But, in order to develop this variant, it is necessary to introduce
the various time series models and show how they are related. First we introduce the concept of
exponential smoothing and its various properties. Next, we describe the Kalman filtering approach
in some detail. Finally, we introduce a particularly simple Kalman filter called the random walk
plus noise and make a connection between it and Bollinger Bands.
5.1 Introduction to Simple Exponential Smoothing
A well known forecasting method originally developed by Brown in the 1950’s [16, 17] is that of
simple exponential smoothing (SES). The method is appropriate when it believed that the mean of
the series might be changing over time but there is no trend or seasonality evident in the series.
The method of SES smoothing takes the forecast for the previous period and adjusts it using the
empirical forecast error. That is, the forecast for the next period is
ˆyt+1 = ˆyt+λ(y
tˆyt) (9)
The value of parameter λis restricted to be between 0 and 1 and is either determined empirically or
known apriori based on the forecaster’s previous experience. Of course, monitoring of the parameter
λis critical because the behavior of the series can change over time. We can re-write the forecast in
the following manner in order to gain insight into what exponential smoothing is really doing:
ˆyt+1 =λy
t+ (1 λyt(10)
By examining (10), we can see that exponential smoothing is a model in which the forecast ˆyt+1
is based on weighting the most recent observation ytwith a weight equal to λand the previous
forecast with a weight equal to (1λ). Following Hyndman et al [18], the implications of exponential
smoothing can be seen more easily if ˆyt+1 is expanded by replacing ˆytwith its components as follows:
ˆyt+1 =λyt+ (1 λ)[λyt1+ (1 λyt1]
=λyt+λ(1 λ)yt1+ (1 λ2yt1
10Identifying a mean reverting pair of tradeable assets is a separate issue and will not be discussed here.
12
If this substitution is repeated by replacing ˆyt1with its components, ˆyt2with its components,
and so on, the relation becomes:
ˆyt+1 =λyt+λ(1 λ)yt1+λ(1 λ)2yt2+λ(1 λ)3yt3
+λ(1 λ)4yt4+···+λ(1 λ)t1y1+ (1 λ)tˆy1(11)
Therefore, ˆyt+1 represents a weighted moving average of all past observations with the weights
decreasing exponentially giving rise to the term “exponential” smoothing. Note that there is an
initialization issue in that we need an initial value for ˆy1. Usually, this value is taken to be the first
observation or some proportion of it and, if the series is long enough, the choice of this value should
have a negligible effect on the predictions. For more elaborate methods for choosing the initial value,
one should refer to [18].
5.2 Simple Exponential Smoothing and the ARIMA(0,1,1)
The following discussion assumes that the reader has some familiarity with the ARIMA time-series
modelling approach of Box and Jenkins. If this is not the case, then one is referred to [19] for
a detailed description. First of all, it is well known that a forecasting equivalence exists between
particular exponential smoothing models and the mapped ARIMA model [18, 22]. In fact, Muth
[20] was the first of many authors to prove that SES is optimal for the ARIMA(0,1,1) process:
(1 B)yt= (1 θB)ǫt
which can be re-written as
yt=yt1θǫt1+ǫt(12)
Note that since the sign of θis arbitrary, (12) can be re-written as
yt=yt1+θǫt1+ǫt(13)
Also, in order for the ARIMA(0,1,1) model to be invertible, it is necessary to restrict θso that
θ(1,1). By SES being optimal, what is meant is that, if the parameter θin ARIMA(0,1,1)
process is known, then the SES method with parameter λ= (1 θ) will give the same forecasts
as the ARIMA(0,1,1) model. Unfortunately Muth’s proof [20] is not particularly transparent so we
provide a simpler proof here. First, we write the one-step forecast for the ARIMA(0,1,1) model
below:
yt+1 =yt+θǫt+ǫt+1 (14)
13
Now, assuming that θis known, if one wanted to calculate the one step ahead forecast using the
ARIMA(0,1,1), the expectation of ǫt+1 is zero so the forecasting equation becomes
ˆyt+1 =yt+θǫt
Therefore, generating the forecast, ˆyt+1 requires estimating ǫtusing ˆǫt. The estimate of ˆǫt=
ytyt) so the forecast becomes
ˆyt+1 =yt+θytyt)
= (1 θ)yt+θˆyt
But this is equation (10) for simple exponential smoothing with parameter λ= (1 θ). Therefore,
we have shown that the forecast of the ARIMA(0,1,1) model with parameter θis identical to the
forecast for SES with parameter (1 θ).
5.3 The Weighted Age In Simple Exponential Smoothing
It should be emphasized that SES is not a time series model per se but rather a forecasting method
because there is no data generating process (DGP) underlying SES. Also, because of the invertibility
condition in the ARIMA(0,1,1), θis restricted to lie between -1 and + 1 which implies that λin SES
is restricted to be between 0 and 2. In practical applications of SES, the λparameter is generally
chosen to be between 0 and 1 in order to ensure that the weight given to past observations decreases
as the observations go further back in time. In fact, given λ, we can easily calculate the weighted
average age of the observations used in the current forecast of SES. Notice that, in (11), the weight
given to an observation k periods ago, ytk, is λ(1 λ)k. Therefore, the weighted average age of the
observations going into the current SES forecast at any time t is:
¯
k= 0λ+ 1λ(1 λ) + 2λ(1 λ)2+···
=λ
X
k=0
k(1 λ)k
=(1 λ)
λ
A similar “older data gets less weight’ concept exists for the moving average forecast used in Bollinger
Bands except that the decrease is more abrupt. In the case of the moving average, the past observa-
tions that are of age n-1 periods or less are weighted equally with weight = 1/n. Any observations
older than n-1 periods get a weight of zero. Therefore, for the moving average, we have:
¯
k=0 + 1 + 2 + ···+n1
n
=n1
2
14
An interesting question is whether there exists a parameter λin SES that will gives forecasts similar
to that of the n period moving average in Bollinger Bands. Brown [17] reasoned that, if the average
age of the observations used in the SES forecast and the moving average forecast are the same,
then one would expect those models to give somewhat similar forecasts. Therefore, one can set the
weighted age of the observations used in the current forecast of SES equal to the weighted age of
the observations used in the moving average and solve for λ:
1λ
λ=n1
2λ=2
n+ 1 (15)
Figure 9 in Appendix D shows the Bollinger Bands plotted along with the exponentially weighted
moving average and its prediction intervals11 when the weighted age relation, λ=2
n+1 , is used. The
figure shows that the approximation is quite reasonable, particularly for the center line. This means
that by using the relation λ= 2/(n+ 1), the moving average associated with Bollinger Bands will
provide a satisfactory approximation to the exponential smoothing model with parameter λ. Below
summarizes the connections made so far:
1. Exponential Smoothing and the ARIMA(0,1,1) are equivalent for λ= 1 θ
2. The moving average is well approximated by exponential smoothing for λ= 2/(n+ 1)
3. 1 and 2 imply that the moving average is well approximated by an ARIMA(0,1,1) for 1 θ=
2/(n+ 1)
This means that if we have an ARIMA(0,1,1) model with parameter θ, then we can set n=2
1θ1
in the Bollinger Band moving average and this will provide a reasonable approximation to that
ARIMA(0,1,1) model. These model connections are illustrated in Figure 3 on page 16.
11The details pertaining to the construction of the prediction intervals for exponential smoothing will not be
discussed here. For details on the computation of the prediction intervals for the ARIMA(0,1,1) one is referred to
[21].
15
ARIMA(0,1,1) λ= (1 θ)
ewma(λ)
n= 21mave(n)
Figure 3: The lines represent the connections between the models and the transformations required
to map one to the other. The thinner line indicates that the relationship is approximate.
In order to make the final connection that leads to the Fixed Forecast Maximum Duration Bands
pairs trading strategy, in what follows we briefly introduce state space models. We should point out
that most of the introduction is taken from [22].
5.4 Introduction To State Space Models
From the 1950’s on, electrical engineers were particularly interested in the following problem in
linear systems theory which is shown in Figure 4 on page 17. Suppose we have an unobserved input
signal at time t, θt1which is known as the system state. The state process evolves in accordance
with a linear transform of θt1to which is added a noise process ωt. This process is described by
the left hand side box in Figure 4. The arrow from θt1to the box containing F
trepresents the
linear transformation of the system state producing the system output zt=F
tθt1. Finally, added
to ztis a noise process ǫtwhich results in the measurement process yt. Only the sequence {yt}is
observed. The linear system is described by equations (16) and (17).
yt=F
tθt1+ǫt, ǫtN[0, Vt] (16)
θt=Gtθt1+ωt,ωtN[0,Wt] (17)
with initial conditions
(θ0|D0)N[m0,C0]
where D0denotes the information available at time zero. The normality assumptions on the error
terms are not absolutely essential but they greatly simplify the inferential framework so they are
usually imposed. The noise processes ǫtand ωtare assumed to be independent and the goal of
the engineers was to produce an estimate of the unobserved system state θtat time t, using the
measurements, y1,...,yt. Then, when a new observation, yt+1 is realized, a new estimate of the
unobserved system state θt+1 should be obtained. This came to be known as the filtering problem
and was studied by electrical engineers for many years.
16
Delay F
t ✲ ✲
+
θtθt1
+
ztyt
Gt
ǫt
ωt
Figure 4: A schematic diagram representing the filtering problem in electrical engineering.
In an extremely important contribution to the engineering literature, Kalman [23] developed a recur-
sive scheme for updating θtoptimally each time a new observation ytis realized. These recursions
became known as the Kalman filter recursions12 and the system described by (16) and (17) became
known as the state space formulation.
Then, in the early 1970’s, Harrison and Stevens [24] bridged the gap between the statistical commu-
nity and the engineering community by showing that the state space formulation could be used by
statisticians to build and estimate models that were already very popular in the statistical litera-
ture. For example, they showed that if one took the state space model and let the matrix Gtbe the
identity matrix, then the model was equivalent to a time varying coefficient regression model. This
brought state space models into the statistical community and led to various specific state space
models one of which is described in Section 5.5.
12The recursions are somewhat complicated so they are not given here but they can be found in [11], [12] and
[22].
17
5.5 A Simple State Space Formulation: The Random Walk Plus Noise
Model
Suppose that we have the state space formulation represented by (16) and (17) where Ft= 1,
Gt= 1, and θtis is a scalar equal to µt. Then, the state space formulation reduces to what is
termed the random walk plus noise state space model (RWPN) shown below:
yt=µt1+ǫt(18)
µt=µt1+ηt(19)
where
ǫtN(0, σ2
ǫ)
ηtN(0, σ2
η)
As noted previously, ǫtand ηtare assumed to be independent. Eliminating the system variable µt
and creating a stationary model by differencing ytgives:
yt=ηt1+ (ǫtǫt1) (20)
Now, we can easily calculate the model autocorrelations , γk, for each k:
γ1=σ2
ǫ
σ2
η+ 2σ2
ǫ
γ2=γ3=···= 0
Since E(yt) = 0, and the autocorrelations are zero after lag 1, the RWPN model is statistically
equivalent to an ARIMA(0,1,1) model. Note that since all the variances are required to be greater
than zero, we can see by inspection that for this random walk plus noise model, 0.5< γ1<0.
This implies that the parameter space for θin the equivalent ARIMA(0,1,1) is restricted. In fact,
if we equate γ1in the random walk plus noise model with γ1in the ARIMA(0,1,1) model, then the
parameter θis forced into the range 1< θ < 0.13
We need to derive the exact relation that maps the random walk plus noise model to the
ARIMA(0,1,1). First of all, it is easy to show that the autocorrelation at lag one of the ARIMA(0,1,1)
defined in equation (14) is θ/(1 + θ2). If we define the signal to noise ratio in the random walk plus
13There is another state space model called the single source of error state space model which does not impose this
restriction. See [18, 22] for details.
18
noise model as q=σ2
ǫ2
ηand equate the lag one autcorrelations of each model, we obtain the
following mapping between the two models:
θ= (pq2+ 4q2q)/2 (21)
Therefore, given the signal to noise ratio q, we can find the equivalent ARIMA(0,1,1) using the
mapping in equation (21). This connection between the RWPN and the equivalent ARIMA(0,1,1)
allows us to modify Figure 3 on page 16 by including the random walk plus noise model. The
resulting Figure is shown below and illustrates that a rolling window moving average using nas the
window size can be viewed as an approximation to the random walk noise model with a signal to
noise ratio equal to q. What we have shown is that the rolling moving average approach used in
Bollinger Bands, initially thought to be a “poor man’s time varying regression coefficient model”,
may not be as poor as originally thought. More importantly, in the next section we will see how
this connection between the RWPN and the moving average leads to an interesting modification of
Bollinger Bands.
RWPN
θ=q2+4q2q
2ARIMA(0,1,1) λ= (1 θ)
ewma(λ)
n= 21mave(n)
Figure 5: Given the various mappings, the moving average construction can be viewed as a approx-
imation to the random walk plus noise model.
19
5.6 FFMDPT: A Bollinger Band Variant
In the previous section, we showed that the moving average in Bollinger Bands can viewed as an
approximation to the random walk plus noise state space model with a signal to noise ratio equal
to q. First, in order to make the state space notation consistent with the Bollinger Band notation
previously used where βthas been viewed as being synonymous with mavet, we modify equations
(18) and (19) representing the RWPN by replacing µtwith βteverywhere. This results in the RWPN
model equations below:
yt=βt1+ǫt(22)
βt=βt1+ηt(23)
where
ǫtN(0, σ2
ǫ)
ηtN(0, σ2
η)
Equations (22) and (23) shed light into what the BBPT strategy is doing from a state space view-
point. Again, we consider Figure 7 on page 32. From the figure, we can see that after a trade entry
in the BBPT strategy, the moving average, ˆ
βt, is the dynamic forecast for ytand in this sense, its
forecasted equilibrium price. Therefore, if we view the moving average in Bollinger Bands as an
approximation to the random walk plus noise model in (22) and (23), then, after a trade entry, the
Bollinger Band algorithm continues to receive the new data, yt, and the estimate of βtin (23) is
updated accordingly. Note that a statistically consistent update of βtrequires that the estimated
signal to noise ratio, q, is remaining constant.
Now, rather than continuing to update the estimate ˆ
βtas if the signal to noise ratio was still qafter
trade entry, we can make an alternative assumption. Suppose that, immediately after trade entry,
we view the future ytas missing and then continually forecast as if there was no longer any future yt
available. This assumption could be quite reasonable because any trade entry implies that some kind
of unusual observation or outlier with respect to the current state of the system has been observed.
Once an outlier is observed during the evolution of a state space model, there is little reason to
assume that the state space model is in the same state with respect to qas it was before trade entry.
Once the assumption is made that future ytare missing after trade entry, the observation equation
(22) no longer exists and the original random walk plus noise model represented by (22) and (23)
reduces to a pure random walk model for βt:
βt=βt1+ηt(24)
20
Note that if βtis evolving as a random walk, then this implies that the optimal forecast to make after
trade entry is ˆ
βtradeentry itself, the estimated value of βtat the time of trade entry. This constant
forecast is the first modification we make to the Bollinger Band algorithm. Rather than using the
moving average, mavetas the forecast at each time tafter trade entry, we use the mavetradeentry ,
namely the known moving average estimate at time t, as the future forecast at all times t. In this
way, the forecast for where the process will revert is a horizontal line segment (i.e. a constant )
starting at mavetr adeentry and we call this forecast the “fixed forecast”.
An advantage to using the “fixed forecast” variant of BBPT (e.g. FFPT) is that if ytdoes cross
through the fixed forecast, the return generated by the trade will always be greater than the return
generated by the identical trade in the BBPT strategy. This is because, in a long FFPT trade, the
fixed forecast at time t will always greater than the mavet(i.e: the forecast in the BBPT strategy
) and, in a short FFPT trade, the fixed forecast will always less than the mavet. Unfortunately,
the FFPT algorithm also introduces a serious problem. Clearly, since we are assuming that the βt
process is a random walk, the variance of the fixed forecast k periods out is σ2
ηkso as one goes
further and further out, the forecast variance increases linearly with k. More problematic is the
fact that there is a non-zero probability that the ytprocess may never revert and cross through
the horizontal forecast. Conversely, in the case of Bollinger Bands, the possibility of the ytnot
crossing the forecast is extremely unlikely because the moving average forecast is a function of the
ytprocess and essentially tracks it. In fact, in the BBPT strategy, the only scenario in which the yt
process will not revert to the moving average is one in which the ytprocess trends permanently in
either direction. Clearly this permanent trending scenario is extremely unlikely in practice. In the
next section, we develop a FFPT trade exit mechanism which remedies its “trade may never exit”
problem.
21
5.7 Restricting the Trade Duration in The Fixed Forecast Variant
Recall that, in Section 4.1, we saw that the average duration of losing trades in BBPT strategies
was greater than the average duration of winning trades. Although this relationship was only shown
empirically, we formalize the duration-return relationship in the following theorem:
Theorem 1. Assume that the rolling window size in the BBPT strategy = n, the band width mul-
tiplier = k and that yttouches BBlower at t=t1so that a long trade is generated at t=t.
Then the total return of this trade is non-negative if and only if the duration of the trade is less than
or equal to n; i.e. the trade is exited at a time less than or equal to t∗∗ =t+n1. This result is
independent of the bandwidth multiplier parameter k.
Proof. See Appendix E.
We should point out that Theorem 1 assumes that slippage does not occur during entry nor exit. By
absence of slippage, we assume that during the time between the trade entry signal and the trade
entry, price erosion does not occur. Similarly, during the time between the trade exit signal and
trade exit, we also assume that price erosion does not occur. If either of these assumptions are not
true, then Theorem 1 will hold only approximately. In fact, if we return to the empirical evidence in
Section 4.1, we see that the average losing trade durations in 2007 for k = 1 and k = 2 and n = 30
were actually slightly less than n = 30. This is inconsistent with Theorem 1 but the inconsistency
only occurs because the BBPT simulations assume that the there is a one day lag between the entry
signal and the entry and a one day lag between the exit signal and the exit. For the k=1 and k=2
cases in 2007 using n = 30, slippage did occur and caused the average duration of the losing trades
to be less than the rolling window size n = 30.
Theorem 1 suggests that a reasonable exit time for a trade in FFPT is the rolling window size
itself.14 The logic behind this idea is that once the trade duration becomes greater than the window
size, by Theorem 1, the trade cannot possibly have an overall positive return. Therefore, intuitively
the rolling window size serves as a reasonable time to exit the trade. Also, exiting at a time t equal
to the rolling window size remedies the original “trade may never exit” problem associated with
the FFPT approach. Therefore, restricting the maximum duration of all FFMDPT trades to the
window size nis the second and final modification to the BBPT strategy and results in a variant we
call the Fixed Forecast Maximum Duration Pairs Trading strategy (FFMDPT). To summarize, the
14Using technical analysis terminology, an exit rule based on a pre-determined length of time is referred to as a
time stop.
22
first component of FFMDPT is that the forecast is a constant equal to the original moving average
at trade entry. Secondly, assuming that the window size = n, then, if a trade has not exited by
n periods, the trade is exited at the nth period. Notice that the parameter nin FFMDPT has a
different role from its role in BBPT. In the FFMDPT strategy, the parameter nis used to calculate
the candidate entries BBU pper and BBLower and the exit point mavet. In FFMDPT, the entry
signals are the same as those in BBPT but the exit signal differs because the maximum duration
of any trade is equal to the rolling window size. Figure 20 in Appendix F illustrates the simulation
results of the BPPT strategy and the FFMDPT strategy for the 2004 SAP-Nikkei data using n= 20
and k= 2. Note that in this specific example, the return generated by the FFMDPT strategy is
about three percent greater than that of the BBPT strategy but this will not always be the case in
general. We illustrate specific differences between FFMDPT and BBPT with two detailed examples
in the section that follows.
5.8 BBPT versus FFMDPT: Two Examples
In what follows, we construct two simulations to illustrate how the different exit rules can effect the
relative performance of the BBPT and FFMDPT strategies. The first simulation ran through the
full year of 2004 using n= 20 and k= 2. In this simulation, the FFMDPT strategy outperforms
the BBPT strategy. The second simulation ran through the full the year of 2005 again using the
parameters n= 20 and k= 2. In this simulation, the FFMDPT strategy underperforms the BBPT
strategy. The plots displaying the results of the first simulation and second simulation are shown in
Appendix H on page 55 and 56 respectively. In what follows, we analyze specific trades associated
with the two simulations in detail.
Consider the 2004 simulation. The FFMDPT strategy outperforms the BBPT strategy by approxi-
mately three percent. Although it may not be obvious from the plot, although the three trades in
February, April and June generates similar returns in both strategies, the trades in the FFMDPT
strategy generate slightly larger returns than the corresponding trades in the BBPT strategy. This
is due to the fixed forecast creating larger vertical distance between the entry point and the exit
point. Therefore, conditional on yttouching or crossing the fixed forecast in the FFMDPT strategy,
the associated return will be larger than the return of the same trade in BBPT. Next, consider the
fifth trade in both strategies at the beginning of August. In the FFMDPT strategy, the trade has
a longer duration because it needs to reach the horizontal forecast before it exits. Consequently,
the FFMDPT August trade results in a small loss. On the other hand, the same August trade in
the BBPT strategy exits quickly because the moving average forecast is reached sooner than the
23
fixed horizontal forecast. Therefore, the BBPT trade exits earlier than the FFMDPT resulting in a
larger loss. In this particular simulation, because FFMDPT trades are held until the fixed forecast
is reached, this resulted in larger winning trades and smaller losing trades compared to the same
trades in BBPT resulting in relative overperformance of the FFMDPT strategy in 2004.
Next consider the 2005 simulation. The FFMDPT underperforms the BBPT strategy by approx-
imately 1.2 percent. The first trade in the BBPT strategy triggered at the beginning of February
generates a positive return but the same trade in FFMDPT generates a negative return because the
horizontal forecast is never reached so the trade is exited when it reaches the maximum duration.
Unfortunately the maximum duration is reached after the ytprocess has hit the moving average
and then decreased. The FFMDPT trade exits at a significant loss of almost 1% while the same
trade in the BBPT strategy generates a positive return. The rest of the trades in 2005 generate
approximately the same return in both strategies so the underperformance of the FFMDPT strategy
is essential due to the behavior of the FFMDPT trade at the beginning of February.
5.9 FFMDPT versus BBPT: An Optimized Simulation
Although the 2004 and 2005 simulations highlight the differences between BBPT and FFDMPT,
unfortunately the results are dependent on whether the asset pair being traded, namely the SAP-
Nikkei, exhibits mean reversion.15 There exist various methods for checking or testing its existence
historically, but whether the mean reversion behavior persists in the future is also uncertain. The
point is that any comparison of the effectiveness of pairs trading strategies is confounded with the
possibly unstable mean reverting behavior of the asset pair being traded. The 2004 and 2005 simu-
lations of the SAP-Nikkei Index pair assume that the traded asset pair, SAP-Nikkei, is cointegrated
and this assumption is critical to the performance comparison of the two strategies, BBPT and
FFMDPT. In fact, the FFMDPT strategy is even more dependent on mean reversion behavior be-
cause it requires that the ytprocess revert all the way back to the forecast at trade entry. In the
case of the BBPT strategy, a positive return can be generated by a trade even when the ytprocess
does not return to the trade entry forecast. Nevertheless, as long as we are aware that the mean
reversion issue makes the results less definitive, we can compare the performance of the respective
pairs trading strategies assuming that the SAP-Nikkei Index pair is mean reverting.
15The mean reversion assumption can be tested statistically by using the Two Step Engle-Granger test for cointe-
gration [15].
24
As explained earlier, the functionality of the parameter nin the FFMDPT strategy is different from
that of the parameter nin the BBPT strategy. Therefore using the same parameter combination
of nand kwhen comparing the two strategies is not necessarily correct. Given the difference in
functionality, it is more reasonable to view the parameter nas being a different parameter in each
strategy, namely nFF M DP T and nBBP T respectively. A more robust methodology for analyzing
strategy performance is to first find the optimal parameters nBBP T and nF F MD P T for the respective
pair strategies over some historical period called the in sample period. These optimized values can
then be used out of sample and the out of sample performance compared. Using R [14], this
methodology was implemented for the SAP-Nikkei index pair for each year from 2003-2011. Three
simulations were run for which the value of the parameter kwas 1,1.5 and 2.0. For each year, the
optimal values of nBBP T and nF F M DP T were found by searching over values from 10 to 50 in steps
of 1. These optimal values were then used in the following year and the performance in that year
was calculated. The results are shown in Tables I, II and III in Appendix I where 2003-4 denotes
that the in sample period was 2003 and the out of sample period was 2004.16 Table I indicates that
there is no clear performance difference for k= 1. In four of the eight years, BBPT outperformed
FFMDPT with the only large difference occurring in 2008 when FFMDPT outperformed BBPT by
almost ten percent. In Table II when k= 1.5 was used, the BBPT strategy significantly outperforms
the FFMDPT strategy. In fact, in six out of the eight years, the BBPT returns are larger and, in
2004, 2005 and 2007, more than five percent larger. Finally, for k = 2, the BBPT strategy again
outperforms the FFMDPT strategy in six out of the 8 years. In this case, the differences generally
hover around the 2% level.
In addition to the question of the asset pair possessing mean reverting behavior, another problem
with assessing relative performance using an optimization approach is the possibility of overfitting.
A current pitfall of the optimization approach is that the model may stick too closely to the data
over which the optimization was performed. Consequently, the model ends up learning irrelevant
details of the in sample data which leads to poor generalization when the parameters are used on the
out of sample data. A informal way of thinking about overfitting is that because such a fine search
was used to find the optimal value of the respective parameters in the in sample period, it might be
the case that a diamond was found in sample but corrodes quickly when used out of sample. There
are various methodologies in the literature that attempt to deal with the issue of overfitting but
these will not be discussed in this study. For an interesting discussion of overfitting and methods
for lessening its effect, the reader is referred to [25]. One possible way to reduce the amount of
overfitting is to optimize more frequently. For example, rather than optimizing the parameters over
16For the 2010-11 period only the first four months of data in 2011 was available.
25
one year and then calculating the performance in the following year, we could use a shorter in sample
and out of sample period such as three months.17 Another possible way to decrease the possibility of
overfitting is to reduce the parameter grid size search. For example, one could decrease the number
of candidate values of nF F DMP T and nBBP T by only allowing values from 1 to 50 in steps of say 5.
In summary, although the simulation results provide some evidence that the BBPT strategy out-
performs the FFMDPT strategy, there are issues that limit the strength of this evidence. The first
is the question of the existence of mean reversion behavior in the specific asset pair analyzed. The
second is the possibility of overfitting when we are optimizing over the parameters nF F M DP T and
nBBP T .
6 Conclusions and Future Research
This article contributed towards reconciling the relationship between Bollinger Bands and time series
models. First we showed that, aside from requiring a slight modification to the prediction bands, the
Bollinger Band components can be mapped exactly to the outputs of a classical regression model.
This mapping provides a statistical foundation for Bollinger Bands and eliminates the algorithmic
and ad hoc reputation it has had until now. Also, through the use of a series of relations linking
various time series models, we were able to show that Bollinger Bands can be viewed as a reasonable
approximation to the random walk plus noise (RWPN) state space model.
Next we proved an interesting result connecting the return-duration relationship in Bollinger Bands.
Although the result of theorem was proven with respect to Bollinger Bands, its importance lies in the
fact that it holds for all cases where one uses distance from a moving average as an entry signal and
reversion to the moving average as an exit signal. Then by modifying the underlying assumption
of the approximate RWPN model and using the return-duration result for moving averages, we
developed a variant of Bollinger Bands called Fixed Forecast Maximum Duration Bands (FFMDPT).
In the case of the SAP-Nikkei data from 2003-2010, FFMDPT generates returns that generally
underperform the BBPT strategy particularly when k= 1.5. At the same time, there are l imitations
to the strength of the result when doing such a strategy comparison and some possible remedies for
the limitations were discussed.
One future research possibility is to compare the FFMDPT and BBPT strategies but optimize
17Note that three months is the shortest sample period that could be used because the optimization uses a grid
from n = 10 to n = 50. Since we want the n=10 simulation and the n=50 simulation to start at the same time, a
minimum of fifty data points are required to calculate the first moving average.
26
over the parameters nand kjointly. Although it was shown that the return-duration proof was
independent of k, the entry signal in FFMDPT and BBPT is still dependent on k. Therefore,
optimizing the two parameters nand kjointly may lead to different results. Also, it would be
useful if additional pairs were investigated so that the performance results were not specific to the
SAP-Nikkei Index pair. Tests for cointegration of the pairs could be done in order to ensure that
only pairs that were cointegrated historically were included in the performance comparison.
Another research area would be to take the standard Bollinger Band pairs trading strategy and
use the theorem result to change the standard exit rule by exiting the trade whenever the trading
duration is equal to the rolling window size. Although this type of exit rule does mean that one has
given up the chance to gain back some of the loss from the losing trade, could possibly utilize capital
more efficiently by restricting live trades to only those which have a chance of being profitable.
Clearly, this type of strategy implies that the Bollinger Band parameters, nand kmay need to be
changed also.
Finally, another research direction would involve implementing a statistically consistent approach
for capturing reversion behavior. Bollinger Bands are still only an approximation to a random walk
plus noise model because the algorithm is such that observations with an age greater than the rolling
window size ago are given a weight of zero during estimation. Rather than using Bollinger Bands to
implement the pairs trading strategy, the Kalman filter approach could be used directly by utilizing
the recursive updating equations. Implementing the Kalman filter approach would eliminate the
parameter nbut would necessitate estimating the observation variance and the system variance (i.e.
q). The possibility of overfitting would no longer be an issue because there would no longer be a
need for optimization over the rolling window size parameter n. At the same time, one would still
need to do an investigation into what the optimal re-estimation frequency for the RWPN variances
should be.
27
7 Acknowledgements
The author wishes to acknowledge Professor Chanseok Park, Clemson University, Dept of Mathe-
matical Sciences and Louis Kates, GKX Associates for many helpful discussions that improved the
quality of this article. Thanks also goes to John Bollinger of Bollinger Capital Management for
providing the SPX-Nikkei data and also the impetus for this study when we met at the R-Finance
conference in May, 2011. Finally, an acknowledgment goes to my advisor, Professor Keith Ord,
Georgetown University, McDonough School of Business for spurring my initial interest in time series
models.
References
[1] John Bollinger, Bollinger On Bollinger Bonds, McGraw-Hill, 2002.
[2] URL http://www.bollingerbands.com.
[3] http://www.investopedia.com/articles/trading/07/bollinger.asp
[4] Butler M, Kazakov D, Particle Swarm Optimization of Bollinger Bands, Proceedings of the 7th
international conference on Swarm intelligence, 2010
[5] Ni M, Zhang C, An Efficient Implementation of the Backtesting of Trading Strategies, Springer-
Verlag, Berlin, pp. 126-131 (2005).
[6] Oleksiv, Statistical Analysis to the Optimization of the Technical Analysis Trading Tools: Trading
Band Strategies, Ph.D. Thesis, 2010
[7] Chande, Tushar S., Adapting Moving Averages to Market Volatility, Technical Analysis of Stocks
and Commodities, pp. 26-35 (March 1992)
[8] Tilley, D.L., Moving Averages with Resistance and Support, Technical Analysis of Stocks and
Commodities, pp. 62-87 (Sep 1998)
[9] Fama, E.F. and MacBeth, J.D. Risk, Return and Equilibrium: Empirical Tests Journal of Polit-
ical Economy, pp, 607-636 (1973).
[10] Eric Zivot and Jiahui Wang, Modeling Financial Time Series with Splus, Springer-Verlag, New
York, 2003.
[11] Jazwinski, A., Stochastic Processes and Filtering Theory, Academic Press (1970).
28
[12] Andrew C. Harvey, Forecasting structural time series models and the Kalman filter, Cambridge
University Press, 1992.
[13] Julian Faraway, Linear Models with R, Chapman & Hall/CRC, 2005.
[14] R Development Core Team (2011), R: A Language and Environment for Statistical Computing,
R Foundation for Statistical Computing, Vienna Austria. ISBN 3-900051-0, URL http:://www.R-
project.org/.
[15] Daniel Herlemont, Pairs Trading, Convergence Trading, Cointegration, URL
http://www.yats.com/doc/cointegration-en.pdf.
[16] Robert Goodell Brown, Statistical forecasting for inventory control, McGraw-Hill, 1959.
[17] Robert Goodell Brown, Smoothing, forecasting and prediction of discrete time series Prentice
Hall, EngleWood Cliffs, New Jersey, 1959.
[18] Hyndman, Koehler, Ord and Snyder, Forecasting with Exponential Smoothing The State Space
Approach, Springer-Verlag, Berlin HeidelBerg, 2008.
[19] George Box, Gwilym Jenkins and Gregory Reinsel, Time Series Analysis, Forecasting and Con-
trol, Prentice Hall, Englewood Cliffs, N.J, 1994.
[20] John F. Muth Optimal Properties of Exponentially Weighted Forecasts, Journal of the American
Statistical Association, 55, No. 290, (1960) 299–306.
[21] Chris Chatfield, Time Series Forecasting, Chapman & Hall/CRC, 2000.
[22] Mark Leeds, Error Structures for Dynamic Linear Models: Single Source versus Multiple Source
Ph.D. thesis, Pennsylvania State University, 2000.
[23] Kalman, R. A New Approach to Linear Filtering and Prediction Problems Transactions ASME
Journal of Basic Engineering 82, 35-45, (1960).
[24] Harrison, P. and Stevens, C. , Bayesian Forecasting (with discussion), JRSSB, 38, 205-247,
(1976).
[25] Andrew Moore, Cross Validation For Detecting and Preventing Overfitting.
http://www.autonlab.org/tutorials/overfit10.pdf
29
Mark Leeds is a statistical consultant in New York City specializing in financial econometrics
and time series analysis. He received a B.S in Operations Research from Columbia University in
1988; the M.S. in Operations Research and Statistics from Rensselaer Polytechnic University in
1990; and a Ph.D. in statistics from Pennsylvania State University in 2000. His work focuses on
using econometric methodologies to uncover anomalies in the capital markets. His email address is
markleeds2@gmail.com.
30
Appendices
A Illustrating the Bollinger Band Construction
0 20 40 60 80 100
−6 −4 −2 0 2 4
time
BOLLINGER BANDS
Moving Average Window = 20 and Width Multiplier = 2
Random Walk
Mave
BLower
BUpper
Figure 6: A simple example that illustrates the construction of Bollinger Bands. A random walk
series was generated initially. The green center line is the n = 20 day moving average of the random
walk series and BBupper and BBlower are k = 2 standard deviations above and below the center
line.
31
B Illustrating the Use of Bollinger Bands in Pairs Trading
Figure 7: The use of Bollinger Bands in a pairs trading strategy. The center line in the figure is
the n = 20 day moving average and BBupper and BBlower are k = 2 standard deviations above
and below the center line. The crossing of BBupper from below (i.e. red arrow) or BBlower from
above (i.e. green arrow) triggers a short trade (i.e. red arrow) or long trade (i.e. green arrow ).
The position is held (i.e. line extends) until the series reverts to the center line. resulting in either
a winning trade (i.e. green line) or losing trade (i.e. red line). Note that Z = SAP Index and X =
Nikkie Index so that the plotted series is y=ln(Pz/Px)t.
32
C BBPT Winning Trades Have Shorter Durations Than Los-
ing Trades
Jan Mar May Jul Sep Nov Jan
1.30 1.34 1.38 1.42
Log Price Ratio
Average Winner Duration: 8.33
Average Loser Duration: 24.4
window = 20 , multiplier = 1
20040102 − 20041231
Jan Mar May Jul Sep Nov Jan
1.24 1.28 1.32
Log Price Ratio
Average Winner Duration: 9.8
Average Loser Duration: 23
window = 20 , multiplier = 1
20070103 − 20071231
Jan Mar May Jul Sep Nov Jan
1.30 1.34 1.38 1.42
Log Price Ratio
Average Winner Duration: 12
Average Loser Duration: 39.67
window = 30 , multiplier = 1
20040102 − 20041231
Jan Mar May Jul Sep Nov Jan
1.24 1.28 1.32
Log Price Ratio
Average Winner Duration: 10.12
Average Loser Duration: 27.33
window = 30 , multiplier = 1
20070103 − 20071231
Jan Mar May Jul Sep Nov Jan
1.30 1.36 1.42
Log Price Ratio
Average Winner Duration: 8.6
Average Loser Duration: 20.5
window = 20 , multiplier = 2
20040102 − 20041231
Jan Mar May Jul Sep Nov Jan
1.24 1.28 1.32
Log Price Ratio
Average Winner Duration: 9.33
Average Loser Duration: 25.67
window = 20 , multiplier = 2
20070103 − 20071231
Jan Mar May Jul Sep Nov Jan
1.30 1.36 1.42
Log Price Ratio
Average Winner Duration: 9
Average Loser Duration: 64
window = 30 , multiplier = 2
20040102 − 20041231
Jan Mar May Jul Sep Nov Jan
1.24 1.28 1.32
Log Price Ratio
Average Winner Duration: 16
Average Loser Duration: 24.5
window = 30 , multiplier = 2
20070103 − 20071231
Figure 8: The results of the Bollinger Band pairs trading strategy of the SAP versus the Nikkei
simulated over different time periods using various values of the parameters n and k. The average
durations indicate that the nature of the Bollinger Band methodology is such that winning trades
have an average duration consistently shorter than that of losing trades.
33
D An Illustration of the Weighted Age Relation λ=2
n+1
Figure 9: The figure shows how similar Bollinger Bands are to the EWMA when the weighted age
is matched using the relation λ=2
n+1 The approximation is quite reasonable with respect to the
center line but not quite as close with respect to BBU pper and BBLower
.
34
E A Proof That Any Trade In A Bollinger Band Pairs Trad-
ing Strategy Has a Non-Negative Return If and Only If
The Total Duration Is Less Than Or Equal To n Where n
Is The Rolling Window Size.
Before going into the details of the theorem, we should point out that the theorem is applicable to a
larger class of models than just the BBPT strategy. Since the theorem result is independent of the
band width multiplier, k, it is applicable to any trading algorithm where the log price of the traded
asset being some threshold distance away from its moving average triggers the entry signal and the
subsequent crossing back of the log price of the traded asset through the moving average triggers
the exit signal. A well known example of such a strategy is the BBPT strategy but there may be
other propietary moving average type strategies that meet this criterion also. One obvious example
would be where Bollinger Bands are used on a single stock itself rather than a pair of stocks. In
that case, ytwould denote the price itself rather than the price ratio but the result of the theorem
would still apply.
Since Theorem 1 requires nine lemmas before it can be proven, details about the notation used and
assumptions made are provided below. The two figures that then follow make the assumptions and
notation described below more tangible. Figure 10 on page 37 illustrates how the endpoints of the
moving average duration are defined. Figure 11 on page 38 uses a long trade as an example to
illustrate other assumptions.
1. ytdenotes the price ratio of the paired asset at time tand the term “Bollinger Band exit rule”
refers to the rule where one exits from the trade when the log(priceratio) at time tcrosses
through or is equal to the moving average of the log(priceratio) at time t.
2. tdenotes the entry time of a trade and t∗∗ denotes the time period associated with a trade
duration equal to the window size. For example, if the window size nwas equal to 10 and a
trade was entered into at t= 15, then t∗∗ =t+n1 = 24. Therefore, a trade entered into
at the beginning of t=tand exited from at the end of t=t∗∗ would be viewed as having
trade duration of 10 periods.
3. We assume a one period time lag between the entry signal time and trade entry time purely for
clarity purposes. We assume that slippage18 does not occur during the one unit time period
18Slippage during entry is defined as price erosion due to the delay between when a trade entry is signaled and
35
between the entry signal and the trade entry. In fact, with respect to entry slippage, we go
further than this by assuming that not only is there never price erosion during entry but also
that there is a non-zero infinitesimal price improvement19. For example, if a long entry signal
is triggered by a price say log(yt1), then we will assume that the actual entry occurs at a
price log(yt1)δwhere δ > 0. This additional price improvement assumption is only needed
so that edge cases do not need to be considered in the steps of the proof. Also, the size of δ
does not affect the derivation of the result as long as it positive so it can be assumed that δis
infinitesimally small.
4. One can think of the discrete time block denoted by tas having a length of one period and
an n period window as being a set of these nblocks stacked adjacently to each other. The
convention associated with a given window with endpoints tand t∗∗ is that entries can occur
at t, t+1, t+2, t+3,...,t∗∗ and exits can occur at t+1, t+2, t+3, t+4,...,t∗∗+1. If we
say that there was an exit signal at time t, implicit in this statement is that the first possible
exit time is the end of time twhich is equivalent to the time right before the beginning of
period t=t+ 1. Note that this does not imply the possibility of slippage during exit because
it is assumed that the price process of log(yt) is discrete so that the price does not change
during the time block associated with the period labeled t. Simply speaking, we assume that
the discrete price process is such that the value of log(yt) at the beginning of the time block
denoted by tis the same the value of log(yt) at the end of the same time block t. We then
assume an instantaneous change in log(yt) just as the beginning of the new time block t+ 1 is
reached. Obviously, this is an over-simplification of how the actual price process evolves but for
purposes of clarity we make this assumption. The main point is that assuming an exit always
occurs at the end of a time block is equivalent to assuming that there is an instantaneous exit
whenever an exit signal is observed and is therefore equivalent to assuming zero slippage on
exit.
5. Although the proof relies on the use of log(priceratio) as the scale on the vertical axis, this is
not a restrictive assumption because the entries and exits calculated can always be transformed
back to price ratio space. The relation that results from using the log(priceratio) on the vertical
axis will be stated as a lemma. Consequently, before proving the main result, we need to prove
this lemma and various other lemmas which will be used in the proof of Theorem 1.20
when the trade is actually entered. Slippage during exit is defined analogously.
19The price improvement on entry assumption would not be necessary if we assumed that there was no time lag
between the signal and the entry. By assuming a one period time lag, we separate the defined window from the signal
price and provides more clarity when explaining the steps of the proof
20All of the lemmas contain arguments under the assumption that the BBPT strategy generated a long trade. This
36
The Notion Of Time In The Proof Of Theorem One
t1t2t3t4t5t*1t*
WINDOW
BEGIN
t8t9t10 t11 t12 t13 t14 t15 t**
WINDOW
END
window size = n = 10
Long Position Is Entered Into At The Beginning of t=t** .
Given The Entry Point, The End Points of The Window Are Defined As
The Beginning of t=t* And The End of t=t** Respectively.
NOTE: Each time block represents one time period.
We assume that the log price yt stays constant during a time block
and only changes instantaneously at the beginning of each new time block.
log(yt)
signal price
entry price
Figure 10: Illustrating how the time periods and window duration endpoints are constructed in the
proofs of the lemmas and the main theorem. A long trade is triggered at t=t1 and entered into
at the beginning of t=t. The first possible exit is at the end of t=twhich is equivalent to the
beginning of t=t+ 1. The trade duration is equivalent to the rolling window size = 10 when the
trade exit occurs at the end of t=t∗∗ which is equivalent to the beginning of the period t=t∗∗ + 1.
is without loss of generality because symmetric arguments hold if the assumption was that a short trade had been
generated.
37
Notation and Assumptions Used In The Proof
Of Theorem One
t1t2t3t4t5t*1t*
WINDOW
BEGIN
t7t8t9t10 t11 t12 t13 t14 t** t16
WINDOW
END
t17 t18
EXIT
t19
log(yt*1)
log(yt*)
log(yt18)
window size = n = 10 and trade duration = 12 log(yt)
BBUpper
mave of log(yt)
BBLower
signal price
entry price
exit price
NOTE: Given The Entry Point, The Moving Window Duration Is Defined As
Starting At The Beginning of t=t* And Ending At The End of t=t** .
{
Positive Slippage
Figure 11: Illustrating the various notation and assumptions used in the proofs of the various
lemmas and main theorem. A long trade is triggered at t=t1 and entered into at t=t. The
total trade duration is longer than the moving average window duration because the actual trade
duration = 12 and the moving average window duration = 10. The overall log return of the trade
is negative.
38
E.1 Preliminary Lemmas
Lemma 1. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Suppose that the
Bollinger Band moving exit rule is ignored in that the position is held for a fixed n = rolling window
size periods. If the overall log return of the paired asset over the period from the beginning of t=t
to the end of t=t∗∗ is ǫwhere −∞ < ǫ < , then the following relation holds:
log(yt∗∗ ) = log(yt) + ǫ
Proof. Consider the interval [t, t∗∗]. Since the return over this interval is ǫ, by the additiviy of log
returns this implies that Pt=t∗∗
t=t+1 tlog(yt) = ǫ. But the terms in the sum represent a telescoping
series which reduces to log(yt∗∗ )log(yt) so that log(yt∗∗ ) = log(yt) + ǫ. Although the relation is
obvious, it will turn out to be useful when proving various other lemmas that follow.
A sample plot of log prices is shown in Figure 12 below and gives the intuition behind Lemma 1.
The Log Return Over Any Time Window Is Only A Function
Of The First Observation and Last Observation In The Window
t*t2t3t4t5t6t7t8t9t**
log(yt*)
log(yt*)+ε
log(yt**)=log(yt*)+ε
ε
Figure 12: Illustrating that only the first price observation and last price observation are needed
to calculate the log return over any time window.
39
Lemma 2. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Assume that the
Bollinger Band exit rule is the exit rule. Then, if ytremains constant from the beginning of t=t
up until the end of t=t∗∗ , then mavet∗∗ =log(yt∗∗ ) = log(yt)and the long trade will be exited at
the end of t=t∗∗.
Proof. Notice that at the end of period t=t∗∗ , aside from log(yt) itself, mavet∗∗ will not contain any
of the points contained in the window when t=t∗∗ is the right endpoint of the window. Therefore
only points that are equal to log(yt) will be contained in the calculation of mavet∗∗ . Obviously,
the average of the n log(yt) values in the window at t∗∗ =log(yt). Therefore mavet∗∗ =log(yt).
But log(yt) did not change over the time between tand t∗∗ so log(yt∗∗ ) = log(yt) which means
mavet∗∗ =log(yt∗∗ ) so that a trade exit is triggered. Therefore the trade is exited with a log return
equal to zero since log(yt) was constant over the trade duration. An illustration of this argument is
provided in Figure 13 on page 41.
Lemma 3. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Also, suppose that
the Bollinger Band exit rule is ignored in that the position is held for a fixed n = rolling window size
periods. If the overall log return of the paired asset over the period from the beginning of t=tto
the end of t=t∗∗ is 1.0×ǫwhere ǫ > 0, then the following relation holds:
log(yt)ǫ < mavet∗∗ < log(yt)
where mavet∗∗ =Pt=t∗∗
t=tlog(yt)
n
Proof. We can obtain the upper and lower bounds for mavet∗∗ by considering two extreme scenarios
in which the overall log return of the trade is 1.0×ǫ.21 This two extreme scenario proof methodology
is justified because any other realization where the return over the window is equal to 1.0×ǫis a
realization that also maintains the same upper and lower bounds.
First consider scenario one where log(yt) moves to the level (log(yt)ǫ) at the beginning
of t=t+ 1 and then remains constant after that until the end of t=t∗∗ is reached.22
21Notice that this proof is essentially assuming a discrete process for log(yt) which is consistent with the original
assumption that ytprocess only changes at the end of each period t.
22Note that the log return over the window in scenario one is 1.0×ǫ.
40
Long Trade Exit Behavior When yt Is Constant
Window Size: n = 10
log(yt*)
log(yt)
BBUpper
mave of yt
BBLower
signal price
t1t2t3t4t5t6t7t8t9t*1t*t12 t13 t14 t15 t16 t17 t18 t19 t**
WINDOW
BEGIN WINDOW
END
mavet** = log(yt**)
Figure 13: Illustrating that a trade will exit at the end of t=t∗∗ when the price remains constant
over n = rolling window size periods.
By definition, mavet∗∗ =log(yt)
n+Pt=t∗∗
t=(t+1)(log(ytǫ))
n. But since log(yt) = (log(ytǫ)) at
t=t+ 1 and remains at that level after t=t+ 1, the previous expression for mavet∗∗ re-
duces to 1
n×log(yt) + n1
n×(log(yt)ǫ). But since log(yt)>(log(yt)ǫ), this implies that
mavet∗∗ =1
n×log(yt) + n1
n×(log(yt)ǫ)> log(yt)ǫ.23 Therefore the opened lower bound
23One can think of this opened lower bound as the discrete counterpart of the integral Rt∗∗
t(log(yt)ǫ)dt
t∗∗twhere
(log(yt)ǫ) is constant over the period t=tto t=t∗∗ .
41
Extreme Scenario I
t*t2t3t4t5t6t7t8t9t**
log(yt*)
log(yt*)ε
log(yt*)− ε < mavet**
ε
log(yt)
mavet**
Extreme Scenario II
t*t2t3t4t5t6t7t8t9t**
mavet** <log(yt*)
log(yt*)
log(yt*)ε
ε
log(yt)
mavet**
Figure 14: Illustrating the two extreme scenarios used in the proof of Lemma 3: (a) the LHS
inequality and (b) the RHS inequality.
for mavet∗∗ is log(yt)ǫ.
Next consider scenario two where log(yt) is constant over the window and then moves to (log(yt)ǫ)
just as the beginning of t=t∗∗ is reached. We can again use the discrete analog of an integration
argument to show that mavet∗∗ <(n×log(yt))
n=log(yt).24 Therefore the opened upper bound for
mavet∗∗ is log(yt). We illustrate the previous scenario arguments in Figure 14 above.
It will turn out to be convenient to re-write the relation in Lemma 3 so that the right hand side is
an equality. Therefore, we re-state Lemma 3 in the following equivalent way:
log(yt)ǫ1< mavet∗∗ =log(yt)ǫ2(25)
where mavet∗∗ =Pt=t∗∗
t=tlog(yt)
n,ǫ1>0, ǫ2>0 and ǫ2< ǫ1.
In the steps of the proof of Lemma 3, since the rolling window size = n periods was used as the
holding period, it was unnecessary to consider the values of log(yt) for t < tbecause, by the time
24One can think of this opened upper bound as the discrete counterpart of the integral Rt∗∗
t(log(yt)dt
t∗∗twhere log(yt)
is constant over the period t=tto t=t∗∗.
42
t=t∗∗, these values are not contained in the rolling window with right endpoint t=t∗∗ . Fortunately,
if we make one extra assumption about the values of log(yt) for a specific period [t, t), then the
lower bound in Lemma 3 can be strengthened to hold for any t=twhere t< t< t∗∗, rather than
just t∗∗, resulting in Lemma 4
Lemma 4. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Let t=tbe any time
point between t=tand t=t∗∗ such that t< t< t∗∗. Again suppose that the Bollinger Band
moving average exit rule is disregarded in that the position is held for a fixed (tt) = nperiods.
If the overall log return of the paired asset over a period from the beginning of t=tto the end of
t=tis 1.0×ǫwhere ǫ > 0and log(yt)>(log(ytǫ)tsuch that (tn1) t < t, then
the following relation holds:
log(yt)ǫ < mavet(26)
where mavet=Pt=t
t=(tn1) log(yt)
n
Proof. We can use the same integration argument used for the lower bound result of Lemma 3 except,
in this case, the integral used to derive the lower bound will now contain the upper limit trather than
t∗∗. Note though that, since we are no longer assuming that the return is calculated over the window
with t=t∗∗ as the right end point, mavetwill still contain values of log(yt)t(tn1) t < t.
Therefore, the extra condition on log(yt) in [(tn1), t) is required in order to ensure that the
same integration argument will still hold for the lower bound. This is because the integration starts
from t=t. Therefore, for the relation to be true when part of the interval is to the left of the
window and therefore not included in the integral, the extra condition is required for the log(yt)
values in that part of the interval.
Lemma 5. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Again suppose that
the Bollinger Band moving average exit rule is disregarded in that the position is held for a fixed n
= rolling window size periods. If the overall log return of the paired asset over a period from the
beginning of t=tto the end of t=t∗∗ is +1.0×ǫwhere ǫ > 0, then the following relation holds:
log(yt)< mavet∗∗ < log(yt) + ǫ
where mavet∗∗ =Pt=t∗∗
t=tlog(yt)
n
43
Extreme Scenario I
t*t2t3t4t5t6t7t8t9t**
log(yt*)<mavet**
log(yt*)
log(yt*)+ε
ε
log(yt)
mavet**
Extreme Scenario II
t*t2t3t4t5t6t7t8t9t**
mavet** <log(yt*)+ε
log(yt*)
log(yt*)+ε
ε
log(yt)
mavet**
Figure 15: Illustrating the two extreme scenarios used in the proof of Lemma 5: (a) the LHS
inequality and (b) the RHS inequality.
Proof. The proof uses similar integration arguments to those used in Lemma 3 so the details will
not be included here. We illustrate the previous scenario arguments in Figure 15 above.
Just as was the case with Lemma 3, since the rolling window size = n periods was used as the holding
period in the proof of Lemma 5, it was unnecessary to consider the values of log(yt) for t < tin the
proof because, by the time t=t∗∗ is reached, these values are not contained in the rolling window
with right endpoint t=t∗∗ . Again, if we make one extra assumption about these values, then the
upper bound in Lemma 5 can be strengthened to hold for any t=twhere t< t< t∗∗ rather than
just t∗∗, resulting in Lemma 6.
44
Lemma 6. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Let t=tbe any time
point between t=tand t=t∗∗ such that t< t< t∗∗. Again suppose that the Bollinger Band
moving average exit rule is disregarded in that the position is held for a fixed (tt) = nperiods.
If the overall log return of the paired asset over the period from the beginning of t=tto the end of
t=tis +1.0×ǫwhere ǫ > 0and log(yt)< mavettsuch that (tn1) t < t, then the
following relation holds:
mavet< log(yt) + ǫ(27)
where mavet=Pt=t
t=(tn1) log(yt)
n
Proof. We can use the same integration argument used for the upper bound result of Lemma 5
except, in this case, the integral used to derive the upper bound will contain the upper limit t
rather than t∗∗ . Note though just as was the case in Lemma 4, since we are no longer assuming
that the return is calculated over the window with t=t∗∗ as the right end point, mavetwill still
contain values of log(yt)t(tn1) t < t. Therefore, the extra condition on log(yt) in
[(tn1), t) is required in order to ensure that the same integration argument will still hold for
the upper bound. Since the integration starts from t=t, for the relation to be true when part of
the interval is to the left of t=tand therefore not included in the integral, the extra condition is
required for the log(yt) values in that part of the interval.
The next lemma is stated below.
Lemma 7. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Let mavetdenote
the moving average of the paired asset at time t. Then the maximum possible overall log return that
can be generated by the trade using the Bollinger Band rule is less than (mavetlog(yt)) = ǫ1>0
which is the initial difference between mavetat entry and log(yt)at entry.
Proof. We will assume that, with the Bollinger Band exit rule ignored, the aforementioned trade
generates an overall log return of +1.0ǫ1from the beginning of t=tto end of t=twhere ǫ1>0.
Also, we will assume that tt=nperiods. Then we will show that if one had used the Bollinger
Band exit rule to exit from the same trade, the overall log return generated would be less than ǫ1.
45
This will complete the proof because the overall log return assumed, ǫ1, is equal to the difference
between the moving average at entry and the log price at entry.
So assume that the long trade position that was initiated at t=twas held for nperiods to the
end of t=twithout regard to the Bollinger Band exit rule and that it generated a return of
+1.0×ǫ1=mavetlog(yt). Notice that the conditions of Lemma 6 are met because, since a
long trade was generated at t=t1, it should be the case that log(yt)< mavettsuch that
(tn1) t < t25. Therefore, by Lemma 6, we know that mavet< log(yt) + ǫ1for any t> t
and t<=t∗∗. But by definition, log(yt) = log(yt) + ǫ1which means that mavet< log(yt). But
this means that mavetmust have decreased from its original value of mavetbecause otherwise it
would be equal to log(yt) since log(yt) + ǫ1=mavet. But if mavetdecreased from its original
value of mavet, then this implies that log(yt) had to have crossed it at some earlier period t′′ < t
and, since log(yt) increased from the beginning of t=tto the end of t=t, the amount that
log(yt) had to increase in order to cross through mavethad to be less than (mavetlog(yt)) = ǫ1.
Since twas arbitrary, this result is true for any twhere t< t <=t∗∗ , so we have shown that
when using the Bollinger Band exit rule, the overall log return generated by any trade is less than
(mavetlog(yt)) = ǫ1. An illustration of this argument is provided in Figure 16 on page 48.
Finally we need to state and prove Lemma 8.
Lemma 8. Let ytdenote the price ratio of a paired asset at time tand consider a window of size
n whose endpoints are denoted as tand t∗∗. Assume that a long trade has been generated at the
beginning of t=t1so that the entry takes place at the beginning of t=t. Assume that the
Bollinger Band exit rule is being used. Then, any long trade with an overall non-negative log return
that has reached the end of t∗∗ will be exited at the end of t=t∗∗. Conversely, any trade with an
overall negative log return that has reached the end of t=t∗∗ will not be exited at the end of t=t∗∗.
Proof. Recall that Lemma 2 says that if log(yt) is constant over the full window from the beginning
of t=tto the end of t=t, then there will be an exit at the end of t=tbecause log(yt∗∗ )
and mavet∗∗ will be equal. We again will use two extreme scenarios along with Lemma 2 in order
25In order to be absolutely certain that log(yt) is less than mavettsuch that (tn1) t < t, we need
to assume that a separate long trade was not completed during these (n+ 1) time periods. If a separate long trade
was completed during this time period and this trade was generated by a sudden large and sharp downward spike in
log(priceratio) and exited due to another sudden large and sharp upward spike in log(priceratio), then it possible
that the condition will not hold. Although the probability of this event is quite small, for this reason we need to make
the assumption that a separate long trade was not completed during the previous (n+ 1) periods.
46
to prove Lemma 8. Figure 17 on page 49 provides graphical representations of the two extreme
scenarios
First consider scenario one where log(yt) is constant from the beginning of t=tto the end of
t=t∗∗ 1 and then increases an infinitesimally small amount equal to +1.0×ǫat the beginning
of t=t∗∗. This implies that the log return over the full window is +1.0×ǫ. Note that for any
given unit period increase in log(yt), by definition the moving average mavetalways increases by a
smaller amount. This fact along with Lemma 2 implies that log(yt) will cross mavetfrom below at
t=t∗∗ and the trade will exit at the end of t=t∗∗ .
Next consider scenario two where log(yt) is constant from the beginning of t=tto the end of
t=t∗∗ 1 and then decreases an infinitesimally small amount equal to 1.0×ǫat the beginning of
t=t∗∗. This implies that the log return over the full window is 1.0×ǫ. Note that for any given
decrease in log(yt), the moving average mavetalways increases by a smaller amount in absolute
value. This fact along with Lemma 2 implies that log(yt) will not cross mavetfrom below at t=t∗∗
and therefore the trade will not exit at the end of t=t∗∗ .
47
Lemma Seven
t1t2t3t4t5t6t7t8t9t10 t11 t12 t14 t15 t18 t19 t20
t*t’’ t
{
ε1} ε1
mavet*>mavet
log(yt*)
log(yt)
log(yt’’)
} ε2
Entry Point of Long Trade Occurs At Beginning of t=t* Denoted By
Assume That Initial Distance From yt* To mavet* = ε1
Eventually By The Beginning of t=t The Return To The Position is ε1
But mavet < mavet*
So This Means That yt Crossed Through mavet At Some t < t Denoted By
Therefore The Return Due To The Bollinger Exit ε2 Has To Be Less Than ε1
But The Choice of t=t Was Arbitrary
This Implies That Result Has To Be True For Every t where t* < t <= t**
log(yt)
BBUpper
mave of log(yt)
BBLower
entry price
exit price
Figure 16: Illustrating That The Maximum Log Return of A Bollinger Band Trade Is Always Less
Than The Initial Difference Between The Moving Average At Entry And The Log(PriceRatio) At
Entry.
48
Extreme Scenario I
t*t2t3t4t5t6t7t8t9t**
mavet** <log(yt**)
log(yt*)
log(yt*)+ε ε
log(yt)
mavet**1
mavet**
Extreme Scenario II
t*t2t3t4t5t6t7t8t9t**
mavet** >log(yt**)
log(yt*)
log(yt*)ε ε
log(yt)
mavet**1
mavet**
Figure 17: Scenario I: An Infinitesimal Positive Return Right Before t=tGuarantees An Exit At
The End Of t=t∗∗. Scenario II: An Infinitesimal Negative Return Right Before t=tGuarantees
A Non-Exit At The End Of t=t∗∗ .
Given the various lemmas, we can prove Theorem 1. We repeat the theorem statement here.
Theorem 1. Assume that the rolling window size in the BBPT strategy = n, the band width mul-
tiplier = k and that a long trade is generated at t=t1. so that the entry takes place at the
beginning of t=t. Then the overall log return of this trade using the Bollinger Band exit rule
is non-negative if and only if the duration of the trade is less than or equal to n; i.e. the trade is
exited at a time tless than or equal to t∗∗ =t+n1. This result is independent of the bandwidth
multiplier parameter k.
49
Proof. First we prove the if part of Theorem 1 which means that we need to show that if the pair
trade has a non-negative overall log return, then the total trade duration has to be less than or equal
to nwhere nis the rolling window size. First, assume that the generated trade is exited at the end
of some time t=t. Clearly, if the trade has a non-negative overall log return, then this implies that
log(yt)log(yt)>= 0 where tis the exit time of the trade. So let us prove the if part of Theorem
1 by contradiction: We will assume that log(yt)log(yt)>= 0 (i.e. a non negative overall total
log return from entry to exit ) and that t> t∗∗ so that the duration of the trade is greater than n.
Then we will show that these assumptions lead to a contradiction.
In order to visualize the argument that follows , a long trade example is provided in Figure 18 on
page 51. First of all, by assumption, the trade duration is greater than nwhich means that, at
the beginning of t=t∗∗ ,mavet∗∗ must have been greater than log(yt∗∗ ) because, if it was not,
then based on the Bollinger Band exit rule, the trade would have exited at the end of t=t∗∗.
Therefore mavet∗∗ > log(yt∗∗ ) at the beginning of t=t∗∗ . Also, by Lemma 8, we know that the
total return from the beginning of t=tto the end of t=t∗∗ has to be negative because otherwise
the trade would have exited at the end of t=t∗∗ . Therefore, we know that log(yt∗∗ )< log(yt). So
let us assume that the overall log return from the beginning of t=tup to the end of t=t∗∗ is
1.0×ǫ1where ǫ1>0. Note that, given the latter assumption, equation (25) in Lemma 3 implies
that log(yt)ǫ1< mavet∗∗ =log(yt)ǫ2where ǫ2< ǫ1and ǫ1>0 and ǫ2>0.
Now, since we know that the generated trade has not exited by the end of t=t∗∗, we can suppose
that we are now sitting at the beginning of t=t∗∗ and can define a new time called the shifted time,
tshift , as tshif t =t(t∗∗ 1) so that the beginning of tshif t = 1 corresponds to the beginning of
t=t∗∗. Now, since the trade has not exited at the end of t=t∗∗ and given equation (25) in Lemma 3,
we can modify our perspective by imagining that we are sitting at the beginning of tshif t = 1 and
have just entered a new Bollinger Band trade with the entry point equal to the value of log(yt∗∗ ),
namely log(yt)ǫ1, and the exit point equal to mavet∗∗ , namely log(yt)ǫ2. But, by Lemma 7,
the BBPT strategy is such that no trade in BBPT can ever generate more return than the original
distance between its entry point, log(yt∗∗ )ǫ1, and its initial exit point, mavet∗∗ =log(yt)ǫ2.
Now, at the beginning of tshif t = 1, this difference equals (log(yt∗∗ )ǫ2)(log(yt∗∗ ǫ1) = ǫ1ǫ2.
Therefore, an opened upper bound for the log return of the trade going forward from the beginning of
tshift = 1 is ǫ1ǫ2. But recall that by assumption, log(yt) has decreased by ǫ1from the beginning of
t=tup to the end of time t=t∗∗ so the log return of ytduring that period is 1.0×ǫ1. Therefore,
by the additivity of log returns, this implies that the maximum possible overall log return of the
trade is < ǫ1ǫ2ǫ1=1.0ǫ2. But, from Lemma 3, ǫ2>0 so that 1.0ǫ2<0. But this
means that the trade has to have a negative overall log return which is a contradiction because we
50
The If Part Of Theorem One
t*t2t3t4t5t6t7t8t** 1t** t11 tt13 t14 t15 t16 t17 t18 t19 t20 t21
log(yt*)
ε1
ε2
log(yt*)ε1
log(yt*)ε2
tshift =1
}
(ε1ε2)
max overall log return = − ε1+( ε1 ε2 )= − ε2<0contradiction
long trade at t=t* : n = 10 and trade duration = 12 log(yt)
BBUpper
mave of log(yt)
BBLower
entry price
exit price
Figure 18: Illustrating that a trade with a non-negative log return cannot have a duration greater
than the rolling window size n.
assumed at the outset of the proof that the trade had a non-negative overall log return. Therefore
we have proven the if part of Theorem 1 by contradiction.
We still need to prove the only if part of Theorem 1 which means showing that if the duration of
the trade is less than or equal to the rolling window size, n, then the trade has an overall log return
51
that is non-negative. Again, we will prove the only if part of the theorem by contradiction. We
will assume that the pair trade duration is less than or equal to nand that the overall log return
from the beginning of the entry period tto the end of the exit period t=tis 1.0×ǫwhere
ǫ > 0 so that the overall log return is negative. Then we will show that these assumptions lead to a
contradiction. Just as was done with the if part of the theorem, in order that one can visualize the
argument that follows, a long trade example is provided in Figure 19 on page 53.
First of all, by assumption, the trade duration is less than or equal to nwhich means that, given
the Bollinger Band exit rule, there exists some t=t<=t∗∗ such that at the end of t=t,
mavet<=log(yt). Without loss of generality and so that Figure 19 is consistent with the proof,
we will assume that tt=n= 4 so that the exit occurs at the end of t=t=t+ 4.
We need to show that the assumptions above lead to a contradiction. First notice that since the
long trade was entered into at t=t, this means that the condition log(yt)>(log(ytǫ)t
such that (tn1) t < tshould hold26. This condition along with the fact that the
overall log return over the interval is 1.0×ǫ, allows us to appeal to Lemma 4 which says that
log(yt)ǫ < mavet< log(yt). But, by definition, since the total log return over the interval
from t=tto t=tis 1.0×ǫ, clearly log(yt) = log(yt)ǫ. Therefore it must be the case that
log(yt)< mavetwhich means that log(yt) could have not crossed through mavetfrom below at
t=t. But if log(yt) did not cross through mavetfrom below at t=t, then this means that there
could not have been an exit at t=t. Therefore we have arrived at a contradiction which completes
the proof.
Both the if and the only if part of Theorem 1 have been proven so Theorem 1 has been proven. Any
pair trade in the BBPT strategy has a non-negative total return if and only if the duration of the
pair trade is less than or equal to nwhere n is the rolling window size.
.
26Just as was with the case in Lemma 7, in order to be certain that the condition holds for all n+1 periods, we
need to assume that a separate long trade was not completed in the time period [(tn1), t].
52
The Only If Part Of Theorem One
t1t2t3t4t5t6t7t8t9t10 t11 t12 t13 t14 t*t16 t17 t18 tt20 t21 t22 t23 t**
t*+n
t*n1
long trade: n=10 and n=4
{
ε
Endpoints of Original Window Are Beginning Of t=t* And End Of t=t**
Assumption is Negative Overall Log Return = ε And Exit At t=t
But Lemma Four says log(yt)<mave_t for all t where t* < t < t**
Therefore It Is Not Possible For There To Be An Exit At t=t
But This Is A Contradiction Because We Assumed There Was An Exit At t=t
Supposed
Exit Price
Entry Price
Supposed
Exit Time
log(yt*)
log(yt*)ε
yt
mave of yt
original window
signal price
Figure 19: Illustrating that a trade whose duration is less than or equal to the rolling window size
has to have a non-negative overall log return.
53
F The BBPT Strategy and the Corresponding FFMDPT
Strategy
Figure 20: The top plot represents the BBPT strategy over 2004 using n = 20 and k = 2 during
2004. The middle and bottom plot illustrate the the FFDBPT strategy over the same time period.
The middle plot excludes the trade line segments for clarity. The purple dots represent BBUpper
and BBLower at the time of entry and the horizontal purple line is the forecast at entry which is
constant for n = 20 periods. The actual trades triggered by the FFMDPT simulation are shown in
the bottom FFMDPT plot with a blue triangle at the end of a line segment indicating that the purple
center line was crossed and the black triangle indicating that the maximum duration occurred. In
the bottom plot, the purple dots at the time of entry are excluded for clarity.
54
G BBPT versus FFMDPT: Two Examples
Example One
Jan Mar May Jul Sep Nov Jan
1.30 1.34 1.38 1.42
Total RTN IN % = 6.921
Log Price Ratio
BBPT RESULTS ( window = 20 , multiplier = 2 )
20040102 − 20041231
Jan Mar May Jul Sep Nov Jan
1.30 1.34 1.38 1.42
Total RTN IN % =8.37
Log Price Ratio
FFMDPT RESULTS ( window = 20 , multiplier = 2 )
20040102 − 20041231
Figure 21: A comparison of Bollinger Bands and Fixed Forecast Maximum Duration Bands during
2004 using n= 20 and k= 2. The second and third trades in April and June generate slightly
higher returns in the FFMDPT strategy. Also, the August trade in the BBPT strategy generates a
much larger negative return compared to the corresponding trade in the FFMDPT strategy.
55
Example Two
Jan Mar May Jul Sep Nov Jan
1.26 1.28 1.30 1.32 1.34
Total RTN IN % = 5.285
Log Price Ratio
BBPT RESULTS ( window = 20 , multiplier = 2 )
20050103 − 20051230
Jan Mar May Jul Sep Nov Jan
1.26 1.28 1.30 1.32 1.34
Total RTN IN % =4.062
Log Price Ratio
FFMDPT RESULTS ( window = 20 , multiplier = 2 )
20050103 − 20051230
Figure 22: A comparison of Bollinger Bands and Fixed Forecast Maximum Duration Bands during
2005 using n= 20 and k= 2. The first trade in early February generates a positive return in the
BBPT strategy but a negative return in the FFMDPT strategy. This is because the fixed forecast
in the FFMDPT strategy is never crossed and extra losses are generated before the exit.
56
H BBPT Versus FFMDPT Optimized Return Comparison
Table 1: Return Comparison of Bollinger Bands Pairs Trading Simulation and Fixed Forecast Max-
imum Duration Pairs Trading Simulation where k= 1 with noptimized.
BBPT STRATEGY FFMDPT STRATEGY
Year nBBPT RTNBBPT nFFMDPT RTNFFMDPT DIFF
2003-4 13 1.491 11 1.930 3.4210
2004-5 12 9.738 45 10.390 0.6529
2005-6 45 4.026 50 5.698 1.6720
2006-7 14 3.056 13 2.024 5.0800
2007-8 10 13.33 20 3.464 9.8660
2008-9 40 3.294 24 1.088 2.2060
2009-10 31 1.406 28 3.625 2.2190
2010-11 18 0.7325 10 1.810 2.5425
Table 2: Return Comparison of Bollinger Bands Pairs Trading Simulation and Fixed Forecast Max-
imum Duration Pairs Trading Simulation where k= 1.5 with noptimized.
BBPT STRATEGY FFMDPT STRATEGY
Year nBBPT RTNBBPT nFFMDPT RTNFFMDPT DIFF
2003-4 13 3.104 12 3.898 7.002
2004-5 11 10.290 43 3.732 6.558
2005-6 14 10.410 19 9.335 1.075
2006-7 15 5.586 13 0.4854 5.1006
2007-8 12 4.00 10 2.331 1.6690
2008-9 15 3.154 20 2.735 5.8890
2009-10 49 5.018 50 9.310 4.2920
2010-11 16 0.0075 16 0.4846 0.4921
57
Table 3: Return Comparison of Bollinger Bands Pairs Trading Simulation and Fixed Forecast Max-
imum Duration Pairs Trading Simulation where k= 2 with noptimized.
BBPT STRATEGY FFMDPT STRATEGY
Year nBBPT RTNBBPT nFFMDPT RTNFFMDPT DIFF
2003-4 32 2.162 11 2.989 0.8270
2004-5 11 4.115 38 2.097 2.0180
2005-6 14 9.534 16 9.301 0.2330
2006-7 14 4.728 15 0.770 3.9576
2007-8 14 4.301 22 1.446 2.8550
2008-9 15 0.548 11 2.851 3.3988
2009-10 10 8.907 14 7.069 1.8380
2010-11 14 0.802 12 1.456 0.6542
58