Content uploaded by Jean-Philippe Bouchaud
Author content
All content in this area was uploaded by Jean-Philippe Bouchaud on Jan 01, 2014
Content may be subject to copyright.
Theory of Financial Risk and
Derivative Pricing
From Statistical Physics to Risk Management
second edition
Jean-Philippe Bouchaud and Marc Potters
published by the press syndicate of the university of cambridge
The Pitt Building, Trumpington Street, Cambridge, United Kingdom
cambridge university press
The Edinburgh Building, Cambridge CB2 2RU, UK
40 West 20th Street, New York, NY 10011–4211, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
Ruiz de Alarc´on 13, 28014 Madrid, Spain
Dock House, The Waterfront, Cape Town 8001, South Africa
http://www.cambridge.org
C
Jean-Philippe Bouchaud and Marc Potters 2000, 2003
This book is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without
the written permission of Cambridge University Press.
First published 2000
This edition published 2003
Printed in the United Kingdom at the University Press, Cambridge
Typefaces Times 10/13 pt. and Helvetica System L
A
TEX2ε[tb]
A catalogue record for this book is available from the British Library
Library of Congress Cataloguing in Publication data
Bouchaud, Jean-Philippe, 1962–
Theory of financial risk and derivative pricing : from statistical physics to risk
management / Jean-Philippe Bouchaud and Marc Potters.–2nd edn
p. cm.
Rev. edn of: Theory of financial risks. 2000.
Includes bibliographical references and index.
ISBN 0 521 81916 4 (hardback)
1. Finance. 2. Financial engineering. 3. Risk assessment. 4. Risk management.
I. Potters, Marc, 1969– II. Bouchaud, Jean-Philippe, 1962– Theory of financial risks.
III. Title.
HG101.B68 2003
658.155 – dc21 2003044037
ISBN 0 521 81916 4 hardback
Contents
Foreword page xiii
Preface xv
1 Probability theory: basic notions 1
1.1 Introduction 1
1.2 Probability distributions 3
1.3 Typical values and deviations 4
1.4 Moments and characteristic function 6
1.5 Divergence of moments–asymptotic behaviour 7
1.6 Gaussian distribution 7
1.7 Log-normal distribution 8
1.8 L´evy distributions and Paretian tails 10
1.9 Other distributions (∗)14
1.10 Summary 16
2 Maximum and addition of random variables 17
2.1 Maximum of random variables 17
2.2 Sums of random variables 21
2.2.1 Convolutions 21
2.2.2 Additivity of cumulants and of tail amplitudes 22
2.2.3 Stable distributions and self-similarity 23
2.3 Central limit theorem 24
2.3.1 Convergence to a Gaussian 25
2.3.2 Convergence to a L´evy distribution 27
2.3.3 Large deviations 28
2.3.4 Steepest descent method and Cram`er function (∗)30
2.3.5 The CLT at work on simple cases 32
2.3.6 Truncated L´evy distributions 35
2.3.7 Conclusion: survival and vanishing of tails 36
2.4 From sum to max: progressive dominance of extremes (∗)37
2.5 Linear correlations and fractional Brownian motion 38
2.6 Summary 40
vi Contents
3 Continuous time limit, Ito calculus and path integrals 43
3.1 Divisibility and the continuous time limit 43
3.1.1 Divisibility 43
3.1.2 Infinite divisibility 44
3.1.3 Poisson jump processes 45
3.2 Functions of the Brownian motion and Ito calculus 47
3.2.1 Ito’s lemma 47
3.2.2 Novikov’s formula 49
3.2.3 Stratonovich’s prescription 50
3.3 Other techniques 51
3.3.1 Path integrals 51
3.3.2 Girsanov’s formula and the Martin–Siggia–Rose trick (∗)53
3.4 Summary 54
4 Analysis of empirical data 55
4.1 Estimating probability distributions 55
4.1.1 Cumulative distribution and densities – rank histogram 55
4.1.2 Kolmogorov–Smirnov test 56
4.1.3 Maximum likelihood 57
4.1.4 Relative likelihood 59
4.1.5 A general caveat 60
4.2 Empirical moments: estimation and error 60
4.2.1 Empirical mean 60
4.2.2 Empirical variance and MAD 61
4.2.3 Empirical kurtosis 61
4.2.4 Error on the volatility 61
4.3 Correlograms and variograms 62
4.3.1 Variogram 62
4.3.2 Correlogram 63
4.3.3 Hurst exponent 64
4.3.4 Correlations across different time zones 64
4.4 Data with heterogeneous volatilities 66
4.5 Summary 67
5 Financial products and financial markets 69
5.1 Introduction 69
5.2 Financial products 69
5.2.1 Cash (Interbank market) 69
5.2.2 Stocks 71
5.2.3 Stock indices 72
5.2.4 Bonds 75
5.2.5 Commodities 77
5.2.6 Derivatives 77
Contents vii
5.3 Financial markets 79
5.3.1 Market participants 79
5.3.2 Market mechanisms 80
5.3.3 Discreteness 81
5.3.4 The order book 81
5.3.5 The bid-ask spread 83
5.3.6 Transaction costs 84
5.3.7 Time zones, overnight, seasonalities 85
5.4 Summary 85
6 Statistics of real prices: basic results 87
6.1 Aim of the chapter 87
6.2 Second-order statistics 90
6.2.1 Price increments vs. returns 90
6.2.2 Autocorrelation and power spectrum 91
6.3 Distribution of returns over different time scales 94
6.3.1 Presentation of the data 95
6.3.2 The distribution of returns 96
6.3.3 Convolutions 101
6.4 Tails, what tails? 102
6.5 Extreme markets 103
6.6 Discussion 104
6.7 Summary 105
7 Non-linear correlations and volatility fluctuations 107
7.1 Non-linear correlations and dependence 107
7.1.1 Non identical variables 107
7.1.2 A stochastic volatility model 109
7.1.3 GARCH(1,1) 110
7.1.4 Anomalous kurtosis 111
7.1.5 The case of infinite kurtosis 113
7.2 Non-linear correlations in financial markets: empirical results 114
7.2.1 Anomalous decay of the cumulants 114
7.2.2 Volatility correlations and variogram 117
7.3 Models and mechanisms 123
7.3.1 Multifractality and multifractal models (∗) 123
7.3.2 The microstructure of volatility 125
7.4 Summary 127
8 Skewness and price-volatility correlations 130
8.1 Theoretical considerations 130
8.1.1 Anomalous skewness of sums of random variables 130
8.1.2 Absolute vs. relative price changes 132
8.1.3 The additive-multiplicative crossover and the q-transformation 134
viii Contents
8.2 A retarded model 135
8.2.1 Definition and basic properties 135
8.2.2 Skewness in the retarded model 136
8.3 Price-volatility correlations: empirical evidence 137
8.3.1 Leverage effect for stocks and the retarded model 139
8.3.2 Leverage effect for indices 140
8.3.3 Return-volume correlations 141
8.4 The Heston model: a model with volatility fluctuations and skew 141
8.5 Summary 144
9 Cross-correlations 145
9.1 Correlation matrices and principal component analysis 145
9.1.1 Introduction 145
9.1.2 Gaussian correlated variables 147
9.1.3 Empirical correlation matrices 147
9.2 Non-Gaussian correlated variables 149
9.2.1 Sums of non Gaussian variables 149
9.2.2 Non-linear transformation of correlated Gaussian variables 150
9.2.3 Copulas 150
9.2.4 Comparison of the two models 151
9.2.5 Multivariate Student distributions 153
9.2.6 Multivariate L´evy variables (∗) 154
9.2.7 Weakly non Gaussian correlated variables (∗) 155
9.3 Factors and clusters 156
9.3.1 One factor models 156
9.3.2 Multi-factor models 157
9.3.3 Partition around medoids 158
9.3.4 Eigenvector clustering 159
9.3.5 Maximum spanning tree 159
9.4 Summary 160
9.5 Appendix A: central limit theorem for random matrices 161
9.6 Appendix B: density of eigenvalues for random correlation matrices 164
10 Risk measures 168
10.1 Risk measurement and diversification 168
10.2 Risk and volatility 168
10.3 Risk of loss, ‘value at risk’ (VaR) and expected shortfall 171
10.3.1 Introduction 171
10.3.2 Value-at-risk 172
10.3.3 Expected shortfall 175
10.4 Temporal aspects: drawdown and cumulated loss 176
10.5 Diversification and utility–satisfaction thresholds 181
10.6 Summary 184
Contents ix
11 Extreme correlations and variety 186
11.1 Extreme event correlations 187
11.1.1 Correlations conditioned on large market moves 187
11.1.2 Real data and surrogate data 188
11.1.3 Conditioning on large individual stock returns:
exceedance correlations 189
11.1.4 Tail dependence 191
11.1.5 Tail covariance (∗) 194
11.2 Variety and conditional statistics of the residuals 195
11.2.1 The variety 195
11.2.2 The variety in the one-factor model 196
11.2.3 Conditional variety of the residuals 197
11.2.4 Conditional skewness of the residuals 198
11.3 Summary 199
11.4 Appendix C: some useful results on power-law variables 200
12 Optimal portfolios 202
12.1 Portfolios of uncorrelated assets 202
12.1.1 Uncorrelated Gaussian assets 203
12.1.2 Uncorrelated ‘power-law’ assets 206
12.1.3 ‘Exponential’ assets 208
12.1.4 General case: optimal portfolio and VaR (∗) 210
12.2 Portfolios of correlated assets 211
12.2.1 Correlated Gaussian fluctuations 211
12.2.2 Optimal portfolios with non-linear constraints (∗) 215
12.2.3 ‘Power-law’ fluctuations – linear model (∗) 216
12.2.4 ‘Power-law’ fluctuations – Student model (∗) 218
12.3 Optimized trading 218
12.4 Value-at-risk– general non-linear portfolios (∗) 220
12.4.1 Outline of the method: identifying worst cases 220
12.4.2 Numerical test of the method 223
12.5 Summary 224
13 Futures and options: fundamental concepts 226
13.1 Introduction 226
13.1.1 Aim of the chapter 226
13.1.2 Strategies in uncertain conditions 226
13.1.3 Trading strategies and efficient markets 228
13.2 Futures and forwards 231
13.2.1 Setting the stage 231
13.2.2 Global financial balance 232
13.2.3 Riskless hedge 233
13.2.4 Conclusion: global balance and arbitrage 235
xContents
13.3 Options: definition and valuation 236
13.3.1 Setting the stage 236
13.3.2 Orders of magnitude 238
13.3.3 Quantitative analysis–option price 239
13.3.4 Real option prices, volatility smile and ‘implied’
kurtosis 242
13.3.5 The case of an infinite kurtosis 249
13.4 Summary 251
14 Options: hedging and residual risk 254
14.1 Introduction 254
14.2 Optimal hedging strategies 256
14.2.1 A simple case: static hedging 256
14.2.2 The general case and ‘’ hedging 257
14.2.3 Global hedging vs. instantaneous hedging 262
14.3 Residual risk 263
14.3.1 The Black–Scholes miracle 263
14.3.2 The ‘stop-loss’ strategy does not work 265
14.3.3 Instantaneous residual risk and kurtosis risk 266
14.3.4 Stochastic volatility models 267
14.4 Hedging errors. A variational point of view 268
14.5 Other measures of risk–hedging and VaR (∗) 268
14.6 Conclusion of the chapter 271
14.7 Summary 272
14.8 Appendix D 273
15 Options: the role of drift and correlations 276
15.1 Influence of drift on optimally hedged option 276
15.1.1 A perturbative expansion 276
15.1.2 ‘Risk neutral’ probability and martingales 278
15.2 Drift risk and delta-hedged options 279
15.2.1 Hedging the drift risk 279
15.2.2 The price of delta-hedged options 280
15.2.3 A general option pricing formula 282
15.3 Pricing and hedging in the presence of temporal correlations (∗) 283
15.3.1 A general model of correlations 283
15.3.2 Derivative pricing with small correlations 284
15.3.3 The case of delta-hedging 285
15.4 Conclusion 285
15.4.1 Is the price of an option unique? 285
15.4.2 Should one always optimally hedge? 286
15.5 Summary 287
15.6 Appendix E 287
Contents xi
16 Options: the Black and Scholes model 290
16.1 Ito calculus and the Black-Scholes equation 290
16.1.1 The Gaussian Bachelier model 290
16.1.2 Solution and Martingale 291
16.1.3 Time value and the cost of hedging 293
16.1.4 The Log-normal Black–Scholes model 293
16.1.5 General pricing and hedging in a Brownian world 294
16.1.6 The Greeks 295
16.2 Drift and hedge in the Gaussian model (∗) 295
16.2.1 Constant drift 295
16.2.2 Price dependent drift and the Ornstein–Uhlenbeck paradox 296
16.3 The binomial model 297
16.4 Summary 298
17 Options: some more specific problems 300
17.1 Other elements of the balance sheet 300
17.1.1 Interest rate and continuous dividends 300
17.1.2 Interest rate corrections to the hedging strategy 303
17.1.3 Discrete dividends 303
17.1.4 Transaction costs 304
17.2 Other types of options 305
17.2.1 ‘Put-call’ parity 305
17.2.2 ‘Digital’ options 305
17.2.3 ‘Asian’ options 306
17.2.4 ‘American’ options 308
17.2.5 ‘Barrier’ options (∗) 310
17.2.6 Other types of options 312
17.3 The ‘Greeks’ and risk control 312
17.4 Risk diversification (∗) 313
17.5 Summary 316
18 Options: minimum variance Monte–Carlo 317
18.1 Plain Monte-Carlo 317
18.1.1 Motivation and basic principle 317
18.1.2 Pricing the forward exactly 319
18.1.3 Calculating the Greeks 320
18.1.4 Drawbacks of the method 322
18.2 An ‘hedged’ Monte-Carlo method 323
18.2.1 Basic principle of the method 323
18.2.2 A linear parameterization of the price and hedge 324
18.2.3 The Black-Scholes limit 325
18.3 Non Gaussian models and purely historical option pricing 327
18.4 Discussion and extensions. Calibration 329
xii Contents
18.5 Summary 331
18.6 Appendix F: generating some random variables 331
19 The yield curve 334
19.1 Introduction 334
19.2 The bond market 335
19.3 Hedging bonds with other bonds 335
19.3.1 The general problem 335
19.3.2 The continuous time Gaussian limit 336
19.4 The equation for bond pricing 337
19.4.1 A general solution 339
19.4.2 The Vasicek model 340
19.4.3 Forward rates 341
19.4.4 More general models 341
19.5 Empirical study of the forward rate curve 343
19.5.1 Data and notations 343
19.5.2 Quantities of interest and data analysis 343
19.6 Theoretical considerations (∗) 346
19.6.1 Comparison with the Vasicek model 346
19.6.2 Market price of risk 348
19.6.3 Risk-premium and the √θlaw 349
19.7 Summary 351
19.8 Appendix G: optimal portfolio of bonds 352
20 Simple mechanisms for anomalous price statistics 355
20.1 Introduction 355
20.2 Simple models for herding and mimicry 356
20.2.1 Herding and percolation 356
20.2.2 Avalanches of opinion changes 357
20.3 Models of feedback effects on price fluctuations 359
20.3.1 Risk-aversion induced crashes 359
20.3.2 A simple model with volatility correlations and tails 363
20.3.3 Mechanisms for long ranged volatility correlations 364
20.4 The Minority Game 366
20.5 Summary 368
Index of most important symbols 372
Index 377
1
Probability theory: basic notions
All epistemological value of the theory of probability is based on this: that large scale
random phenomena in their collective action create strict, non random regularity.
(Gnedenko and Kolmogorov, Limit Distributions for Sums of Independent
Random Variables.)
1.1 Introduction
Randomness stems from our incomplete knowledge of reality, from the lack of information
which forbids a perfect prediction of the future. Randomness arises from complexity, from
the fact that causes are diverse, that tiny perturbations may result in large effects. For over a
century now, Science has abandoned Laplace’s deterministic vision, and has fully accepted
the task of deciphering randomness and inventing adequate tools for its description. The
surprise is that, after all, randomness has many facets and that there are many levels to
uncertainty, but, above all, that a new form of predictability appears, which is no longer
deterministic but statistical.
Financial markets offer an ideal testing ground for these statistical ideas. The fact that
a large number of participants, with divergent anticipations and conflicting interests, are
simultaneously present in these markets, leads to unpredictable behaviour. Moreover, finan-
cial markets are (sometimes strongly) affected by external news–which are, both in date
and in nature, to a large degree unexpected. The statistical approach consists in drawing
from past observations some information on the frequency of possible price changes. If one
then assumes that these frequencies reflect some intimate mechanism of the markets them-
selves, then one may hope that these frequencies will remain stable in the course of time.
For example, the mechanism underlying the roulette or the game of dice is obviously always
the same, and one expects that the frequency of all possible outcomes will be invariant in
time– although of course each individual outcome is random.
This ‘bet’ that probabilities are stable (or better, stationary) is very reasonable in the
case of roulette or dice;†it is nevertheless much less justified in the case of financial
markets– despite the large number of participants which confer to the system a certain
†The idea that science ultimately amounts to making the best possible guess of reality is due to R. P. Feynman
(Seeking New Laws, in The Character of Physical Laws, MIT Press, Cambridge, MA, 1965).
2Probability theory: basic notions
regularity, at least in the sense of Gnedenko and Kolmogorov. It is clear, for example, that
financial markets do not behave now as they did 30 years ago: many factors contribute to
the evolution of the way markets behave (development of derivative markets, world-wide
and computer-aided trading, etc.). As will be mentioned below, ‘young’ markets (such as
emergent countries markets) and more mature markets (exchange rate markets, interest rate
markets, etc.) behave quite differently. The statistical approach to financial markets is based
onthe ideathat whateverevolutiontakes place, this happens sufficientlyslowly (onthe scale
of several years) so that the observation of the recent past is useful to describe a not too
distant future. However, even this ‘weak stability’ hypothesis is sometimes badly in error,
in particular in the case of a crisis, which marks a sudden change of market behaviour. The
recent example of some Asian currencies indexed to the dollar (such as the Korean won or
the Thai baht) is interesting, since the observation of past fluctuations is clearly of no help
to predict the amplitude of the sudden turmoil of 1997, see Figure 1.1.
9706 9708 9710 9712
0.4
0.6
0.8
1
x(t)
KRW/USD
9206 9208 9210 9212
6
8
10
12
x(t)
Libor 3M dec 92
8706 8708 8710 8712
t
200
250
300
350
x(t)
S&P 500
Fig. 1.1. Three examples of statistically unforeseen crashes: the Korean won against the dollar in
1997 (top), the British 3-month short-term interest rates futures in 1992 (middle), and the S&P 500
in 1987 (bottom). In the example of the Korean won, it is particularly clear that the distribution of
price changes before the crisis was extremely narrow, and could not be extrapolated to anticipate
what happened in the crisis period.
1.2 Probability distributions 3
Hence, the statistical description of financial fluctuations is certainly imperfect. It is
nevertheless extremely helpful: in practice, the ‘weak stability’ hypothesis is in most cases
reasonable, at least to describe risks.†
In other words, the amplitude of the possible price changes (but not their sign!) is, to a
certain extent, predictable. It is thus rather important to devise adequate tools, in order to
control (if at all possible) financial risks. The goal of this first chapter is to present a certain
number of basic notions in probability theory which we shall find useful in the following.
Our presentation does not aim at mathematical rigour, but rather tries to present the key
concepts in an intuitive way, in order to ease their empirical use in practical applications.
1.2 Probability distributions
Contrarily to the throw of a dice, which can only return an integer between 1 and 6, the
variation of price of a financial asset‡can be arbitrary (we disregard the fact that price
changes cannot actually be smaller than a certain quantity – a ‘tick’). In order to describe
a random process Xfor which the result is a real number, one uses a probability density
P(x), such that the probability that Xis within a small interval of width dxaround X=x
is equal to P(x)dx. In the following, we shall denote as P(·) the probability density for
the variable appearing as the argument of the function. This is a potentially ambiguous, but
very useful notation.
The probability that Xis between aand bis given by the integral of P(x) between a
and b,
P(a<X<b)=b
aP(x)dx.(1.1)
In the following, the notation P(·) means the probability of a given event, defined by the
content of the parentheses (·).
The function P(x) is a density; in this sense it depends on the units used to measure X.
For example, if Xis a length measured in centimetres, P(x) is a probability density per unit
length, i.e. per centimetre. The numerical value of P(x) changes if Xis measured in inches,
but the probability that Xlies between two specific values l1and l2is of course independent
of the chosen unit. P(x)dxis thus invariant upon a change of unit, i.e. under the change
of variable x→γx. More generally, P(x)dxis invariant upon any (monotonic) change of
variable x→y(x): in this case, one has P(x)dx=P(y)dy.
In order to be a probability density in the usual sense, P(x) must be non-negative
(P(x)≥0 for all x) and must be normalized, that is that the integral of P(x) over the
whole range of possible values for Xmust be equal to one:
xM
xm
P(x)dx=1,(1.2)
†The prediction of future returns on the basis of past returns is however much less justified.
‡Asset is the generic name for a financial instrument which can be bought or sold, like stocks, currencies, gold,
bonds, etc.
4Probability theory: basic notions
where xm(resp. xM) is the smallest value (resp. largest) which Xcan take. In the case where
the possible values of Xare not bounded from below, one takes xm=−∞, and similarly
for xM. One can actually always assume the bounds to be ±∞ by setting to zero P(x)in
the intervals ]−∞,xm] and [xM,∞[. Later in the text, we shall often use the symbol as
a shorthand for +∞
−∞ .
An equivalent way of describing the distribution of Xis to consider its cumulative
distribution P<(x), defined as:
P<(x)≡P(X<x)=x
−∞
P(x)dx.(1.3)
P<(x) takes values between zero and one, and is monotonically increasing with x. Obvi-
ously, P<(−∞)=0 and P<(+∞)=1. Similarly, one defines P>(x)=1−P<(x).
1.3 Typical values and deviations
It is quite natural to speak about ‘typical’ values of X. There are at least three mathematical
definitions of this intuitive notion: the most probable value, the median and the mean.
The most probable value x∗corresponds to the maximum of the function P(x); x∗needs
not be unique if P(x) has several equivalent maxima. The median xmed is such that the
probabilities that Xbe greater or less than this particular value are equal. In other words,
P<(xmed)=P>(xmed)=1
2. The mean, or expected value of X, which we shall note as
mor xin the following, is the average of all possible values of X, weighted by their
corresponding probability:
m≡x=xP(x)dx.(1.4)
For a unimodal distribution (unique maximum), symmetrical around this maximum, these
three definitions coincide. However, they are in general different, although often rather
close to one another. Figure 1.2 shows an example of a non-symmetric distribution, and the
relative position of the most probable value, the median and the mean.
One can then describe the fluctuations of the random variable X: if the random process is
repeated several times, one expects the results to be scattered in a cloud of a certain ‘width’
in the region of typical values of X. This width can be described by the mean absolute
deviation (MAD) Eabs,bytheroot mean square (RMS) σ(or, standard deviation), or
by the ‘full width at half maximum’ w1/2.
The mean absolute deviation from a given reference value is the average of the distance
between the possible values of Xand this reference value,†
Eabs ≡|x−xmed|P(x)dx.(1.5)
†One chooses as a reference value the median for the MAD and the mean for the RMS, because for a fixed
distribution P(x), these two quantities minimize, respectively, the MAD and the RMS.
1.3 Typical values and deviations 5
02468
x
0.0
0.1
0.2
0.3
0.4
P(x)
x*
xmed
<x>
Fig. 1.2. The ‘typical value’ of a random variable Xdrawn according to a distribution density P(x)
can be defined in at least three different ways: through its mean value x, its most probable value x∗
or its median xmed. In the general case these three values are distinct.
Similarly, the variance (σ2) is the mean distance squared to the reference value m,
σ2≡(x−m)2=(x−m)2P(x)dx.(1.6)
Since the variance has the dimension of xsquared, its square root (the RMS, σ) gives the
order of magnitude of the fluctuations around m.
Finally, the full width at half maximum w1/2is defined (for a distribution which is
symmetrical around itsunique maximum x∗)suchthat P(x∗±(w1/2)/2) =P(x∗)/2,which
corresponds to the points where the probability density has dropped by a factor of two
compared to its maximum value. One could actually define this width slightly differently,
for example such that the total probability to find an event outside the interval [(x∗−w/2),
(x∗+w/2)] is equal to, say, 0.1. The corresponding value of wis called a quantile. This
definition is important when the distribution has very fat tails, such that the variance or the
mean absolute deviation are infinite.
The pair mean–variance is actually much more popular than the pair median–MAD. This
comes from the fact that the absolute value is not an analytic function of its argument, and
thusdoesnotpossesstheniceproperties of the variance, such as additivity under convolution,
which we shall discuss in the next chapter. However, for the empirical study of fluctuations,
it is sometimes preferable to use the MAD; it is more robust than the variance, that is, less
sensitive to rare extreme events, which may be the source of large statistical errors.
6Probability theory: basic notions
1.4 Moments and characteristic function
Moregenerally,onecandefinehigher-order moments ofthedistribution P(x)asthe average
of powers of X:
mn≡xn=xnP(x)dx.(1.7)
Accordingly, the mean mis the first moment (n=1), and the variance is related to the
second moment (σ2=m2−m2). The above definition, Eq. (1.7), is only meaningful if the
integral converges, which requires that P(x) decreases sufficiently rapidly for large |x|(see
below).
From a theoretical point of view, the moments are interesting: if they exist, their knowl-
edge is often equivalent to the knowledge of the distribution P(x) itself.†In practice how-
ever, the high order moments are very hard to determine satisfactorily: as ngrows, longer
and longer time series are needed to keep a certain level of precision on mn; these high
moments are thus in general not adapted to describe empirical data.
For many computational purposes, it is convenient to introduce the characteristic func-
tion of P(x), defined as its Fourier transform:
ˆ
P(z)≡eizx P(x)dx.(1.8)
The function P(x) is itself related to its characteristic function through an inverse Fourier
transform:
P(x)=1
2πe−izx ˆ
P(z)dz.(1.9)
Since P(x) is normalized, one always has ˆ
P(0) =1. The moments of P(x) can be obtained
through successive derivatives of the characteristic function at z=0,
mn=(−i)ndn
dznˆ
P(z)z=0
.(1.10)
One finally defines the cumulants cnof a distribution as the successive derivatives of the
logarithm of its characteristic function:
cn=(−i)ndn
dznlog ˆ
P(z)z=0
.(1.11)
The cumulant cnis a polynomial combination of the moments mpwith p≤n. For example
c2=m2−m2=σ2. It is often useful to normalize the cumulants by an appropriate power
of the variance, such that the resulting quantities are dimensionless. One thus defines the
normalized cumulants λn,
λn≡cn/σ n.(1.12)
†This is not rigorously correct, since one can exhibit examples of different distribution densities which possess
exactly the same moments, see Section 1.7 below.
1.6 Gaussian distribution 7
One often uses the third and fourth normalized cumulants, called the skewness (ς) and
kurtosis (κ),†
ς≡λ3=(x−m)3
σ3κ≡λ4=(x−m)4
σ4−3.(1.13)
The above definition of cumulants may look arbitrary, but these quantities have remark-
able properties. For example, as we shall show in Section 2.2, the cumulants simply add
when one sums independent random variables. Moreover a Gaussian distribution (or the
normal law of Laplace and Gauss) is characterized by the fact that all cumulants of order
larger than two are identically zero. Hence the cumulants, in particular κ, can be interpreted
as a measure of the distance between a given distribution P(x) and a Gaussian.
1.5 Divergence of moments – asymptotic behaviour
The moments (or cumulants) of a given distribution do not always exist. A necessary
condition for the nth moment (mn) to exist is that the distribution density P(x) should
decay faster than 1/|x|n+1for |x|going towards infinity, or else the integral, Eq. (1.7),
would diverge for |x|large. If one only considers distribution densities that are behaving
asymptotically as a power-law, with an exponent 1 +µ,
P(x)∼µAµ
±
|x|1+µfor x→±∞, (1.14)
then all the moments such that n≥µare infinite. For example, such a distribution has
no finite variance whenever µ≤2. [Note that, for P(x) to be a normalizable probability
distribution, the integral, Eq. (1.2), must converge, which requires µ>0.]
The characteristic function of a distribution having an asymptotic power-law behaviour
given by Eq. (1.14) isnon-analytic around z =0. The small z expansion contains regular
terms of the form znfor n <µ followed by a non-analytic term |z|µ(possibly with
logarithmic corrections such as |z|µlog z for integer µ). The derivatives of order larger
or equal to µof the characteristic function thus do not exist at the origin (z =0).
1.6 Gaussian distribution
The most commonly encountered distributions are the ‘normal’ laws of Laplace and Gauss,
which we shall simply call Gaussian in the following. Gaussians are ubiquitous: for
example, the number of heads in a sequence of a thousand coin tosses, the exact number
of oxygen molecules in the room, the height (in inches) of a randomly selected individual,
†Note that it is sometimes κ+3, rather than κitself, which is called the kurtosis.
8Probability theory: basic notions
are all approximately described by a Gaussian distribution.†The ubiquity of the Gaussian
can be in part traced to the central limit theorem (CLT) discussed at length in Chapter 2,
which states that a phenomenon resulting from a large number of small independent causes
is Gaussian. There exists however a large number of cases where the distribution describing
a complex phenomenon is not Gaussian: for example, the amplitude of earthquakes, the
velocity differences in a turbulent fluid, the stresses in granular materials, etc., and, as we
shall discuss in Chapter 6, the price fluctuations of most financial assets.
A Gaussian of mean mand root mean square σis defined as:
PG(x)≡1
√2πσ2exp −(x−m)2
2σ2.(1.15)
The median and most probable value are in this case equal to m, whereas the MAD (or any
other definition of the width) is proportional to the RMS (for example, Eabs =σ√2/π ).
For m=0, all the odd moments are zero and the even moments are given by m2n=
(2n−1)(2n−3)...σ2n=(2n−1)!!σ2n.
All the cumulants of order greater than two are zero for a Gaussian. This can be realized
by examining its characteristic function:
ˆ
PG(z)=exp −σ2z2
2+imz.(1.16)
Its logarithm is a second-order polynomial, for which all derivatives of order larger than
two are zero. In particular, the kurtosis of a Gaussian variable is zero. As mentioned above,
the kurtosis is often taken as a measure of the distance from a Gaussian distribution. When
κ>0(leptokurticdistributions), the corresponding distribution density has a marked peak
around the mean, and rather ‘thick’ tails. Conversely, when κ<0, the distribution density
has a flat top and very thin tails. For example, the uniform distribution over a certain interval
(for which tails are absent) has a kurtosis κ=−6
5. Note that the kurtosis is bounded from
below by the value −2, which corresponds to the case where the random variable can only
take two values −aand awith equal probability.
A Gaussian variable is peculiar because ‘large deviations’ are extremely rare. The quan-
tity exp(−x2/2σ2) decays so fast for large xthat deviations of a few times σare nearly
impossible. For example, a Gaussian variable departs from its most probable value by more
than 2σonly 5% of the times, of more than 3σin 0.2% of the times, whereas a fluctuation
of 10σhas a probability of less than 2 ×10−23; in other words, it never happens.
1.7 Log-normal distribution
Another very popular distribution in mathematical finance is the so-called log-normal law.
That Xis a log-normal random variable simply means that log Xis normal, or Gaussian. Its
use in finance comes from the assumption that the rate of returns, rather than the absolute
†Although, in the above three examples, the random variable cannot be negative. As we shall discuss later, the
Gaussian description is generally only valid in a certain neighbourhood of the maximum of the distribution.
1.7 Log-normal distribution 9
change of prices, are independent random variables. The increments of the logarithm of the
price thus asymptotically sum to a Gaussian, according to the CLT detailed in Chapter 2.
The log-normal distribution density is thus defined as:†
PLN(x)≡1
x√2πσ2exp −log2(x/x0)
2σ2,(1.17)
the moments of which being: mn=xn
0en2σ2/2.
From these moments, one deduces the skewness, given by ς3=(e3σ2−3eσ2+2)/
(eσ2−1)3/2,(3σfor σ1), and the kurtosis κ=(e6σ2−4e3σ2+6eσ2−3)/(eσ2−
1)2−3, (19σ2for σ1).
In the context of mathematical finance, one often prefers log-normal to Gaussian distri-
butions for several reasons. As mentioned above, the existence of a random rate of return,
or random interest rate, naturally leads to log-normal statistics. Furthermore, log-normals
account for the following symmetry in the problem of exchange rates:‡if xis the rate of
currency A in terms of currency B, then obviously, 1/xis the rate of currency Bin terms
of A. Under this transformation, log xbecomes −log xand the description in terms of a
log-normal distribution (or in terms of any other even function of logx) is independent of
the reference currency. One often hears the following argument in favour of log-normals:
since the price of an asset cannot be negative, its statistics cannot be Gaussian since the
latter admits in principle negative values, whereas a log-normal excludes them by construc-
tion. This is however a red-herring argument, since the description of the fluctuations of
the price of a financial asset in terms of Gaussian or log-normal statistics is in any case an
approximation which is only valid in a certain range. As we shall discuss at length later,
these approximations are totally unadapted to describe extreme risks. Furthermore, even if
a price drop of more than 100% is in principle possible for a Gaussian process,§the error
caused by neglecting such an event is much smaller than that induced by the use of either
of these two distributions (Gaussian or log-normal). In order to illustrate this point more
clearly, consider the probability of observing ntimes ‘heads’ in a series of Ncoin tosses,
which is exactly equal to 2−NCn
N. It is also well known that in the neighbourhood of N/2,
2−NCn
Nis very accurately approximated by a Gaussian of variance N/4; this is however
not contradictory with the fact that n≥0 by construction!
Finally, let us note that for moderate volatilities (up to say 20%), the two distributions
(Gaussian and log-normal) look rather alike, especially in the ‘body’ of the distribution
(Fig. 1.3). As for the tails, we shall see later that Gaussians substantially underestimate
their weight, whereas the log-normal predicts that large positive jumps are more frequent
†A log-normal distribution has the remarkable property that the knowledge of all its moments is not suffi-
cient to characterize the corresponding distribution. One can indeed show that the following distribution:
1
√2πx−1exp[−1
2(log x)2][1 +asin(2πlog x)], for |a|≤1, has moments which are independent of the value of
a, and thus coincide with those of a log-normal distribution, which corresponds to a=0.
‡This symmetry is however not always obvious. The dollar, for example, plays a special role. This symmetry can
only be expected between currencies of similar strength.
§In the rather extreme case of a 20% annual volatility and a zero annual return, the probability for the price to
become negative after a year in a Gaussian description is less than one out of 3 million.
10 Probability theory: basic notions
50 75 100 125 150
x
0.00
0.01
0.02
0.03
P(x)
Gaussian
log-normal
Fig. 1.3. Comparison between a Gaussian (thick line) and a log-normal (dashed line), with
m=x0=100 and σequal to 15 and 15% respectively. The difference between the two curves
shows up in the tails.
thanlarge negativejumps. Thisis at variancewith empirical observation:the distributions of
absolute stock price changes are rather symmetrical; if anything, large negative draw-downs
are more frequent than large positive draw-ups.
1.8 L ´
evy distributions and Paretian tails
L´evy distributions (noted Lµ(x) below) appear naturally in the context of the CLT (see
Chapter 2), because of their stability property under addition (a property shared by
Gaussians). The tails of L´evy distributions are however much ‘fatter’ than those of Gaus-
sians, and are thus useful to describe multiscale phenomena (i.e. when both very large
and very small values of a quantity can commonly be observed–such as personal income,
size of pension funds, amplitude of earthquakes or other natural catastrophes, etc.). These
distributions were introduced in the 1950s and 1960s by Mandelbrot (following Pareto)
to describe personal income and the price changes of some financial assets, in particular
the price of cotton. An important constitutive property of these L´evy distributions is their
power-law behaviour for large arguments, often called Pareto tails:
Lµ(x)∼µAµ
±
|x|1+µfor x→±∞,(1.18)
where 0 <µ<2 is a certain exponent (often called α), and Aµ
±two constants which we
call tail amplitudes,orscale parameters:A±indeed gives the order of magnitude of the
1.8 L´evy distributions and Paretian tails 11
large (positive or negative) fluctuations of x. For instance, the probability to draw a number
larger than xdecreases as P>(x)=(A+/x)µfor large positive x.
One can of course in principle observe Pareto tails with µ≥2; but, those tails do not
correspond to the asymptotic behaviour of a L´evy distribution.
In full generality, L´evy distributions are characterized by an asymmetry parameter
defined as β≡(Aµ
+−Aµ
−)/(Aµ
++Aµ
−), which measures the relative weight of the positive
and negative tails. We shall mostly focus in the following on the symmetric case β=0. The
fully asymmetric case (β=1) is also useful to describe strictly positive random variables,
such as, for example, the time during which the price of an asset remains below a certain
value, etc.
An important consequence of Eq. (1.14) with µ≤2 is that the variance of a L´evy
distribution is formally infinite: the probability density does not decay fast enough for the
integral, Eq. (1.6), to converge. In the case µ≤1, the distribution density decays so slowly
that even the mean, or the MAD, fail to exist.†The scale of the fluctuations, defined by the
width of the distribution, is always set by A=A+=A−.
There is unfortunately no simple analytical expression for symmetric L´evy distributions
Lµ(x), except for µ=1, which corresponds to a Cauchy distribution (or Lorentzian):
L1(x)=A
x2+π2A2.(1.19)
However, the characteristic function of a symmetric L´evy distribution is rather
simple, and reads:
ˆ
Lµ(z)=exp(−aµ|z|µ),(1.20)
where aµis a constant proportional to the tail parameter Aµ:
Aµ=µ(µ−1)sin(πµ/2)
πaµ1<µ<2,(1.21)
and
Aµ=(1 −µ)(µ)sin(πµ/2)
πµ aµµ<1.(1.22)
It is clear, from (1.20), that in the limit µ=2, one recovers the definition of a Gaussian.
When µdecreases from 2, the distribution becomes more and more sharply peaked around
the origin and fatter in its tails, while ‘intermediate’ events lose weight (Fig. 1.4). These
distributions thus describe ‘intermittent’ phenomena, very often small, sometimes gigantic.
The moments of the symmetric L´evy distribution can be computed, when they exist. One
finds:
|x|ν=(aµ)ν/µ (−ν/µ)
µ(−ν) cos(πν/2),−1<ν<µ. (1.23)
†Themedianandthemostprobablevaluehoweverstill exist. For a symmetric L´evy distribution, the most probable
value defines the so-called localization parameter m.
12 Probability theory: basic notions
-3 -2 -1 0 1
x
0
0.5
1
P(x)
-3 -2 -1 0 2
µ=0.8
µ=1.2
µ=1.6
µ=2 (Gaussian)
510 15
0
0.02
3
Fig. 1.4. Shape of the symmetric L´evy distributions with µ=0.8,1.2,1.6 and 2 (this last value
actually corresponds to a Gaussian). The smaller µ, the sharper the ‘body’ of the distribution, and
the fatter the tails, as illustrated in the inset.
Notefinally that Eq.(1.20) does notdefine a probabilitydistributionwhen µ>2,because
its inverse Fourier transform is not everywhere positive.
In the case β= 0, one would have:
ˆ
Lβ
µ(z)=exp −aµ|z|µ1+iβtan(µπ/2) z
|z| (µ= 1).(1.24)
It is important to notice that while the leading asymptotic term for large xis given
by Eq. (1.18), there are subleading terms which can be important for finite x. The full
asymptotic series actually reads:
Lµ(x)=∞
n=1
(−)n+1
πn!
an
µ
x1+nµ(1 +nµ)sin(πµn/2).(1.25)
The presence of the subleading terms may lead to a bad empirical estimate of the exponent
µbased on a fit of the tail of the distribution. In particular, the ‘apparent’ exponent which
describes the function Lµfor finite xis larger than µ, and decreases towards µfor x→∞,
but more and more slowly as µgets nearer to the Gaussian value µ=2, for which the
power-law tails no longer exist. Note however that one also often observes empirically
the opposite behaviour, i.e. an apparent Pareto exponent which grows with x. This arises
when the Pareto distribution, Eq. (1.18), is only valid in an intermediate regime x1/α,
beyond which the distribution decays exponentially, say as exp(−αx). The Pareto tail is
then ‘truncated’ for large values of x, and this leads to an effective µwhich grows with x.
1.8 L´evy distributions and Paretian tails 13
An interesting generalization of the symmetric L´evy distributions which accounts for this
exponential cut-off is given by the truncated L´evy distributions (TLD), which will be of
much use in the following. A simple way to alter the characteristic function Eq. (1.20) to
account for an exponential cut-off for large arguments is to set:
ˆ
L(t)
µ(z)=exp −aµ
(α2+z2)µ
2cos(µarctan(|z|/α))−αµ
cos(πµ/2) ,(1.26)
for 1 ≤µ≤2. The above form reduces to Eq. (1.20) for α=0. Note that the argument in
the exponential can also be written as:
aµ
2cos(πµ/2)[(α+iz)µ+(α−iz)µ−2αµ].(1.27)
The first cumulants of the distribution defined by Eq. (1.26) read, for 1 <µ<2:
c2=µ(µ−1) aµ
|cosπµ/2|αµ−2c3=0.(1.28)
The kurtosis κ=λ4=c4/c2
2is given by:
λ4=(3 −µ)(2 −µ)|cosπµ/2|
µ(µ−1)aµαµ.(1.29)
Note that the case µ=2 corresponds to the Gaussian, for which λ4=0 as expected.
On the other hand, when α→0, one recovers a pure L´evy distribution, for which c2
and c4are formally infinite. Finally, if α→∞with aµαµ−2fixed, one also recovers the
Gaussian.
As explained below in Section 3.1.3, the truncated L´evy distribution has the interesting
property of being infinitely divisible for all values of αand µ(this includes the Gaussian
distribution and the pure L´evy distributions).
Exponential tail: a limiting case
Very often in the following, we shall notice that in the formal limit µ→∞, the power-
law tail becomes an exponential tail, if the tail parameter is simultaneously scaled as
Aµ=(µ/α)µ. Qualitatively, this can be understood as follows: consider a probability
distribution restricted to positive x, which decays as a power-law for large x, defined
as:
P>(x)=Aµ
(A+x)µ.(1.30)
This shape is obviously compatible with Eq. (1.18), and is such that P>(x=0) =1.If
A=(µ/α), one then finds:
P>(x)=1
[1+(αx/µ)]µ−→
µ→∞ exp(−αx).(1.31)
14 Probability theory: basic notions
1.9 Other distributions (∗)
There are obviously a very large number of other statistical distributions useful to describe
random phenomena. Let us cite a few, which often appear in a financial context:
rThe discrete Poisson distribution: consider a set of points randomly scattered on the
real axis, with a certain density ω(e.g. the times when the price of an asset changes).
The number of points nin an arbitrary interval of length is distributed according to the
Poisson distribution:
P(n)≡(ω)n
n!exp(−ω).(1.32)
rThe hyperbolic distribution, which interpolates between a Gaussian ‘body’ and expo-
nential tails:
PH(x)≡1
2x0K1(αx0)exp −αx2
0+x2,(1.33)
where the normalization K1(αx0) is a modified Bessel function of the second kind. For
xsmall compared to x0,PH(x) behaves as a Gaussian although its asymptotic behaviour
for xx0is fatter and reads exp(−α|x|).
From the characteristic function
ˆ
PH(z)=αx0K1(x0√1+αz)
K1(αx0)√1+αz,(1.34)
we can compute the variance
σ2=x0K2(αx0)
αK1(αx0),(1.35)
and kurtosis
κ=3K2(αx0)
K1(αx0)2
+12
αx0
K2(αx0)
K1(αx0)−3.(1.36)
Note that the kurtosis of the hyperbolic distribution is always between zero and three.
(The skewness is zero since the distribution is even.)
In the case x0=0, one finds the symmetric exponential distribution:
PE(x)=α
2exp(−α|x|),(1.37)
with even moments m2n=(2n)!α−2n, which gives σ2=2α−2and κ=3. Its character-
istic function reads: ˆ
PE(z)=α2/(α2+z2).
rThe Student distribution, which also has power-law tails:
PS(x)≡1
√π
((1 +µ)/2)
(µ/2) aµ
(a2+x2)(1+µ)/2,(1.38)
which coincides with the Cauchy distribution for µ=1, and tends towards a Gaussian in
the limit µ→∞, provided that a2is scaled as µ. This distribution is usually known as
1.9 Other distributions (∗)15
0x
10-2
10-1
100
P(x)
Truncated Levy
Student
Hyperbolic
510 15 20
10-18
10-15
10-12
10-9
10-6
10-3
23
1
Fig. 1.5. Probability density for the truncated L´evy (µ=3
2), Student and hyperbolic distributions.
All three have two free parameters which were fixed to have unit variance and kurtosis. The inset
shows a blow-up of the tails where one can see that the Student distribution has tails similar to (but
slightly thicker than) those of the truncated L´evy.
Student’s t-distribution with µdegrees of freedom, but we shall call it simply the Student
distribution.
The even moments of the Student distribution read: m2n=(2n−1)!!(µ/2−n)/
(µ/2)(a2/2)n, provided 2n<µ; and are infinite otherwise. One can check that
in the limit µ→∞, the above expression gives back the moments of a Gaussian:
m2n=(2n−1)!!σ2n. The kurtosis of the Student distribution is given by κ=6/(µ−4).
Figure 1.5 shows a plot of the Student distribution with κ=1, corresponding to
µ=10.
Note that the characteristic function of Student distributions can also be explicitly
computed, and reads:
ˆ
PS(z)=21−µ/2
(µ/2)(az)µ/2Kµ/2(az),(1.39)
where Kµ/2is the modified Bessel function. The cumulative distribution in the useful
cases µ=3 and µ=4 with achosen such that the variance is unity read:
PS,>(x)=1
2−1
πarctan x+x
1+x2(µ=3,a=1),(1.40)
16 Probability theory: basic notions
and
PS,>(x)=1
2−3
4u+1
4u3,(µ=4,a=√2),(1.41)
where u=x/√2+x2.
rTheinversegamma distribution,for positive quantities (such as, for example, volatilities,
or waiting times), also has power-law tails. It is defined as:
P(x)=xµ
0
(µ)x1+µexp −x0
x.(1.42)
Its moments of order n<µare easily computed to give: mn=xn
0(µ−n)/(µ). This
distribution falls off very fast when x→0. As we shall see in Chapter 7, an inverse
gamma distribution and a log-normal distribution can sometimes be hard to distinguish
empirically. Finally, if the volatility σ2of a Gaussian is itself distributed as an inverse
gammadistribution,thedistributionofxbecomesa Student distribution – see Section9.2.5
for more details.
1.10 Summary
rThe most probable value and the mean value are both estimates of the typical values
of a random variable. Fluctuations around this value are measured by the root mean
square deviation or the mean absolute deviation.
rFor some distributions with very fat tails, the mean square deviation (or even the
mean value) is infinite, and the typical values must be described using quantiles.
rThe Gaussian, the log-normal and the Student distributions are some of the important
probability distributions for financial applications.
rThe way to generate numerically random variables with a given distribution
(Gaussian, L´evy stable, Student, etc.) is discussed in Chapter 18, Appendix F.
rFurther reading
W. Feller, An Introduction to Probability Theory and its Applications, Wiley, New York, 1971.
P. L´evy, Th´
eorie de l’addition des variables al´
eatoires, Gauthier Villars, Paris, 1937–1954.
B. V. Gnedenko, A. N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables,
Addison Wesley, Cambridge, MA, 1954.
G. Samorodnitsky, M. S. Taqqu, Stable Non-Gaussian Random Processes, Chapman & Hall, New
York, 1994.