ArticlePDF Available

Applications of large deviation theory in geophysical fluid dynamics and climate science

Authors:

Abstract and Figures

The climate is a complex, chaotic system with many degrees of freedom. Attaining a deeper level of understanding of climate dynamics is an urgent scientific challenge, given the evolving climate crisis. In statistical physics, many-particle systems are studied using Large Deviation Theory (LDT). A great potential exists for applying LDT to problems in geophysical fluid dynamics and climate science. In particular, LDT allows for understanding the properties of persistent deviations of climatic fields from long-term averages and for associating them to low-frequency, large-scale patterns. Additionally, LDT can be used in conjunction with rare event algorithms to explore rarely visited regions of the phase space. These applications are of key importance to improve our understanding of high-impact weather and climate events. Furthermore, LDT provides tools for evaluating the probability of noise-induced transitions between metastable climate states. This is, in turn, essential for understanding the global stability properties of the system. The goal of this review is manifold. First, we provide an introduction to LDT. We then present the existing literature. Finally, we propose possible lines of future investigations. We hope that this paper will prepare the ground for studies applying LDT to solve problems encountered in climate science and geophysical fluid dynamics.
Content may be subject to copyright.
La Rivista del Nuovo Cimento
https://doi.org/10.1007/s40766-021-00020-z
REVIEW PAPER
Applications of large deviation theory in geophysical fluid
dynamics and climate science
Vera Melinda Gálfi1,2 ·Valerio Lucarini3,4 ·Francesco Ragone5,6 ·
Jeroen Wouters3,4
Received: 19 February 2021 / Accepted: 19 April 2021
© The Author(s) 2021
Abstract
The climate is a complex, chaotic system with many degrees of freedom. Attaining a
deeper level of understanding of climate dynamics is an urgent scientific challenge,
given the evolving climate crisis. In statistical physics, many-particle systems are stud-
ied using Large Deviation Theory (LDT). A great potential exists for applying LDT
to problems in geophysical fluid dynamics and climate science. In particular, LDT
allows for understanding the properties of persistent deviations of climatic fields from
long-term averages and for associating them to low-frequency, large-scale patterns.
Additionally, LDT can be used in conjunction with rare event algorithms to explore
rarely visited regions of the phase space. These applications are of key importance
to improve our understanding of high-impact weather and climate events. Further-
more, LDT provides tools for evaluating the probability of noise-induced transitions
between metastable climate states. This is, in turn, essential for understanding the
global stability properties of the system. The goal of this review is manifold. First, we
provide an introduction to LDT. We then present the existing literature. Finally, we
propose possible lines of future investigations. We hope that this paper will prepare
the ground for studies applying LDT to solve problems encountered in climate science
and geophysical fluid dynamics.
BVera Melinda Gálfi
vera.melinda.galfi@geo.uu.se
1Department of Earth Sciences, Uppsala University, Uppsala, Sweden
2Meteorological Institute, CEN, University of Hamburg, Hamburg, Germany
3Department of Mathematics and Statistics, University of Reading, Reading, UK
4Centre for the Mathematics of Planet Earth, University of Reading, Reading, UK
5Georges Lemaître Centre for Earth and Climate Research, Earth and Life Institute, Université
catholique de Louvain, Louvain-la-Neuve, Belgium
6Royal Meteorological Institute of Belgium, Brussels, Belgium
123
V. M. Gálfi et al.
Keywords Large deviation theory ·Climate system ·Geophysical fluid dynamics ·
Low-frequency variability ·Extreme events ·Rare event algorithms ·Metastable
states ·Noise-induced transitions ·Instantons ·Heatwaves ·Cold Spells ·Rogue
Waves
1 Introduction and motivation
1.1 The climate crisis: extreme events in a changing climate
The climate is a forced and dissipative nonlinear heterogeneous system composed by
several subdomains, namely the atmosphere, the hydrosphere, the cryosphere, the soil,
and the biosphere. The climate evolves under the action of a primary forcing given by
the incoming solar radiation and modulating factors such as the atmospheric composi-
tion, the optical properties of the surface of the planet, gravity, and the rotation of the
Earth around its vertical axis. Each of these subsystems features complex nonlinear
physical and chemical processes, and the various subsystems interact among them-
selves through exchanges of energy, momentum, and chemical species. As a result of
the interplay between forcing, dissipation, and internal nonlinear dynamics, the climate
system features variability of a vast range of spatial and temporal scales. The climate
can be seen as a prominent example of non-equilibrium system where an approximate
steady state is reached as the inhomogeneous absorption of solar radiation occurring
throughout its domain is compensated by a variety of physical mechanisms, including
thermal emission in the infrared and complex patterns of transport of sensible and
latent heat. Lorenz [172] provided a first comprehensive theory of the dynamics and
thermodynamics of climate able to bring together the main mechanisms of forcing,
energy conversion, and dissipation. The large-scale flows of the ocean and of the
atmosphere ultimately result from the conversion of available potential into kinetic
energy performed by the climatic engine. The conversion takes place through various
mechanisms of instability fuelled by the presence of spatial temperature gradients.
Such instabilities allow for energy conversion between the background state and the
fluctuations of the climatic field and lead to chaotic conditions that are associated with
heterogeneous turbulence in the geophysical fluids. Additionally, these instabilities
establish negative feedbacks, because they tend to reduce, via transport and mixing,
the temperature gradients that support them. See [174,204] for an extensive discussion
of these mechanism. We remark that an exact steady state is never achieved because of
the fluctuations in the incoming solar radiation and in the processes, both natural and
anthropogenic, that alter the atmospheric composition and the surface of the planet
[104,250].
Improving our understanding of the dynamical and statistical properties of the cli-
mate system, of the links between its response to anthropogenic and natural forcings
and of its natural variability is key to provide scientific tools for anticipating, pre-
dicting, and possibly addressing the ongoing climate crisis. The current popularity
of the expression climate crisis as opposed to—the more usual one—climate change
is motivated by the desire to focus on the understanding of how changing climate
conditions will unfold as variations in the higher moments of the distribution of the
123
Large deviation theory in GFD and climate science
climatic variables, able to better capture the properties of extreme events and tipping
points [104].
Indeed, the study of extreme events is essential for addressing the natural hazards
associated with climate variability and climate change and affecting in a potentially
catastrophic way human and environmental welfare. As the resilience of any system
and the incurred damages due to an unusual, high-impact event change drastically when
certain thresholds in the intensity and/or in the duration of the hazard are reached, it
is clear that understanding the fate of extreme events in the context of the changing
climate is essential for accurately factoring in future losses and damages and pre-
pare for them [133]. High-impact events affecting human and environmental welfare
are sometimes associated with the presence of long temporal persistence of a large
anomaly in the field of interest, as resilience—the ability of any system to resist—
against anomalous environmental conditions does not last indefinitely [76,146,214].
Meteo-climatic extremes characterised by persistence are usually referred to as slow
onset events, as opposed to the fast onset ones, which, instead, are associated with
fast processes. Prominent examples of slow onset events are droughts, heatwaves, and
cold spells, while flash floods and intense snowfalls lead to hazards in the category
of fast onset events [133]. We remark that it might be worth considering a revision of
such a classical terminology, because of the multiscale nature of some of the so-called
slow onset events: the onset of an atmospheric blocking responsible for a heatwave
takes usually up to a couple of days, while its duration can be much much longer, and
ranging up to several weeks. Also the exit from long lasting events can be quite rapid
and, anyhow, considerably shorter than their active phase [254,255].
An important remark is also needed regarding the problem of defining persistent
extreme events; we refer here, for the sake of clarity, to the specific case of heatwaves.
While the general understanding is that a heatwave is a period of extreme and unusual
warmth, there is no rigorous nor commonly accepted definition for it in terms of
intensity and persistence of the anomalous weather conditions, despite several attempts
in this direction; see, e.g. [223]. As noted in [206], …it seems that almost, if not every
climatological study that looks at heatwaves uses a different metric... seealso[183,243]
for a discussion on the lack of consensus for a shared definition of heatwaves. The
confusion around the definition of heatwave has serious implications as it hinders
attempts at mitigating their impacts [38,195,216].
Changing climate conditions can lead to dramatic changes in the statistics of extreme
events. As mentioned above, this is one of the main manifestations of the climate cri-
sis. In the future climate, changes in the statistics of heatwaves are worrying, as more
persistent and larger temperature fluctuations are possible as a result of changes in
the properties of the low frequency variability of the atmosphere and of the properties
of the soil. This effect compounds with the trend in the average temperature, leading
to a greatly increased risk of such catastrophic events [50,210,238]. Climate change
additionally leads to a reduced winter weather variability, as a result of the reduction in
the temperature difference between low and high latitudes [250]. Therefore, the prob-
ability of occurrence of cold spells is likely to greatly decrease [242], even if structural
changes in the dynamics of the atmosphere can rarely create special conditions that
facilitate their occurrence [46,150]. Looking instead at flash floods, consensus exists
that they will become more likely and more intense in the future because the higher
123
V. M. Gálfi et al.
retention of water vapour in the atmosphere—made possible by warmer conditions as
a result of the Clausius-Clapeyron relation—compounds with strengthened convective
motions, possibly leading to disproportionately enhanced extreme precipitation events
in specific locales [88,279]. However, we remark that, due to the complexity of the
(microscopic and macroscopic) dynamical and thermodynamical processes involved,
the precipitation is not distributed in space and time in direct proportion to the available
precipitable water [278].
A somewhat separate research agenda tries, instead, to relate in a direct way climate
change and individual extreme events, thus blurring the distinction between climate
and weather, statistics and analysis of specific case studies. Following the landmark
paper [5], considerable efforts have been directed at developing tools for assessing to
what extent climate change has impacted and is impacting either the frequency, or the
likelihood, or the intensity of individual extreme events, with the goal of providing the
basis for science-based liability for the impacts of such extremes. The scientific debate
around extreme events attribution has relevant implications in terms of climate adap-
tation, risk assessment, public policy, infrastructural design, insurance instruments
design, international relations, and even migration policies [135,199,239,251].
1.2 Quest for universality of extreme events
Empirical frequentist approaches aimed at the study of extremes applied to actual
climate records or to the output of climate models are essential for keeping track of
the observed events, but face the unavoidable problem of being unable to say any-
thing about the probability of occurrence of out-of-sample events. In order to achieve
predictive power in a statistical sense—i.e. being able to estimate the probability of
occurrence of events that are more extreme than the observed ones—one needs to
interpret data through mathematical approaches that provide some form of universal-
ity.
Extreme Value Theory (EVT) provides a powerful framework for studying extreme
events in a multitude of applications. It is based on limit theorems mimicking the central
limit theorem that allow one, under rather general hypotheses and taking suitable
limits, to define universal laws describing the probability of occurrence of events
generated according to a given stochastic process above a sufficiently high threshold.
Alternatively, one can develop the theory in order to describe the distribution of the
maximum of a set of independent and identically distributed stochastic variables in
the limit of large sets [47]. The theory can be adapted for dealing with correlations
between the variables [161] and for treating the outputs of chaotic dynamical systems
[178], in such a way that deep connections emerge between the statistical properties
of suitably defined extremes and the geometry of the attractor of the system [18,95,
174,213]. EVT has received a great deal of attention in geosciences [83,91,106,265,
270,273,280] and is extremely influential especially in hydrology [138,139,158]. In
the context of climate dynamics, the analysis of extremes has proved very fruitful for
providing a new viewpoint for understanding atmospheric predictability by looking
at the recurrence of weather patterns [82,184]. Persistence, as mentioned above, is
a key factor in determining the impact of large climatic fluctuations. EVT can deal
123
Large deviation theory in GFD and climate science
with time correlations in time series through the introduction of the extremal index,
which allows one to quantify the average size of clusters of consecutive extreme events
[87,187]. The extremal index encodes important information on the dynamics of the
system [36].
A more direct line to attack the problem of studying persistent extreme events can
be taken through the use of Large Deviation Theory (LDT) [263]. In a nutshell, in
one of its most basic formulations, LDT aims at providing limit laws for the average
of n(typically identically distributed) stochastic variables, where nis large. Similarly
to the case of EVT, a unified approach for LDT can be used on stochastic processes
and chaotic dynamical systems [141,283]. It is hard to overestimate the importance
of LDT in contemporary physics and mathematics [62,63,257,267]. Establishing a
large deviation principle for an observable—see below—leads to gaining predictive
power of the process. While in EVT such a power is aimed at being able to predict the
probability of occurrence of events larger than those already observed, in the case of
LDT the predictive power is twofold, as it is directed towards predicting the property
of occurrence of events that are larger and/or more persistent than the observed ones.
Drawing an example from climate science, EVT is better suited for studying the
probability of occurrence of extremely hot days, whereas LDT is better suited for
studying the probability of occurrence of heatwaves.
A fascinating aspect of looking at the properties of long time-averages of climatic
fields is the following. The theory of low-frequency variability of the atmosphere
indicates that long temporal persistence and large spatial extent of the anomalous
patterns go hand in hand [104,105]; see Fig. 1. In the mid-latitudes, it is custom-
ary and indeed scientifically meaningful to distinguish between synoptic variability,
due to mid-latitude eastward-moving weather systems and associated with temporal
scales of 3–7 days and spatial scales of the order of 1000 Km, and low-frequency
variability, whose temporal and spatial scales are typically larger, amounting to
1–3 months and several thousands of Km, respectively. [105,246]. The main man-
ifestations of low-frequency variability in the mid-latitudes are the so-called blocking
events, which are persistent, large-scale departures from the approximately zonally
symmetric flow associated with the presence of large-amplitude, almost-stationary
pressure anomalies [79,105,105,169,180,246,254,255]. The difference between syn-
optic and low-frequency variability is clarified when performing a spectral analysis
of the atmospheric fields: the former is associated with eastward propagating waves,
while the latter is characterised by stationary or weakly propagating planetary waves
[58]. Persistence is key to creating conditions conducive to long-lasting extreme events,
and, indeed, it is well-known that the anomalies of the flow due to occurrence of block-
ings can lead to long-duration warm [67] as well as cold extreme events [34]. Given
their long time duration and large spatial extent, blockings can lead, in a cascade pro-
cess, to the onset of extreme events also at considerable geographic distance from the
core of the blocking, as in the case of the summer 2010 floods in Pakistan resulting
from the large scale flow associated with the blocking—and ensuing heatwave—in
Russia [159]. Advancing our understanding of the low-frequency variability of the
atmosphere would be very beneficial because, despite continuous improvements, our
ability to perform accurate extended-range (beyond 7–10 days) weather forecast in the
mid-latitudes is still limited [105], and because attaining a convincing representation
123
V. M. Gálfi et al.
Fig. 1 Idealised power spectra for the atmosphere indicating the relationship between the spatial and tem-
poral scales of atmospheric flows. The source of this material is the COMET©Website at http://meted.ucar.
edu/ of the University Corporation for Atmospheric Research (UCAR), sponsored in part through coopera-
tive agreement(s) with the National Oceanic and Atmospheric Administration (NOAA), U.S. Department of
Commerce (DOC). ©1997–2017 University Corporation for Atmospheric Research. All Rights Reserved
of the statistical and dynamical properties of blocking events is still challenging for
both numerical weather forecast models [85] and climate models [54].
The hope is that, by focusing on suitably defined large deviations of the atmospheric
fields, one could distil information on the low-frequency variability of the atmosphere.
Roughly speaking, as discussed below, it can be proven rigorously that any large
deviation is realised in the least unlikely of all the unlikely ways [63]. Let’s clarify
this important concept using again an example drawn from climate science. Let’s
assume that we have established a large deviation law describing the probability of
occurrence of heatwaves in a given location. In principle, the corresponding rare events
can take place as a result of a variety of large scale atmospheric configurations; see a
recent analysis of heatwaves in France [7]. Nonetheless, LDT imposes that, in fact, if
we look at true extremes, with overwhelming probability the heatwaves we observe
will take place, apart from small-scale spatio-temporal fluctuations, as a result of a
well-defined large-scale atmospheric configuration, which is very rare in the standard
statistics, but is typical if we consider the multitude of possible heatwaves with same
intensity. By typical here we mean that the probability of the occurrence of a large
scale atmospheric pattern that is very close to such a configuration, conditional on
the occurrence of heatwave at the reference location, is very high, and gets closer to
one as we consider more stringent criteria - in terms of intensity and duration - for
123
Large deviation theory in GFD and climate science
the occurrence of (rarer) heatwaves. In dynamical terms, one has that selecting events
associated with large deviations amounts to considering a very small portion of the
phase space. The property above implies that the (rarely occurring) approach to this
very special region overwhelmingly occurs through a well-defined set of paths, that
are singled out by LDT, even if much unlikelier paths are still possible.
Indeed, looking at the specific case of the catastrophic 2010 Russian heatwave, one
does find that the observed extreme event is in some sense typical [67,94]. This does
not exclude the possibility of more exotic atmospheric configurations on the scale of
Eurasia, but their occurrence is much more unlikely than those, already extremely rare,
described by LDT. These exotic events might be interpreted as dragon kings [245].
Of course the possibility of practically using LDT in a complex and multiscale
system like the climate is far from being an obvious task for all possible climatic
observables. The mathematical foundations for using LDT in the context of climate
lay on taking into account, on the one hand, its chaoticity, and, on the other hand, the fact
that stochastic effects emerge as a result of considering its coarse-grained evolution.
Indeed, most of the results we present below are a natural extension of the scientific
programme aimed at developing and analysing stochastic climate models pioneered
by Hasselmann [125]; see later developments in [131,132,181,205,233]. Additionally,
one needs to take into account that while most LDT results require stationarity of
the time series, the climate system is only approximately stationary, because of the
periodicity in the solar forcing and the natural and anthropogenic forcings to the
atmospheric composition (e.g. change in greenhouse gases and in aerosols) and to the
properties of the land surface (e.g. forest fires; agriculture; deforestation). Therefore,
one might need to pre-process the data (e.g. removing the seasonal cycle; removing
trends) before being able to apply LDT. Clearly, since the climate is a nonlinear system,
the previous pre-processing aimed at removing part of the time-dependence is in
principle partly arbitrary and definitely non uniquely defined. Nonetheless, one needs
to resort to reasonable pragmatism in treating observational or model-generated data
that do not conform exactly to the demands of the mathematical theory, and possibly
derive nonetheless useful information, as often in fact done in physical sciences.
Another aspect to be kept in consideration is the presence of serial correlations in
the time series of the observables. If one considers, for example, the serial correlation
of the anomalies of the surface temperature (obtained after removing seasonal cycle
and long-term trends) somewhere in the middle of a continent, like Central Europe,
and the serial correlation of the same observable over an oceanic region, like the North
Atlantic (not far away from the first location), one would notice that the strength of the
serial correlation is much weaker and the auto-correlation function decays substantially
faster over the continent as compared to the oceanic region. In the latest case, the decay
of correlations will be slower than exponential (at least on a vast range of scales), as
a result of the presence of long-term memory in the system. Large differences in the
heat capacity of land surface vs water, and the dynamical link between surface waters
and deeper levels of the ocean explain such a discrepancy between the two cases.
The fact that the same climatic field - anomalies of the surface temperature - features
such fundamentally different properties, in terms of stochasticity, depending on the
geographical location of interest provides a good example of the complexity of the
climate system. Note that, as we will discuss below, while in the former case one is able
123
V. M. Gálfi et al.
to establish large deviation laws to describe accurately long and persistent temperature
fluctuations behind heatwaves and cold spells, LDT will not apply in the latter case.
1.3 Paths and transitions
LDT can be used for different scopes than looking at persistent deviation of fields.
Indeed, it provides tools for studying how such special configurations of the climate are
dynamically realised. One can use a more general definition of events that encompasses
trajectories in the phase space, and adapt LDT to study rare trajectories leading to target
extreme events. In this settings the dynamical equations contain a small parameter,
describing either a weak noise strength or an inverse time scale separation. Under such
conditions the path probabilities collapse onto one single path as the small parameter
goes to zero, either the deterministic zero-noise path for weak noise systems or the
averaged equation for system with a time scale separation. Also here the principle
holds that the unlikely event is reached in the least unlikely way. Such paths, called
instantons, can be seen as minimisers of an action describing the cost of going against
the natural tendency of the system [93]. Take, for example, a particle in a double well
potential with weak noise. The particle can transition from one well to another, but
in the weak noise limit such transitions will be rare. LDT then gives us not only an
approximation of the transition probability, but also of the mean exit time and the
transition trajectory.
Such a knowledge can furthermore be used to tackle challenges in numerically
sampling unlikely paths to rare events. In rare event simulation methods, a model is
dynamically driven in such a way that otherwise very rarely visited paths are over-
populated [29,228]. This can be done either by manipulating the dynamical equations
of the system, or by implementing genetic algorithms on top of the system, which
selectively kill and clone parallel realisations of the model. Hence, such trajectories
become statistically tractable without resorting to ultra-long numerical integrations.
Enriching the statistics, while retaining the correct dynamics, makes it possible to
explore the dynamical processes behind the extreme event of interest.
The previously mentioned fact that LDT allows one to select typical extreme events
is key for interpreting some recent results on so-called rogue waves in the ocean
[2,66,241]. Rogue waves are extremely dangerous hazards impacting the marine and
coastal environment, and manifest themselves as hard-to-predict surface waves that
can have surprisingly high destructive power and that, apparently, materialise out of
nothing [65,193]. A novel viewpoint has been recently proposed for finding a com-
prehensive theoretical framework on rogue waves, able to generalise earlier theories.
The idea has been to use LDT to study the properties of the solutions of the one-
dimensional nonlinear Schrödinger equation starting from suitably defined random
initial conditions constructed in accordance with observations taken from an oceano-
graphic campaign. Both numerical and experimental evidence strongly suggest that
rogues waves can be seen as hydrodynamic instantons, whose precursors can be clearly
identified, and that can be computed by minimising a suitably defined action [60,61].
A related area of investigation is the study of—rarely occurring—noise-induced
transitions between metastable states associated with alternative configurations of
123
Large deviation theory in GFD and climate science
geophysical flows or actual competing climatic states. In this case, along the lines of
the classical Freidlin–Wentzell theory [93], the target region in the phase space for the
endpoints of the desired paths is a special portion in the basin boundary separating the
competing basins of attraction, which corresponds to a saddle in the classical case of
motion in an energy landscape.
The multistability of the climate system manifests itself both locally and globally.
By local we mean that the difference between the competing metastable states is, in
fact, geographically confined and associated with one of the so-called tipping elements
[163], representing features of the climate system that can go through critical transi-
tions if forced beyond the point of no return. These include the dieback of the Amazon
forest [21], the shut-down of the thermohaline circulation of the Atlantic ocean [221],
the methane release resulting from the melting of the permafrost [269], and the collapse
of the atmospheric circulation regime associated to the Indian monsoon [166].
A hierarchically higher level of multistability is present in the Earth as our planet
is well known to have at least two possible steady climatic states in the current and
past astronomical configuration, the warm climate, and a frozen one, termed snowball,
which features global glaciation, extremely low temperatures and limited climatic vari-
ability. This is confirmed by geological and paleomagnetic evidence [128,211] and well
understood in terms of relevant dynamical processes [33,102,179,237]. Despite the
presence of chaotic dynamics in the competing attractors and of a complex geometrical
structure in the basin boundary [175], suitable generalisations of the Freidlin–Wentzell
theory proposed in [115117] allow one to establish large deviation laws able to
describe in the weak noise limit the transitions between the competing metastable
states. Indeed, one can define a generalised quasipotential, whose local minima cor-
respond to the competing attractors, while the transition paths cross preferentially
the basin boundaries in special locations, which are saddles also termed Melancho-
lia states [19,175177]. There are good reasons to believe that, in fact, the climate
system allows for the presence of additional competing metastable states on top of
the warm and snowball climate [1,32,167]. This leads to a more complex pattern of
possible transition paths between them and requires a careful statistical examination
when noise is added into the system [182]. Finally, one can interpret the localised
tipping elements described above as being associated with smaller and localised min-
ima and saddles, which define the multiscale nature of the quasi-potential. Therefore,
an adequate use of LDT might be key for making a more careful assessment of the
risk coming from irreversible transitions for present-day tipping elements, and then
for more precisely evaluating the risk of going beyond the so-called global planetary
boundaries [248,249].
1.4 This review
The goals of this paper are to provide an informal mathematical introduction to LDT
and then to lead the reader to explore some relevant applications of the theory for
analysing properties of geophysical flows and of the climate system. The range of
topics covered by this paper is somewhat broader and more targeted to real-life applica-
tions as compared to the excellent and more theoretically inclined earlier contribution
123
V. M. Gálfi et al.
by Bouchet and Vernaille on the statistical mechanics of two-dimensional and geo-
physical flows [30].
Depending on the observable and on the scales of interest, and specifically on the
strength of correlations, one can rely on different stochastic models to approximate
the behaviour of climatic observables: independent, identically distributed random
variables, Markov chains, dependent sequences. The theoretical overview of LDT
presented in Sect. 2is organised according to this line of thoughts. Subsequently,
Sect. 3introduces the concept of coarse-graining for the dynamics of geophysical
flows, presents the general framework of stochastic climate models, and discusses
the establishment of large deviation laws in stochastic and deterministic dynamical
systems. The analysis of large deviation laws for stochastic dynamical systems will
provide key tools for understanding the dynamical and statistical properties of tran-
sition paths between competing metastable states and for studying rare paths, rather
than just rare events. Instead, the results presented for deterministic dynamical sys-
tems will be useful for understanding the reason why Markov chain models are of
general interest for modelling the statistical properties of chaotic dynamical systems.
Section 4will then present a range of applications of LDT in various areas of geophys-
ical fluid dynamics and climate science. We will showcase its use for understanding
persistent climatic fluctuations, for characterising the fluctuations of the predictability
of geophysical flows on different time scales, for providing a unified viewpoint for the
understanding of rogue waves in the ocean, as well as for explaining special dynam-
ical features associated with transitions between competing metastable states, thus
mirroring the theoretical framework presented in the previous sections. This section
contains also novel, previously unpublished results. Finally, Sect. 5presents our con-
clusions together with a discussion regarding opportunities and challenges for future
applications of LDT in climate science
2 A summary of large deviation theory
In this section, we recapitulate the main elements of LDT for two stochastic models
applied often successfully to geophysical data: independent, identically distributed
(i.i.d.) random variables and Markov chains, or more generally dependent sequences.
This summary is far from being complete and does not make use of much mathematical
sophistication either. Hence readers experienced in mathematics are referred to [63],
whereas readers versed in physics are referred to [257]. These are at the same time the
main sources we follow.
2.1 Independent, identically distributed random variables
The first basic results of LDT is known as Cramér’s theorem [51] and describes the
large deviation behaviour of empirical sample averages 1
nn
i=1Xi=1
nSn.
Theorem 1 Let (Xi)be i.i.d. R-valued random variables with a finite moment gen-
erating function in a region around the origin, i.e. 0int(Dϕ)with Dϕ={tR:
123
Large deviation theory in GFD and climate science
ϕ(t):= E[etX1]<∞} , where E[f(X1)]is the expectation value of f (X1). Let
Sn=n
i=1Xi. Then, for all a >E[X1],
lim
n→∞
1
nlog P1
nSna=−I(a)(1)
where
I(z)=sup
tR[zt log ϕ(t)](2)
According to (1), which can be written in the form P(Sn/na)exp (nI(a)),1
the probability of empirical averages deviating from the mean decays exponentially
with the averaging length n,asnincreases. If this is the case, we say that we have
found a large deviation principle. The speed of decay is described by the rate function
I. The rate function in Theorem 1has some important and useful properties, such as
compact level sets, lower semi-continuity and convexity on Ras well as continuity,
strict convexity and smoothness on the interior of DI={zR:I(z)<∞}.I(z)0
with equality if and only if z=μ, with μ=E[X1]. Thus, the minimum of the rate
function is located at the expectation value of the random variable suggesting that
the sample averages converge to the expected value, as stated by the the law of large
numbers. Furthermore, I(μ) =12, the second derivative of the rate function at its
minimum is the inverse of the variance of the random variable X1, which goes back
to the central limit theorem.
As shown by (2), the rate function is the Legendre transform of the cumulant
generating function log ϕ. We will discuss this relationship in more detail below.
Equations (1) and (2) describe two different methods to estimate the rate function in
case of applications: a direct method based on the probability density function (pdf) of
averages and an indirect one based on the cumulant generating function, as discussed
in detail in Sect. 3.3.
Considering that the rate function is lower semi-continuous and convex, and attains
its unique minimum at the expectation value μ,ifa, then I(z)I(a)for all
za. Thus, Eq. (1) can be rewritten for aas
lim
n→∞
1
nlog P1
nSnA=−inf
zAI(z)with A=[a,). (3)
Similarly, if, instead, a, one obtains:
lim
n→∞
1
nlog P1
nSnA=−inf
zAI(z)with A=(−∞,a].(4)
This indicates one of the basic principles of LDT that we have hinted in the intro-
duction. The occurrence of a large deviation {Sn
nA} is closely associated with the
specific event corresponding to the lowest value of the rate function Itaken in A,
1We have that aεbεif limε0ln(aε)
ln(bε)=1; here 1=n.
123
V. M. Gálfi et al.
as the probability of this event is exponentially larger than the probability of all the
other events compatible with the conditions {Sn
nA}. The rate function can then be
interpreted as a cost function, and we have that any large deviation is done in the least
unlikely of all the unlikely ways [63].
In the following, we discuss some generalisations of Theorem 1by going from large
deviations of empirical averages to large deviations of empirical measures. From the
more general setting of Cramér’s theorem we go now to a finite state space, where the
i.i.d. random variables X1,X2,...take values in a finite set XiΓ={1,...,r}⊂N
and obey the marginal law ρ=s)sΓ,ρs>0. The empirical measure Ln=
1
nn
i=1δXiis a random probability measure on Γ. We denote the set of probability
measures on Γby M(Γ ) ={ν=1,...,ν
r)∈[0,1]r:r
s=1νs=1}, where
the total variation distance between two measures μand νis defined as d(μ, ν) =
1
2r
s=1|μsνs|. The following theorem, which goes back to Sanov [234], contains
a large deviation law of Lnwith respect to ρ.
Theorem 2 Let (Xi)be i.i.d. random variables satisfying the conditions above, and
Ln=1
nn
i1δXi. Then, for all a >0,
lim
n→∞
1
nlog P(LnBc
a(ρ)) =− inf
νBc
a(ρ) Iρ), (5)
where Ba(ρ) ={νM(Γ ) :d(ν, ρ) a},B
c
a(ρ) =M(Γ ) \Ba(ρ), and
Iρ(ν) =
r
s=1
νslog νs
ρs:= H|ρ) (6)
When comparing (3) with (5), it becomes clear that Theorem 2is nothing more
than a higher dimensional version of Theorem 1. Instead of looking at deviations
of the empirical averages away from the mean, we consider now deviations of the
empirical measure Lnaway from the true measure ρ. The rate functions depends in
this case on the different measures νon Γand on how similar they are to ρ.The
quantity H|ρ) is the relative entropy of the measure νwith respect to the measure
ρ[152]. By applying Jensen’s inequality to Iρ(ν) =−sνs(logss)),wehave
that Iρ(ν) ≥−log sνsss)=0, with the equality being realised if and only if
ν=ρ.
In other terms, Sanov’s theorem states that the exponential rate of decay of the
probability of a large deviation of size abetween the empirical measure and the
marginal distribution ρis controlled by the element of all measures on Γwhose
distance from ρis athat is closest to ρin the sense of relative entropy.
The contents of Theorem 2allow us to reinterpret and extend the results discussed
in (3)–(4). Let’s consider a function fwith r
s=1fsρs=μfR. We define Φf,a=
{φM(Γ )|sfsφsa}. We also define Ψf,a={φM(Γ )|sfsφs=a}.
Clearly, one has Φf,a=∪
baΨf,b.
123
Large deviation theory in GFD and climate science
We have that P(LnΦf,a)=P1
nn
j=1f(Xj)a, where aμfand we
consider the empirical measure Ln=1
nn
i=1δXiintroduced before. One then derives
that:
lim
n→∞
1
nlog P(LnΦf,a)=lim
n→∞
1
nlog P
1
n
n
j=1
f(Xj)a
=− inf
νΦf,a
Iρ(ν) =−inf
zainf
νΨf,z
Iρ(ν) (7)
Let’s now consider the case f(x)=x. The empirical average Sn/nis connected to the
empirical measure Lnthrough the formula Sn/n=r
s=1sLn(s). The rate function
in (3) can be obtained from (7)by
I(z)=inf
νΨf,z
Iρ(ν). (8)
Thus, the rate function of the empirical average zis equal to the infimum of the rate
function for the empirical measure νif the infimum is taken over all the measures νwith
mean μ=r
s=1sνs=z. In other words, there is an equivalence between the large
deviations of the empirical average zand the large deviations of the least unlikely
empirical measure νwith mean equal to z. This is an example of the contraction
principle that we state now.
Theorem 3 Contraction Principle. Let Anbe a family of random variables such that
lim
n→∞
1
nlog P(AnA)=−inf
zAIA(z)(9)
and let’s consider another family of random variables Bn=T(An)where T is a
continuous function. It is possible to establish a large deviation principle for Bnas
follows:
lim
n→∞
1
nlog P(BnB)=−inf
zBIB(z), IB(z)=inf
y=T1z
(IA(y)). (10)
Theorem 2can be generalised further to large deviations of pair empirical measures
as well as of measures with higher dimensions. Higher level large deviation laws
imply the ones for lower levels, the downward link being provided by the contraction
principle. The interested reader can find a short summary of the generalisations to
higher dimension in Appendix A, for a detailed discussion of this topic we refer to
[63].
2.2 Dependent sequences
We continue with a generalisation of Cramér’s Theorem for random sequences that
have a form of moderate dependence, which goes back to [100] and [80]. A rigorous
123
V. M. Gálfi et al.
derivation of the Gärtner–Ellis (GE) theorem would go beyond the scope if this paper,
thus we concentrate on the main results. As above, we follow here the work of [257].
We consider the sequence (Zn)of random variables on the probability space
(Rd,B(Rd), P), where B(Rd)is the Borel sigma-field on Rdwith moment gener-
ating functions
ϕn(t)=E[et,Zn],tRd,nN(11)
with ·,· denoting the standard inner product. It can be useful to think of (Zn)as an
empirical average, but this doesn’t have to be the case. We assume that the limit
lim
n→∞
1
nlog ϕn(nt)=Λ(t)∈[,∞] (12)
exists and
0int(DΛ), with DΛ={tRd:Λ(t)<∞}.(13)
We also assume that Λis convex and differentiable on int(DΛ). Furthermore, we
assume that Λis lower semi-continuous on R, and either DΛ=Rdor Λis steep at
DΛ.
Let Pn(·)=P(Zn∈·). Under the above conditions, the GE theorem states that
(Pn)satisfies a large deviations principle on Rdwith rate nand with rate function
I(x)=sup
tRd[x,t−Λ(t)],xRd.(14)
Thus, the rate function Iis the Legendre-transform of Λ, also called the scaled cumu-
lant generating function. The rate function Iis convex. Note that (14) is a generalised
form of (2).
If Zn=1
nn
i=1Xiwith Xia stationary random sequence, then conditions (12)
and (13) can be interpreted as a kind of moderate dependence assumption on (Xi).
However, in case of strong dependence, the theorem would fail because the strict
convexity of Λwould be violated.
We have seen that by using the GE theorem one obtains a large deviations principle
under fairly mild regularity assumptions. As mentioned above, it is not necessary that
Znrepresents sample averages. In fact, the large deviation principles presented in
Sects. 2.1 and 2.2 for sample averages, empirical measures, pair empirical measures
(see Appendix A), and so on, can all be obtained by following the route given by the
GE theorem as well. Below, we derive based on [63,257] the rate functions of sample
averages for i.i.d. random variables and for Markov chains, by using the GE theorem.
1) Let (Xi) be i.i.d. R-valued random variables satisfying ϕ(t)=E[etX1]<,
for all tR. Let us consider the empirical average Zn=1
nn
i=1Xi. Then,
ϕn(nt)=E[ent Zn]=Eetn
i=1Xi=[ϕ(t)]n,(15)
123
Large deviation theory in GFD and climate science
with ϕthe moment generating function of X1. Hence Λ(t)=log ϕ(t)and the GE
theorem reduces to Crámer’s theorem (Theorem 1).
2) Let (Xi) be a stationary Γ-valued Markov chain. Let Zn=1
nn
i=1f(xi), where
f:ΓRd,d1. Then,
ϕn(nt)=E[ent Zn]=Eetn
i=1f(xi)
=
x1,...,xnΓ
π(x1)etf(x1)P(x2|x1)etf(x2)···P(xn|xn1)etf (xn),
where π(x1)denotes the probability of the initial state x1, and P(xi|xi1)denotes the
conditional probability of state xigiven xi1,i=1,...,n. By defining πt(x1)=
π(x1)etf(x1)and Pt(xi|xi1)=P(xi|xi1)etf(xi), we have that
ϕn(nt)=
jΓ
n1
tπt)j,
where πtis the vector of probabilities for which t)i=πt(x1=i), and Πtdenotes
the matrix with elements t)ji =Pt(j|i). Based on Perron–Frobenius theory for
positive matrices we get that limn→∞ log ϕn(nt)=log λ(t), with λ(t)denoting the
unique largest eigenvalue of Πt. Hence Λ(t)=log λ(t), and the rate function is given
by the Legendre transform
I(z)=sup
tR[zt log λ(t)].(16)
Please note that (16) can be used to obtain the rate function only if Πhas a unique
stationary distribution π.IfΠhas several stationary distributions, Λ(t)exists, but
depends on the initial distribution π(x1).IfΠhas no stationary distribution, generally
no large deviation principle can be found and the law of large numbers does not even
hold [257].
3 Large deviations in dynamical systems
At this point, we leave the idealised world of i.i.d. random variables and discrete time
processes, and turn our attention to systems evolving continuously in time, as we want
to look into mathematical models that are more relevant for capturing the dynamical
properties of the climate system. Instead of empirical measures and sample averages,
we consider in the following probabilities of trajectories or paths of deterministic
dynamical systems and finite time averages along these trajectories. However, the
main ingredients leading to large deviation results stay the same. One needs basically
the attracting effect of an asymptotic limit leading to an exponential decay of proba-
bilities of finite time estimates. By taking into consideration the dynamics in time and
including the temporal dimension into the large deviation analysis, the methods pre-
sented below are directly relevant for geophysical applications. We will present some
123
V. M. Gálfi et al.
basic results pertaining to stochastic and to deterministic chaotic dynamical systems,
for the sake of completeness, and because the modelling of geophysical flows follows
both dynamical paradigms.
First, we motivate the use of stochastic dynamics for investigating the properties
of geophysical flows by introducing the concept of filtering and the development
of evolution equations based on dynamical balances and specialised for specific
scales of motion [104,142,247]. The introduction of stochastic parametrizations
[16,92,104,143,277] is motivated through the use of the Mori–Zwanzig formalism
[188,288]. When suitable limits are considered, the stochastic component, which pro-
vides a surrogate representation of the effects of the scales we are unable to describe
explicitly, can be written as multiplicative white noise [202]. This provides the basis
for a large class of stochastic climate models of very widespread use and great physical
relevance [125,131,132,181,205,233]. Such stochastic models are amenable to being
studied using the Freidlin–Wentzell theory [93], which allows to derive powerful large
deviation results. Additionally, one should keep in mind that the climate undergoes
actual stochastic forcing due to random fluctuations in the incoming solar radiation
and other astrophysical factors. More in general, the use of stochastic dynamics for
describing non-equilibrium statistical mechanical systems has reached a high level of
popularity and has shown a great potential for deriving results of great theoretical and
practical relevance [10,154,170,198].
In case of the Freidlin–Wentzell theory, the zero-noise limit of stochastic evolution
law is given by its purely deterministic component. Hence, one obtains the probability
of random paths deviating from the deterministic path in terms of large deviation laws.
The probabilities of deviation of finite time averages from their asymptotic values can
be obtained from the large deviation results for random paths using the contraction
principle. A more general and pragmatic approach, however, which can be followed
even in case of unknown model equations, is related to the fact that finite time averages
of weakly correlated observables are (nearly) independent. Thus, one can model finite
time averages of correlated observables as resulting from i.i.d. random variables or
Markov chains. Consequently, the theorems presented in Sect. 2can be applied in a
similar way with the difference that the large deviations parameter nis now related
to time. In Sect. 3.3 we discuss a modified version of the GE theorem (14) acting on
time averaged observables.
Later on, we consider special chaotic dynamical systems, so-called Axiom A sys-
tems [31], and discuss the emerging large deviation laws for finite time averages of
given observables. The framework of Axiom A systems-which are essentially the clos-
est deterministic relatives of the truly stochastic systems - blurs the distinction between
statistical mechanics and dynamical systems theory, mainly as a result of the fact that
Axiom A systems possess a rather special ergodic invariant measure that has a clear
physical interpretation [78]. Another remarkable property of Axiom A systems is that
they admit a Markov partition, i.e. a partition of the attractor such that one can put in a
one-to-one correspondence the actual orbit of the system with an infinite sequence of
symbols describing the history of occupancy of the various elements of the partition.
Accordingly, the original map can be associated with a shift map, i.e. a finite-state
Markov chain describing the probability of transition between the various elements of
the partition [97,229]. The possibility of establishing the so-called symbolic dynamics
123
Large deviation theory in GFD and climate science
guarantees that the results presented in Sect. 2.2 for finite-state Markov chain apply
also for Axiom A systems. Nonetheless, there is no free lunch: it is in general far from
trivial to actually construct the Markov partition. As discussed below, while Axiom
A systems are very special dynamical objects, the chaotic hypothesis [97,98] makes
them very relevant for providing a framework for studying large deviations laws in
high-dimensional geophysical systems.
3.1 Stochastic climate models
The state of the climate system can be described using the continuum approximation,
introducing field variables that depend on three spatial dimensions and time. The partial
differential equations that describe the evolution of the field variables are based on the
budget of mass (including different chemical species), momentum and energy. Since
the climate system features variability on a vast range of spatial and temporal scales,
as mentioned above, a key procedure one needs to apply, both on theoretical grounds
and for reasons of defining efficient numerical models, is to specialise the evolution
equations to a desired range of spatial and temporal scales of interest by the use
of suitable approximations based on the validity of approximate dynamical balances
[104,142,247]. Additionally, when constructing an actual numerical model, the three-
dimensional fields are discretised on a lattice, either in the physical space, or in the
reciprocal space via spectral projection, or in a suitable combination of the two. Hence,
the impact of the physical processes occurring in the unresolved spatio-temporal scales
on those taking place in the resolved ones can be represented only through approximate
parametrizations [16,92]. The Mori–Zwanzig coarse-graining based on the projection
operator [188,288] clarifies that such parametrizations have in general a deterministic,
a stochastic, and a non-Markovian component [42,143,276,277].
Let us assume, for simplicity, that the true evolution equation for the climate system
can be written as a system of autonomous ordinary differential equations2of the form
dz
dt=G(z)(17)
where zRN. The procedure of coarse-graining, associated with specialising the
equations for a specific range of time and spatial scales, implies that we rewrite the
state vector zas z=(x,y), where xRnand yRNnand we aim at deriving
approximate equations of the variables of interest x. It is reasonable to assume that
nN. Note that, alternatively, xcan correspond to the variables describing the state
of a portion of the climate system (e.g. the atmosphere), and ycan instead describe the
rest of the system. One does not need to assume apriorithe presence of a very large
time-scale separation between the dynamics of the xand ycomponents. One can then
rewrite (17)as:
2Weare here neglecting the—very important—presence of explicit time-dependence and stochastic forcing
in the dynamics; see [43,103,104] for a detailed discussion of this aspect.
123
V. M. Gálfi et al.
dx
dt=f(x)+δfx(x,y)
dy
dt=1
g(y)+δ
gy(x,y)
(18)
where fand gdefine the autonomous dynamics of the xand ycomponents, respec-
tively, δis a constant controlling the intensity of the coupling, and defines the time
scale separation between the two sets of variables. The Mori–Zwanzig theory indicates
that one can in general write the dynamics of the xvariables in an implicit form as
follows:
dx
dt=f,δ(x)σ,δ(x)+dsK,δ(x,ts)x(s), (19)
where the three terms of the right hand side correspond to the deterministic drift,
to a noise contribution, and to the memory term. In the weak-coupling limit (δ
0), it is possible to derive via perturbative approach an explicit expression for these
three terms that is valid up to order δ2[276,277]; see a practical implementation of
this theory for the development of parametrizations in geophysical fluid dynamical
models in [59,264,275]. Note that one can derive an expression for the Mori–Zwanzig
projected dynamics using data-driven approaches [42,143]. Very recently, it has been
shown [120] that the data-driven and the top-down approach presented in [276,277]
are fundamentally equivalent.
Instead, if the two sets of variables xand yhave an infinite time scale separation
(0), the dynamics of the variable xconverges to a deterministic averaged equation
(for more details, see Sect. 3.2 below). Via homogenisation theory [202] deviations
from this averaged equation can be modelled by a stochastic differential equation
without memory, and with multiplicative white noise, so that the evolution of the
xRnvariables is controlled by:
dxt=F(xt)dt+Σ(xt)dWt,(20)
where it possible to derive explicit formulas for the renormalised drift term F:Rn
Rn, and the diffusion matrix Σ:RnRn×m, while Wtis an m-dimensional Brow-
nian motion. Here and in what follows we assume the It ô convention for stochastic
differential equations. In a nutshell, the impact of the neglected scales of motions cor-
responding to the yvariables is twofold: it leads to (a) a change in the deterministic
contribution to the evolution of the xvariables; and to (b) the inclusion of a random
forcing. Stochasticity is essentially due to the lack of information on the state of the
yvariables in the projected xspace; see a detailed discussion of this in [42].
Equation (20) is at the basis of stochastic climate models, whose investigation was
initiated by Hasselmann [125]; see a comprehensive analysis of this viewpoint and
further developments in [131,132,181,205,233]. Traditionally, the deterministic com-
ponent of (20) features one or more fixed points, and the noise allows for the the system
to explore regions of the phase space far from the deterministic solutions, and to per-
form transitions between competing metastable states. We will provide a broader view
point on this in Sect. 4.5, where we will consider more general competing asymptotic
123
Large deviation theory in GFD and climate science
states. Stochastic climate models have been key for discovering fundamental phys-
ical processes like stochastic resonance [14,15,99,191,192], and have provided key
insights for studying the transitions between different weather regimes in the atmo-
sphere [13,40,134,144,186,231]; see discussion in Sect. 4.6. Equation (20) is probably
the most convenient starting point for discussing the use of LDT in geophysical flows
even if, as shown in Sect. 3.4, LDT can be introduced also in the context of determin-
istic chaos.
3.2 Dynamical systems perturbed by weak noise
We now focus on the stochastic climate models introduce in the previous subsection
and aim at deriving large deviation laws. The Freidlin–Wentzell theory [263,274],
allows one to study the convergence of probability measures on the path-space of a
stochastic differential equation Xin Rn
dXε
t=b(Xε
t)dt+εσ(Xε
t)dWtXε
0=x,t0.(21)
where, as in (20)b:RnRnis a deterministic drift, σ:RnRn×mis the
diffusion function, Wtis an m-dimensional Brownian motion, and we introduce here
the parameter ε>0 that controls the intensity of the stochastic forcing.
For bounded and Lipschitz band σ, it can be shown that as the noise intensity goes
to zero (ε0), the distribution of paths of Xε
tconverges to the deterministic path
determined by dxt=b(xt)dt[274]. For all T>0 and δ>0
lim
ε0
Pmax
0tT|Xε
txt|
=0.
We may wonder, of course, about the probability of observing a given path f(t)= xt
when ε= 0. It can be shown that a large deviation principle holds for Xε
t, with a rate
function or action functional
IT(f)=1
2T
0˙
f(t)b(f(t)), a1(f(t))( ˙
f(t)b(f(t)))dt,(22)
where a(x)=σ(x)σ (x)Tis the noise covariance. We have that
Psup
t∈[0,T]|Xε
tf(t)|
exp 1
εIx(f).(23)
Similarly as integrals of the form b
ae1
εh(x)k(x)dxare dominated by the minimum
x0of h(x)in Lagrange’s methods, as ε0, the probability in a set FC[0,T]of
trajectories concentrates on the trajectory fwith the smallest rate function Ix:
IT(f)=inf
fFIT(f). (24)
123
V. M. Gálfi et al.
Such path is called the minimum action path or instanton.
The exit problem In the limit ε0, the dynamics of (21) is determined by the drift
field b. When bhas an attractor, the trajectory will never escape from it in the absence
of noise. The situation is markedly different when noise is added. The system can make
excursions away from the attractor, can exit from its surrounding, and can possibly
perform transitions to another attractor.
LDT provides a way to describe the exit from regions containing an attractor, e.g.
if Ωis a bounded set containing a stable fixed point ¯xof dxt=b(xt)dt, then the exit
from the domain will happen close to the point minimising the action (22). For more
details see [93].
Instanton calculation The minimisation of the action functional for problems of inter-
est in geophysics can usually not be done analytically. In such cases the instanton
needs to be calculated numerically.
Arguably the most direct way of finding the instanton is by minimising the action
(22). In the minimum action method [75], the instanton fon a finite time interval
[0,T]is approximated by f(ti)on a discrete temporal grid, a discrete approximation
to the action is derived and a quasi-Newton method is then applied to minimise the
discretised action.
Another fairly simple method of numerically finding the instanton is solving the
Hamilton equation connected to this minimisation problem. A difficulty arises here
in that we are often looking for a minimisation with fixed start and end points for f
at t=0 and t=T. To solve the Hamilton equation we need to specify initial values
for the coordinates and their conjugate momenta, however. A shooting method can be
applied to find the initial values of the momenta, but this is in general difficult to apply
in high dimensions.
Both these methods can only be applied to finite time intervals, while in many
cases we will want to allow for infinite time lengths of transition. In the special case
where the drift term is a gradient, i.e. b=−U, and σis the identity, these problems
can be circumvented by using the string method [74] which uses that the instanton
is always parallel to the drift. The method alternates relaxation along the drift with a
redistribution of the discretisation points along the instanton curve.
This principle has been further generalised to non-gradient systems in the geometric
minimum action method [261]. Here the action is reformulated in a geometric way
that does not invo