Available via license: CC BY-NC-SA 4.0
Content may be subject to copyright.
Robust Elicitable Functionals
Kathleen E. Miao*
Department of Statistical Sciences, University of Toronto, Canada, k.miao@mail.utoronto.ca
Silvana M. Pesenti
Department of Statistical Sciences, University of Toronto, Canada, silvana.pesenti@utoronto.ca
September 6, 2024
Elicitable functionals and (strict) consistent scoring functions are of interest due to their utility of determin-
ing (uniquely) optimal forecasts, and thus the ability to effectively backtest predictions. However, in practice,
assuming that a distribution is correctly specified is too strong a belief to reliably hold. To remediate this, we
incorporate a notion of statistical robustness into the framework of elicitable functionals, meaning that our
robust functional accounts for “small” misspecifications of a baseline distribution. Specifically, we propose
a robustified version of elicitable functionals by using the Kullback-Leibler divergence to quantify potential
misspecifications from a baseline distribution. We show that the robust elicitable functionals admit unique
solutions lying at the boundary of the uncertainty region. Since every elicitable functional possesses infinitely
many scoring functions, we propose the class of b-homogeneous strictly consistent scoring functions, for
which the robust functionals maintain desirable statistical properties. We show the applicability of the REF
in two examples: in the reinsurance setting and in robust regression problems.
Key words : Elicitability, Kullback-Leibler divergence, Model Uncertainty, Risk Measures
1. Introduction
Risk measures are tools used to quantify profits and losses of financial assets. A risk measure
maps losses, stemming from, e.g., historical data or a distributional output, to a real number
characterising its riskiness or in an insurance setting, the amount of assets required to be retained
as reserves. However, in many situations, data or distributional information of modelled losses may
be flawed for several reasons – they may be out-of-date, sparse, or unreliable due to errors. As such,
it is of interest to relax the assumption that the underlying distribution is correctly specified. In the
literature, this is referred to as distributional robustness, and is often approached via a worst-case
risk measure; indicatively see Ghaoui et al. (2003), Bernard et al. (2024), Cai et al. (2024). For a
* corresponding author
1
arXiv:2409.04412v1 [stat.ME] 6 Sep 2024
Miao and Pesenti: Robust Elicitable Functionals
2
set of probability measures Q– the so-called uncertainty set – a worst-case risk measure evaluated
at a random variable Yis defined by
sup
Q∈Q
RFQ
Y,(1)
where Ris a law-invariant risk measure and FQ
Yis the cumulative distribution function (cdf) of Y
under Q. Thus, a worst-case risk measure is the largest value the risk measure can attain over a
predetermined set of cdfs.
From a statistical perspective, the notion of elicitability of a law invariant functional is of interest
as it yields a natural backtesting procedure (Nolde and Ziegel 2017). Of course, risk measures are
functionals, and specific risk measures, such as the Value-at-Risk (VaR), are elicitable. An elicitable
risk measures has representation as the minimiser of the expectation of a suitable scoring function
(Gneiting 2011). Thus, for an elicitable risk measure Rwith corresponding scoring function S, the
worst-case risk measure given in (1) can equivalently be written as
sup
Q∈Q
arg min
z∈RZS(z, y )dF Q
Y(y).(2)
The worst-case risk measure framework yields a worst-case cdf, that is, the one attaining the supre-
mum in (1) and (2). This worst-case cdf may be interpreted as the distribution of the losses when
the most adverse probability measure in Qmaterialises. Depending on the choice of uncertainty
set and risk measure, however, these worst-case cdfs may yield ineffectively wide risk bounds, see
R¨uschendorf et al. (2024) for various instances. Alternatively to changing the distribution of the
losses, the risk measure may be altered to a “worst-case / robust” risk functional. Motivated by
this, we interchange the supremum and argmin in (2) and define the robust elicitable functional
(REF) as
RS(Y) := arg min
z∈R
sup
Q∈QεZS(z, y )dF Q
Y(y),(3)
where in the exposition, we consider an uncertainty set characterised by the Kullback-Leibler (KL)
divergence. In contrast to worst-case risk measures, the REF depends not only on the choice of
risk measure but more specifically on the scoring function that elicits the risk measures, see also
Section 3for a detailed discussion. As the REF is the minimiser of the worst-case expected score,
it is indeed a “robust” risk functional. From a computational perspective, if the inner optimisation
problem of (3) over the space of probability measures can be solved semi-analytically, the outer
problem over reals can then be tackled using classical optimisation techniques.
The benefits of elicitable functionals has been well justified in the domains of statistics and
risk management. Elicitable functionals and (strictly) consistent scoring functions are of interest
due to their utility of determining (uniquely) optimal forecasts, and thus the ability to effectively
backtesting predictions, see e.g., Gneiting (2011) and Fissler and Ziegel (2016). The literature on
Miao and Pesenti: Robust Elicitable Functionals
3
elicitability in risk management is extensive, with many authors arguing for its importance (Ziegel
2016,He et al. 2022). The characterisation of elicitable convex and coherent risk measures for
example has been studied in Bellini and Bignozzi (2015), while Fissler et al. (2016) showed that
Expected Shortfall (ES) is jointly elicitable with VaR. Embrechts et al. (2021) explored the con-
nection between elicitability with Bayes risks, and showed that entropic risk measures are the only
risk measures that have both properties. In the related Fissler and Pesenti (2023), authors study
sensitivity measures tailored for elicitable risk measures. Robustness to distributional uncertainty
is of concern in risk management and has been explored in the literature, see e.g., Embrechts
et al. (2013) for approximation of worst-case risk measures using the Rearrangement Algorithm,
Pesenti et al. (2016) for a discussion on distributional robustness of distortion risk measures, and
Embrechts et al. (2015) for the aggregation robustness of VaR and ES.
In this work, we propose an alternative to the classical worst-case risk measure by taking the min-
imiser of a worst-case expected score, which results in a distributionally robust elicitable functional,
the REF given in (3). We solve the inner optimisation problem of (3) and obtain a semi-closed
solution of the worst-case probability measure. Furthermore, we prove existence and uniqueness
of the REF, that is, the solution to the double optimisation problem (3). We extend the REF
to k-elicitable functionals, and thus obtain jointly robust VaR and ES pairs. In this setting, we
obtain a probability measure that attains the worst-case simultaneously for VaR and ES. This is
in contrast to the classical framework of worst-case risk measures, which yields different worst-case
cdfs for VaR and ES. As the proposed REF depends on the choice of scoring function, we discuss
the families of homogeneous scoring function that naturally lead to visualisations of the REFs via
Murphy diagrams. We showcase the applicability of the REF on a simulated reinsurance example
and connect the framework to robust regression.
This paper is organised as follows; in Section 2, we define the robust elicitable functional and
state our main results – solving the inner problem of (3) and establishing existence and uniqueness
of the REF. In Section 3, we address the question of choosing a scoring function by considering
families of parameterised scoring functions, showing that these families retain useful properties of
the functional, and illustrate the behaviour of the REF in Murphy diagrams. We extend our results
to higher order elicitable risk measures such as the (VaR, ES) pair in Section 4, and consider an
application of the robust (VaR, ES) in reinsurance. Finally, we explore the connection of the REF
to robust regression in Section 5.
2. Elicitability and robust elicitable functionals
This section provides the necessary notation and recalls the notion of elicitability. We further define
the robust elicitable functional and prove its existence and uniqueness.
Miao and Pesenti: Robust Elicitable Functionals
4
2.1. Robust elicitable functionals
Let (Ω,F,P) be a probability space, where we interpret Pas the baseline probability measure, e.g.,
estimated from data. Let L∞:= L∞(Ω,F,P) be the space of essentially bounded random variables
(rvs) on (Ω,F,P) and denote by M∞the corresponding class of cumulative distribution functions
(cdfs), i.e. M∞:= {F|F(y) = P(Y≤y), Y ∈ L∞}.
We denote by R:M∞→A,A⊆R, a potentially set-valued, law-invariant, and non-constant
functional, which is a mapping from the set of admissible cdfs, M∞, to the set of allowed predictions
A, also called action domain. We assume that when evaluating the functional R(Y) that Yis
non-degenerate. As Ris law-invariant, it can equivalently be defined as R:L∞→A, by setting
R(Y) := R(FY), where FYis the cdf of Y∈ L∞.
Throughout we work with elicitable functionals, for this we first recall the definition of scoring
functions and elicitability. In statistics, scoring functions are used to asses the accuracy of a (sta-
tistical) estimate or prediction to the true value. Formally, S(z, y ) is a mapping from a prediction
zand a realisation yof Yto the non-negative reals. By convention, we view Sas a penalty for
the forecast zcompared to the realised event y, where smaller values represent more accurate
predictions.
Definition 1 (Scoring functions and elicitability). A scoring function is a measurable
function S:A×R→[0,∞). A scoring function may satisfy the following properties:
i) A scoring function Sis consistent for a functional R, if
ZS(t, y)dF (y)≤ZS(z , y)dF (y),(4)
for all F∈ M∞,t∈R(F), and all z∈A.
ii) A scoring function Sis strictly consistent for a functional R, if it is consistent and Equation
(4) holds with equality only if z∈R(F).
iii) A functional Ris elicitable, if there exists a strictly consistent scoring function Sfor R.
An elicitable functional Radmits the representation
R(Y) = arg min
z∈RZS(z, y )dFY(y),(5)
for all FY∈ M∞and where Sis any strictly consistent scoring function for R. We note that
R(Y) can be an interval, such as in the case of the Value-at-Risk, and we denote the image of R
by ImR⊂A. We call R(Y) unique, if the argmin is a singleton. Many statistical functionals and
risk measures are elicitable, including the mean, median, quantiles, expectiles, see, e.g., Gneiting
(2011), and Bellini and Bignozzi (2015).
The definition of elicitability, in particular Equation (5), assumes knowledge of the true distri-
bution of Y. However, under distributional ambiguity – that is, uncertainty on the distribution of
Miao and Pesenti: Robust Elicitable Functionals
5
Y– one may consider a robust functional. A classical choice to model distributional deviations
is the Kullback-Leibler (KL) divergence. The KL divergence has been extensively used in model
assessment; indicatively see Glasserman and Xu (2014), Pesenti et al. (2019), Blanchet et al. (2019),
Lassance and Vrins (2023). For a probability measure Qon (Ω,F), the KL divergence from Qto P
is defined as
DKL (Q|| P) := EPdQ
dPlog dQ
dP,
if Qis absolutely continuous with respect to Pand +∞otherwise. We denote by EQ[·] the expected
value under Q, and for simplicity write E[·] := EP[·], when considering the baseline probability
measure P. We use the KL divergence to describe an uncertainty set and propose the following
robust elicitable functionals.
Definition 2 (Robust elicitable functional). Let Rbe an elicitable functional with
strictly consistent scoring function Sand let ε≥0. Then we define the robust elicitable func-
tional (REF), for Y∈ L∞, by
RS(Y) := arg min
z∈R
sup
Q∈Qε
EQ[S(z, Y )],(P)
where the uncertainty set Qεis given as
Qε:= Q|P≪Q,and DKL (Q|| P)≤ε.(6)
The parameter ε≥0 is a tolerance distance that quantifies the distance to the baseline proba-
bility P. Clearly, if ε= 0, only the baseline measure is considered and we recover the traditional
definition of elicitable functionals. If the tolerance distance becomes too large, as specified in the
next proposition, the inner problem in (P) becomes degenerate. For this, we first consider the limits
of the KL divergence of a specific family of probability measures.
Lemma 1. Let Ybe a rv satisfying P(Y= ess sup Y) =: π > 0and define the probability measure
dQs
dP:= esY
E[esY ].
Then the following limits hold
lim
s→0DKL (Qs|| P) = 0 and lim
s→+∞DKL (Qs|| P) = log 1
π.
Proof The fact that lims→0DK L(Qs|| P) = 0 follows as lims→0dQs
dP= 1. For the second limit, we
redefine the rv Yvia
Y(ω) = (W(ω) on ω∈A∁
¯y:= ess sup Yon ω∈A ,
Miao and Pesenti: Robust Elicitable Functionals
6
where P(Y∈A) = πand A∁denotes the complement of A. By setting W= 0 on A, we have that
E[esW ] =: MW(s) is the moment generating function (mgf) of W. For ω∈A∁, define w:= W(ω)
and obtain
dQs
dP(ω) = esw
MW(s) + πes¯y=1
MW−w(s) + πes( ¯y−w).
As MW−w(s)≥0 and ¯y > w, we have that lims→+∞dQs
dP= 0 P-a.s. on A∁. For ω∈A, we have
dQs
dP(ω) = es¯y
MW(s) + πes¯y=1
MW−¯y(s) + π.
Since ¯y > W P-a.s., it holds that
lim
s→+∞MW−¯y(s) = ZA
lim
s→+∞es(y−¯y)fW(y)dy = 0 ,
and lims→+∞dQs
dP=1
πP-a.s. on A. Thus, we conclude that lims→+∞DKL(Qs|| P) = log 1
π.□
This lemma shows that depending on the distribution of Y, the KL-divergence converges to
log( 1
π), which provides an upper limit on the the choice of the tolerance distance ε, as detailed in
the next assumption.
Assumption 1 (Maximal Kullback-Leibler distance).We assume that the tolerance dis-
tance εsatisfies
0≤ε < log 1
π(z)for all z∈R,
where π(z) := PS(z, Y ) = ess sup S(z, Y ).
We assume that Assumption 1holds throughout the manuscript. Lemma 1and Assumption
1give rise to an interpretation of the choice of ε, which depends on the baseline distribution of
S(z, Y ). That is, when ε= 0, one is completely confident in the baseline probability. On the other
hand, ε= log 1
π(z)corresponds to maximal uncertainty. Note that if π(z) = 0, then there is no
upper limit on the tolerance distance and ε∈[0,∞).
The next result shows that if the tolerance distance of the uncertainty set is too large, i.e.
ε≥log( 1
π(z)), then the worst-case probability measure puts all probability mass on the essential
supremum.
Proposition 1. If ε≥log 1
π(z), then the optimal probability measure Q†attaining the inner
supremum in (P)has Radon-Nikodym density
dQ†
dP=1{S(z,Y )=ess sup S(z,Y )}
π(z).
Miao and Pesenti: Robust Elicitable Functionals
7
Proof First, Q†lies within the KL uncertainty set, since
DKL (Q†|| P) = E1{S(z,Y )=ess sup S(z,Y )}
π(z)log 1{S(z,Y )=ess sup S(z,Y )}
π(z)= log 1
π(z).
Next, EQ†[S(z, Y )] = ess sup S(z, Y ) and moreover, for any probability measure Qthat is absolutely
continuous with respect to P, it holds that EQ[S(z, Y )] ≤ess sup S(z , Y ). Hence, Q†attains the
inner supremum in (P). □
In the case when the tolerance distance satisfies Assumption 1, the probability measure attaining
the inner supremum in (P) becomes non-degenerate and the REF admits an alternative represen-
tation.
Theorem 1 (Kullback-Leibler Uncertainty).Let Sbe a strictly consistent scoring function
for R. Then, the REF has representation
RS(Y) = arg min
z∈R
ES(z, Y )eη∗(z)S(z,Y )
E[eη∗(z)S(z,Y )],
where for each z∈R,η∗(z)≥0is the unique solution to ε=DK L(Qη(z)|| P), with
dQη(z)
dP:= eη(z)S(z,Y )
E[eη(z)S(z,Y )].
Proof Let Yhave cdf Funder Pand denote its (not-necessarily continuous) probability density
function (pdf) by f, such that e.g., E[Y] = Ryf(y)ν(dy), and where integration is tacitly assumed
to be over the support of Y. Then, the inner problem of (P) can be written as an optimisation
problem over densities as follows
sup
g:R→RZS(z, y )g(y)ν(dy),subject to Zg(y)
f(y)log g(y)
f(y)f(y)ν(dy)≤ε ,
Zg(y)ν(dy) = 1 ,and
g(y)≥0,for all y∈R.
The above optimisation problem admits the Lagrangian with Lagrange parameters η1, η2≥0, and
η3(y)≥0, for all y∈R, as
L(η1, η2, η3, g) = Z−S(z , y)g(y) + η1log g(y)
f(y)g(y) + η2g(y)−η3(y)g(y)ν(dy)−η1ε−η2.
The Lagrange parameter η3(y) guarantees the positivity of g(·), that is g(y)≥0, whenever
f(y)>0 and g(y) = 0 otherwise. The associated Euler-Lagrange equation becomes
−S(z, y ) + η1log g(y)
f(y)+ 1+η2+η3(y) = 0 .
Miao and Pesenti: Robust Elicitable Functionals
8
This implies that
g(y)
f(y)= exp 1
η1
(S(z, y )−η2−η3(y))−1.
Thus, η3(y)≡0 and imposing η2yields
g(y)
f(y)=
exp n1
η1S(z, y )−1o
Ehexp n1
η1S(z, y )−1oi.
Reparametrising η:= 1
η1, we define the Radon-Nikodym derivative
dQη
dP:= g(Y)
f(Y)=eηS(z ,Y )
E[eηS(z ,Y )],
which implies that Yunder Qηhas density g.
Next, we show that for each z∈R, the optimal Lagrangian parameter η∗is the solution to
ε=DKL (Qη|| P), i.e. that KL divergence constraint is binding. For this we fist show that for fixed
z∈R,DKL (Qη|| P) is strictly increasing in η. We calculate
DKL (Qη|| P) = EeηS(z ,Y )
E[eηS(z ,Y )]log eηS(z ,Y )
E[eηS(z ,Y )]
=EeηS(z ,Y )
E[eηS(z ,Y )]ηS(z, Y )−log E[eη S(z,Y )]
=ηE[eηS(z ,Y )S(z, Y )]
E[eηS(z ,Y )]−log E[eηS(z ,Y )]
=ηK′
S(z,Y )(η)−KS(z,Y )(η) =: d(η),(7)
where KS(z,Y )(η) := log E[eηS(z,Y )], and K′
S(z,Y )(η) := ∂
∂η KS(z,Y )(η) denotes the derivative with
respect to η. Observe that KS(z,Y )(·) is the cumulant generating function of S(z, Y ), thus it is
differentiable and strictly convex. Therefore d(η) is increasing since
d′(η) = K′
S(z,Y )(η) + ηK ′′
S(z,Y )(η)−K′
S(z,Y )(η) = ηK ′′
S(z,Y )(η)>0.
Furthermore, as the objective function equals the derivative of the cumulant generating function,
it is also increasing in η. Indeed,
ES(z, Y )eηS(z,Y )
E[eηS(z ,Y )]=K′
S(z,Y )(η),
which is increasing in η. Thus, both the objective function and the KL divergence are strictly
increasing in η, the constraint is binding and η∗is the unique solution to ε=DKL(Qη|| P). To see
that a solution to ε=DKL (Qη|| P) exists, note that by Lemma 1, it holds that limη→0DKL(Qη||P) =
0 and limη→+∞DKL (Qη||P) = log( 1
π(z)).
As ηand Qηdepends on z, we make this dependence explicit in the statement, and write η(z)
and Qη(z).□
Miao and Pesenti: Robust Elicitable Functionals
9
2.2. Existence and uniqueness
For existence and uniqueness of the REF we require additional properties on the scoring functions.
Thus, we first recall several definitions related to scoring functions. The first set of properties rely
on the first order condition of elicitable functionals. Indeed, as elicitable functionals are defined
via an argmin, they can often be found by solving the corresponding first order condition. The
following definition of identification functions make this precise. We refer the interested reader to
Steinwart et al. (2014) for details and discussions.
Definition 3 (Identification function). Let Rbe an elicitable functional. Then, a mea-
surable function V:A×Y→Ris
i) called an identification function for R, if
E[V(z, Y )] = 0 if and only if z=R(Y),
for all z∈Im◦R, where Im◦Ris the interior ImR, and for all Y∈ L∞.
ii)orientated if Vis an identification function and
E[V(z, Y )] >0 if and only if z > R(Y),
for all z∈Im◦R, and for all Y∈ L∞.
iii) The functional Ris called identifiable if there exists an identification function for R.
An identification function thus characterises the first order condition of elicitable functionals and
is therefore intimately connected to scoring functions. Moreover, an oriented identification function
can further rank predictions. To see this, note that any identification function gives raise to a scoring
function, and in particular, any oriented identification function gives raise to an order sensitive
scoring function, see e.g., Steinwart et al. (2014). A scoring function is called order sensitive (or
accuracy rewarding) for R, if for all t1, t2∈Awith either R(Y)< t1< t2or R(Y)> t1> t2it holds
that
ESR(Y), Y <ESt1, Y <ESt2, Y .
Thus, the further away the prediction t2is from R(Y), the larger its expected score.
Moreover, recall that a scoring function S is locally Lipschitz continuous in z, if for all intervals
[a, b]∈A, there exists a constant ca,b ≥0 such that for all z1, z2∈[a, b] and all y∈R, we have
S(z1, y )−S(z2, y)≤ca,b |z1−z2|.
We next establish the following result, which characterises identification functions of exponen-
tially transformed scoring functions, that will be instrumental in proving existence of the REF.
Miao and Pesenti: Robust Elicitable Functionals
10
Lemma 2. Let S(z, y) : A×R→Rbe strictly convex in z, locally Lipschitz continuous in z, and
a strictly consistent scoring function for a functional R. Moreover, define H(z, y) := ev S(z ,y), for
v > 0.
Then, H(·,·)is strictly convex in zand locally Lipschitz continuous in z. Moreover, H(·,·)is a
scoring function in that RH:= arg minz∈RE[H(z, Y )] exists. Furthermore, there exists an oriented
identification function Wfor RHthat satisfies
k(z)W(z, y ) = vev S(z,y)∂
∂z S(z , y),for almost all (z, y )∈ImR×R,
and for some k(z)>0.
Proof Clearly, His strictly convex in zas it is a composition of two strictly convex functions in
z. Similarly, it is locally Lipschitz continuous in zas His the composition of two locally Lipschitz
functions. As His strictly convex in z, we have that E[H(z, Y )] is also convex in zfor all Y∈ L∞.
Thus, RHexists and is an elicitable functional.
By Corollary 9 of Steinwart et al. (2014), RHis identifiable and has an oriented identification
function, which we denote by W. By iii), Theorem 8 of Steinwart et al. (2014), the identification
function satisfies for almost all (z, y )
k(z)W(z, y ) = ∂
∂z H(z , y) = vevS (z,y)∂
∂z S(z , y),
for some k(z)>0 for all z∈R, and where the last equation holds by definition of H.□
The next results shows the conditions for the REF to exist, and when it is unique.
Theorem 2 (Existence and uniqueness).Let Rbe an elicitable functional with strictly con-
sistent scoring function S. Further assume that S(z, y )is strictly convex in z, continuously differen-
tiable in z, and locally Lipschitz continuous in z. Let Y∈ L∞and assume that RA|∂
∂z S(z , y)|dz < ∞,
for all yin the support of Y. Then the following hold:
i)there exist a solution to optimisation problem (P), that is RS(Y)exists,
ii)if arg minz∈RE[ev S(z,Y )]is a singleton for all v > 0, then the solution to optimisation problem
(P)is unique.
Proof We define the value function of the inner optimisation problem of (P) by
J(z) := sup
Q∈Qε
EQ[S(z, Y )] .(8)
For existence i) we first apply the envelope theorem for saddle point problems, Theorem 4 in
Milgrom and Segal (2002) to derive an expression for d
dz J(z) an second show that d
dz J(z) crosses
zero. For ii) we show that d
dz J(z) crosses zero at most once, thus J(z) admits a unique minima.
Miao and Pesenti: Robust Elicitable Functionals
11
Part 1: Rewriting optimisation problem (8) as a constrained optimisation problem over densities
gives
sup
gdensity ZS(z, y)g(y)ν(dy),subject to Zg(y) log g(y)
f(y)ν(dy)≤ε .
As the space of densities is convex and as the objective function J(g, z) := RS(z , y)g(y)ν(dy) and
the constraint function c(g, z) := ε−RRg(y) log g(y)
f(y)dy are both concave in g, the constrained
optimisation problem can be represented as a saddle-point problem with associated Lagrangian
L(g, η , z) := J(g, z) + ηc(g, z ).
Moreover, it holds that
J(z) = L(g∗(z), η∗(z), z ),
for saddle points (g∗(z), η∗(z)). Next, we apply Theorem 4 in Milgrom and Segal (2002) to the
Lagrangian L(g, η , z). For this note that for fixed (g , η), L(g, η, ·) is absolutely continuous, since the
scoring function is continuously differentiable in z. Moreover, for each z, the set of saddle points is
non-empty by Theorem 1. Also, there exists a non-negative and integrable function b:A→[0,∞)
such that
∂
∂z L(g , η, z)=Z∂
∂z S(z , y)g(y)ν(dy)≤b(z),
where the inequality follows since the scoring function is locally Lipschitz continuous in z, and gis
a density. Integrability of bfollows by the integrability assumption on the derivative of the scoring
function. Also, ∂
∂z L(g , η, z) = ∂
∂z J(g , z) is continuous in gand L(g, η , z) equi-differentiable in gand
η. Thus the assumption of Theorem 4 in Milgrom and Segal (2002) are satisfied and it holds that
J(z) = J(0) + Zz
0
∂
∂s′L(g , η, s′)g=g∗(s),η =η∗(s),s′=s
ds . (9)
Therefore, taking derivative with respect to zof (9), we have
d
dz J(z) = ∂
∂z L(g , η, z)g=g∗(z),η =η∗(z)
=Z∂
∂z S(z , y)g(y)ν(dy)g=g∗(z),η =η∗(z)
=E∂
∂z S(z , Y )eη∗(z)S(z,Y )
E[eη∗(z)S(z,Y )],
where in the last equality we used that g∗(z) and η∗(z) are given in Theorem 1.
Part 2: To show that d
dz J(z) crosses zero we proceed as follows. For fixed η > 0, define Hη(z, y) :=
eηS(z ,y)and ¯
Hη(z) := E[Hη(z, Y )]. By Lemma 2,Hη(·,·) is strictly convex in z, locally Lipschitz
continuous, and z∗
η:= arg minz∈R¯
Hη(z) exists. Furthermore, by Lemma 2, there exists an oriented
identification function for z∗
ηthat satisfies
k(z)Wη(z, y ) = ηeη S(z,y)∂
∂z S(z , y),
Miao and Pesenti: Robust Elicitable Functionals
12
for some k(z)>0, and for all z∈R. Since for each η > 0, Wηis an oriented identification function
and as k(·)≥0, we have that for all z > z∗
η
k(z)E[W(z, Y )] = E[eηS(z,Y )∂
∂z S(z , Y )] >0
and similarly for all z < z∗
η
k(z)E[W(z, Y )] = E[eηS(z,Y )∂
∂z S(z , Y )] <0.
Since this holds for all η > 0, the above equations also hold for η∗. Therefore
d
dz J(z)>0,for all z > z∗
η∗,
d
dz J(z)<0,for all z < z∗
η∗,and
d
dz J(z)z=z∗
η∗= 0 .
Therefore, J(z) admits a (potentially interval valued) minima.
To show ii), assume that arg minz∈RE[ev S(z,Y )] is a singleton for all v > 0. That is z∗
ηis a singleton
for all η > 0. Which implies that J(z) admits a unique minima. □
We are able to alternatively write the REF in terms of the cumulant generating function (cgf)
of the scoring function. For this define the cgf of a rv U∈ L∞by KU(η) := log (E[eηU ])and denote
its derivative by K′
U(η) := ∂
∂η KU(η).
Proposition 2 (Alternative representation).The REF can be represented as:
TKL (Y) = arg min
z∈R
K′
S(z,Y )η∗(z),
where for each z∈R,η∗(z)is the unique solution to
η(z)K′
S(z,Y )η(z)−KS(z,Y )η(z)=ε. (10)
Proof Note that by definition, the cgf of S(z, Y ) is KS(z,Y )(η)= log E[eη S(z,Y )]. Then, we have
K′
S(z,Y )(η∗(z)) = E[S(z , Y )eη(z)S(z,Y )]
E[eη(z)S(z,Y )].
Moreover, Equation (10) for η(z) is equivalent to Equation (7). Thus, the two optimisation
problems are equivalent.
□
Miao and Pesenti: Robust Elicitable Functionals
13
3. Choice of scoring function
As elicitable functionals are defined as the arg min of an expected scoring function, the scoring
function is not unique, indeed, there are infinitely many scoring functions. In this section, we
discuss the family of homogeneous scoring function that naturally lead to illustration of the REF
via Murphy diagrams.
3.1. Families of b-homogeneous scoring functions
To investigate the effect of the scoring function on the REF, we propose the use of b-homogeneous
scoring functions as argued in Efron (1991), Patton (2011), and studied in Nolde and Ziegel (2017).
Definition 4 (b-Homogeneous Scores). A scoring function S:A×R−→ [0,∞) is positively
homogeneous of degree b∈R, if
S(cz, cy ) = cbS(z, y)
for all (z, y )∈A×Rand for all c > 0.
We say that a scoring function is positively homogeneous if there exists a b∈Rsuch that it is
positively homogeneous of degree b.
These families of parameterised scoring functions retain useful properties of the elicitable func-
tional, discussed next.
Proposition 3. Let Sbe a strictly consistent scoring function for R, then the following holds:
i)if Sis a positive homogeneous scoring function, then Rand RSare positive homogeneous of
degree 1.
ii)if S(z−c, y) = S(z , y +c)for c∈R, then Rand RSare translation invariant.
iii)if S(y, y ) = 0, then R(m) = mand RS(m) = mfor all m∈R. In particular, R(0) = 0 and
RS(0) = 0.
Proof
i) The case for Rfollows from Fissler and Pesenti (2023). To show that it also holds for RS, let
Sbe a positive homogeneous scoring function for Rof degree b. Then, using the change of
variable z:= cw, we obtain
RS(cY ) = arg min
z∈R
sup
Q∈Q
EQ[S(z, cY )]
=carg min
w∈R
sup
Q∈Q
EQ[S(cw, cY )]
=carg min
w∈R
sup
Q∈Q
cbEQ[S(w, Y )]
=carg min
w∈R
sup
Q∈Q
EQ[S(w, Y )]
=cRS(Y).
Miao and Pesenti: Robust Elicitable Functionals
14
ii) Suppose that S(z−c, y ) = S(z, y +c) for c∈R. Fix c∈R, then
RS(Y+c) = arg min
z∈R
sup
Q∈Q
EQ[S(z, Y +c)]
= arg min
z∈R
sup
Q∈Q
EQ[S(z−c, Y )]
= arg min
z∈Rsup
Q∈Q
EQ[S(z, Y )]+c
=RS(Y) + c.
The statement for Rfollows by setting ε= 0.
iii) Assume that S(y, y ) = 0 and let m∈R, then
RS(m) = arg min
z∈R
sup
Q∈Q
EQ[S(z, m)]= arg min
z∈R
S(z, m) = m .
The result for Rfollows for the special case of ε= 0.
□
In the risk management literature, significant emphasis is placed on what is considered desirable
properties of a risk measure. Among these properties, homogeneity and translation invariance
enjoy significant interest due to their inclusion into the notion of a coherent risk measure (Artzner
et al. 1999). Here, we show that we can translate these properties on the elicitable functional (risk
measure) into properties on the corresponding scoring function, and that there is a relationship
between these properties on the scoring function to their elicitable functional.
In particular, the positive homogeneity property allows for the rescaling of the rvs. This has
the financial interpretation of allowing for currency and unit conversions. This also has practical
implications, in that a practitioner may be able to make REF computations at a reduced scale, for
example to improve numerical performance, and be able to rescale back to the original magnitude.
Translation invariance, or cash invariance, is typically motivated by the risk-free property of cash
assets. Meaning that by adding a constant value the risk of the random portfolio should be reduced
by the same amount.
3.2. Murphy diagrams for robust elicitable functionals
In this section we illustrate the REF on different functionals such as the mean, VaR, and expectiles
using b-homogeneous scoring functions. The b-homogeneous class of scoring functions allows prac-
titioners to rescale losses via the homogeneity property of the REF, to improve numerical stability
of the functional.
Miao and Pesenti: Robust Elicitable Functionals
15
Proposition 4 (b-homogeneous scoring functions for the mean (Nolde and Ziegel 2017)).
The class of strictly consistent and b-homogeneous scoring functions for the mean satisfying
S(y, y ) = 0 are given by any positive multiple of a member of the Patton family
SE
b(z, y ) =
yb−zb
b(b−1) −zb−1
b−1(y−z), b ∈R\ {0,1},
y
z−log y
z−1, b = 0,
ylog y
z−(y−z), b = 1,
(11)
where we require that z, y > 0.
Note that the squared-loss, S(z, y )=(z−y)2, is recovered when b= 2 in the b-homogeneous
scoring function for the mean in Equation (11).
Introduced in Ehm et al. (2016), a Murphy diagram is a graph used to display the effect of a
scoring function’s homogeneity parameter against the value of the functional. Here, we use this
idea to plot the function b7→ RSb. In the next examples, we plot the REF Murphy diagrams for
the mean, VaR, and expectiles.
For the numerical examples, and as we work on L∞, we consider right truncated rvs. Right
truncated rvs arise in financial and insurance contexts via financial options, limits on (re)insurance
contracts, and maximal losses, eg., if an insured asset is written off as a total loss. We also refer
to Albrecher et al. (2017) for use of truncated and censored distributions in reinsurance settings.
Let Xbe a random variable with pdf gand cdf G. Then the right truncated random variable
Y:= X|X≤¯xwith truncation point ¯x∈R, has pdf
fY(y|X≤¯x) = g(y)
G(¯x)1{x≤¯x}.
Furthermore, it holds that F−1(α) = G−1(αG(¯x)), and E[Y] = R¯x
0xg(x)dx
F(¯x), whenever Y≥0P-a.s..
In the examples below, we consider right truncated exponential losses. In particular, we truncate
each exponential to it’s 95% quantile i.e. we set the truncation point to ¯x:= F−1(0.95), where F−1
is the quantile function of the exponential. This corresponds to retaining 95% of the support of
the exponential distribution. We denote the above described distribution by TExp(λ), where λis
the parameter of the original exponential distribution.
Example 1 (Murphy diagrams for the mean). We consider the mean functional and its
REF with the b-homogeneous scoring functions given in Proposition 4. Figure 1displays the REF
with varying uncertainty tolerances εbetween 0 and 0.5, for the Beta(shape 1 = 2,shape 2 = 2)1
baseline distribution in the left panel of the figure and the TExp(2) baseline distribution in the
1Here we use the convention that Beta(a,b) has density f(x) = Γ(a+b)
Γ(a)Γ(b)xa−1(1 −x)b−1, where Γ(·) is the Gamma
function.
Miao and Pesenti: Robust Elicitable Functionals
16
0.46
0.47
0.48
0.49
0.50
0.5 1.0 1.5 2.0
b
RS
0.40
0.45
0.50
0.5 1.0 1.5 2.0
b
tolerance
ε = 0
ε = 0.1
ε = 0.2
ε = 0.3
ε = 0.4
Figure 1 RSfor varying bparameter of the b-homogeneous mean scoring function for Beta(2,2) distribution
(left), and TExp(2) (right).
right panel. We observe that for the Beta distribution, for each ε, the REF is increasing in the
homogeneity degree band converges to the value 0.5, that is the mean of the Beta distribution.
Furthermore, for fixed b, the larger the ε, the smaller the REF. In particular, the REF is always
less than the baseline mean. This is not observed in the right panel, which displays the REF for the
TExp(2) distribution, where the REF is always greater than the baseline mean of 0.4. Moreover,
for each εthe REF is increasing in the homogeneity degree b, and for fixed bthe REF is ordered in
ε. Therefore, we observe that depending on the underlying distribution the REFs can be greater
or smaller than the baseline mean. Figure 1also indicates that there is a trade-off between εand
b, indicating that the choice of tolerance εshould be influenced by the homogeneity degree bof
the scoring function. Next, we consider how the REF changes according to the parameters of the
0.3
0.4
0.5
0.5 1.0 1.5 2.0
shape 1
RS
0.5
0.6
0.7
0.5 1.0 1.5 2.0
shape 2
tolerance
ε = 0
ε = 0.1
ε = 0.2
ε = 0.3
ε = 0.4
Figure 2 RSfor varying shape parameters of the Beta distribution with b= 1.5homogeneous mean scoring
function, shape 2 is constant at 1.5(left), and shape 1 is constant at 1.5(right).
Beta distribution. In particular Figure 2show the REF for fixed b= 1.5 against different values
Miao and Pesenti: Robust Elicitable Functionals
17
of the shape parameters of the Beta distribution. The left panel displays variation with shape 1
and the right panel with shape 2, where the different lines correspond to different ε. Interestingly,
the REFs for different εcross when plotted against the first parameter of the Beta distribution,
whereas the REFs are ordered in εfor the second shape parameter.
Another elicitable functional of interest is the Value-at-Risk (VaR), also known as the quantile.
Definition 5 (Value-at-Risk). The Value-at-Risk at tolerance level α∈(0,1) is defined for
X∈ L∞, as
VaRα(X) = inf {x∈R:P(X≤x)≥α}.
The VaR is a well-known to be an elicitable functional with a family of b-homogeneous scoring
functions, recalled next.
Proposition 5 (b-homogeneous scoring functions for VaR (Nolde and Ziegel 2017)).
The class of strictly consistent and b-homogeneous scoring functions for VaRαsatisfying S(y, y ) = 0
are given by
SVaR
b(z, y ) = 1{y≤z}−αg(z)−g(y),
where
g(y) =
d1yb1{y>0}−d2|y|b1{y<0}if b > 0and y∈R,
dlog(y)if b= 0 and y > 0,
−dybif b < 0and y > 0,
for positive constants d, d1, d2>0.
The b-homogeneous scoring function for the VaR coincides with the pinball loss scoring function
when b= 1. For simplicity, we choose d=d1=d2= 1 in the following experiments.
Example 2 (Murphy diagrams for the VaR). Similarly to Example 1, we plot the Mur-
phy diagrams for the VaR functional for the TExp(2) baseline distribution. Figure 3displays in
the left panel the REFs against the homogeneity degree bfor TExp(λ= 2) and on the right panel
against the parameter of the TExp(λ) for fixed b= 1.5. We observe in both panels that the REFs
are ordered in εwith larger values for larger uncertainty tolerances ε. Note that in the right panel,
for small λ, the difference between the REF and the VaR of the TExp(λ) is significantly larger
than the difference for small λ.
Another commonly considered functional is the expectile, which has been proposed for use in
risk management in Bellini et al. (2014). Its elicitability is established in Gneiting (2011).
Definition 6 (τ-expectile). Let τ∈(0,1), then the τ-expectile, eτ(·), is the unique solution
to
τE[(X−eτ(X))+] = (1 −τ)E[(X−eτ(X))−].
Miao and Pesenti: Robust Elicitable Functionals
18
1.20
1.25
1.30
0.0 0.5 1.0 1.5 2.0
b
RS
0.75
1.00
1.25
1.50
1.75
1.5 2.0 2.5 3.0
λ
tolerance
ε = 0
ε = 0.1
ε = 0.2
ε = 0.3
ε = 0.4
Figure 3 RSfor varying bparameter of the b-homogeneous 95%-VaR scoring function (left), and varying λ
with b= 1.5homogeneous 95% VaR scoring function (right), TExp(2) distribution.
For τ≥1
2, the expectile is coherent in the sense of Artzner et al. (1999). The family of b-
homogeneous scoring functions for the expectile is as follows.
Proposition 6 (b-homogeneous Scores for the Expectile (Nolde and Ziegel 2017)).
For τ∈(0,1), the strictly consistent b-homogeneous scoring function corresponding to the
τ-expectile is given as
Se
b(z, y ) = |1−τ| · SE
b(z, y ),
where SE
b(z, y )is the b-homogeneous scoring function for the mean as defined in Proposition 4, and
where z, y > 0.
Example 3 (Murphy diagrams for the expectile). We continue to consider the under-
lying distribution of the truncated exponential as described in Example 2. We calculate the 0.7-
expectile of the baseline TExp(3) and TExp(λ) numerically using a simulated sample of size 30,000.
The left panel of Figure 4displays the Murphy diagram of the b-homogeneous score for the
expectile as defined in Proposition 6, and the right panel displays the robust expectile against the
TExp parameter λ. The left panel in Figure 4exhibits similar behaviour as in previous examples
for the mean and VaR. In particular, we have that for fixed b, the REF is increasing in tolerance
ε. This is again seen in the right panel, where for any fixed λ, the REF is increasing in ε.
4. Multivariate robust elicitability functional
Many statistical functionals are not elicitable on their own, such as the variance, Range-Value-
at-Risk, Expected Shortfall (ES), and distortion risk measures, (Gneiting 2011,Kou and Peng
2016,Wang and Ziegel 2015,Fissler and Ziegel 2021). However, they may be elicitable as multi-
dimensional functionals, as is in the case of the pairs (mean, variance) and (VaR, ES) (Fissler et al.
Miao and Pesenti: Robust Elicitable Functionals
19
0.55
0.60
0.65
0.70
0.75
0.80
12345
b
RS
0.6
0.9
1.2
1.5
1.0 1.5 2.0 2.5 3.0
λ
tolerance
ε = 0
ε = 0.1
ε = 0.2
ε = 0.3
ε = 0.4
Figure 4 RSfor varying bparameter of the b-homogeneous 0.7-expectile scoring function (left), and varying λ
with b= 3 homogeneous 0.7-expectile scoring function (right), TExp(2) distribution.
2016). In this section, we generalise the REF to these instances where the functionals are elicitable
in the multi-dimensional case.
4.1. The notion of k-dimensional elicitability
In Fissler and Ziegel (2016), the authors establish the framework for k-consistency and k-
elicitability, recalled next.
Proposition 7 (k-elicitability (Fissler and Ziegel 2016)).Let A⊆Rk,k∈N, be an
action domain and R:M∞→Abe a k-dimensional law-invariant functional.
i)A scoring function S:A×R→Ris consistent for a functional R, if
ZS(t, y)dF (y)≤ZS(z, y )dF (y),(12)
for all F∈ M∞,t∈R(F), and all z∈A.
ii)A scoring function Sis strictly consistent for a functional R, if it is consistent and Equation
(12)holds with equality only if z∈R(F).
iii)A functional Ris k-elicitable, if there exists a strictly consistent scoring function Sfor R.
Similar to the one-dimensional case, elicitable functionals and strictly consistent scoring func-
tions have a correspondence relationship. Indeed, a k-elicitable functional R(·) := (R1(·),...,Rk(·))
admits the representation
R1(F),...,Rk(F)= arg min
z∈RkZS(z, y)dF (y),
for all F∈ M∞and where Sis any strictly consistent scoring function for R.
Similar to univariate functionals, in the multi-dimensional setting, it is of interest to consider
uncertainty in the baseline distribution, and we define the multi-dimensional REF as follows. Let
Miao and Pesenti: Robust Elicitable Functionals
20
Rbe an elicitable functional with strictly consistent scoring function Sand let ε≥0. Then we
define the k-dimensional robust elicitable functional (REF)evaluated at Y∈ L∞, by
RS(Y) := arg min
z∈Rk
sup
Q∈Qε
EQ[S(z, Y )],(Pk)
where the uncertainty set Qεis given in Equation (6). As RSis k-dimensional, we write RS
ifor
the i-th component of RS, thus RS(·) := (RS
1(·), . . . , RS
k(·)).
The results of the univariate case Theorem 1follow readily into the multi-dimensional setting.
For this we first generalise the assumption on the tolerance distance ε, which we assume throughout
this section.
Assumption 2 (Maximal Kullback-Leibler distance).We assume that the tolerance dis-
tance εsatisfies
0≤ε < log 1
π(zi)for all i= 1,...k and (z1,...,zk)∈Rk,
where π(zi) := PS(zi, Y ) = ess sup S(zi, Y ).
Under this condition, Theorem 1holds in the k-dimensional setting.
Corollary 1 (Kullback-Leibler Uncertainty).Let Rbe a k-elicitable functional and Sbe
a strictly consistent scoring function for R. Then, the REF has representation
RS(Y) = arg min
z∈Rk
ES(z, Y )eη∗(z)S(z,Y )
E[eη∗(z)S(z,Y )],(P′)
where each z∈Rk,η∗(z)≥0is the unique solution to ε=DK L(Qη(z)|| P), with
dQη(z)
dP:= eη(z)S(z,Y )
E[eη(z)S(z,Y )].
Proof The proof is similar to that of Theorem 1. The inner optimisation problem of (P′) can be
written as an optimisation problem over the density of Yas follows
sup
g:R×R→RZS(z, y)g(y)dy , sub ject to Zg(y)
f(y)log g(y)
f(y)f(y)dy ≤ε ,
Zg(y)dy = 1 ,and
g(y)≥0,for all y∈R.
This optimisation problem admits the Lagrangian, with Lagrange parameters η1, η2≥0 and η3(y)≥
0 for all y∈R,
L(η1, η2, η3, g) = Z−S(z, y )g(y) + η1g(y) log g(y)
f(y)+η2g(y)−η3(y)g(y)dy −η1ε−η2,
Miao and Pesenti: Robust Elicitable Functionals
21
where η1is the Lagrange parameter for the KL constraint, η2is such that gintegrates to 1, and
η3(y) are such that g(x, y)≥0, whenever f(y)≥0, and g(y) = 0 otherwise. Using similar steps as
in Theorem 1, that is derive the Euler-Lagrange equation, solve it for g(y)
f(y), and then impose η2, η3.
Finally setting η:= 1
η1results in the change of measure
dQη
dP:= g(Y)
f(Y)=eηS(z,Y )
E[eηS(z,Y )].
Moreover, it holds that
DKL (Qη|| P) = ηK′
S(z,Y )(η)−ηKS(z,Y )(η).
By similar arguments as in the proof of Theorem 1, the KL constraint is binding and η∗is the
unique solution to ε=DKL (Qη|| P). As ηand Qηboth depend on z∈Rk, we make this explicit
by writing η(z) and Qη(z)in the statement. □
A functional of major interest in the context of risk management is the ES.
Definition 7 (Expected Shortfall). The Expected Shortfall at tolerance level α∈(0,1)
for X∈ L∞is defined as
ESα(X) = 1
1−αZ1
α
VaRq(X)dq .
Though it is known to not be elicitable alone (Gneiting 2011), the pair (VaRα, ESα) is 2-elicitable,
as shown in Fissler et al. (2016). While many scoring functions exist for this pair, we refer to
Theorem C.3 in supplemental of Nolde and Ziegel (2017) for the existence of a b-homogeneous
scoring function for (VaRα,ESα).
Proposition 8 (b-homogeneous scoring function for (VaR, ES) (Nolde and Ziegel 2017)).
For α∈(0,1), the 2-elicitable functional (VaRα,ESα)has corresponding scoring functions of the
form
S(z1, z2, y) = 1{y>z1}−G1(z1) + G1(y)−G2(z2)(z1−y)+ (1 −α)G1(z1)(z2−z1) + G2(z2),(13)
where G1is increasing, and G2is twice differentiable, strictly increasing, strictly concave and
G′
2=G2. If
i)b∈(0,1), the only positive homogeneous scoring functions of degree band of the form in
Equation (13)are obtained by G1(x) = (d11{x≥0} − d21{x < 0})|x|b−c0and G2(x) = c1xb+
c0,x > 0where c0∈R,d1, d2≥0, and c1>0.
ii)b∈(−∞,0), the only positive homogeneous scoring function of degree band of the form in
Equation (13)are obtained by G1(x) = −c0and G2(x) = −c1xb+c0,x > 0where c0∈Rand
c1>0.
Miao and Pesenti: Robust Elicitable Functionals
22
iii)b∈ {0} ∪ (1,+∞), there are no positively homogeneous scoring functions of the form in Equa-
tion (13)of degree b= 0 or b≥1.
We refer to the supplemental of Nolde and Ziegel (2017) for further details and discussions.
Proposition 3on the properties of the REF also hold in the k-dimensional setting.
Proposition 9. Let S:A×R→[0,+∞)be a strictly consistent scoring function for a k-
dimensional elicitable functional R:M∞→A.
i)Let S(cz,cy) = cbS(z, y), where cz:= (cz1,...,czk), for all z∈A, y ∈R,c > 0, and for some
b≥0. Then Rand RSare positive homogeneous of degree 1.
ii)If S(z−c, y) = S(z, y +c)for c∈R,c:= (c,...,c)of length k, then Rand RSare translation
invariant.
iii)If S(y, y)=0, where y:= (y,...,y)∈Rk, then R(m) = mand RS(m) = mfor all m∈R. In
particular, R(0) = 0 and RS(0) = 0.
Proof The proof follows using similar arguments as in the proof of Proposition 3.□
In the next section, we discuss an application of jointly robustifying the (VaR, ES) to a reinsur-
ance application.
4.2. Reinsurance application
We consider a reinsurance company who aims to asses the risk associated with its losses. In par-
ticular, we are interested in an reinsurance company who has underwritten losses stemming from
different insurance companies. Specifically, the insurance-reinsurance market consists of three insur-
ers and one reinsurer. Each insurance company k∈ {1,2,3}purchases reinsurance on their business
line Xkwith deductible dkand limit lk. Thus, the total reinsurance loss is
Y=
3
X
k=1
min (Xk−dk)+, lk,
where (x)+= max{0, x}denotes the positive part. The reinsurer is covering losses between the 60%
and the 80% quantile for insurer 1 and 2, and the losses between the 85% and the 95% quantiles
of insurer 3, i.e.
dk: = F−1
Xk(0.6) and lk:= F−1
Xk(0.8) , k = 1,2,and
d3: = F−1
X3(0.85) and l3:= F−1
X3(0.95) .
The insurance losses (X1, X2, X3) have marginal distributions described in Table 1and are depen-
dent through a t-copula with 4 degrees of freedom and correlation matrix
R=
1 0.2 0
0.2 1 0.8
0 0.8 1
.
Miao and Pesenti: Robust Elicitable Functionals
23
Table 1 Distributional assumptions of risk factors of the reinsurance example.
Risk factor Distribution Mean Std
X1Log-Normal (4.58,0.192)100 20
X2Log-Normal (4.98,0.232)150 35
X3Pareto (147.52,60.65)150 40
Figure 5displays the histogram of the total reinsurance losses Y, stemming from n= 100,000
samples, and the estimated baseline VaRα(Y) and ESα(Y) for α= 0.9,0.975. A smoothed kernel
density is displayed as an approximation of the density. In the simulate dataset, the reinsurer’s
losses are bounded by the sum of the insurers’ limit, and the maximal loss for the reinsurer is
448.15.
0.000
0.025
0.050
0.075
0 100 200 300
Reinsurance loss
density
α
0.9
0.975
ES
VaR
Figure 5 Smoothed density for n= 100,000 simulated reinsurance losses as described in Table 1, with VaRα
and ESαfor α= 90%,97.5%.
Next, we generate a sample of size n= 10,000 of reinsurance losses, and scale the losses by a
factor of 0.01. We are able to perform this scaling by using a b-homogeneous scoring function for the
pair (VaRα,ESα) and the homogeneity property from Proposition 9. For this sample, we calculate
the robust VaR, displayed in the left panel, and ES, in the right panel, for each α= 0.9,0.975 and
for each ε= 0.6,0.7,0.8,0.9. We do this N= 100 times to illustrate the density of the robust VaR
and ES. To handle quantile-crossing, we reject loss samples that result in VaRα>ESαfor any
ε= 0.6,0.7,0.8,0.9, α= 0.9,0.975. Figure 6displays violin diagrams for the robust VaR and ES,
for the different ε’s and α’s. We find that the robust VaR and ES are ordered in ε, mirroring the
Miao and Pesenti: Robust Elicitable Functionals
24
behaviour of the one-dimensional REFs. We also observe that the variance of the robust VaR is
significantly smaller than that of the robust ES.
100
150
200
250
0.6 0.7 0.8 0.9 1
ε
VaR
100
150
200
250
0.6 0.7 0.8 0.9 1
ε
ES
α
0.9
0.975
Figure 6 Densities of robust VaR (left) and robust ES (right) of simulated reinsurance losses for varying εand
α.
5. Application to robust regression
In this section, we extend the REF framework to a regression setting. Distributionally robust
regression has been of increasing interest in the machine learning and statistics communities. In the
field of statistics, emphasis is placed on outlier detection and methods that are not too sensitive to
outliers. Significant theory has been developed on this topic, see e.g. Rousseeuw and Leroy (2005),
for a comprehensive treatment. These methodologies are now widely applicable to problems in
machine learning, though their goals differ. For example, robustness techniques can be used to
counter adversarial tests against machine learning algorithms. Specifically for robust regression,
the authors Shafieezadeh-Abadeh et al. (2015), consider robust logistic regression with uncertainty
on both the covariates and the response quantified via a Wasserstein ball, and, in an attempt
to mitigate adversarial attacks, Chen and Paschalidis (2018) minimise the worst-case absolute
residuals of a discrete regression problem over a Wasserstein ball.
The motivation for robustness in risk management differs from these communities. Of concern
are extreme events, where data may be sparse due to the rarity of events, which a risk measure
should ideally be capturing.
5.1. Robust regression coefficients
Here we propose an approach where we consider distributional uncertainty jointly in both the
covariate and the response variable, and where the uncertainty is characterised via the KL-
divergence. For this let X:= (X1,...,Xm) be the m-dimensional covariates such that each compo-
nent is in L∞, i.e., Xk∈ L∞, for all k= 1 ...,m, and let Y∈ L∞be a univariate response.
Miao and Pesenti: Robust Elicitable Functionals
25
For an elicitable functional Rwith strictly consistent scoring function S, we make the classical
regression assumption that
R(Y|X=x) = β1x1+...+βmxm,
where R(Y|X=x) denotes the functional Revaluated on the conditional cdf of Ygiven X=x.
The parameters β:= (β1,...,βm), the regression coefficients, are estimated via solving the sample-
version of the following minimisation problem
ˆ
β= arg min
β∈Rm
ES(β⊺X, Y ).(14)
For simplicity, we assume that the functional R(Y|X=x) is linear in the covariates, though the
results can be adapted to include link functions. For the choice of the mean functional Rand
the squared loss S(z, y ) = (y−z)2, we recover the usual linear regression. The classical quantile
regression follows by setting the scoring function to be the pinball loss, i.e., S(z, y) = 1{y≤z}−
α(z−y), which is strictly consistent for the α-quantile, α∈(0,1). Similarly one can obtain expectile
and ES regression.
We propose to robustify the regression coefficients ˆ
βof Equation (14) by accounting for uncer-
tainty in the joint distribution between the response and the covariates, that is of (X, Y ).
Definition 8 (Robust regression coefficients). Let Rbe an elicitable functional with
strictly consistent scoring function S:A×R→[0,∞), A⊂R, and ε≥0. Then the robust regres-
sion coefficients are given by
βS:= arg min
β∈Rmsup
Q∈Qε
EQ[S(β⊺X, Y )],(Pβ)
where the uncertainty set Qεis given in Equation (6).
The representation of the inner optimisation problem in (Pβ) also holds in the robust regression
setting under an analogous assumption on the tolerance distance ε.
Assumption 3 (Maximal Kullback-Leibler distance).We assume that the tolerance dis-
tance εsatisfies
0≤ε < log 1
π(β⊺X)for all β∈Rmand Xwith Xk∈ L∞, k = 1,...,m,
where π(β⊺X) := PS(β⊺X, Y ) = ess sup S(β⊺X, Y ).
Corollary 2. Let Sbe a strictly consistent scoring function for R. Then, for m-dimensional
covariates X, satisfying Xk∈ L∞,k= 1,...,m, and a univariate response Y∈ L∞, the robust
regression coefficients have representation
βS= arg min
β∈Rm
ES(β⊺X, Y )eη∗(β)S(β⊺X,Y )
E[eη∗(β)S(β⊺X,Y )],
Miao and Pesenti: Robust Elicitable Functionals
26
where for each β∈Rm,η∗(β)≥0is the unique solution to ε=DK L(Qη(β)|| P)with
dQη(β)
dP:= eη(β)S(β⊺X,Y )
E[eη(β)S(β⊺X,Y )].
Proof The proof is similar to that of Theorem 1. For simplicity, we assume that (X, Y ) has joint
pdf funder the baseline measure P. Then, the inner optimisation problem of (Pβ) can be written
as an optimisation problem over the joint density of (X, Y ) as follows
sup
g:Rm×R→RZ Z S(β⊺x, y)g(x, y )dy dx,subject to
Z Z g(x, y)
f(x, y)log g(x, y )
f(x, y)f(x, y )dy dx≤ε ,
Z Z g(x, y)dy dx= 1 ,and
g(x, y)≥0,for all x∈Rm, y ∈R.
This optimisation problem admits the Lagrangian, with Lagrange parameters η1, η2≥0 and
η3(x, y)≥0 for all x∈Rm, y ∈R,
L(η1, η2, η3, g) = Z Z −S(β⊺x, y )g(x, y) + η1g(x, y) log g(x, y)
f(x, y)
+η2g(x, y)−η3(x, y )g(x, y)dy dx
−η1ε−η2,
where η1is the Lagrange parameter for the KL constraint, η2is such that gintegrates to 1, and
η3(x, y) are such that g(x, y)≥0, whenever f(x, y)≥0, and g(x, y ) = 0 otherwise. Using similar
steps as in Theorem 1, that is derive the Euler-Lagrange equation, solve it for g(x,y)
f(x,y), and then
impose η2, η3. Finally setting η:= 1
η1results in the change of measure
dQη
dP:= g(X, Y )
f(X, Y )=eηS (β⊺X,Y )
E[eηS(β⊺X,Y )].
Moreover, it holds that
DKL (Qη|| P) = ηK′
S(β⊺X,Y )(η)−ηKS(β⊺X,Y )(η).
By the similar arguments as in the proof of Theorem 1, the KL constraint is binding and η∗
is the unique solution to ε=DKL (Qη|| P). As ηand Qηboth depend on β∈Rm, we make this
explicit by writing η(β) and Qη(β)in the statement. □
Miao and Pesenti: Robust Elicitable Functionals
27
5.2. Numerical case study
We consider a numerical case study where we compare three different models for the joint depen-
dence structure of a bivariate covariate and response pair (X, Y ), as detailed below. Here the
dimension of the covariates is 1, i.e., m= 1. Motivated by a dataset being contaminated with out-
liers, we specify (X, Y ) as having a Gumbel(5) copula with uniform marginals, as the “reference”
samples without any outliers (model A). The choice of Gumbel(5) copula yields a Kendall’s tau
of 0.8, giving a suitable linear relationship to test linear regression. The “contaminated” models
(models B and C) have additional to the data points of model A, samples of ( ˜
X, ˜
Y), that we
identify as outliers, where ( ˜
X, ˜
Y) have uniform marginal distributions and an independent copula.
Example 4 (Data Contamination). In particular, the dataset we consider is constructed in
the following way: we sample from the Gumbel(5) copula to form the uncontaminated dataset
in model A, then augment model A with 4 independent outliers to result in model B, and then
again augment model B with 4 additional outliers to generate model C. The model B consists
of 9% independent outliers, and model C of 18%. We fit a robust linear regression, with varying
tolerances, to these 3 models, using the squared loss scoring function S(z , y) = (z−y)2. We further
calculate the traditional linear regression, which coincides with tolerance ε= 0. Figure 7displays
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Response
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Covariate
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
tolerance
ε = 0
ε = 1
ε = 5
ε=10
Figure 7 Fitted robust linear regressions lines for models A, B, C (left to right) and for different ε= 0,1,5,10.
The blue pyramid and the red squares correspond to the outliers in models B and C.
the datasets for models A, B, and C in the panels from left to right, and the robust regression
lines for varying tolerances ε. In the leftmost panel, we have the uncontaminated dataset model A,
where we find that the robust linear regression has steeper regression lines for smaller tolerances.
The centre panel displays the data and linear regression lines from the mildly contaminated model
Miao and Pesenti: Robust Elicitable Functionals
28
B. The regression lines from this centre panel behave similarly to those of model A, in that steeper
regression lines correspond to smaller ε, though the confounding of the sample reduces the overall
steepness of the regression lines. In the left and middle panels, the slope of the robust regression
decreases as the uncertainty tolerance εincreases, which is different from the right panel where we
do not observe a clear ordering. The right panel displays the robust linear regression of the most
contaminated dataset, model C. In this scenario, the robust linear regression is significantly flatter
for all values of εcompared to the linear regression lines with ε= 0. Clearly, including outliers
reduces the slope of the regression line. Moreover, we observe that the larger the tolerance distance
ε, meaning as we allow for more uncertainty, the flatter the regression lines.
For completeness, we report in Table 2the regression coefficients and mean squared errors
(MSEs) of the robust linear regressions of models A, B, and C. As observed in Figure 7, the larger
εthe smaller the slope of the regression line, i.e. β1.
Table 2 Results of robust linear regression for models A, B, and C. The parameter β0corresponds to the
intercept, β1is the slope of the regression line, and MSE the mean squared error.
Model A Model B Model C
ε β0β1MSE β0β1MSE β0β1MSE
0 0.05 0.88 0.016 0.18 0.69 0.036 0.24 0.48 0.054
1 0.18 0.68 0.020 0.40 0.33 0.044 0.35 0.21 0.062
5 0.26 0.49 0.028 0.37 0.28 0.051 0.38 0.14 0.065
10 0.29 0.42 0.030 0.37 0.25 0.051 0.32 0.19 0.079
Example 5 (Sensitivity to sample size). Another situation in which one may want to
robustify linear regression is when the data sample is sparse, e.g., to introduce uncertainty to pre-
vent model overfit. Here, we consider the behaviour of the robust regression for increasing sample
sizes. We generate covariate and response pairs having Gumbel(5) copula and uniform marginals,
i.e. Model A. The first dataset consists of 40 samples, then we increase the dataset to 80 samples
by adding another 40 samples, resulting in the second dataset. Finally, for the third dataset, we
add another 40 samples to obtain a sample size of 120. Figure 8displays model A for sample sizes
n= 40,80,120, from left to the right panel. Again, we observe that the smaller ε, the steeper the
slope of the regression lines.
Table 3reports the regression coefficients and MSEs of the robust linear regressions for the
n= 40,80,120 sample size datasets. As expected, for linear regression with ε= 0, we have the
Miao and Pesenti: Robust Elicitable Functionals
29
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Response
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Covariate
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
tolerance
ε = 0
ε = 1
ε = 5
ε=10
Figure 8 Fitted robust linear regression lines for model A with sample size n= 40,80,120 (left to right) and for
with different ε= 0,1,5,10.
Table 3 Results of robust linear regression for different dataset sizes n= 40,80,and 120. The parameter β0
corresponds to the intercept, β1is the slope of the regression line, and MSE the mean squared error.
n= 40 n= 80 n= 120
ε β0β1MSE β0β1MSE β0β1MSE
0 0.05 0.94 0.011 0.06 0.92 0.012 0.41 0.92 0.01
1 0.22 0.60 0.023 0.20 0.63 0.020 0.18 0.64 0.019
5 0.26 0.47 0.033 0.25 0.48 0.031 0.23 0.50 0.027
10 0.27 0.44 0.033 0.36 0.44 0.031 0.23 0.53 0.027
smallest MSE, as it, by definition, is the solution to minimising the MSE. The MSE of the regular
linear regression remains approximately the same for the three sample sizes, while those of the
robust linear regression have a larger deviation. Moreover, for each value of ε, the MSE decreases
with increasing sample size.
6. Conclusion
This paper constructs a new robustification of elicitable functionals, the REF, by incorporating
uncertainty prescribed by the KL divergence. Mathematically, the REF is the argmin of a worst-
case expected score, thus takes the form of a double optimisation. We solve the inner problem of the
REF, leaving a traditional optimisation problem. We show that the constraint on the uncertainty
region is binding, and go on to describe conditions for the existence and uniqueness of the REF.
Miao and Pesenti: Robust Elicitable Functionals
30
Since the REF depends on the choice of scoring function, we explore this choice by using b-
homogeneous scoring functions, show that these families of scoring functions preserve properties
of risk measures for the REF, and illustrate them using Murphy diagrams.
We extend the REF and its representation results to two settings: to k-dimensional elicitable
functionals, and to functional regression application. In the k-dimensional setting, we consider
an application to reinsurance by demonstrating the behaviour of the joint (VaR, ES) REF to a
synthetic dataset. In the robust regression setting, we explore the behaviour of the REF under two
scenarios of unreliable data: a data contamination problem, and a sample size scenario.
Acknowledgments
The authors thank Fabio Gomez for stimulating discussions. SP gratefully acknowledges support from the
Natural Sciences and Engineering Research Council of Canada (grants DGECR-2020-00333 and RGPIN-
2020-04289) and from the Canadian Statistical Sciences Institute (CANSSI).
References
Albrecher H, Beirlant J, Teugels J (2017) Reinsurance: Actuarial and Statistical Aspects. Wiley Series in
Probability and Statistics (Wiley), ISBN 9780470772683.
Artzner P, Delbaen F, Eber JM, Heath D (1999) Coherent measures of risk. Mathematical Finance 9(3):203–
228, URL http://dx.doi.org/10.1111/1467-9965.00068.
Bellini F, Bignozzi V (2015) On elicitable risk measures. Quant. Finance 15(5):725–733, URL http://dx.
doi.org/10.1080/14697688.2014.946955.
Bellini F, Klar B, M¨uller A, Rosazza Gianin E (2014) Generalized quantiles as risk measures. Insur-
ance: Mathematics and Economics 54:41–48, ISSN 0167-6687, URL http://dx.doi.org/10.1016/j.
insmatheco.2013.10.015.
Bernard C, Pesenti SM, Vanduffel S (2024) Robust distortion risk measures. Mathematical Finance 34(3):774–
818, URL http://dx.doi.org/10.1111/mafi.12414.
Blanchet J, Lam H, Tang Q, Yuan Z (2019) Robust actuarial risk analysis. North American Actuarial Journal
23(1):33–63, URL http://dx.doi.org/10.1080/10920277.2018.1504686.
Cai J, Liu F, Yin M (2024) Worst-case risk measures of stop-loss and limited loss random variables under dis-
tribution uncertainty with applications to robust reinsurance. European Journal of Operational Research
318(1):310–326, ISSN 0377-2217, URL http://dx.doi.org/10.1016/j.ejor.2024.03.016.
Chen R, Paschalidis IC (2018) A robust learning approach for regression models based on distribution-
ally robust optimization. Journal of Machine Learning Research 19(13):1–48, URL http://jmlr.org/
papers/v19/17-295.html.
Efron B (1991) Regression percentiles using asymmetric squared error loss. Statistica Sinica 93–125, URL
https://www.jstor.org/stable/24303995.
Miao and Pesenti: Robust Elicitable Functionals
31
Ehm W, Gneiting T, Jordan A, Kr¨uger F (2016) Of quantiles and expectiles: consistent scoring functions,
Choquet representations and forecast rankings. Journal of the Royal Statistical Society Series B (Sta-
tistical Methodology) 78(3):505–562, URL 10.1111/rssb.12154.
Embrechts P, Mao T, Wang Q, Wang R (2021) Bayes risk, elicitability, and the expected shortfall. Mathe-
matical Finance 31(4):1190—-1217, URL http://dx.doi.org/10.1111/mafi.12313.
Embrechts P, Puccetti G, R¨uschendorf L (2013) Model uncertainty and VaR aggregation. Journal of Banking
& Finance 37(8):2750–2764, ISSN 0378-4266, URL http://dx.doi.org/10.1016/j.jbankfin.2013.
03.014.
Embrechts P, Wang B, Wang R (2015) Aggregation-robustness and model uncertainty of regulatory risk
measures. Finance and Stochastics 19(4):763–790, ISSN 0949-2984, 1432-1122, URL http://dx.doi.
org/10.1007/s00780-015-0273-z.
Fissler T, Pesenti SM (2023) Sensitivity measures based on scoring functions. European Journal of Opera-
tional Research 307(3):1408–1423, ISSN 0377-2217, URL http://dx.doi.org/10.1016/j.ejor.2022.
10.002.
Fissler T, Ziegel JF (2016) Higher order elicitability and Osband’s principle. Annals of Statistics 44(4):1680–
1707, URL http://dx.doi.org/10.1214/16-AOS1439.
Fissler T, Ziegel JF (2021) On the elicitability of range value at risk. Statistics & Risk Modeling 38(1-2):25–46,
URL http://dx.doi.org/10.1515/strm-2020-0037.
Fissler T, Ziegel JF, Gneiting T (2016) Expected shortfall is jointly elicitable with Value-at-Risk: implications
for backtesting. Risk Magazine 58–61, URL http://dx.doi.org/10.48550/arXiv.1507.00244.
Ghaoui LE, Oks M, Oustry F (2003) Worst-case Value-At-Risk and robust portfolio optimization: A
conic programming approach. Operations Research 51(4):543–556, URL http://dx.doi.org/10.1287/
opre.51.4.543.16101.
Glasserman P, Xu X (2014) Robust risk measurement and model risk. Quantitative Finance 14(1):29–58,
URL http://dx.doi.org/10.1080/14697688.2013.822989.
Gneiting T (2011) Making and Evaluating Point Forecasts. Journal of the American Statistical Association
106(494):746–762, URL https://www.jstor.org/stable/41416407.
He XD, Kou S, Peng X (2022) Risk measures: Robustness, elicitability, and backtesting. Annual Review of
Statistics and Its Application 9(Volume 9, 2022):141–166, ISSN 2326-831X, URL http://dx.doi.org/
10.1146/annurev-statistics-030718-105122.
Kou S, Peng X (2016) On the measurement of economic tail risk. Operations Research 64(5):1056–1072,
ISSN 0030364X, 15265463, URL http://www.jstor.org/stable/26153487.
Lassance N, Vrins F (2023) Portfolio selection: A target-distribution approach. European Journal of Oper-
ational Research 310(1):302–314, ISSN 0377-2217, URL http://dx.doi.org/10.1016/j.ejor.2023.
02.014.
Miao and Pesenti: Robust Elicitable Functionals
32
Milgrom P, Segal I (2002) Envelope theorems for arbitrary choice sets. Econometrica 70(2):583–601, URL
http://dx.doi.org/10.1111/1468-0262.00296.
Nolde N, Ziegel JF (2017) Elicitability and backtesting: Perspectives for banking regulation. Annals of
Applied Statistics 11(4):1833–1874, URL http://dx.doi.org/10.1214/17-AOAS1041.
Patton AJ (2011) Data-based ranking of realised volatility estimators. J. Econometrics 161(2):284–303, URL
http://dx.doi.org/10.1016/j.jeconom.2010.12.010.
Pesenti SM, Millossovich P, Tsanakas A (2016) Robustness regions for measures of risk aggregation. Depen-
dence Modeling 4(1):000010151520160020, URL http://dx.doi.org/10.1515/demo- 2016-0020.
Pesenti SM, Millossovich P, Tsanakas A (2019) Reverse sensitivity testing: What does it take to break the
model? European Journal of Operational Research 274(2):654–670, URL 10.1016/j.ejor.2018.10.
003.
Rousseeuw P, Leroy A (2005) Robust Regression and Outlier Detection. Wiley Series in Probability and
Statistics (Wiley), ISBN 9780471725374.
R¨uschendorf L, Vanduffel S, Bernard C (2024) Model Risk Management: Risk Bounds under Uncer-
tainty (Cambridge University Press), ISBN 9781009367189, URL http://dx.doi.org/10.1017/
9781009367189.
Shafieezadeh-Abadeh S, Esfahani PM, Kuhn D (2015) Distributionally robust logistic regression. Proceed-
ings of the 28th International Conference on Neural Information Processing Systems - Volume 1,
1576–1584, NIPS’15 (Cambridge, MA, USA: MIT Press), URL http://dx.doi.org/10.48550/arxiv.
1509.09259.
Steinwart I, Pasin C, Williamson RC, Zhang S (2014) Elicitation and identification of properties. Annual
Conference Computational Learning Theory, URL https://api.semanticscholar.org/CorpusID:
15711300.
Wang R, Ziegel JF (2015) Elicitable distortion risk measures: A concise proof. Statistics and Probability
Letters 100:172–175, URL http://dx.doi.org/10.1016/j.spl.2015.02.004.
Ziegel JF (2016) Coherence and elicitability. Math. Finance 26(4):901–918, URL http://dx.doi.org/10.
1111/mafi.12080.