Content uploaded by Aleksandr G. Alekseev

Author content

All content in this area was uploaded by Aleksandr G. Alekseev on Sep 10, 2019

Content may be subject to copyright.

Content uploaded by Don Ross

Author content

All content in this area was uploaded by Don Ross on Mar 31, 2018

Content may be subject to copyright.

Deciphering the Noise:

The Welfare Costs of Noisy Behavior

Aleksandr Alekseev, Glenn W. Harrison, Morten Lau and Don Ross∗

August 2019

Abstract

Theoretical work on stochastic choice mainly focuses on the sources of choice ran-

domness, and less on its economic consequences. We attempt to close this gap by

developing a method of extracting information about the monetary costs of noise from

structural estimates of preferences and choice randomness. Our method is based on

allowing a degree of noise in choices in order to rationalize them by a given structural

model. To illustrate the approach, we consider risky binary choices made by a sample

of the general Danish population in an artefactual ﬁeld experiment. The estimated

welfare costs are small in terms of everyday economic activity, but they are consider-

able in terms of the actual stakes of the choice environment. Higher welfare costs are

associated with higher age, lower education, and certain employment status.

Keywords: stochastic choice, choice under risk, welfare costs, behavioral welfare eco-

nomics

JEL codes: D61, D81, C93

∗Economic Science Institute, Chapman University, USA (Alekseev); Department of Risk Management &

Insurance and Center for the Economic Analysis of Risk, Robinson College of Business, Georgia State Univer-

sity, USA (Harrison); Copenhagen Business School, Denmark (Lau); School of Sociology, Philosophy, Crimi-

nology, Politics and Government, University College Cork, Ireland; School of Economics, University of Cape

Town, South Africa; and Center for Economic Analysis of Risk, Robinson College of Business, Georgia State

University, USA (Ross). Harrison is also aﬃliated with the School of Economics, University of Cape Town.

E-mail contacts: alekseev@chapman.edu,gharrison@gsu.edu,mla.eco@cbs.dk and don.ross@uct.ac.za. We

are grateful to the Danish Social Science Research Council (Project #12-130950) for ﬁnancial support.

1 Introduction

Stochastic choice has become an active area of research in recent years, motivated primar-

ily by two considerations. First, a large body of empirical evidence shows that stochastic

choice is a robust empirical phenomenon,1and much work has been devoted to explaining

this behavior.2Second, models of stochastic choice provide researchers with econometric

tools to estimate structural models in a broad range of applications. The primary inter-

est in applying a model of stochastic choice is to recover the structural parameters of the

deterministic part of a model, such as risk or time preferences. Little attention has been

given, however, to the systematic economic interpretation of the parameter estimates of the

stochastic part, which determine the magnitude of choice randomness. The interpretation of

these parameters is important for understanding the economic value of choice randomness,

which has implications for the quality of decision making, and also for a better understanding

of the underlying “source” models of stochastic choice. We study the economic consequences

of stochastic choice by developing an intuitive method of translating the estimates of the

stochastic part into economically tractable terms.

Consider a generic structural model of discrete choice that uses a standard multinomial

logit model of stochastic choice,3which assigns each discrete alternative a choice likelihood

Paccording to

P(a|β, µ) = exp(U(a|β)/µ)

Pa0∈Aexp(U(a0|β)/µ).(1)

In this expression, aand a0are alternatives, such as lotteries or dated outcomes, from a

set of all alternatives A. The deterministic part of this structural model is parametrized

by a vector of behavioral parameters β, which could represent, for instance, an agent’s

risk or time preferences. For example, in the case of risk preferences, βcould be a risk

1Nogee and Mosteller (1951) provide the earliest evidence of stochastic choice, followed by Tversky

(1969), Starmer and Sugden (1989), Camerer (1989), and Ballinger and Wilcox (1997).

2Wilcox (2008) provides an excellent overview of many popular stochastic models of choice under risk.

Recent examples include Swait and Marley (2013), Wallin, Swait, and Marley (2018), Matˆejka and McKay

(2015) and Agranov and Ortoleva (2017).

3Also known in the literature as the strong utility model or the Fechnerian model.

1

aversion parameter and Ucould be the expected utility of a risky alternative; in the case

of time preferences, such as the quasi-hyperbolic discounting model, βwould comprise the

exponential and hyperbolic discounting parameters and Uwould be the discounted utility

of an income stream. The behavioral parameters determine the aggregate utility function U

assigned to each alternative.

The stochastic part of the model is parametrized by µ, often called the noise parameter.4

The noise parameter determines how sensitive choice likelihoods are to the maximization of

utility Uaccording to a given structural model. As noise tends to zero, an agent will almost

surely choose the alternative with the highest utility. When noise goes to inﬁnity, the agent

will assign equal likelihoods to choosing each alternative regardless of their utilities. Higher

values of µthus imply a higher magnitude of choice randomness in this popular speciﬁcation.

Three issues arise with the interpretation of the estimates of the noise parameter. First,

while the eﬀect of µon choice likelihoods is clear, one cannot readily interpret a particular

estimate of noise in economic terms.5A monetary value assigned to a noise estimate, on

the other hand, would provide clear information about the economic consequences of choice

randomness. Second, since the noise parameter is unbounded from above, it is diﬃcult to

judge whether the randomness of an agent’s choices is high or low. A value deﬁned on the

unit interval would solve this problem.6Third, the raw estimates of µare not well suited for

interpersonal comparisons, since behavioral parameters βalso change across people. Having

choice randomness expressed in common units, such as money, and taking into account the

interpersonal diﬀerences in βwould help to overcome this issue. Aspects of these three issues

4In the game theory literature on Quantal Response Equilibrium due to McKelvey and Palfrey (1995),

which applies stochastic choice to strategic settings, it is common to use an alternative parametrization

λ≡1/µ.

5In the existing literature (von Gaudecker et al.,2011;Bland,2018), an estimate of noise is sometimes

interpreted as the likelihood of choosing the best alternative (among the two available) for a given diﬀerence

in utilities (or certainty equivalents) between them. While this number is informative of the economic

consequences of choice randomness, it does not provide a monetary measure of the welfare costs associated

with stochastic choice.

6The parameter of the tremble model of stochastic choice (Harless and Camerer,1994) has this property

and thus allows one to evaluate the relative magnitude of choice randomness. However, an estimate of

the tremble parameter would still require an economic interpretation. See Carbone and Hey (2000) for a

comparison between the tremble model and the Fechnerian model.

2

arise not only in the standard multinomial logit model but also in its modiﬁcations, such

as the contextual utility model of Wilcox (2011) or speciﬁcations that substitute certainty

equivalents of alternatives for their expected utilities, such as von Gaudecker et al. (2011).

We address these issues by converting an estimate of µinto two intuitive measures.7

The ﬁrst measure, absolute welfare cost (AWC), puts a dollar value on choice randomness.

It shows how much money, in certainty equivalent terms, an agent would be allowed to

“waste” compatibly with rationalization of her choices by an underlying structural model.8

The second measure, relative welfare cost (RWC), scales the absolute welfare cost by the

monetary value at stake in a choice context. The relative welfare cost is thus deﬁned on the

unit interval. It shows what proportion of the total monetary value at stake an agent would

be allowed to waste compatibly with rationalization of her choice by the model.9

Our approach rests on a careful interpretation of the concepts of “noise” and “waste.”

We follow the descriptive, structural literature on risk preferences by assuming a speciﬁc

model of the manner in which choice randomness is rationalized. In the language of Infante,

Lecouteux, and Sugden (2016, p. 21), this is

...not an inference about the hypothetical choices of the client’s inner rational

agent, but rather a way of regularising the available data about the client’s

preferences so that it is compatible with the particular model of decision-making

that the professional wants to use. Regularisation in this sense is almost always

needed when a theoretical model comes into contact with real data.

In our case the subject being evaluated is the “client,” and we are the “professional.” Thus

we consistently use the expression “noise,” or some synonym, rather than “error.” When

it comes to us using this regularised model of the agent, we may then adapt the “inten-

7While the discussion below focuses on the multinomial logit model and its modiﬁcations, a similar logic

can be applied to other models of stochastic choices, such as the trembles model (Harless and Camerer,1994)

or the random preferences model (Loomes and Sugden,1995;Gul and Pesendorfer,2006).

8While our discussion focuses on individual decision-making, our method can also be used to study

stochastic choice in group decision-making (Bone, Hey, and Suckling,1999).

9Other ways to measure the welfare costs of stochastic choice might exist, however we ﬁnd that using

monetary measures based on certainty equivalents to be intuitive and transparent. It might be the case that,

depending on a particular research question, one might be more interested in an absolute measure than a

relative measure, or vice versa. Our goal here is to provide the general tools, which can then be adapted to

a particular research question.

3

tional stance” towards the evaluation of an agent’s behavior, using a philosophical approach

developed by Dennett (1987), theoretically interpreted for use in economics by Ross (2014,

ch. 4), and explicitly applied to behavioral welfare economics by Harrison and Ross (2018,

§5). This perspective, which has become the dominant one in the philosophy of psychology,

emphasizes that preferences and beliefs are not ﬁxed internal states of people, but are ra-

tionalizations of choice behaviors that people rely on to interpret one another. This applies

mutatis mutandis to self-interpretation. Preference and belief attributions pick out “real

patterns” in choice behaviors (Dennett,1991), and these patterns, which typically involve

some noise, are the basis for assessing people’s goals, and hence, for economics, their welfare.

Only then can we use the expression “waste.” Similarly, when we characterize behavior as

being “imperfectly rational” below, that also reﬂects our intentional stance, rather than a

claim that the agent has made an error in cognitive processing or problem representation.

Our measures of the welfare costs of noisy behavior are consistent with the model-based

approach advocated by Manzini and Mariotti (2014). This means that in order to calculate

our welfare cost measures, we assume speciﬁc deterministic and stochastic models of the

decision-making process. These assumptions allow us to derive precise (in the sense of being

point estimates) and eﬃcient (in the sense of eﬃciently using available data, explained in

Section 2.3) values of welfare costs. We recognize the potential sensitivity of our results to

these assumptions, and address them in Appendix A.

Our absolute and relative welfare cost measures allow one to conveniently evaluate the

economic signiﬁcance of choice randomness, its relative magnitude, and to compare the mag-

nitude of choice randomness across people. The implications of these measures for an agent’s

behavior, however, will ultimately depend on the underlying model of the source of choice

randomness adopted by a researcher. This is an important point since diﬀerent “source”

models of stochastic choice often lead to the same choice likelihoods, such as the likelihoods

generated by the multinomial logit model presented above. For instance, the Random Utility

model due to Marschak (1960) assumes that when an agent makes an optimal choice, the

4

choice randomness is due to the perturbations in her utility function that are unobservable

to a researcher. The noise parameter in the Random Utility model is then proportional to

the variance of the unobserved component of utility. High estimated welfare costs would

imply that the stochastic part of the structural model dominates the deterministic part, i.e.,

the structural model cannot explain the agent’s choices well. The welfare costs can then be

viewed as measures of a model’s ﬁt.10

Recent studies oﬀer an alternative view on choice randomness as an optimal response to

costly frictions in the decision-making process. For example, these frictions may be caused

by the need to collect the relevant information to make a choice, as in Rational Inattention

models of Caplin and Dean (2015) and Matˆejka and McKay (2015). The noise parameter

in a Rational Inattention model represents marginal information costs. The estimates of

welfare costs in this type of models can then be interpreted as aggregate information costs,

or losses that an agent incurs relative to an ideal case of no information costs. Another

example of frictions is the pursuit of multiple goals that cannot be obtained simultaneously

(Swait and Marley,2013;Wallin et al.,2018). An agent is assumed to balance the goal of

choosing the best available alternative with the goal of having diversity in choices. Noise

parameters in this model represent the relative weight of the second goal. The estimates of

welfare costs in this type of models can be interpreted as the economic value that an agent

places on the goal of having diversity or, alternatively, as the loss an agent incurs relative to

a case of having a single goal of choosing the best alternative.

We apply our method to the data from an artefactual ﬁeld experiment in Denmark. The

subjects came from a sample of the general Danish population and were asked to make a series

of choices between two risky alternatives. Each subject answered a detailed demographic

survey, which we use to characterize the eﬀects of demographic characteristics on the observed

heterogeneity in the AWC and RWC. We ﬁnd that the average AWC are around 67 Danish

10 Recent work by Halevy et al. (2018) provides a promising example of how welfare costs can be used as

a measure of ﬁt.

5

kroner ($10)11 and thus negligible for the subjects’ natural economic environment. However,

the RWC are quite signiﬁcant, at 0.87 on average. There is also considerable variation

among the subjects in terms of their AWC and RWC. Regression analysis shows that certain

demographic characteristics are associated with higher costs. In particular, subjects who

are older, less educated, and have a particular employment status, have larger welfare costs.

Females have higher AWC than males, but do not diﬀer in RWC.

Section 2describes the method of converting an estimate of noise into welfare costs

measured in monetary terms and provides an explicit algorithm for computation in a binary

choice case. Section 3applies the method to data from an artefactual ﬁeld experiment in

Denmark involving choice under risk and studies the properties of the welfare costs, as well

as their demographic correlates. Section 4discusses connections with previous literature.

Section 5concludes.

2 Method

We ﬁrst look at a general case when the set of alternatives is continuous. This case allows us

to clearly demonstrate the logic behind our method of extracting the welfare cost information

from a noise estimate. Then we turn to a more common discrete case with two alternatives

and explicitly describe the algorithm to implement our method.

11 Throughout the text, we use an exchange rate of 1 Danish krone = $0.15 that was prevalent at the time

of the experiment.

6

2.1 General Case

Consider an agent choosing from a set of alternatives indexed by real numbers on a compact

interval A= [ al, ah]. Each alternative generates a lottery12

l(a) = {x1(a), . . . , xk(a); q1(a), . . . , qk(a)},

a∈A, xi∈R, qi∈R+,∀i= 1, . . . , k,

k

X

i=1

qi= 1,

where xiare monetary outcomes and qiare respective probabilities of obtaining those out-

comes.

This setting could represent allocating resources between two state-contingent accounts,

as in Choi et al. (2007). Each allocation in this example is an alternative with k= 2

outcomes, x1(a) and x2(a), and equal probabilities of each outcome. The minimum (al= 0)

and maximum (ah>0) amounts an agent can allocate to account 1 will deﬁne the interval

of alternatives A. Then x1(a) = aand x2(a) = b(ah−a), where −b < 0 is the slope of the

budget line and q1(a) = q2(a),∀a∈A.13

The risk elicitation task of Gneezy and Potters (1997) is another example of such a

setting. In this example, the minimum (al= 0) and maximum (ah>0) amounts a subjects

can allocate to a risky asset deﬁne the set of alternatives A, where ahis the initial endowment.

A subject’s choice of how much of the endowment to allocate to a risky asset, a, generates

lotteries with two outcomes given by x1(a) = ah−a(the asset yields no return) and x2(a) =

ah+a(k−1) (the asset yields a positive return k−1). The probabilities of outcomes are

given exogenously and do not depend on a.

Each alternative ahas an aggregate utility U(a)≡U(l(a)) deﬁned by an assumed struc-

tural model of choice under risk. Monetary outcomes are transformed using u:R7→ R, the

von Neumann-Morgenstern utility function. Each value of U(a) can be translated into a cer-

12 The lottery itself does not need to be discrete. An alternative can generate a continuous probability

density.

13 In an actual experiment, the set of alternatives is, of course, discrete. This choice set, however, comes

close to being continuous.

7

tainty equivalent m(a), deﬁned by u(m(a)) = U(a). The ordering of alternatives is preserved

for the certainty equivalent transformation: U(a)>U(b)⇔m(a)>m(b),∀a, b ∈A.

Assume that Uis concave and reaches its unique maximum (minimum) at a∗(a∗), as does

the certainty equivalent function. Deﬁne the maximum certainty equivalent as m∗≡m(a∗),

and the minimum certainty equivalent as m∗≡m(a∗). If the agent always chooses the best

alternative a∗, we call this behavior perfectly rationalizable (by an assumed model of choice

under risk). On the other extreme, if the likelihood of choosing a∗is the same as for any other

alternative, we call such a behavior non-rationalizable. We are concerned with the behavior

in between, which is neither perfectly rationalizable nor non-rationalizable, a behavior that

we call imperfectly rationalizable.

The degree of this imperfection14 is characterized by a number ε, 0 6ε6∆m, with

∆m≡m∗−m∗. Choices that lead to certainty equivalents within εdistance from the

maximum certainty equivalent can be viewed, from the perspective of a model, as imperfectly

rationalizable.15 These choices form an optimal region A∗deﬁned by

A∗(ε) = a∈A|m(a)>m∗−ε.(2)

The degree of imperfection εshows how much monetary welfare an agent would be allowed to

waste to make her choices rationalizable by the model, and eﬀectively includes these choices

in the optimal region. In other words, εrepresents the welfare costs measured in monetary

units. Our goal is to link these costs to noise.

The allowed degree of imperfection co-varies with the width of the optimal region. If εis

set to 0, the optimal region will consist only of the best alternative a∗. If εis high enough,

the optimal region will coincide with the whole set of alternatives A. Figure 1illustrates

how the optimal region varies with the degree of imperfection. Geometrically, the optimal

14 This term should be understood as an imperfection of a given model to regularise data, rather than a

statement about an agent making decision errors.

15 The idea of allowing an agent some degree of imperfection in choices is not new. For example, Harrison

(1994) introduces a similar quantity based on an agent’s subjective cost of choosing one alternative versus

the other to explain many EUT violations.

8

region is the line segment a∗

l, a∗

h.

A*

m*

m*− ε

al*a*ah*

Alternatives

Certainty equivalent

(a) Low Degree of Imperfection

A*

m*

m*− ε

al*a*ah*

Alternatives

Certainty equivalent

(b) High Degree of Imperfection

Figure 1: Optimal Region and Degree of Imperfection

The optimal region and the degree of imperfection are the ﬁrst two components that

we need to interpret an estimate of noise. The third component comes from a stochastic

model p:A7→ R+, which generates choice likelihoods over the set of alternatives. Some

alternatives fall into the optimal region, by deﬁnition. By integrating the density p(a) over

this region we get the proportion of choices that are counted, from the perspective of a

model, as imperfectly rationalizable for a given degree of imperfection. We call this measure

adegree of rationalizability (DoR):

ρ(µ, ε) = ZA∗(ε)

p(a) da. (3)

The DoR has several intuitive properties, two of which turn out to be crucial for our

analysis, and can be represented graphically. Figure 2shows that as the degree of imper-

fection increases, the optimal region expands and the DoR, represented by the gray shaded

area, increases. Figure 3shows that as the noise goes up, the density ﬂattens out and the

9

probability mass shifts from the optimal region to the outside area, reducing the DoR.

A*

ρ(µ, ε)

Alternatives

Density

(a) Low Degree of Imperfection

A*

ρ(µ, ε)

Alternatives

Density

(b) High Degree of Imperfection

Figure 2: Degree of Rationalizability and Imperfection

The DoR for certain values of noise and imperfection has attractive interpretations. The

quantity ρ(∞, ε) tells us what proportion of choices are counted as rationalizable for a given

imperfection εwhen they are, in fact, non-rationalizable. It represents a Type II error in

a test to detect rationalizability, and the quantity 1 −ρ(∞, ε) is the power of this test.

This power will decrease as the allowed degree of imperfection increases or as the set of

alternatives shrinks. The value of DoR at ρ(ˆµ, 0) measures the proportion of rationalizable

choices for an estimated level of noise ˆµand no imperfection. We refer to it as the default

degree of rationalizability or DDoR.

We now have all the tools to decipher the noise. We do this by linking an estimate of µ,

the value of which is hard to interpret, to the degree of imperfection, a monetary measure that

has an intuitive economic interpretation as the welfare cost, or monetary welfare required

to rationalize the agent’s choices by a model. In order to link them, we need to reverse the

steps we followed so far. Currently, we introduced a degree of imperfection εthat deﬁnes an

optimal region A∗. The optimal region combined with a stochastic model, parameterized by

10

A*

ρ(µ, ε)

Alternatives

Density

(a) Low Noise

A*

ρ(µ, ε)

Alternatives

Density

(b) High Noise

Figure 3: Degree of Rationalizability and Noise

µ, yields a value of DoR. Now suppose that instead we start with a DoR measure and ﬁx it

at some target level α. Let an estimated value of the noise be ˆµ. The question is how much

imperfection should be allowed for 100 ×α% of the choices to be rationalized for a given

noise. In other words, we need to ﬁnd εthat satisﬁes

ρ(ˆµ, ε) = α. (4)

This equation establishes an implicit function, ε(ˆµ;α). For the purpose of our analysis, the

following property of this function is important.

Proposition 1. For a given α, the degree of imperfection as a function of noise, ε(µ;α), is

monotonically increasing:16

dε

dµ >0.

16 We note that in the case when alternatives are discrete rather than continuous, as discussed below, the

DoR as a function of imperfection will not be continuous and thus it will not be possible to match the target

DoR αperfectly. We address this issue by using a discrete grid for εand an interpolated version of ρ(ˆµ, ε).

The interpolated DoR function on a discrete grid is continuous and thus Proposition 1applies.

11

Proof. See Appendix B.

This property implies that noise and imperfection are in a direct and monotonic relation.17

This property is important since more noise should imply higher welfare costs, which in our

case are measured by imperfection. If imperfection and noise were not in a direct and

monotonic relation, such an interpretation would be impossible. The relation between εand

µcomes from the fact that the DoR is decreasing in noise and increasing in imperfection.

From these properties it also follows that higher values of αimply higher values of ε. The

more choices we wish to rationalize, for a given value of a noise, the more imperfection we

should allow. The choice of the target αis left to the discretion of a researcher. In our

empirical analysis we use the values of 0.9,0.95, and 0.99, which appear to be reasonable

targets.

So far we have focused on a single choice context, but in practice we observe agents make

choices over a series of rounds of a choice task. Suppose we observe an agent’s choices over n

rounds indexed by j= 1, . . . , n, and in each round the mapping of alternatives ainto lotteries

lj(a) is diﬀerent. In the context of an allocation task, the variation is introduced by changing

the slope of a budget line bjand a maximum amount aj

hthat can be allocated to one of the

accounts: Aj= [ 0, aj

h], xj

1(a) = a, xj

2=bj(aj

h−a). We can repeat all the previous steps in

deriving the DoR, but now it will diﬀer by the round: ρj(µ, ε). What remains common across

rounds, however, is the degree of allowed imperfection ε. We assume that µand preferences

remain ﬁxed for the duration of a choice task. We can then aggregate the DoR from all the

choices by averaging across the DoR for single choices:

ρ(µ, ε) = 1

n

n

X

j=1

ρj(µ, ε).(5)

Naturally, the average follows all the properties of the DoR for a single choice. In particular,

it increases in εand decreases in µ. We can then use the aggregate DoR in (5) to calculate

17 This property holds for a given agent, or rather given risk preferences. This property will not hold

perfectly across agents whose preferences are diﬀerent.

12

the imperfection needed to reach a target αin equation (4).

After calculating the value of εthat satisﬁes equation (4), ε(ˆµ, α), it makes sense to

adjust this value to take into account the fact that the degree of imperfection should not

exceed the diﬀerence between the maximum and minimum certainty equivalents for a given

choice. Since a common εis applied to all the rounds of a choice task, for some rounds it can

actually exceed ∆m. Increasing imperfection beyond this diﬀerence does not have any eﬀect

on the DoR and would imply that we allow an agent to waste more monetary welfare than

there actually is. This issue can be addressed by bounding εby ∆m, and then averaging

across all the rounds:

¯ε(ˆµ, α) = 1

n

n

X

j=1

min ε(ˆµ, α),∆mj.(6)

We call the resulting measure of imperfection Absolute Welfare Costs (generated by noise

ˆµ, with 100 ×α% of choices rationalized), or AWC. It represents the monetary welfare that

the agent would be allowed to give up for exactly 100 ×α% of her choices to be rationalized

by the model, given noise ˆµ. For any estimated value of noise and any desired proportion of

choices we would like to rationalize we can, therefore, always ﬁnd a precise dollar value of

the welfare costs.

We can go further and translate the welfare costs into relative terms, to compare these

costs with the actual stakes of a choice context. For example, an AWC of $1 may not look

like much, but if ∆mjare close to $1 in all the rounds, almost all the welfare would have to

be sacriﬁced to rationalize an agent’s choices. We divide the degree of imperfection by the

diﬀerence between the maximum and minimum certainty equivalents for every round, and

average across all the choices:18

˜ε(ˆµ, α) = 1

n

n

X

j=1

min (ε(ˆµ, α)

∆mj

,1).(7)

The resulting degree of imperfection represents Relative Welfare Costs (generated by

18 Since the resulting quantity has to be a fraction, we bound this ratio by 1.

13

noise ˆµ, with 100 ×α% of choices rationalized), or RWC. Another beneﬁt of this measure

is that it allows one to appreciate the relative magnitude of noise, since RWC are bound

between 0 and 1, while a raw estimate of noise is unbounded from above. If rationalizing

100 ×α% of the choices requires on average almost all the diﬀerence between the maximum

and the minimum certainty equivalents, in which case RWC are close to 1, that clearly

indicates that the choices are close to being non-rationalizable, from the perspective of the

model. On the other hand, if it requires only a small fraction of this diﬀerence, in which case

RWC are near 0, then choices are close to being perfectly rationalizable, from the perspective

of the model.

2.2 Binary Choice

An important special case arises when an agent has only two alternatives to choose from.

This is one of the most common experimental designs in risk elicitation tasks.19 In this case

the set of alternatives in each round is A={a1, a2}. Without loss of generality, assume that

alternative a2always gives the highest utility, so that Uj(a2)> Uj(a1), j = 1, . . . , n, i.e.,

a∗

j=a2, using the notational convention Uj(a)≡Ulj(a). The maximum and the minimum

certainty equivalents in each round jare m∗

j≡mj(a2) and mj∗≡mj(a1), respectively. The

optimal region and the DoR can then take only two values:

A∗

j(ε) =

a2, ε < ∆mj,

A, ε >∆mj,

ρj(µ, ε) =

pj(a2), ε < ∆mj,

1, ε >∆mj,

(8)

where pj(a2) is the likelihood of choosing alternative a2in round j.

Suppose we observe a series of binary choices made by a subject and estimate a structural

model of risk preferences in which ˆγis a vector of estimated risk parameters and ˆµis an

estimate of noise. The ˆγvector in the Expected Utility Theory (EUT) case is typically just

19 For example, the risk elicitation tasks developed and popularized by Hey and Orme (1994) and Holt

and Laury (2002) apply to the binary choice case.

14

the relative risk aversion. In the case of Cumulative Prospect Theory (CPT) ˆγincludes

the risk aversion parameter(s), the probability weighting parameter(s), and the loss aversion

parameter.20 The computation of AWC and RWC (rationalizing 100 ×α% of the choices)

from these data can be performed using the following algorithm.

1. For each round, compute the aggregate utilities of both alternatives, Uj(a1; ˆγ), Uj(a2; ˆγ),

j= 1, . . . , n.

2. Compute the certainty equivalents of both alternatives, mj(a1), mj(a2), using the in-

verse transformation, mj(a) = u−1(Uj(a; ˆγ); ˆγ), a ∈A, and the diﬀerence between

them, ∆mj.

3. Compute the likelihoods of each alternative using the stochastic model, pj(a; ˆγ, ˆµ),

a∈A.

4. Start with ε= 0. Compute the DoR in each round ρj(ˆµ, ε) using (8). Compute the

aggregate DoR ρ(ˆµ, ε) using (5).

5. If ρ( ˆµ, ε)< α, increase εby a small number ∆ε > 0.

6. Repeat Step 5 until the aggregate DoR reaches the target level of α.21

7. Compute the AWC ¯ε( ˆµ, α) using (6). Compute the RWC ˜ε(ˆµ, α) using (7).

2.3 Alternative Measures

Note that the proposed computation of welfare costs does not involve actual choices. After

estimating risk parameters and noise, we ignore whether the actual choices corresponded

to the maximum certainty equivalent or not. A question then arises: what choices do we

20 The parametrization will also depend on the utility and probability weighting functions used. For

example, if an expo-power utility function is used, it will have two parameters rather than one.

21 In practice, due to the discreteness of ρ(ˆµ, ε) it will often be impossible to match the target αexactly.

We use linear interpolation to handle this issue.

15

rationalize, if not actual choices? This question also suggests an equivalent computation

based on actual choices rather than on likelihoods.

Consider the following alternative algorithm. Start by computing the implied (by the

model) decisions based on certainty equivalents. Compare actual and implied decisions

by looking at the proportion of times when implied and actual decisions coincide. This

proportion gives the actual default DoR. Next, calculate the vector of the diﬀerences in

the certainty equivalents (CE diﬀerences) for the cases when implied and actual decisions

disagree. These are the “mistakes,” from the perspective of the model, we need to “correct,”

or regularise by adding a structural model of behavioral noise. Start with ε= 0 and increase

it by a small positive amount. When εis lower than the CE diﬀerence, the DoR in that

round is zero; otherwise it equals one, meaning that implied and actual decisions become

equivalent. After that, compute the relative proportion of times when rationalized decisions

coincide with the actual ones. Increase εuntil this proportion reaches the target level.

Compute the average of the bounded (by CE diﬀerence) εfor the absolute actual welfare

costs, and the average of their ratios to CE diﬀerences for the relative actual welfare costs.

Although this alternative algorithm is almost identical to the previous algorithm, there

is a subtle diﬀerence. This diﬀerence makes us choose in favor of the method described in

§2.2, which involves rationalizing potential choices, as opposed to actual ones. Consequently,

we obtain the estimates of the potential welfare costs, while the alternative method would

give us the actual welfare costs. The key diﬀerence between the two methods lies in the fact

that the likelihoods of choices represent what could have been chosen if the same options

were presented many times. We view the approach of using potential choices as extracting

more information from the same data points. The informational gain is obtained through

the introduction of a particular structure that describes the choice likelihoods.

Of course, if the two methods gave completely diﬀerent estimates, one would need a

stronger argument in favor of one method against the other. Comparing the potential and

actual welfare costs, however, shows that the measures are tightly associated in practice (not

16

reported here). In principle, one could easily substitute one method for another.

Another alternative method of computing the absolute welfare costs would arise if we

reconsidered equation (6), which involves bounding the value of imperfection by the diﬀer-

ence in certainty equivalents. This is not required and we could, as well, have computed

the unbounded absolute welfare costs.22 One might expect that we would obtain higher esti-

mates of the AWC in that case. Indeed, our calculations show (not reported here) that the

unbounded AWC are on average twice as large as the bounded AWC, and both measures

are tightly associated. We prefer to use the bounded measure, however, since it represents

only the welfare costs that can be potentially incurred, while the unbounded measure allows

wasting more monetary welfare than there actually is.

3 Empirical Analysis

3.1 Data

We present the results for 218 adult Danes, a subsample of a larger ﬁeld study by Harrison,

Jessen, Lau, and Ross (2018). The subjects for the original study were recruited from two

internet-based panels with 165,000 active members combined. The sample frame consisted of

65,592 adult Danes between ages 18 and 75. The sample was stratiﬁed by sex and age across

three regions of Denmark: greater Copenhagen, Jutland, and Funen and Zealand.23 The

completed sample consisted of 8,405 respondents, or 12.8% of the sample frame. Invitations

were sent out by email, and the subjects could participate in a survey using internet-browsers

on their computers or mobile devices. The experiment was implemented as an artefactual

ﬁeld experiment (Harrison and List,2004).

Table 1provides a summary of the socio-demographic characteristics of our subsample

22 The computation of the RWC must involve bounding, since they represent a fraction that must lie in

the unit interval.

23Greater Copenhagen area was assigned a weight of 50%, and the other two regions were assigned equal

weights of 25%.

17

who were invited to participate in an experiment after completing the online survey. Slightly

less than half of the sample were females and the average age was just less than 50 years. The

majority of the sample had college education, and the distribution of income across diﬀerent

income brackets was roughly equal. Most of the participants were either employed as public

servants or retired. More than 75% of our sample comes from the Greater Copenhagen area.

The subjects made binary choices across 60 pairs of lotteries and answered a set of

demographic questions. Once all the lottery choices were made, one of the choices was

selected randomly for payoﬀ. Table C.1 in the Appendix contains the battery of lotteries

that were given to the subjects. This battery is based on designs by Loomes and Sugden

(1998), Wakker, Erev, and Weber (1994) and Cox and Sadiraj (2008). Together they provide

a powerful test of EUT and RDU.

3.2 Estimation Procedure

Computation of the welfare costs relies on structural estimates of risk preferences γand

noise µ. We implement the estimation in the standard fashion by maximizing the Bernoulli

log-likelihood function at the level of a subject:24

(ˆγ, ˆµ) = arg max

γ,µ

n

X

j=1 yjln pj(a2;γ, µ) + (1 −yj) ln pj(a1;γ , µ),

where yj≡I(a=a2)jis an indicator variable that takes a value of 1 whenever an alternative

a2is chosen in round j. The alternative a2is taken to be the one on the right side of the

screen without loss of generality, and we no longer assume that it gives the highest aggregate

utility in all the rounds.

We assume that the choice probability pj(a2;γ, µ) is given by the strong utility model in

24 We ﬁnd that the estimation procedure successfully converges for 183 subjects (84% of the sample). For

the rest of our subjects, the estimation procedure terminates after a number of iterations and yields the

best parameter values at the time of termination. The results in subsequent sections are reported for the

full sample of subjects. Using only the subset of sub jects with successful convergence yields quantitatively

similar results, e.g., compare Figure 4that uses the full sample and Figure D.7 (in Appendix) that uses the

subset of subjects.

18

Table 1: Socio-Demographic Characteristics of the Sample

Characteristic Mean

Female 0.46

Age 48.06

Education

Vocational training 0.19

Low level of formal education 0.21

College, less than 3 years 0.09

College, 3 to 4 years 0.27

College, 5 or more years 0.24

Annual household income, before tax

Less than 300k DKK 0.23

300k–500k DKK 0.23

500k–800k DKK 0.23

More than 800k DKK 0.17

Not reported 0.14

Occupation

Public servant 0.42

Student 0.12

Unemployed 0.04

Retired 0.23

Skilled worker 0.03

Unskilled worker 0.06

Self-employed 0.06

Other 0.04

Family

Has children 0.25

Lives with a partner 0.54

Geographic area

Copenhagen 0.78

Central Denmark 0.07

Zealand 0.09

Southern Denmark 0.06

19

the logit form25

pj(a2;γ, µ) = exp Uj(a2;γ)/µ

exp Uj(a2;γ)/µ+ exp Uj(a1;γ)/µ= Λ Uj(a2;γ)−Uj(a1;γ)

µ,

where Λ(·) denotes the logistic cumulative density function, and pj(a1;γ , µ) = 1−pj(a2;γ, µ).

We also assume that the lotteries are compared according to their expected utilities

(dropping an index for the round)

U(a;γ) =

k

X

i=1

qi(a)u(xi(a); γ),

and the ufunction takes the constant relative risk aversion form:

u(x;γ) = x1−γ

1−γ.

Nothing in our approach relies on assuming an EUT model or a strong utility model in the

logit form. In fact, we could have proceeded in a way suggested by Harrison and Ng (2016)

and estimated diﬀerent models for diﬀerent subjects, classifying our subjects as EUT or

RDU, for example. Alternatively, as suggested by Monroe (2017), we could have assumed an

RDU model for all the subjects, since correct classiﬁcation has signiﬁcant data requirements.

Appendix Ademonstrates this important generalization by regenerating all results assuming

an RDU model of risk preferences, as well as assuming a diﬀerent utility function, the expo-

power utility function, and a diﬀerent model of stochastic choice, the contextual utility model

of Wilcox (2011).

25 One could alternatively use certainty equivalent functions minstead of the aggregate utility functions

Uin the speciﬁcation of the stochastic model, as in Bruhin et al. (2010) or von Gaudecker et al. (2011).

Using this alternative speciﬁcation only changes the scale of the estimates of the noise parameter, and does

not change the estimates of risk parameters or the estimated likelihoods of choosing each alternative. The

algorithm for computing welfare costs and their magnitude would remain the same.

20

3.3 Welfare Costs

Figure 4(Panel A) shows the distribution of the individual-level estimates of AWC (in Danish

kroner, DKK) and Table 2(Panel A) presents their summary statistics.26 The distribution

of the AWC is composed of two clusters: a major cluster on the left end and a minor cluster

on the right end of the support, so that overall the distribution is right-skewed. The major

cluster is bell-shaped and fairly symmetric. The minor cluster appears to be bell-shaped but

the number of observation in it is small. As the target DoR increases, the distribution of

the AWC ﬂattens out and slides to the right end of the support.

The AWC are, on average, quite modest in size. For α= 0.95, the mean AWC are only

66.96 DKK (10.04 USD) and the median AWC are even smaller, 58.56 DKK (8.78 USD).

For 50% of the subjects, the AWC lie within 38.37 DKK (5.76 USD) and 80.76 DKK (12.11

USD) at this level of DoR. As the target level of DoR increases, the mean AWC also increase,

as expected. For α= 0.99, the mean AWC reach 88.66 DKK (13.3 USD), and the median

AWC reach 79.44 DKK (11.92 USD).

There is substantial variation among subjects in their AWC. At α= 0.95 the smallest

AWC are just 1.24 DKK (0.19 USD) while the maximum AWC are 224.23 DKK (33.63 USD),

which is roughly 3 times as large as the mean AWC. The standard deviation of AWC at this

level of DoR is 42.2 DKK (6.33 USD). The variation in AWC increases as the target DoR

goes up, which is reﬂected in higher standard deviations and higher ranges. At α= 0.99 the

minimum AWC are still tiny, just 4.56 DKK (0.68 USD), while the maximum AWC become

271.3 DKK (40.69 USD), which is again roughly 3 times as large as the mean AWC at this

level of DoR. The standard deviation reaches 48.05 DKK (7.21 USD) at this level of DoR.

While the variation in the AWC between subjects is substantial, there is also consider-

able uncertainty at the individual level. Figure 5(Panel A) shows the point estimates of

the AWC (for α= 0.95) for each subject along with the 95% conﬁdence interval around

26 As discussed in §2.3, we rationalize the potential choices, which allows us to use a ﬁne grid for the target

DoR. If we were rationalizing the actual choices instead, we would have to deal with the target DoR’s that

are fractions of 60 (the number of choice pairs in the experiment).

21

Figure 4: Distributions of AWC and RWC in the Sample

α= 0.99

α= 0.95

α= 0.9

0 50 100 150 200 250

0.000

0.005

0.010

0.015

0.020

0.000

0.005

0.010

0.015

0.020

0.000

0.005

0.010

0.015

0.020

AWC (DKK)

Density

Panel A.

α= 0.99

α= 0.95

α= 0.9

0.4 0.6 0.8 1.0

0

5

10

15

20

25

0

5

10

15

20

25

0

5

10

15

20

25

RWC

Density

Panel B.

Note: The graph shows the distributions of the individual-level estimates of AWC (Panel A) and

RWC (Panel B) for three target levels of α: 0.9, 0.95, and 0.99. The bars are the histograms

and the smooth lines are the kernel density estimates. The dashed lines show the medians of the

distributions. The AWC numbers are in DKK. For the RWC, we truncate the support at 0.4 to

improve readability of the graph. This results in dropping 11 observations for which the RWC are

below 0.4.

22

Table 2: Summary Statistics for AWC and RWC

αMean SD Min Q1 Median Q3 Max

Panel A. AWC (DKK)

0.9 50.10 36.60 0 27.60 41.00 58.90 200.00

0.95 67.00 42.20 1.24 38.40 58.60 80.80 224.00

0.99 88.70 48.00 4.56 56.70 79.40 110.00 271.00

Panel B. RWC

0.9 0.77 0.15 0 0.73 0.81 0.89 0.93

0.95 0.87 0.13 0.18 0.83 0.91 0.94 0.98

0.99 0.95 0.09 0.41 0.95 0.98 0.99 1.00

Notes: The table reports the summary statistics for the three

samples of the individual-level estimates of AWC and RWC com-

puted at diﬀerent target levels of DoR: 0.9, 0.95, and 0.99. The

AWC numbers are in DKK. RWC are measured as proportions.

those estimates. The conﬁdence intervals are computed using bootstrap methods. We rank

subjects based on their point estimates of AWC. The vertical axis represents the percentile

rank of each subject. The uncertainty in the individual-level estimates of AWC stems from

the combined uncertainty in the estimates of risk aversion and noise and tends to increase

with the percentile rank.

Figure 4(Panel B) shows the distribution of the individual-level estimates of RWC and

Table 2(Panel B) presents their summary statistics. At α= 0.9 the distribution is very

ﬂat and has a long left tail resulting in a negative skew. As the target DoR increases, the

distribution of the RWC shifts to right and becomes more concentrated while preserving a

long left tail. The distribution features some observations on the left tail with unusually low

RWC. For some of these outcomes, RWC are less than a half, even for the highest level of α.

In contrast to the AWC, the RWC are extremely high, which implies that while the

AWC are modest in size, these costs represent a substantial portion of the monetary welfare

available in the choice environment. For α= 0.95, the mean RWC are 0.87 and the median

RWC are 0.91, so that around 90% of the relative welfare has to be sacriﬁced in order to

23

Figure 5: Uncertainty in the Individual-Level Estimates of AWC and RWC

0.00

0.25

0.50

0.75

1.00

0 50 100 150 200 250 300 350 400

AWC (DKK)

Percentile Rank

Panel A.

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

RWC

Percentile Rank

Panel B.

Note: The graph shows the point estimates and the conﬁdence intervals for AWC (Panel A) and

RWC (Panel B) computed at α= 0.95. Each point on the graph represents an individual-level

estimate for a given subject. The estimates are ranked from lowest to highest, and the percentile

rank of each subject is shown on the vertical axis. The horizontal error bars show the bootstrapped

95% conﬁdence intervals around the point estimates.

24

rationalize this proportion of choices. For 50% of the subjects, the RWC lie within 0.83 and

0.94 at this level of DoR. As the target level of DoR increases, the mean RWC increase even

further. For α= 0.99, the mean RWC reach 0.95, and the median RWC reach 0.98: almost

all the welfare must be sacriﬁced in this case.27

The RWC numbers also show signiﬁcant variation across subjects. At α= 0.95 the

smallest amount of RWC is 0.18, while the maximum amount is 0.98, which is roughly 1.13

times as large as the mean amount. The standard deviation at this level of DoR is 0.13.

At α= 0.99 the minimum RWC is slightly below a half, 0.41, while the maximum amount

becomes 1, and the standard deviation is 0.09.

There is considerable uncertainty in the RWC at the individual level, just as we found

for the AWC. Figure 5(Panel B) shows the point estimates of RWC (at α= 0.95) for each

subject along with the 95% conﬁdence interval around those estimates. Contrary to the

AWC, however, the uncertainty in the RWC is highest at the lower percentile ranks. This

uncertainty tends to decrease with the percentile rank.

The preceding analysis allow us to formulate the following result.

Result 1. The welfare costs are low in terms of everyday economic activity, but are sub-

stantial for the choice environment in which they occurred.

Comparing our results to Choi et al. (2014), we ﬁnd that the subjects in our sample

require a larger fraction of the total monetary welfare to rationalize their choices. This

diﬀerence can be explained by the diﬀerences in methods. The GARP-based measure used

by Choi et al. is well-known for its relatively mild requirements on choice consistency (Beatty

and Crawford,2011).28 For example, their primary measure does not even require choices

to satisfy ﬁrst-order stochastic dominance.

27 Rationalizing all the choices would deﬁnitionally require RWC of 1 for every subject with a non-zero

noise, however small, which is the reason to use 0.99 as the highest level of α, and not 1.

28 Another potential explanation could be that there are systematic diﬀerences in the samples used.

The Choi et al. (2014) experiment was conducted in Netherlands, while our experiment was conducted in

Denmark. We believe this explanation to be unlikely a priori. The results in Blow et al. (2008) support our

claim, employing a revealed preference approach and showing that Danes are generally consistent in other

choice domains.

25

3.4 Marginal Welfare Costs

So far we have looked at the distributions of AWC and RWC for only three levels of the target

DoR. This analysis does not tell us how quickly welfare costs grow as the rationalizability

requirements become tighter, and in general what the shape of the costs as functions of α

is. Figure 6provides an answer to these questions by showing the median welfare costs as

functions of αacross all the subjects in the sample, with the dashed lines corresponding to

the 5% and 95% empirical quantiles. The lowest possible target DoR in our context is 0.5,

since the choice is binary. However, the welfare costs stay at 0 for the median subject until

αcrosses the 0.62 mark.

Figure 6: Welfare Costs as Functions of the Target DoR

0

50

100

150

200

0.5 0.6 0.7 0.8 0.9 1.0

α

AWC (DKK)

Panel A.

0.00

0.25

0.50

0.75

1.00

0.5 0.6 0.7 0.8 0.9 1.0

α

RWC

Panel B.

Note: The graph shows the AWC (Panel A) and the RWC (Panel B) as functions of the target

DoR (α). The black solid lines are the median welfare costs for each level of α. The dashed lines

below (above) the solid line represent the 5% (95%) quantiles of the welfare costs.

Panel A on Figure 6shows the graphs of the AWC in relation to α. The median AWC

tend to be a convex function of α: at ﬁrst increasing the DoR requires relatively little AWC,

but as the target becomes higher, each additional percentage point of DoR costs more and

more in terms of AWC. The graph for the RWC on Panel B of Figure 6is in a sense the mirror

image of the AWC. The RWC tends to be a concave function of α. For small values of DoR

extra percentage points of change require high welfare costs, but as the target increases these

extra points become less costly in relative terms. These observations allow us to formulate

26

the next result.

Result 2. The marginal absolute (relative) welfare costs are increasing (decreasing) with the

increase in the target DoR.

This result can be explained by the way our measures of welfare costs are computed.

Starting from a given default DoR, ρ(ˆµ, 0), we gradually increase εuntil the DoR reaches

the target, ρ(ˆµ, ε) = α. The low marginal AWC at low αtargets imply there are many choices

that can be easily rationalized by small ε, since the diﬀerence in the certainty equivalents be-

tween the alternatives must be low. At high target DoR more choices have to be rationalized,

but no “easily rationalizable” choices are left. Increasing DoR requires tapping into choice

pairs with higher diﬀerences in certainty equivalents, and hence higher marginal AWC. The

implications for the RWC graphs are the converse. At low target DoR the marginal RWC are

high, since rationalizing many choices with small diﬀerences in certainty equivalents requires

the whole diﬀerence. At high targets fewer such choice pairs remain and the marginal RWC

decrease.

3.5 Relationship Between the Measures

We now turn to the relationship between the two welfare costs measures and the default

DoR (DDoR). We ask whether people with lower AWC also have lower RWC, and formally

test the previous observation that people with lower DDoR tend to have higher costs. The

motivation behind these questions is that it is intuitive to expect the positive relation between

AWC and RWC. It does not follow, however, directly from the method of their computation.

Only if preferences are held constant must higher AWC imply higher RWC, but there is

no such prediction when preferences are not constant, as is typically the case when making

comparisons across subjects. Likewise, even though it is natural to expect that people with

higher DDoR have lower costs we cannot formally expect this observation to hold a priori.

Figure 7(Panel A) shows a scatterplot of the RWC (y-axis) against the AWC (x-axis)

27

Figure 7: Relationship Between AWC, RWC, and DDoR

0.25

0.50

0.75

1.00

0 50 100 150 200

AWC (DKK)

RWC

Panel A.

0

50

100

150

200

0.5 0.6 0.7 0.8 0.9 1.0

DDoR

AWC (DKK)

Panel B.

0.25

0.50

0.75

1.00

0.5 0.6 0.7 0.8 0.9 1.0

DDoR

RWC

Panel C.

Note: The graph shows the scatterplots between three pairs of measures: AWC, RWC, and DDoR.

The welfare costs are evaluated at α= 0.95. The dots represent individual subjects. The dashed

lines are the smooth ﬁtted lines estimated using local polynomial regressions, and the shaded regions

are the estimated 95% conﬁdence intervals.

computed at α= 0.95.29 A clear positive association between the two measures can be

observed. The Kendall rank correlation between the two measures is 0.45 (the ranking of

subjects by the two measures is the same 73% of the time) and is highly signiﬁcant, with

p-value <0.001. The relation between RWC and AWC is non-linear, and has a concave

shape.

Figure 7(Panel B) shows the scatterplot of the AWC (y-axis) against the DDoR (x-axis),

and conﬁrms our earlier observation from the analysis of marginal costs. There is a moderate

negative association between the DDoR and the AWC. The Kendall rank correlation between

the two measures is −0.5 (the ranking of subjects by the two measures is the opposite 75%

of the time) and is highly signiﬁcant, with p-value <0.001. The relation between them is

again non-linear, and has a convex shape.

Figure 7(Panel C) shows the scatterplot of the RWC (y-axis) against the DDoR (x-axis).

We can immediately see a very tight negative association between the two measures. The

Kendall rank correlation between them is −0.85 (the ranking of subjects by the two measures

is the opposite 93% of the time) and is highly signiﬁcant, with p-value <0.001. The relation

is again slightly non-linear and has a concave shape.

29 The results in this section remain quantitatively similar if we use α= 0.9 or 0.99.

28

These observations allow us to formulate the following result.

Result 3. People with higher absolute welfare costs tend to have higher relative welfare costs.

People with higher default degree of rationalizability tend to have lower absolute welfare costs

and relative welfare costs.

This result implies that there is a certain degree of consistency between the measures we

introduce. Moreover, this consistency works in the way we expect. This is a nice property, but

it could not have been deduced from the method by which these measures are constructed.

If risk preferences were the same across subjects, higher AWC must have implied higher

RWC, but we cannot say much about the case when preferences and noise are diﬀerent

across subjects. It is possible, that a subject with high AWC has preferences such that the

diﬀerences in the certainty equivalents are even higher, and the RWC are actually low. We

do, in fact, observe such cases. But the general tendency is for the subjects to have the same

ordering, whether it is measured according to the absolute or relative measure of welfare

costs.

There is also a negative relationship between the DDoR and the welfare two costs mea-

sures. This implies that people who make more consistent choices, measured by the DDoR,

also require less welfare costs to rationalize their choices. This is an intuitive property, but

it is hard to see a priori why it should hold even though the data indicate that it does, with

the relation between the default DoR and the RWC being particularly strong. The relative

strength of this relationship, compared to the relationship with the AWC can be partially

attributed to the fact that both the default DoR and the RWC are relative measures deﬁned

on the unit interval. Nonetheless, such a strong relationship is remarkable, given that the

two measures address two very diﬀerent questions.

3.6 Welfare Costs and Noise

Our approach is in part motivated by the desire to attach an economic meaning to the noise

parameter. It is, therefore, of interest to look at the relationship between the two welfare

29

cost measures we introduced and noise, as well as the relationship between the DDoR and

noise. Higher noise does translate into higher welfare costs if preferences are kept constant,

but no prediction is available for comparisons between subjects, whose preferences are not

kept constant. It is natural to expect, however, that this property should also hold between

subjects. Given the negative association between the DDoR and the costs, it is also natural

to expect that higher noise translates into lower default DoR, but whether it does is an

empirical question.

Figure 8: Relationship Between Welfare Costs and Noise

0

50

100

150

200

-5 0 5 10 15

Log Noise

AWC (DKK)

Panel A.

0.3

0.6

0.9

-5 0 5 10 15

Log Noise

RWC

Panel B.

0.5

0.6

0.7

0.8

0.9

-5 0 5 10 15

Log Noise

DDoR

Panel C.

Note: The graph shows the scatterplots between three measures (AWC, RWC, DDoR) and (the

logarithm of) noise. The welfare costs are evaluated at α= 0.95. The dots represent individual

subjects. The dashed lines are the smooth ﬁtted lines estimated using local polynomial regressions,

and the shaded regions are the estimated 95% conﬁdence intervals. The graphs drop one subject

with log noise higher than 15.

Figure 8shows the scatterplots of (from left to right) the AWC and RWC and the DDoR

(on the y-axis) against the logarithm of noise (x-axis).30 The three panels conﬁrm our

hypotheses. We do see that higher noise is associated with higher AWC and RWC and lower

DDoR, although the strength of this association diﬀers across the measures. It is small,

though statistically signiﬁcant, for the AWC. The Kendall rank correlation between the two

measures is 0.13 with p-value = 0.003 (the ranking of subjects by the two measures is the

same 57% of the time). The weakness of the association can be seen by the substantial

variation in the AWC at the high values of noise, which means that there are many subjects

30 We truncate the logarithm noise at 15, in order to make the graph more readable. This excludes one

subject.

30

with high estimates of noise but low AWC. The association with the RWC is much stronger.

The Kendall rank correlation between the two measures is 0.48 with p-value <0.001 (the

ranking of subjects by the two measures is the same 74% of the time). Finally, the association

with the DDoR (in absolute terms) is slightly weaker than the association with the RWC, but

much stronger than the association with the AWC. The Kendall rank correlation between

the two measures is −0.41 with p-value <0.001 (the ranking of subjects by the two measures

is the opposite 70% of the time).

A notable feature in these results, most pronounced in the relationship between noise

and the DDoR and the RWC, is that there is an outer boundary that constrains the values.

On Panel B of Figure 8this boundary constrains the values of the RWC from above, and

on Panel C of Figure 8this boundary constrains the values of the DDoR from below. This

pattern suggests that for given noise the RWC (DDoR) cannot be higher (lower) than a

certain value, deﬁned by this boundary.

These ﬁndings lead us to the next result.

Result 4. People with higher noise tend to have higher absolute and relative welfare costs and

lower default degree of rationalizability. For any given value of noise there appears to exist a

maximum (minimum) amount of absolute and welfare costs (degree of rationalizability) that

one can have.

The ﬁrst part of this result conﬁrms our intuitive guesses. We do see some association

between noise and welfare costs, which implies that noise contains some information about

welfare costs and choice consistency, but this information is imprecise. Despite big diﬀerences

in noise estimates between some subjects, their RWC need not be that diﬀerent. Similarly,

some subjects might appear to have high welfare costs based on the noise measure, while in

fact their AWC are not nearly as large.

The second part of the result is unexpected and remarkable. It says that there is a

regularity in the relation between noise, welfare costs and default DoR. This regularity is

in the form of a boundary that constraints the possible values. The existence of such a

31

boundary is likely to be related to the estimation and computation procedures, however it

is not clear why it exists and what determines its shape. We leave this question for further

research.

3.7 Socio-Demographic Covariates of Welfare Costs

We have seen that the estimates of welfare costs vary substantially between subjects in our

sample. Here we attempt to attribute this variability to the observable socio-demographic

characteristics of the subjects. We focus on sex, age, education, work, income, housing,

family, and health characteristics. The demographic covariates are deﬁned as indicator

variables, relative to a base category. The base category is male, age 18–29, vocational

training, employed as a student, household income less than 300,000 DKK, living in an

apartment, owning apartment/house, living alone, no children, has not experienced death,

has not been hospitalized, and not smoking.

Figure 9provides descriptive regression results by plotting the estimates of regression

coeﬃcients along with 95% conﬁdence intervals (using robust standard errors). The model

on Panel A uses the logarithm of AWC, computed at α= 0.95, as the dependent variable,31

ln(AW C)i=constant +β Demographic controlsi+i,

and is estimated using OLS. The model on Panel B uses the RWC, computed at α= 0.95,

as the dependent variable. Since the RWC are deﬁned only on the unit interval, we use a

fractional regression model due to Papke and Wooldridge (1996) to estimate the coeﬃcients.

Several patterns emerge from Figure 9. Females tend to have higher AWC than males.

The RWC, however, are not signiﬁcantly diﬀerent between males and females. Welfare costs

tend to be higher for older subjects. The AWC tend to increase monotonically with the

age group, but the eﬀect is not precisely estimated. The RWC are higher for subjects older

31 Using alternative target DoR, 0.9 or 0.99, produces quantitatively similar results.

32

Figure 9: Regression Results

Smoker: Yes

Hospitalized: Yes

Experienced death: Yes

Children: Yes

123Civil status: Lives w/partner

Ownership: Rented

Ownership: Cooperative

Housetype: House

Income: Not reported

Income: over 800k DKK

Income: 500k–800k DKK

Income: 300k–500k DKK

Employment: Retired

Employment: Public servant

Employment: Worker

Education: College (≥5 yrs)

Education: College (3–4 yrs)

Education: College (<3 yrs)

Education: Low formal

Age: over 50

Age: 40–49

Age: 30–39

Female

-0.4 0.0 0.4

Coeﬃcient

Panel A. Log AWC

-1.0 -0.5 0.0 0.5 1.0

Coeﬃcient

Panel B. RWC

Note: The graph shows descriptive regression results for the AWC (Panel A, OLS) and RWC

(Panel B, fractional regression). Bars correspond to coeﬃcient estimates and error bars show 95%

conﬁdence intervals based on robust standard errors. Number of observations: 217.

33

than 30 years than for younger subjects, but there is no statistically signiﬁcant diﬀerence

between the three age groups above 30 years. College education has a beneﬁcial impact on

the welfare costs relative to vocational training. The eﬀect is most pronounced for the RWC

and subjects with 5 or more years of college. Interestingly, subjects who are employed as

public servants have signiﬁcantly lower AWC and RWC than subjects employed as students

or workers. Retired subjects tend to have lower AWC and RWC, on average, but the eﬀect

is not statistically diﬀerent from zero. The eﬀect of income is mixed and not precisely

estimated. Subjects with medium and medium-high levels of income tend to have higher

welfare costs, while subjects with very high levels of income tend to have lower welfare costs.

The type of housing a subject occupies and the type of ownership does not appear to have

a meaningful impact on welfare costs. Similarly, the eﬀect of parenting status is small and

not statistically signiﬁcant, except for the eﬀect having children on the RWC. Subjects show

some systematic variation by their health status with the eﬀects most pronounced for RWC.

For instance, smokers tend to have higher RWC than non-smokers.

These observations lead us to the following result.

Result 5. Having higher welfare costs is associated with higher age, lower education, and

particular employment status. The RWC are not signiﬁcantly diﬀerent for males and females,

although the AWC for females tend to be higher.

Overall, even the rich set of socio-demographic characteristics that we use does little to

explain the observed variance in welfare costs. The regression for the AWC, for instance,

is able to explain only 12% of the observed variation. After correcting for the number

of covariates included, the R2actually becomes negative. Such low explanatory power of

socio-demographic characteristics for elicited economic variables is typical in the literature

(l’Haridon et al.,2018;Noussair et al.,2014;Choi et al.,2014;von Gaudecker et al.,2011).

One explanation for low predictive power of socio-demographic characteristics is the sampling

error in the estimates on the left-hand side.32 On the other hand, part of the heterogeneity

32 Using weighted OLS in the AWC regression in which weights are proportional to the inverses of the

34

in the estimates of welfare costs that we observe might be truly idiosyncratic, which in our

view is not necessarily an undesirable property as suggested by some, such as l’Haridon et al.

(2018). If an elicited economic quantity (such as welfare costs, in our case) could be perfectly

decomposed into a linear combination of socio-demographic characteristics, this quantity

would have nothing to contribute to explaining variation in other behavioral outcomes.

4 Related literature

Our approach connects to a large theoretical literature on stochastic choice, which we brieﬂy

summarize. The early work on stochastic choice dates back to Fechner (1860) and Thurstone

(1927). It was subsequently developed into the Random Utility Model (RUM) by Marschak

(1960) and summarized by McFadden (2001). Luce (1959) introduced and axiomatized the

strong utility (or multinomial logit) model, as well as other models of stochastic choice.

McFadden (1976) established necessary and suﬃcient conditions under which a RUM is

equivalent to the multinomial logit model.

Wilcox (2011) extends the standard multinomial logit model by allowing for the noise

heterogeneity that is caused by the range of monetary stakes in a choice context. This exten-

sion allows one to preserve the deterministic notion of being more risk averse in a stochastic

setting. The stronger utility model developed by Blavatskyy (2014) also allows for noise

heterogeneity, but focuses on preserving the ﬁrst-order stochastic dominance relation in a

stochastic choice setting. Gul, Natenzon, and Pesendorfer (2014) modify the multinomial

logit model by considering the attributes of choice alternatives rather than alternatives them-

selves, to address some of the criticism of the original formulation. Apesteguia, Ballester,

and Lu (2017) characterize the RUM that satisﬁes a single-crossing property.

Conceptually, our measures are similar to the Critical Cost Eﬃciency Index (CCEI) of

Afriat (1972), which is used to evaluate the degree of consistency with the Generalized

Axiom of Revealed Preference (GARP). Just like our relative cost measure, CCEI is deﬁned

squared standard errors substantially improves ﬁt.

35

on the unit interval, and its complement shows what proportion of monetary value an agent

should be allowed to waste in order to rationalize her choices by some utility function. While

GARP provides qualitative statements, we put more structure on it in a ﬂexible manner to

complement it and provide quantitative evidence.

Viewing our approach as a structural extension of GARP allows us to position our ap-

proach again in a broader methodological setting. Ross (2014, ch. 4) carefully lays out

the full case for interpreting economic experimentation as an application of the intentional

stance of Dennett (1987), noted earlier. This is the methodology that Ross (2014) calls

“neo-Samuelsonian,” a label that tries to nudge economists toward seeing that the inten-

tional stance is what they have always been doing when they applied Revealed Preference

Theory to actual, ﬁnite, choice data. In other words: our approach is not novel, exotic

economic methodology. Instead we view it as just a sophisticated, structural interpretation

of the good old-time religion for economists.

The intuition behind the computation of our measures also links it to a literature on payoﬀ

dominance in experiments (Harrison,1994,1992;Harrison and Morgan,1990;Harrison,

1989). This literature shows that allowing for small deviations from optimal behavior, just

as we do, allows one to rationalize supposedly anomalous eﬀects observed in experimental

studies.

Harrison and Ng (2016,2018) use an approach similar to ours in order to evaluate the loss

of consumer surplus resulting from suboptimal insurance choices. Harrison and Ross (2018)

apply the same approach to evaluate suboptimal portfolio investments. Their measure of

lost consumer surplus is similar to our AWC measure, with both being based on computing

certainty equivalents. One key diﬀerence, however, is that these studies use two experimental

tasks: one for preference estimation and the other for welfare evaluation, while we rely on a

single task to estimate welfare costs resulting from stochastic choice. The approach that we

take in this study does not rely on an independent risk metric, as is the case in Harrison and

Ng (2016,2018) and Harrison and Ross (2018), but rather relies on a speciﬁc noise structure

36

to “bootstrap” a measure of welfare costs.

Our approach is related to studies that estimate structural models of choice under risk

and over time. Holt and Laury (2002) study subjects’ choices under risk in a laboratory

experiment. Subjects make choices between a “safe” and a “risky” lottery across diﬀerent

pairs of lotteries, in which the probabilities of lottery outcomes vary from one pair to the

next. HL estimate the Expected Utility model with a ﬂexible Expo-Power utility function

using the strict utility model of stochastic choice.33 Andersen, Harrison, Lau, and Rutstr¨om

(2008) also use the strict utility model to structurally estimate risk and time preferences

of a representative sample of the Danish population. They note that noise estimates are

higher in the risk task than in the discounting task. von Gaudecker et al. (2011) uses a

representative sample of the Dutch population to estimate subjects’ risk preferences using

a model of stochastic choice that is a hybrid between the multinomial logit and tremble

models, and features two measures of choice randomness due to noise: a Fechner parameter

that is common to all subjects, and a “tremble” parameter that is speciﬁc to each subject.34

While these studies typically focus on estimates of risk and time preferences, and do not

interpret the estimates of the stochastic part, we explicitly focus on the estimates of the

stochastic part and provide a systematic approach to economically interpret the estimates

of choice randomness. Finally, Bland (2018) considers mixture speciﬁcation over pooled

choices, contrasting one “rational” model as one of the data generating processes (DGP)

with a “behavioral” model as the other DGP. He then calculates CE of choices using the

deterministic core of the “rational” model DGP, thereby evaluating potential welfare losses

from using a “behavioral” DGP as well as the existence of noise for both DGP. We reject

the simplistic identiﬁcation of one model as “rational” and the explicit assumption that the

“behavioral” model is therefore “irrational.” But the general logic of allowing the estimated

33The strict utility model of Luce (1959) diﬀers from the multinomial logit model in the way the noise

parameter enters choice likelihoods.

34 Their model also allows for random coeﬃcients on core parameters for risk preferences. Although this

speciﬁcation allows for stochastic variation in those parameters, it is conventionally interpreted as reﬂecting

unobserved heterogeneity in these parameters across subjects, “unobservable” only in the sense that the

researcher cannot account for the variation with observable characteristics of the subject or task.

37

structural model of noise to provide a basis for welfare evaluation is consistent with our

approach.

We provide economic measures of choice randomness (or consistency), which link to

studies on the quality of decision-making. Choi et al. (2007) study decision-making under

risk in a laboratory experiment in which they present subjects with convex budget sets

for two Arrow securities. This design allows them to gauge the subjects’ decision-making

quality using a measure of GARP-consistency, a standard technique in the revealed preference

approach to consumer demand. They ﬁnd that subjects’ behavior is highly consistent with

GARP. Choi et al. (2014) expand the analysis by using a representative panel of the Dutch

population. They also ﬁnd a high degree of GARP-consistency in risky choices, which

varies, however, with education, sex, and age. Beatty and Crawford (2011) show that while

behavior in a wide range of situations is highly GARP-consistent, this might be a result of

a misspeciﬁed measure of consistency. They propose an alternative to the traditional CCEI

measure, which is based on predictive success, and show that the CCEI measures of GARP-

consistency are overinﬂated, and hence that the actual consistency of choices is much lower.

Hey (2001) studies decision-making quality in a laboratory experiment on choice under risk

and asks whether choice consistency improves with experience. He ﬁnds mixed evidence of

a positive eﬀect of experience on choice consistency. We rely on a parametric measure of

choice consistency and ﬁnd a lower degree of consistency than in the studies that use the

non-parametric revealed preference approach.

Finally, our approach is also related to recent literature on rational inattention. Matˆejka

and McKay (2015) show that when an agent faces information costs, optimal behavior is

stochastic choice, and that under certain conditions choice likelihoods are represented by

the multinomial logit speciﬁcation. Cheremukhin, Popova, and Tutino (2015) apply a model

of rational inattention to risky choices and estimate the shape of the cost-of-information

function in a laboratory experiment with student subjects. Caplin and Dean (2015) develop

a revealed preference test of rational inattention theories with general cost-of-information

38

functions. Since the noise parameter in the rational inattention models has the interpretation

of marginal information costs, our method allows one to convert these costs into monetary

or percentage terms.

5 Conclusion

Stochastic choice has become an active area of both theoretical and empirical research. While

the existing literature mainly focuses on the sources of choice randomness, its economic

consequences are less well understood. We develop tools to assess the economic signiﬁcance

of noise and apply them to a sample from the general Danish population in an artefactual

ﬁeld experiment.

We introduce three interconnected concepts: rationalizing imperfection, optimal region,

and degree of rationalizability. Fixing the degree of rationalizability at a certain target

level, we vary the amount of imperfection, which in turn aﬀects the optimal region, to

make the proportion of subjects’ choices falling in the optimal region equal the target level.

This amount of imperfection represents the welfare costs, or monetary welfare allowed to be

wasted, that is required to rationalize by a model a given proportion of choices. The resulting

welfare costs can be expressed both in absolute (dollar) and in relative (to the actual stakes

of the choice environment) terms.

We compute the absolute welfare costs and relative welfare costs at the individual level

in an experiment with binary-choice lotteries. Several patterns emerge from our analysis,

some of which coincide with previous ﬁndings, and some of which are new. We ﬁnd that

the AWC are not economically signiﬁcant in our sample, while the RWC are economically

signiﬁcant. In other words, the welfare costs are tiny if viewed from a broad perspective of

economic activity, but they are substantial if viewed from the perspective of this particular

choice experiment. As compared to Choi et al. (2014), who employ a relative measure based

on the consistency with GARP, our estimates of RWC are much larger. We attribute the

39

diﬀerence in results to the diﬀerence in the methods, with our method imposing stricter

requirements.

Since our welfare costs measures depend on the target level of rationalizability α, we study

the shape of the relation between αand these welfare costs. We ﬁnd that the AWC increase in

αat an increasing speed, while the RWC increase in αat a decreasing speed. The diﬀerence in

these two relations is explained by the way our method of computation works. Subjects with

higher AWC tend to have higher RWC. Also, a lower DDoR is associated with higher AWC

and RWC: subjects who start out with low default degree of rationalizability require a higher

cost to reach a given degree of rationalizability. Looking at the relationship between our

cost measures and raw estimates of noise reveals that they are positively associated, though

our measures do not have such a wide range, which allows for sensible comparisons across

subjects and allows us to make judgments about the magnitudes of choice inconsistencies.

The analysis of observable heterogeneity and its role in predicting welfare costs suggests

patterns similar to those reported by Choi et al. (2014). We ﬁnd that welfare costs increase

with age, decline with education, and are lower for certain occupations.

Finally, we take seriously the need for consistent methodological and philosophical po-

sitions when it comes to undertaking behavioral welfare economics. The reason is simple:

one cannot question the consistency of observed choices by agents on the one hand and then

turn around and eﬀortlessly infer the preferences of those agents on the other hand. This

isolates the deep normative challenge raised by the core descriptive insight of behavioral

economics, as stressed by Ross (2014, ch. 4), Infante, Lecouteux, and Sugden (2016), and

Harrison and Ross (2018,§5). Dennett (1987)’s intentional stance, as applied to economics

by Ross (2014)’s “neo-Samuelsonian” methodology, provides a general and consistent ap-

proach to address this challenge, and permits concrete applications illustrated by Harrison

and Ng (2016,2018), Harrison and Ross (2018) and the present study.

40

References

Afriat SN (1972). “Eﬃciency Estimation of Production Function.” International Economic

Review,13(3), 568–98.

Agranov M, Ortoleva P (2017). “Stochastic Choice and Preferences for Randomization.”

Journal of Political Economy,125(1), 40–68.

Andersen S, Harrison GW, Lau MI, Rutstr¨om EE (2008). “Eliciting Risk and Time Prefer-

ences.” Econometrica,76(3), 583–618.

Apesteguia J, Ballester MA, Lu J (2017). “Single-Crossing Random Utility Models.” Econo-

metrica,85(2), 661–674.

Ballinger TP, Wilcox NT (1997). “Decisions, Error and Heterogeneity.” Economic Journal,

107(443), 1090–1105.

Beatty TKM, Crawford IA (2011). “How Demanding Is the Revealed Preference Approach

to Demand?” American Economic Review,101(6), 2782–95.

Bland JR (2018). “The Cost of Being Behavioral in Risky Choice Experiments.” Working

paper, The University of Toledo, Department of Economics.

Blavatskyy PR (2014). “Stronger Utility.” Theory and Decision,76(2), 265–286.

Blow L, Browning M, Crawford I (2008). “Revealed Preference Analysis of Characteristics

Models.” Review of Economic Studies,75(2), 371–389.

Bone J, Hey J, Suckling J (1999). “Are Groups More (or Less) Consistent Than Individuals?”

Journal of Risk and Uncertainty,18(1), 63–81.

Bruhin A, Fehr-Duda H, Epper T (2010). “Risk and Rationality: Uncovering Heterogeneity

in Probability Distortion.” Econometrica,78(4), 1375–1412.

Camerer CF (1989). “An Experimental Test of Several Generalized Utility Theories.” Journal

of Risk and Uncertainty,2(1), 61–104.

Caplin A, Dean M (2015). “Revealed Preference, Rational Inattention, and Costly Informa-

tion Acquisition.” American Economic Review,105(7), 2183–2203.

Carbone E, Hey JD (2000). “Which Error Story is Best?” Journal of Risk and Uncertainty,

20(2), 161–176.

Cheremukhin A, Popova A, Tutino A (2015). “A Theory of Discrete Choice with Information

Costs.” Journal of Economic Behavior & Organization,113, 34–50.

Choi S, Fisman R, Gale D, Kariv S (2007). “Consistency and Heterogeneity of Individual

Behavior under Uncertainty.” American Economic Review,97(5), 1921–1938.

41

Choi S, Kariv S, M¨uller W, Silverman D (2014). “Who Is (More) Rational?” American

Economic Review,104(6), 1518–1550.

Cox JC, Sadiraj V (2008). “Risky Decisions in the Large and in the Small: Theory and

Experiment.” In JC Cox, GW Harrison (eds.), Risk Aversion in Experiments (Bingley,

UK: Emerald, Research in Experimental Economics, Volume 12, 2008).

Dennett DC (1987). The Intentional Stance. Cambridge, MA: MIT Press.

Dennett DC (1991). “Real Patterns.” The Journal of Philosophy,88(1), 27–51.

Fechner GT (1860). Elements of Psychophysics. Amsterdam: Bonset.

Gneezy U, Potters J (1997). “An Experiment on Risk Taking and Evaluation Periods.” The

Quarterly Journal of Economics,112(2), 631–645.

Gul F, Natenzon P, Pesendorfer W (2014). “Random Choice as Behavioral Optimization.”

Econometrica,82(5), 1873–1912.

Gul F, Pesendorfer W (2006). “Random Expected Utility.” Econometrica,74(1), 121–146.

Halevy Y, Persitz D, Zrill L (2018). “Parametric Recoverability of Preferences.” Journal of

Political Economy,126(4), 1558–1593.

Harless DW, Camerer CF (1994). “The Predictive Utility of Generalized Expected Utility

Theories.” Econometrica,62(6), 1251–1289.

Harrison GW (1989). “Theory and Misbehavior of First-Price Auctions.” American Eco-

nomic Review,79(4), 749–762.

Harrison GW (1992). “Theory and Misbehavior of First-Price Auctions: Reply.” American

Economic Review,82(5), 1426–1443.

Harrison GW (1994). “Expected Utility Theory and the Experimentalists.” Empirical Eco-

nomics,19, 223–253.

Harrison GW, Jessen LJ, Lau M, Ross D (2018). “Disordered Gambling Prevalence: Method-

ological Innovations in a General Danish Population Survey.” Journal of Gambling Studies,

34(1), 225–253.

Harrison GW, List JA (2004). “Field Experiments.” Journal of Economic Literature,42(4),

1009–1055.

Harrison GW, Morgan P (1990). “Search Intensity in Experiments.” Economic Journal,

100(401), 478–486.

Harrison GW, Ng JM (2016). “Evaluating the Expected Welfare Gain from Insurance.”

Journal of Risk and Insurance,83(1), 91–120.

Harrison GW, Ng JM (2018). “Welfare eﬀects of insurance contract non-performance.” The

Geneva Risk and Insurance Review,43(1), 39–76.

42

Harrison GW, Ross D (2018). “Varieties of Paternalism and the Heterogeneity of Utility

Structures.” Journal of Economic Methodology,25(1), 42–67.

Hey JD (2001). “Does Repetition Improve Consistency?” Experimental Economics,4(1),

5–54.

Hey JD, Orme C (1994). “Investigating Generalizations of Expected Utility Theory Using

Experimental Data.” Econometrica, pp. 1291–1326.

Holt CA, Laury SK (2002). “Risk Aversion and Incentive Eﬀects.” American Economic

Review,92(5), 1644–1655.

Infante G, Lecouteux G, Sugden R (2016). “Preference Puriﬁcation and the Inner Ratio-

nal Agent: A Critique of the Conventional Wisdom of Behavioural Welfare Economics.”

Journal of Economic Methodology,23(1), 1–25.

l’Haridon O, Vieider FM, Aycinena D, Bandur A, Belianin A, Cingl L, Kothiyal A, Martins-

son P (2018). “Oﬀ the Charts: Massive Unexplained Heterogeneity in a Global Study of

Ambiguity Attitudes.” The Review of Economics and Statistics,100(4), 664–677.

Loomes G, Sugden R (1995). “Incorporating a Stochastic Element into Decision Theories.”

European Economic Review,39(3), 641–648.

Loomes G, Sugden R (1998). “Testing Diﬀerent Stochastic Speciﬁcations of Risky Choice.”

Economica,65(260), 581–598.

Luce DR (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley, New York.

Manzini P, Mariotti M (2014). “Welfare Economics and Bounded Rationality: The Case for

Model-Based Approaches.” Journal of Economic Methodology,21(4), 343–360.

Marschak J (1960). “Binary-Choice Constraints and Random Utility Indicators.” In K Arrow

(ed.), Stanford Symposium on Mathematical Methods in the Social Sciences (Stanford, CA:

Stanford University Press).

Matˆejka F, McKay A (2015). “Rational Inattention to Discrete Choices: A New Foundation

for the Multinomial Logit Model.” American Economic Review,105(1), 272–98.

McFadden DL (1976). “Quantal Choice Analaysis: A Survey.” In SV Berg (ed.), Annals of

Economic and Social Measurement (pp. 363–390). NBER.

McFadden DL (2001). “Economic Choices.” American Economic Review,91(3), 351–378.

McKelvey RD, Palfrey TR (1995). “Quantal Response Equilibria for Normal Form Games.”

Games and Economic Behavior,10(1), 6–38.

Monroe BA (2017). Stochastic Models in Experimental Economics. Ph.D. thesis, University

of Cape Town.

43

Nogee P, Mosteller F (1951). “An Experimental Measure of Utility.” Journal of Political

Economy,59, 371–404.

Noussair CN, Trautmann ST, van de Kuilen G (2014). “Higher Order Risk Attitudes,

Demographics, and Financial Decisions.” The Review of Economic Studies,81(1), 325–

355.

Papke LE, Wooldridge JM (1996). “Econometric Methods for Fractional Response Variables

with an Application to 401 (K) Plan Participation Rates.” Journal of Applied Economet-

rics,11(6), 619–32.

Prelec D (1998). “The Probability Weighting Function.” Econometrica,66(3), 497–528.

Quiggin J (1982). “A Theory of Anticipated Utility.” Journal of Economic Behavior &

Organization,3(4), 323–343.

Ross D (2014). Philosophy of Economics. London: Palgrave Macmillan.

Starmer C, Sugden R (1989). “Probability and Juxtaposition Eﬀects: An Experimental

Investigation of the Common Ratio Eﬀect.” Journal of Risk and Uncertainty,2(2), 159–

78.

Swait J, Marley AAJ (2013). “Probabilistic Choice (Models) as a Result of Balancing Mul-

tiple Goals.” Journal of Mathematical Psychology,57(1–2), 1–14.

Thurstone LL (1927). “A Law Of Comparative Judgment.” Psychological Review,34(4),

266–270.

Tversky A (1969). “Intransitivity of Preferences.” Psychological Review,76(1), 31.

von Gaudecker HM, van Soest A, Wengstrom E (2011). “Heterogeneity in Risky Choice

Behavior in a Broad Population.” American Economic Review,101(2), 664–94.

Wakker P, Erev I, Weber EU (1994). “Comonotonic Independence: The Critical Test Be-

tween Classical and Rank-Dependent Utility Theories.” Journal of Risk and Uncertainty,

9(3), 195–230.

Wallin A, Swait J, Marley AAJ (2018). “Not Just Noise: A Goal Pursuit Interpretation of

Stochastic Choice.” Decision,5(4), 253–271.

Wilcox NT (2008). “Stochastic Models for Binary Discrete Choice Under Risk: A Critical

Primer and Econometric Comparison.” In JC Cox, GW Harrison (eds.), Risk Aversion

in Experiments (Bingley, UK: Emerald, Research in Experimental Economics, Volume 12,

2008).

Wilcox NT (2011). “Stochastically More Risk Averse: A Contextual Theory of Stochastic

Discrete Choice Under Risk.” Journal of Econometrics,162(1), 89–104.

Wilcox NT (2015). “Error and Generalization in Discrete Choice Under Risk.” Working

paper, Chapman University.

44

Appendices

A Robustness Checks

Here we present additional results derived from alternative assumptions about risk prefer-

ences and stochastic choice.

First, we consider an alternative to the EUT, the Rank-Dependent Utility (RDU) model

due to Quiggin (1982), which allows for probability weighting. The RDU model has been

used extensively in applied and theoretical work. Under this alternative assumption the

aggregate utilities of the lotteries are computed as

U(a;γu, γq) =

=

k

X

i=1 ωq(1)(a) + . . . +q(i)(a); γq−ωq(1)(a) + . . . +q(i−1)(a); γq×

×ux(i)(a); γu,

where ω:[0,1] 7→ [0,1] is the probability-weighting function, and outcomes are ranked

from highest x(1) to lowest x(k), with corresponding probabilities. We assume that ωis the

two-parameter (Prelec,1998) probability weighting function,35

ω(q;γ1

q, γ2

q) = exp(−γ2

q(−ln q)γ1

q).

Figure A.1 shows the calculated absolute and relative welfare costs under the assumption

of the RDU model for each individual. Figure A.1 shows that the distributions look very

similar to those under EUT, Figure 4.

Taking a closer look at the diﬀerences between the EUT and RDU-based calculations,

we can see from Figure A.2a that the AWC calculated using the EUT model are lower. For

35We do not restrict the shape parameter γ1

qto the unit interval, and thus do not impose an inverse-S

shape on the probability weighting function.

A.1

0.000

0.002

0.004

0.006

0 200 400

Absolute welfare costs

density

α

0.9

0.95

0.99

(a)

0

5

10

0.00 0.25 0.50 0.75 1.00

Relative welfare costs

density

α

0.9

0.95

0.99

(b)

Figure A.1: Absolute and Relative Welfare Costs for Three Levels of α, RDU.

α= 0.9 the diﬀerence in the medians between the AWC calculated using EUT vs. RDU is

−59.04 (Wilcoxon signed rank test, p-value <0.001). The mean of the diﬀerences is −84.89

DKK (approximately −13 USD): RDU-based AWC are almost 3 times higher on average.

The RWC, however, are slightly higher under EUT, as shown in Figure A.2b. The

diﬀerence in the medians between the RWC calculated using EUT vs. RDU is 0.02 (Wilcoxon

signed rank test, p-value = 0.02). The mean of the diﬀerences is 0.03. The diﬀerence in the

RWC for RDU and EUT disappears at higher values α, while the diﬀerence in the AWC

persists. All the other qualitative results on marginal welfare costs, relations between the

measures, and observable heterogeneity hold under the RDU assumption.

0.000

0.005

0.010

0.015

0.020

0 200 400

Absolute welfare costs

density

model

EUT

RDU

(a)

0

1

2

3

4

0.00 0.25 0.50 0.75 1.00

Relative welfare costs

density

model

EUT

RDU

(b)

Figure A.2: Absolute and Relative Welfare Costs for EUT vs. RDU, α= 0.9.

A.2

Second, we consider a diﬀerent speciﬁcation for the utility function under EUT, an expo-

power (EP) utility, which generalizes the CRRA and CARA utility functions

u(x;γa, γr) = 1−exp(−γax1−γr)

γa

,

where γaand γrare the two parameters to be estimated. This speciﬁcation does not do so

well in modeling subjects’ risk preferences in our data. For a large (40%) fraction of subjects

the estimation procedure yields unreasonably high parameter values, which impedes the

calculation of certainty equivalents and welfare costs. We use CRRA speciﬁcation for these

subjects when presenting the results on Figure A.3. They look very similar to the baseline

speciﬁcation with the CRRA utility function.

0.000

0.005

0.010

0 100 200 300

Absolute welfare costs

density

α

0.9

0.95

0.99

(a)

0

5

10

15

20

25

0.00 0.25 0.50 0.75 1.00

Relative welfare costs

density

α

0.9

0.95

0.99

(b)

Figure A.3: Absolute and Relative Welfare Costs for 3 Levels of α, EP.

Looking at the diﬀerences between the AWC calculated under the two utility speciﬁca-

tions, our baseline speciﬁcation again provides lower values (see Figure A.4a). The diﬀerence

in the medians between the AWC calculated using CRRA vs. EP is −16.17 (Wilcoxon signed

rank test, p-value <0.001), for α= 0.9. The mean of the diﬀerences is −24.7 DKK (approx-

imately −4 USD). The AWC under the EP utility function are roughly 60% higher than in

the baseline, which is even higher than in the case of the RDU model as an alternative.

At the same time there are no signiﬁcant diﬀerences in the RWC between the two utility

speciﬁcations (Wilcoxon signed rank test, p-value = 0.52). The same pattern of results hold

A.3

for other values of α. Under the EP-utility assumption the marginal welfare costs have a

similar shape, but the association between the measures becomes weaker, as do the eﬀects

of observable heterogeneity.

0.000

0.005

0.010

0.015

0.020

0 100 200

Absolute welfare costs

density

model

CRRA

EP

(a)

0

1

2

3

4

5

0.00 0.25 0.50 0.75

Relative welfare costs

density

model

CRRA

EP

(b)

Figure A.4: Absolute and Relative Welfare Costs for CRRA vs. EP, α= 0.9.

Finally, we look at an alternative stochastic choice speciﬁcation, the contextual utility

model due to Wilcox (2011), which allows for a heterogeneous noise term and preserves the

“more risk averse” relation in the stochastic domain. This speciﬁcation of noise has been

shown by (Wilcox,2015) to have good out-of-sample predictive power. Under the assumption

of contextual utility the choice probabilities become

p(a2;γ, µ) = Λ U(a2;γ)−U(a1;γ)

µu(x(1);γ)−u(x(k);γ)!,

where we drop the index for the decision round, and p(a1;γ, µ) = 1 −p(a2;γ, µ). As before,

x(1) and x(k)denote the highest and lowest outcomes, but this time they are deﬁned only

among the outcomes that occur with positive probabilities, and outcomes are ranked across

both lotteries in the choice.

Figure A.5 shows the calculated AWC and RWC under the assumption of contextual

utility. These graphs, again, look very similar to those under EUT and no contextual utility

(Figure 4), except that the right tails in the distributions of the AWC become thicker.

A.4

0.000

0.005

0.010

0.015

0.020

0 100 200 300 400 500

Absolute welfare costs

density

α

0.9

0.95

0.99

(a)

0

10

20

30

40

0.00 0.25 0.50 0.75 1.00

Relative welfare costs

density

α

0.9

0.95

0.99

(b)

Figure A.5: Absolute and Relative Welfare Costs for Three Levels of α, Contextual Utility.

Figure A.6 contrasts the AWC and RWC for the baseline and alternative speciﬁcations

of noise. The densities of the AWC are very much alike, except for a thicker right tail in

the case of contextual utility, which leads to higher welfare costs. The diﬀerence in the

medians between the AWC calculated using non-contextual vs. contextual models is −1.87

(Wilcoxon signed rank test, p-value <0.001). The mean of the diﬀerences is −16.33 DKK

(approximately −2 USD). This result is comparable to the non-contextual noise speciﬁcation

with RDU as an alternative. Again, there is no signiﬁcant diﬀerence between the RWC for

these two models (Wilcoxon signed rank test, p-value ≈0.47). All the results reported for

the baseline model hold in the case of contextual utility model as well.

0.000

0.005

0.010

0.015

0.020

0 100 200 300 400 500

Absolute welfare costs

density

model

Contextual

Non−Contextual

(a)

0

1

2

3

4

0.00 0.25 0.50 0.75 1.00

Relative welfare costs

density

model

Contextual

Non−Contextual

(b)

Figure A.6: Absolute and Relative Welfare Costs for Non-contextual vs. Contextual Utility,

α= 0.9.

A.5

B Proofs

Consider an implicit function ρ(µ, ε) = α. From the implicit function theorem, it follows

that

dε

dµ =−∂ρ/∂µ

∂ρ/∂ ε .

The denominator of this expression is

∂ρ

∂ε =∂

∂ε Za∗

h(ε)

a∗

l(ε)

p(a)da =p(a∗

h(ε))a∗0

h(ε)−p(a∗

l(ε))a∗0

l(ε)>0,

since a∗0

h(ε)>0, and a∗0

l(ε)60.

In order to show the sign of the numerator, we restrict our attention to the binary choice

case, since it is the setting of our primary interest. Recall that

p(a2;γ, µ) = Λ U(a2;γ)−U(a1;γ)

µ.

Then

∂p(a2;γ , µ)

∂µ = Λ0U(a2;γ)−U(a1;γ)

µ(U(a2;γ)−U(a1;γ))(−µ2)<0,

since alternative a2gives the highest certainty equivalent by our assumption. Therefore,

∂ρ

∂µ =

∂p(a2;γ ,µ)

∂µ , ε < ∆m,

0, ε >∆m,

so that ∂ρ/∂µ 60. Together the two results imply that dε/dµ >0.

A.6

C Additional Tables

Table C.1: The Battery of Lotteries

ID La1 Lp1 La2 Lp2 La3 Lp3 La4 Lp4 Ra1 Rp1 Ra2 Rp2 Ra3 Rp3 Ra4 Rp4

1 450 0.50 1,350 0 2,250 0.50 0 0 450 0.10 1,350 0.80 2,250 0.10 0 0

2 450 0.50 1,350 0 2,250 0.50 0 0 450 0 1,350 1 2,250 0 0 0

3 450 0.10 1,350 0.80 2,250 0.10 0 0 450 0 1,350 1 2,250 0 0 0

4 450 0.70 1,350 0 2,250 0.30 0 0 450 0.50 1,350 0.40 2,250 0.10 0 0

5 450 0.70 1,350 0 2,250 0.30 0 0 450 0.40 1,350 0.60 2,250 0 0 0

6 450 0.50 1,350 0.40 2,250 0.10 0 0 450 0.40 1,350 0.60 2,250 0 0 0

7 450 0.40 1,350 0 2,250 0.60 0 0 450 0.10 1,350 0.75 2,250 0.15 0 0

8 450 0.40 1,350 0 2,250 0.60 0 0 450 0 1,350 1 2,250 0 0 0

9 450 0.30 1,350 0 2,250 0.70 0 0 450 0.15 1,350 0.25 2,250 0.60 0 0

10 450 0.10 1,350 0.75 2,250 0.15 0 0 450 0 1,350 1 2,250 0 0 0

11 450 0.70 1,350 0 2,250 0.30 0 0 450 0.60 1,350 0.25 2,250 0.15 0 0

12 450 0.70 1,350 0 2,250 0.30 0 0 450 0.50 1,350 0.50 2,250 0 0 0

13 450 0.60 1,350 0.25 2,250 0.15 0 0 450 0.50 1,350 0.50 2,250 0 0 0

14 450 0.40 1,350 0 2,250 0.60 0 0 450 0.20 1,350 0.60 2,250 0.20 0 0

15 450 0.40 1,350 0 2,250 0.60 0 0 450 0.10 1,350 0.90 2,250 0 0 0

16 450 0.20 1,350 0.60 2,250 0.20 0 0 450 0.10 1,350 0.90 2,250 0 0 0

17 450 0.60 1,350 0 2,250 0.40 0 0 450 0.50 1,350 0.30 2,250 0.20 0 0

18 450 0.30 1,350 0 2,250 0.70 0 0 450 0 1,350 0.50 2,250 0.50 0 0

19 450 0.60 1,350 0 2,250 0.40 0 0 450 0.40 1,350 0.60 2,250 0 0 0

20 450 0.50 1,350 0.30 2,250 0.20 0 0 450 0.40 1,350 0.60 2,250 0 0 0

21 450 0.25 1,350 0 2,250 0.75 0 0 450 0.10 1,350 0.60 2,250 0.30 0 0

22 450 0.25 1,350 0 2,250 0.75 0 0 450 0 1,350 1 2,250 0 0 0

23 450 0.10 1,350 0.60 2,250 0.30 0 0 450 0 1,350 1 2,250 0 0 0

24 450 0.50 1,350 0.20 2,250 0.30 0 0 450 0.40 1,350 0.60 2,250 0 0 0

25 450 0.55 1,350 0 2,250 0.45 0 0 450 0.40 1,350 0.60 2,250 0 0 0

26 450 0.55 1,350 0 2,250 0.45 0 0 450 0.50 1,350 0.20 2,250 0.30 0 0

27 450 0.15 1,350 0.25 2,250 0.60 0 0 450 0 1,350 0.50 2,250 0.50 0 0

28 450 0.15 1,350 0.75 2,250 0.10 0 0 450 0 1,350 1 2,250 0 0 0

29 450 0.60 1,350 0 2,250 0.40 0 0 450 0 1,350 1 2,250 0 0 0

30 450 0.60 1,350 0 2,250 0.40 0 0 450 0.15 1,350 0.75 2,250 0.10 0 0

31 135 0.55 1,620 0.25 1,890 0.20 0 0 135 0.55 1,215 0.25 2,430 0.20 0 0

32 810 0.40 675 0.40 1,620 0.20 0 0 810 0.40 405 0.40 2,025 0.20 0 0

33 1,485 0.40 675 0.40 1,620 0.20 0 0 1,485 0.40 405 0.40 2,025 0.20 0 0

34 2,160 0.40 675 0.40 1,620 0.20 0 0 2,160 0.40 405 0.40 2,025 0.20 0 0

35 675 0.70 1,485 0.10 2,835 0.20 0 0 675 0.70 945 0.10 3,375 0.20 0 0

36 1,620 0.70 1,485 0.10 2,835 0.20 0 0 1,620 0.70 945 0.10 3,375 0.20 0 0

37 2,565 0.70 1,485 0.10 2,835 0.20 0 0 2,565 0.70 945 0.10 3,375 0.20 0 0

38 3,510 0.70 1,485 0.10 2,835 0.20 0 0 3,510 0.70 945 0.10 3,375 0.20 0 0

39 0 0.50 540 0.10 540 0.40 0 0 0 0.50 0 0.10 810 0.40 0 0

40 540 0.50 540 0.10 540 0.40 0 0 540 0.50 0 0.10 810 0.40 0 0

41 1,080 0.50 540 0.10 540 0.40 0 0 1,080 0.50 0 0.10 810 0.40 0 0

42 945 0.55 1,620 0.25 1,890 0.20 0 0 945 0.55 1,215 0.25 2,430 0.20 0 0

43 1,620 0.50 540 0.10 540 0.40 0 0 1,620 0.50 0 0.10 810 0.40 0 0

44 540 0.50 1,080 0.10 1,080 0.40 0 0 540 0.50 540 0.10 1,350 0.40 0 0

45 1,080 0.50 1,080 0.10 1,080 0.40 0 0 1,080 0.50 540 0.10 1,350 0.40 0 0

46 1,620 0.50 1,080 0.10 1,080 0.40 0 0 1,620 0.50 540 0.10 1,350 0.40 0 0

47 2,160 0.50 1,080 0.10 1,080 0.40 0 0 2,160 0.50 540 0.10 1,350 0.40 0 0

48 1,755 0.55 1,620 0.25 1,890 0.20 0 0 1,755 0.55 1,215 0.25 2,430 0.20 0 0

49 2,565 0.55 1,620 0.25 1,890 0.20 0 0 2,565 0.55 1,215 0.25 2,430 0.20 0 0

50 135 0.65 945 0.20 1,485 0.15 0 0 135 0.65 810 0.20 1,620 0.15 0 0

51 675 0.65 945 0.20 1,485 0.15 0 0 675 0.65 810 0.20 1,620 0.15 0 0

52 1,215 0.65 945 0.20 1,485 0.15 0 0 1,215 0.65 810 0.20 1,620 0.15 0 0

53 1,755 0.65 945 0.20 1,485 0.15 0 0 1,755 0.65 810 0.20 1,620 0.15 0 0

54 135 0.40 675 0.40 1,620 0.20 0 0 135 0.40 405 0.40 2,025 0.20 0 0

55 0 0 0 0 0 0 1,200 1 0 0 0 0 975 0.50 1,440 0.50

56 0 0 0 0 0 0 1,275 1 0 0 0 0 1,155 0.50 1,410 0.50

57 0 0 0 0 0 0 450 1 0 0 0 0 225 0.50 690 0.50

58 0 0 0 0 0 0 1,950 1 0 0 0 0 1,725 0.50 2,190 0.50

59 0 0 0 0 0 0 2,025 1 0 0 0 0 1,905 0.50 2,160 0.50

60 0 0 0 0 0 0 225 1 0 0 0 0 105 0.50 360 0.50

Notes. The columns are coded as follows: “L” and “R” denote left and right lottery, “a” denotes amounts (in DKK) and “p”

denotes probabilities. The amounts in the table are baseline amounts. In addition to these amounts, 1.5x and 2x amounts

were used. The sub jects were randomized across the baseline, 1.5x and 2x amounts.

A.7

D Additional Graphs

Figure D.7: Distributions of AWC and RWC in the Subset of Subjects

α= 0.99

α= 0.95

α= 0.9

0 50 100 150 200 250

0.000

0.005

0.010

0.015

0.020

0.000

0.005

0.010

0.015

0.020

0.000

0.005

0.010

0.015

0.020

AWC (DKK)

Density

Panel A.

α= 0.99

α= 0.95

α= 0.9

0.4 0.6 0.8 1.0

0

10

20

30

0

10

20

30

0

10

20

30

RWC

Density

Panel B.

Note: The graph shows the distributions of the individual-level estimates of AWC (Panel A) and

RWC (Panel B) for three target levels of α: 0.9, 0.95, and 0.99. The sample is restricted to

include only the subjects for whom the estimation procedure successfully converged. The bars

are the histograms and the smooth lines are the kernel density estimates. The dashed lines show

the medians of the distributions. The AWC numbers are in DKK. For the RWC, we truncate the

support at 0.4 to improve readability of the graph. This results in dropping 8 observations for

which the RWC are below 0.4.

A.8