ArticlePDF Available

Nonhuman Primates Satisfy Utility Maximization in Compliance with the Continuity Axiom of Expected Utility Theory

Authors:

Abstract

Expected Utility Theory (EUT), the first axiomatic theory of risky choice, describes choices as a utility maximization process: decision makers assign a subjective value (utility) to each choice option and choose the one with the highest utility. The continuity axiom, central to EUT and its modifications, is a necessary and sufficient condition for the definition of numerical utilities. The axiom requires decision makers to be indifferent between a gamble and a specific probabilistic combination of a more preferred and a less preferred gamble. While previous studies demonstrated that monkeys choose according to combinations of objective reward magnitude and probability, a concept-driven experimental approach for assessing the axiomatically defined conditions for maximizing subjective utility by animals is missing. We experimentally tested the continuity axiom for a broad class of gamble types in four male rhesus macaque monkeys, showing that their choice behavior complied with the existence of a numerical utility measure as defined by the economic theory. We used the numerical quantity specified in the continuity axiom to characterize subjective preferences in a magnitude-probability space. This mapping highlighted a trade-off relation between reward magnitudes and probabilities, compatible with the existence of a utility function underlying subjective value computation. These results support the existence of a numerical utility function able to describe choices, allowing for the investigation of the neuronal substrates responsible for coding such rigorously defined quantity.SIGNIFICANCE STATEMENTA common assumption of several economic choice theories is that decisions result from the comparison of subjectively assigned values (utilities). This study demonstrated the compliance of monkey behavior with the continuity axiom of Expected Utility Theory, implying a subjective magnitude-probability trade-off relation which supports the existence of numerical subjective utility directly linked to the theoretical economic framework. We determined a numerical utility measure able to describe choices, which can serve as a correlate for the neuronal activity in the quest for brain structures and mechanisms guiding decisions.
A preview of the PDF is not available
... Each person has a particular identity that corresponds to their behavior and serves as a paradigm. Identity utility refers to the change in utility that results from the adaptation of individual behavior to identity norms, and utility maximization is a general and fundamental process that determines the subject's survival (Ferrari-Toniolo et al., 2021). In light of this, we proposed to measure identity salience from the perspective of utility, suggesting that the more utility a role brings to an individual, the higher its salience. ...
Conference Paper
Full-text available
Knowledge sharing is crucial to the operation and sustainability of virtual communities. Against this background, this study aims to investigate whether and how users' virtual role identification influences their knowledge sharing behavior. Theoretical insights from structural symbolic interactionism and identity economics are synthesized and used as a basis for proposing the mechanism by which virtual role identification influences knowledge sharing behavior. We collected data to test the research model from 250 community users via an online survey. The results suggest that virtual role identification can facilitate users' knowledge sharing behavior by increasing role utility and perceived role expectations. The theoretical contributions and practical implications of this study are also discussed.
... Most previous studies found that monkeys have convex (13,14) or concave (12,45,46,50) utility over rewards in the gain domain. The monkeys and humans in our study exhibited the same concave utility shape (Fig. 2B), with the utility function of humans more concave than the utility function of the two monkeys. ...
Article
Full-text available
Research in the multidisciplinary field of neuroeconomics has mainly been driven by two influential theories regarding human economic choice: prospect theory, which describes decision-making under risk, and reinforcement learning theory, which describes learning for decision-making. We hypothesized that these two distinct theories guide decision-making in a comprehensive manner. Here, we propose and test a decision-making theory under uncertainty that combines these highly influential theories. Collecting many gambling decisions from laboratory monkeys allowed for reliable testing of our model and revealed a systematic violation of prospect theory's assumption that probability weighting is static. Using the same experimental paradigm in humans, substantial similarities between these species were uncovered by various econometric analyses of our dynamic prospect theory model, which incorporates decision-by-decision learning dynamics of prediction errors into static prospect theory. Our model provides a unified theoretical framework for exploring a neurobiological model of economic choice in human and nonhuman primates.
... Recent studies on captive macaques have begun to investigate the possibility that monkeys make decisions based on probability values different from those that are objectively correct, with inconsistent results across studies [24][25][26][27]47,48 . The probability weighting function was inverse S-shaped 25,26 , S-shaped 24,27 , or concave 26,49 . Although we consistently found that the probability weighting functions of our two well-trained monkeys were concave, most studies conducted in humans found inverse-S-shaped probability weighting functions at the aggregate level, with a large amount of heterogeneity at the individual level 13,15,[50][51][52][53][54] indicating an inconsistency between the two species. ...
Article
Full-text available
Prospect theory, arguably the most prominent theory of choice, is an obvious candidate for neural valuation models. How the activity of individual neurons, a possible computational unit, obeys prospect theory remains unknown. Here, we show, with theoretical accuracy equivalent to that of human neuroimaging studies, that single-neuron activity in four core reward-related cortical and subcortical regions represents the subjective valuation of risky gambles in monkeys. The activity of individual neurons in monkeys passively viewing a lottery reflects the desirability of probabilistic rewards parameterized as a multiplicative combination of utility and probability weighting functions, as in the prospect theory framework. The diverse patterns of valuation signals were not localized but distributed throughout most parts of the reward circuitry. A network model aggregating these signals reconstructed the risk preferences and subjective probability weighting revealed by the animals’ choices. Thus, distributed neural coding explains the computation of subjective valuations under risk.
Article
Individual survival and evolutionary selection require biological organisms to maximize reward. Economic choice theories define the necessary and sufficient conditions, and neuronal signals of decision variables provide mechanistic explanations. Reinforcement learning (RL) formalisms use predictions, actions, and policies to maximize reward. Midbrain dopamine neurons code reward prediction errors (RPE) of subjective reward value suitable for RL. Electrical and optogenetic self-stimulation experiments demonstrate that monkeys and rodents repeat behaviors that result in dopamine excitation. Dopamine excitations reflect positive RPEs that increase reward predictions via RL; against increasing predictions, obtaining similar dopamine RPE signals again requires better rewards than before. The positive RPEs drive predictions higher again and thus advance a recursive reward-RPE-prediction iteration toward better and better rewards. Agents also avoid dopamine inhibitions that lower reward prediction via RL, which allows smaller rewards than before to elicit positive dopamine RPE signals and resume the iteration toward better rewards. In this way, dopamine RPE signals serve a causal mechanism that attracts agents via RL to the best rewards. The mechanism improves daily life and benefits evolutionary selection but may also induce restlessness and greed.
Article
Behavior-related neuronal signals often vary between neurons, which might reflect the unreliability of individual neurons or a truly heterogeneous code. This notion may also apply to economic ("value-based") choices and the underlying reward signals. Reward value is subjective and can be described by a nonlinearly weighted magnitude (utility) and probability. Defining subjective values relies on the continuity axiom, whose testing involves structured variations of a wide range of reward magnitudes and probabilities. Axiom compliance demonstrates understanding of the stimuli and the meaningful character of choices. Using these tests, we investigated the encoding of subjective economic value by neurons in a key economic-decision structure of the monkey brain, the orbitofrontal cortex (OFC). We found that individual neurons carry heterogeneous neuronal value signals that largely fail to match the animal's choices. However, neuronal population signals matched the animal's choices well, suggesting accurate subjective economic value encoding by a heterogeneous population of unreliable neurons.
Preprint
Full-text available
Research in the multidisciplinary field of neuroeconomics has been driven by two influential theories regarding human economic choice: prospect theory, which describes decision-making under risk, and reinforcement learning theory, which describes learning for decision-making. We hypothesized that these two distinct theories guide decision-making in a comprehensive manner. Here, we propose and test a new decision-making theory under uncertainty that combines these highly influential theories. Collecting many gambling decisions from laboratory monkeys allowed for reliable testing of our hybrid model and revealed a systematic violation of prospect theory’s assumption that probability weighting is static. Using the same experimental paradigm in humans, substantial similarities between monkey and human behavior were described by our hybrid model, which incorporates decision-by-decision learning dynamics of prediction errors into static prospect theory. Our new model provides a single unified theoretical framework for exploring the neurobiological model of economic choice in human and nonhuman primates.
Article
Full-text available
Unlabelled: Expected Utility Theory (EUT) provides axioms for maximizing utility in risky choice. The Independence Axiom (IA) is its most demanding axiom: preferences between two options should not change when altering both options equally by mixing them with a common gamble. We tested common consequence (CC) and common ratio (CR) violations of the IA over several months in thousands of stochastic choices using a large variety of binary option sets. Three monkeys showed consistently few outright Preference Reversals (8%) but substantial graded Preference Changes (46%) between the initial preferred gamble and the corresponding altered gamble. Linear Discriminant Analysis (LDA) indicated that gamble probabilities predicted most Preference Changes in CC (72%) and CR (88%) tests. The Akaike Information Criterion indicated that probability weighting within Cumulative Prospect Theory (CPT) explained choices better than models using Expected Value (EV) or EUT. Fitting by utility and probability weighting functions of CPT resulted in nonlinear and non-parallel indifference curves (IC) in the Marschak-Machina triangle and suggested IA non-compliance of models using EV or EUT. Indeed, CPT models predicted Preference Changes better than EV and EUT models. Indifference points in out-of-sample tests were closer to CPT-estimated ICs than EV and EUT ICs. Finally, while the few outright Preference Reversals may reflect the long experience of our monkeys, their more graded Preference Changes corresponded to those reported for humans. In benefitting from the wide testing possibilities in monkeys, our stringent axiomatic tests contribute critical information about risky decision-making and serves as basis for investigating neuronal decision mechanisms. Supplementary information: The online version contains supplementary material available at 10.1007/s11166-022-09388-7.
Article
Logistic regressions were developed in economics to model individual choice behavior. In recent years, they have become an important tool in decision neuroscience. Here, I describe and discuss different logistic models, emphasizing the underlying assumptions and possible interpretations. Logistic models may be used to quantify a variety of behavioral traits, including the relative subjective value of different goods, the choice accuracy, risk attitudes, and choice biases. More complex logistic models can be used for choices between good bundles, in cases of nonlinear value functions, and for choices between multiple options. Finally, logistic models can quantify the explanatory power of neuronal activity on choices, thus providing a valid alternative to receiver operating characteristic (ROC) analyses.
Article
Full-text available
A fundamental but rarely contested assumption in economics and neuroeconomics is that decision-makers compute subjective values of risky options by multiplying functions of reward probability and magnitude. By contrast, an additive strategy for valuation allows flexible combination of reward information required in uncertain or changing environments. We hypothesized that the level of uncertainty in the reward environment should determine the strategy used for valuation and choice. To test this hypothesis, we examined choice between risky options in humans and rhesus macaques across three tasks with different levels of uncertainty. We found that whereas humans and monkeys adopted a multiplicative strategy under risk when probabilities are known, both species spontaneously adopted an additive strategy under uncertainty when probabilities must be learned. Additionally, the level of volatility influenced relative weighting of certain and uncertain reward information, and this was reflected in the encoding of reward magnitude by neurons in the dorsolateral prefrontal cortex.
Article
Full-text available
Humans and other primates share many decision biases, among them our subjective distortion of objective probabilities. When making choices between uncertain rewards we typically treat probabilities nonlinearly: overvaluing low probabilities of reward and undervaluing high ones. A growing body of evidence, however, points to a more flexible pattern of distortion than the classical inverse-S one, highlighting the effect of experimental conditions in shifting the weight assigned to probabilities, such as task feedback, learning, and attention. Here we investigated the role of sequence structure (the order in which gambles are presented in a choice task) in shaping the probability distortion patterns of rhesus macaques: we presented 2 male monkeys with binary choice sequences of MIXED or REPEATED gambles against safe rewards. Parametric modeling revealed that choices in each sequence type were guided by significantly different patterns of probability distortion: whereas we elicited the classical inverse-S-shaped probability distortion in pseudorandomly MIXED trial sequences of gamble-safe choices, we found the opposite pattern consisting of S-shaped distortion, with REPEATED sequences. We extended these results to binary choices between two gambles, without a safe option, and confirmed the unique influence of the sequence structure in which the animals make choices. Finally, we showed that the value of gambles experienced in the past had a significant impact on the subjective value of future ones, shaping probability distortion on a trial-by-trial basis. Together, our results suggest that differences in choice sequence are sufficient to reverse the direction of probability distortion. SIGNIFICANCE STATEMENT Our lives are peppered with uncertain, probabilistic choices. Recent studies showed how such probabilities are subjectively distorted. In the present study, we show that probability distortions in macaque monkeys differ significantly between sequences in which single gambles are repeated (S-shaped distortion), as opposed to being pseudorandomly intermixed with other gambles (inverse-S-shaped distortion). Our findings challenge the idea of fixed probability distortions resulting from inflexible computations, and points to a more instantaneous evaluation of probabilistic information. Past trial outcomes appeared to drive the “gap” between probability distortions in different conditions. Our data suggest that, as in most adaptive systems, probability values are slowly but constantly updated from prior experience, driving measures of probability distortion to either side of the S/inverse-S debate.
Article
Full-text available
Monkeys and other animals appear to share with humans two risk attitudes predicted by prospect theory: an inverse-S-shaped probability-weighting (PW) function and a steeper utility curve for losses than for gains. These findings suggest that such preferences are stable traits with common neural substrates. We hypothesized instead that animals tailor their preferences to subtle changes in task contexts, making risk attitudes flexible. Previous studies used a limited number of outcomes, trial types, and contexts. To gain a broader perspective, we examined two large datasets of male macaques’ risky choices: one from a task with real (juice) gains and another from a token task with gains and losses. In contrast to previous findings, monkeys were risk seeking for both gains and losses (i.e., lacked a reflection effect) and showed steeper gain than loss curves (loss seeking). Utility curves for gains were substantially different in the two tasks. Monkeys showed nearly linear PWs in one task and S-shaped ones in the other; neither task produced a consistent inverse-S-shaped curve. To account for these observations, we developed and tested various computational models of the processes involved in the construction of reward value. We found that adaptive differential weighting of prospective gamble outcomes could partially account for the observed differences in the utility functions across the two experiments and thus provide a plausible mechanism underlying flexible risk attitudes. Together, our results support the idea that risky choices are constructed flexibly at the time of elicitation and place important constraints on neural models of economic choice.
Article
Full-text available
Author Summary Numerous choice tasks have been used to study decision processes. Some of these choice tasks, specifically n-armed bandit, information sampling and foraging tasks, pose choices that trade-off immediate and future reward. Specifically, the best choice may not be the choice that pays off the highest reward immediately, and exploration of unknown options vs. exploiting known options can be a normatively useful strategy. We characterized the optimal choice strategies across these tasks using Markov Decision Processes (MDPs). The MDP framework can characterize optimal choice strategies when choices are affected by the value of future rewards. We found that uncertainty and time horizon have important effects on the choice strategies in these tasks. Specifically, in bandit and information sampling tasks, increasing uncertainty increases the value of exploring choice options that tend to pay off in the future, while decreasing uncertainty increases the value of choice options that pay off immediately. These effects are increased when time horizons are longer. Foraging tasks differ in that uncertainty plays a minimal role. However, time horizon is still important in foraging. Specifically, for long time horizons, travel delays to rewards become less relevant.
Article
Full-text available
Significance Most real-world rewards have multiple dimensions, such as amount, risk, and type. Here we show that within a bounded set of such multidimensional rewards monkeys’ choice behavior fulfilled several core tenets of rational choice theory; namely, their choices were stochastically complete and transitive. As such, in selecting between rewards, the monkeys behaved as if they maximized value on a common scale. Dopamine neurons encoded prediction errors that reflected that scale. A particular reward dimension influenced dopamine activity only to the extent that it influenced choice. Thus, vastly different reward types such as juice and food activated dopamine neurons in accordance with subjective value derived from the different rewards. This neuronal signal could serve to update value signals for economic choice.
Article
The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses — the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.
Article
I. Introduction, 503. — II. Basic postulates of choice, 505. — III. Wants vs. utility, 510. — IV. Structure of wants, 513. — V. Wants and choice, 518. — VI. Indifference and substitutability, 521. — VII. Choice and psychological threshold, 523. — VIII. Choice and risk, 524. — IX. Choice and uncertainty, 527. — X. Concluding remarks, 531.
Article
Background Optimal choices require an accurate neuronal representation of economic value. In economics, utility functions are mathematical representations of subjective value that can be constructed from choices under risk. Utility usually exhibits a nonlinear relationship to physical reward value that corresponds to risk attitudes and reflects the increasing or decreasing marginal utility obtained with each additional unit of reward. Accordingly, neuronal reward responses coding utility should robustly reflect this nonlinearity. Results In two monkeys, we measured utility as a function of physical reward value from meaningful choices under risk (that adhered to first- and second-order stochastic dominance). The resulting nonlinear utility functions predicted the certainty equivalents for new gambles, indicating that the functions’ shapes were meaningful. The monkeys were risk seeking (convex utility function) for low reward and risk avoiding (concave utility function) with higher amounts. Critically, the dopamine prediction error responses at the time of reward itself reflected the nonlinear utility functions measured at the time of choices. In particular, the reward response magnitude depended on the first derivative of the utility function and thus reflected the marginal utility. Furthermore, dopamine responses recorded outside of the task reflected the marginal utility of unpredicted reward. Accordingly, these responses were sufficient to train reinforcement learning models to predict the behaviorally defined expected utility of gambles. Conclusions These data suggest a neuronal manifestation of marginal utility in dopamine neurons and indicate a common neuronal basis for fundamental explanatory constructs in animal learning theory (prediction error) and economic decision theory (marginal utility).