PreprintPDF Available

A Rational Reinterpretation of Dual-Process Theories

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
Preprint

A Rational Reinterpretation of Dual-Process Theories

Abstract and Figures

Highly influential "dual-process" accounts of human cognition postulate the coexistence of a slow accurate system with a fast error-prone system. But why would there be just two systems rather than, say, one or 93? Here, we argue that a dual-process architecture might be neither arbitrary nor irrational, but might instead reflect a rational tradeoff between the cognitive flexibility afforded by multiple systems and the time and effort required to choose between them. We investigate what the optimal set and number of cognitive systems would be depending on the structure of the environment. We find that the optimal number of systems depends on the variability of the environment and the difficulty of deciding when which system should be used. Furthermore, when having two systems is optimal, then the first system is fast but error-prone and the second system is slow but accurate. Our findings thereby provide a rational reinterpretation of dual-process theories.
Content may be subject to copyright.
Running head: RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 1
A Rational Reinterpretation of Dual-Process Theories
Smitha Millia, *, Falk Liederb, Thomas L. Griffithsc
smilli@berkeley.edu,falk.lieder@tuebingen.mpg.de,
tomg@princeton.edu
aDepartment of Electrical Engineering and Computer Science, University of California, Berkeley,
Berkeley, CA, USA 94704
bMax Planck Institute for Intelligent Systems, Max-Planck-Ring 4 72076, Tübingen, Germany
cDepartments of Psychology and Computer Science, Princeton University, Princeton, NJ, 08544
Author Note
*Corresponding author. A preliminary version of Simulations 1 and 2 was presented at the
Thirty-First AAAI Conference on Artificial Intelligence and appeared in the proceedings of that
conference (Milli, Lieder, & Griffiths, 2017). The work presented in this article was supported by
grant number ONR MURI N00014-13-1-0341 and a grant from the Future of Life Institute.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 2
Abstract
Highly influential “dual-process" accounts of human cognition postulate the coexistence of a slow
accurate system with a fast error-prone system. But why would there be just two systems rather
than, say, one or 93? Here, we argue that a dual-process architecture might be neither arbitrary
nor irrational, but might instead reflect a rational tradeoff between the cognitive flexibility
afforded by multiple systems and the time and effort required to choose between them. We
investigate what the optimal set and number of cognitive systems would be depending on the
structure of the environment. We find that the optimal number of systems depends on the
variability of the environment and the difficulty of deciding when which system should be used.
Furthermore, when having two systems is optimal, then the first system is fast but error-prone and
the second system is slow but accurate. Our findings thereby provide a rational reinterpretation of
dual-process theories.
Keywords: bounded rationality; dual-process theories; meta-decision making; bounded
optimality; metareasoning; resource-rationality
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 3
A Rational Reinterpretation of Dual-Process Theories
Starting in the 1960s, a number of findings began to suggest that people’s judgments and
decisions systematically deviate from the predictions of logic, probability theory, and expected
utility (Wason, 1968; Tversky & Kahneman, 1974; Kahneman & Tversky, 1979; Gilovich,
Griffin, & Kahneman, 2002). These deviations are often referred to as cognitive biases and have
fueled the heated debate about human rationality (Stanovich, 2009; Gigerenzer, 1991; Kahneman
& Tversky, 1996). It is commonly assumed that cognitive biases result from people’s use of rather
arbitrary heuristics (Tversky & Kahneman, 1974; Gilovich et al., 2002), thus leading some to
conclude that people are fundamentally irrational (Sutherland, 2013; Marcus, 2009; Ariely,
2009). However, others have argued that many apparent errors in human judgment can be
understood as rational solutions to a different construal of the problem participants were
presumably trying to solve (Oaksford & Chater, 1994, 2007; Hahn & Oaksford, 2007; Hahn &
Warren, 2009; Tenenbaum & Griffiths, 2001; Griffiths & Tenenbaum, 2001; Austerweil &
Griffiths, 2011; Parpart, Jones, & Love, 2017).
These rational explanations build on the methodology of rational analysis (Anderson, 1990;
Chater & Oaksford, 1999), which aims to explain the function of cognitive processes by assuming
the human mind is well-adapted to the structure of the environment and the problems people are
trying to solve. In other words, rational analysis assumes that the human mind implements a
(near) rational solution with respect to the underlying computational problem the mind is trying to
solve. A more recent line of work on resource-rational analysis extends this idea and assumes
that the human mind is well-adapted to problems after taking into account the constraint of
limited time or cognitive resources (Lieder & Griffiths, in revision; Griffiths, Lieder, & Goodman,
2015). In other words, resource-rational analysis assumes that the human mind rationally
trades-off the benefit of accurate solutions against the limited resources available. Under this
framework, when time or cognitive resources are abundant, then it is rational to perform more
computation, and when time or cognitive resources are limited, then it is rational to do less
computation. In this way, many supposedly ad-hoc heuristics have been reinterpreted as being
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 4
rational solutions when resources are limited (Lieder, Griffiths, & Hsu, 2018; Lieder, Griffiths,
Huys, & Goodman, 2018a, 2018b; Howes, Warren, Farmer, El-Deredy, & Lewis, 2016; Khaw, Li,
& Woodford, 2017; Sims, 2003; Tsetsos et al., 2016; Bhui & Gershman, 2017). Furthermore,
people appear to adaptively choose between their fast heuristics and their slower and more
deliberate strategies based on the amount of resources available (Lieder & Griffiths, 2017)
However, an issue still remains unresolved in the push for the resource-rational
reinterpretation of these heuristics. Since the exact amount of computation to do for a problem
depends on the particular time and cognitive resources available, a larger repertoire of reasoning
systems should enable the mind to more flexibly adapt to different situations (Payne, Bettman, &
Johnson, 1993; Gigerenzer & Selten, 2002). In fact, achieving the highest possible degree of
adaptive flexibility would require choosing from an infinite set of diverse cognitive systems.
However, this is not consistent with behavioral and neuroscientific evidence for a small number of
qualitatively different decision systems (van der Meer, Kurth-Nelson, & Redish, 2012; Dolan &
Dayan, 2013) and similar evidence in the domain of reasoning (Evans, 2003, 2008; Evans &
Stanovich, 2013).
One reason for a smaller number of systems could be that as the number of systems
increases it becomes increasingly more time-consuming to select between them (Lieder &
Griffiths, 2017). This suggests that the number and nature of the mind’s cognitive systems might
be shaped by the competing demands for the ability to flexibly adapt one’s reasoning to the
varying demands of a wide range of different situations and the necessity to do so quickly and
efficiently. In our work, we theoretically formalize this explanation, allowing us to derive not only
what the optimal system is given a particular amount of resources, but what the optimal set of
systems is for a human to select between across problems.
Such an explanation may provide a rational reinterpretation of dual-process theories, the
theory that the mind is composed of two distinct types of cognitive systems: one that is deliberate,
slow, and accurate, and a second one that is fast, intuitive, and fallible (Evans, 2008; Kahneman &
Frederick, 2002, 2005). Similar dual-process theories have independently emerged in research on
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 5
decision-making (Dolan & Dayan, 2013) and cognitive control (Diamond, 2013). While recent
work in these areas has addressed the question of how the mind arbitrates between the two
systems (Daw, Niv, & Dayan, 2005; Keramati, Dezfouli, & Piray, 2011; Lieder & Griffiths, 2017;
Shenhav, Botvinick, & Cohen, 2013; Boureau, Sokol-Hessner, & Daw, 2015), it remains
normatively unclear why the mind would be equipped with these two types of cognitive system,
rather than another set of systems.
The existence of the accurate and deliberate system, commonly referred to as System 2
following Kahneman and Frederick (2002), is easily justified by the benefits of rational
decision-making. By contrast, the fast and fallible system (System 1) has been interpreted as a
kluge (Marcus, 2009) and its mechanisms are widely considered to be irrational (Sutherland,
2013; Ariely, 2009; Tversky & Kahneman, 1974; Gilovich et al., 2002). This raises the question
why this system exists at all. Recent theoretical work provided a normative justification for some
of the heuristics of System 1 by showing that they are qualitatively consistent with the rational use
of limited cognitive resources (Griffiths et al., 2015; Lieder, Griffiths, & Hsu, 2018; Lieder,
Griffiths, Huys, & Goodman, 2018a, 2018b) – especially when the stakes are low and time is
scarce and precious. Thus, System 1 and System 2 appear to be rational for different kinds of
situations. For instance, you might want to rely on System 1 when you are about to get hit by a
car and have to make a split-second decision about how to move. But, you might want to employ
System 2 when deciding whether or not to quit your job.
Here, we formally investigate what set of systems would enable people to make the best
possible use of their finite time and cognitive resources. We derive the optimal tradeoff between
the cognitive flexibility afforded by mutliple systems and the cost of choosing between them. To
do so, we draw inspiration from the artificial intelligence literature on designing intelligent agents
that make optimal use of their limited-performance hardware by building upon the mathematical
frameworks of bounded optimality (Russell & Subramanian, 1995) and rational metareasoning
(Russell & Wefald, 1991b; Hay, Russell, Tolpin, & Shimony, 2012). We apply this approach to
four different domains where the dual systems framework has been applied to explain human
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 6
decision-making: binary choice, planning, strategic interaction, and multi-alternative,
multi-attribute risky choice. We investigate how the optimal cognitive architecture for each
domain depends on the variability of the environment and the cost of choosing between multiple
cognitive systems, which we call metareasoning cost.
This approach allows us to extend the application of resource-rational analysis from a
particular system of reasoning to sets of cognitive systems, and our findings provide a normative
justification for dual-process theories of cognition. Concretely, we find that across all four
domains the optimal number of systems increases with the variability of the environment but
decreases with the costliness determining when which of these systems should be in control. In
addition, when it is optimal to have two systems, then the difference in their speed-accuracy
tradeoffs increases with the variability of the environment. In variable environments, this results
in one system that is accurate but costly to use and another system that is fast but error-prone.
These predictions mirror the assertions of dual-process accounts of cognition (Evans, 2008;
Kahneman, 2011). Our findings cast new light on the debate about human rationality by
suggesting that the apparently conflicting views of dual-process theories and rational accounts of
cognition might be compatible after all.
The remainder of this paper is structured as follows: We start by summarizing previous
work in psychology and artificial intelligence that our article builds on. We then describe our
mathematical methods for deriving optimal sets of cognitive systems. The subsequent four
sections apply this methodology to the domains of binary choice, planning, strategic interaction in
games, and multi-alternative risky choice. We conclude with the implications of our findings for
the debate about human rationality and directions for future work.
Background
Before delving into the details of our analysis, we first discuss how our approach applies to
the various dual-process theories in psychology, and how we build on the ideas of bounded
optimality and rational metareasoning developed in artificial intelligence research.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 7
Dual-process theories
The idea that human minds are composed of multiple interacting cognitive systems first
came to prominence in the literature on reasoning (Evans, 2008; Stanovich, 2011). While people
are capable of reasoning in ways that are consistent with the prescriptions of logic, they often do
not. Dual-process theories suggested that this is because people employ two types of cognitive
strategies: fast but fallible heuristics that are triggered automatically and deliberate strategies that
are slow but accurate.
Different dual-process theories vary in what they mean by two cognitive systems. For
example, Evans and Stanovich (2013) distinguish between dual processes, in which each process
can be made up of multiple cognitive systems, and dual systems, which corresponds to the literal
meaning of two cognitive systems. Because our work abstracts these cognitive systems based on
their speed-accuracy tradeoff our analysis applies both at the level of systems or processes as long
as the systems or processes accomplish speed-accuracy tradeoffs. Thus, our theory still applies to
both dual “processes” and dual “systems”.
There is also debate over how the two systems would interact. Some theories postulate the
existence of a higher-level controller that chooses between the two systems (Norman & Shallice,
1986; Shenhav et al., 2013), some that the two systems run in parallel, and others that the slower
system interrupts the faster one (Evans & Stanovich, 2013). The analysis we present simply
assumes that there is greater metareasoning cost incurred for each additional system. This is
clearest to see when a higher-level controller needs to make the decision of which system to
employ. Alternatively, if multiple cognitive systems operated in parallel, the cost of arbitrating
between these systems would also increase with the number of systems – just like the
metareasoning cost. So, we believe our analysis would also apply under this alternative
assumption.
Since their development in the reasoning literature, dual-process theories have been applied
to explain a wide range of mental phenonomena, including judgment and decision-making, where
it has been popularized by the distinction between System 1 and System 2 (Kahneman &
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 8
Frederick, 2002, 2005; Kahneman, 2011), and moral reasoning where the distinction is made
between a fast deontological system and a slow utilitarian system (Greene, 2015). In parallel with
this literature in cognitive psychology, research on human reinforcement learning has led to
similar conclusions. Behavioral and neural data suggest that the human brain is equipped with
two distinct decision systems: a fast, reflexive, system based on habits and a slow, deliberate
system based on goals (Dolan & Dayan, 2013). The mechanisms employed by these systems have
been mapped onto model-based versus model-free reinforcement learning algorithms. A
model-free versus model-based distinction has also been suggested to account for the nature of
the two systems posited to underlie moral reasoning (Cushman, 2013; Crockett, 2013).
The empirical support for the idea that the human mind is composed of two types of
cognitive systems raises the question of why such a composition would evolve from natural
selection. Given that people outperform AI systems in most complex real-world tasks despite
their very limited cognitive resources (Gershman, Horvitz, & Tenenbaum, 2015), we ask whether
being equipped with a fast but fallible and a slow but accurate cognitive system can be understood
as a rational adaption to the challenge of solving complex problems with limited cognitive
resources (Griffiths et al., 2015).
Bounded Optimality and Resource-Rational Analysis
Recent work has illustrated that promising process models of human cognition can be
derived from the assumption that the human mind makes optimal use of cognitive resources that
are available to it (Griffiths et al., 2015; Lewis, Howes, & Singh, 2014). This idea can be
formalized by drawing on the theory of bounded optimality which was developed as a foundation
for designing optimal intelligent agents. In contrast to expected utility theory (Von Neumann &
Morgenstern, 1944), bounded optimality takes into account the constraints imposed by
performance-limited hardware and the requirement that the agent has to interact its environment
in real time (Russell & Subramanian, 1995). The basic idea is to mathematically derive a program
that would enable the agent to interact with its environment as well as or better than any other
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 9
program that its computational architecture could execute. Critically, the agent’s limited
computational resources and the requirement to interact with a potentially very complex,
fast-paced, dynamic environment in real-time entail that the agent’s strategies for reasoning and
decision-making have to be extremely efficient. This rules out naive implementations of Bayes
rule and expected utility maximization as those would take so long to compute that the agent
would suffer a decision paralysis so bad that it might die before taking even a single action.
The fact that people are subject to the same constraints makes bounded optimality a
promising normative framework for modeling human cognition (Griffiths et al., 2015).
Resource-rational analysis applies the principle of bounded optimality to derive optimal cognitive
strategies from assumptions about the problem to be solved and the cognitive architecture
available to solve it (Griffiths et al., 2015). Recent work illustrates that this approach can be used
to discover the discover and make sense of people’s heuristics for judgment (Lieder, Griffiths,
Huys, & Goodman, 2018a) and decision-making (Lieder, Griffiths, Huys, & Goodman, 2018a;
Lieder, Griffiths, & Hsu, 2018), as well as memory and cognitive control (Howes et al., 2016).
The resulting models have shed new light on the debate about human rationality (Lieder, Griffiths,
Huys, & Goodman, 2018a, 2018b; Lieder, Krueger, & Griffiths, 2017; Lieder, Griffiths, Huys, &
Goodman, 2018b; Lieder, Griffiths, & Hsu, 2018; Griffiths et al., 2015). While this approach has
so far focused on one individual strategy at a time, the research presented here extends it to
deriving optimal cognitive architectures comprising multiple systems or strategies for a wider
range of problems. To do so, we use the theory of rational metareasoning as a foundation for
modeling how each potential cognitive architecture would decide when to rely on which system
or strategy.
Rational metareasoning as a framework for modeling the adaptive control of cognition
Previous research suggests that people flexibly adapt how they decide to the requirements
of the situation (Payne, Bettman, & Johnson, 1988). Recent theoretical work has shown that this
adaptive flexibility can be understood within the rational metareasoning framework developed in
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 10
artificial intelligence (Lieder & Griffiths, 2017). Rational metareasoning (Russell & Wefald,
1991b; Hay et al., 2012) formalizes the problem of selecting computations so as to make optimal
use of finite time and limited-performance hardware. The adaptive control of computation
afforded by rational metareasoning is critical for intelligent systems to be able to solve complex
and potentially time-critical problems on performance-limited hardware (Horvitz, Cooper, &
Heckerman, 1989; Russell & Wefald, 1991b). For instance, it is necessary for a
patient-monitoring system used in emergency medicine to metareason in order to decide when to
terminate diagnostic reasoning and recommend treatment. (Horvitz & Rutledge, 1991). This
example illustrates that rational metareasoning may be necessary for agents to achieve
bounded-optimality in environments that pose a wide range of problems that require very
different computational strategies. However, to be useful for achieving bounded-optimality,
metareasoning has to be done very efficiently.
In principle, rational metareasoning could be used to derive the optimal amount of time and
mental effort that a person should invest into making a decision (Shenhav et al., 2017).
Unfortunately, selecting computations optimally is a computation-intensive problem itself
because the value of each computation depends on the potentially long sequence of computations
that can be performed afterwards. Consequently, in most cases, solving the metareasoning
problem optimally would defeat the purpose of trying to save time and effort (Lin, Kolobov,
Kamar, & Horvitz, 2015; Hay et al., 2012; Russell & Wefald, 1991a). Instead, to make optimal
use of their finite computational resources bounded-optimal agents (Russell & Subramanian,
1995) must optimally distribute their resources between metareasoning and reasoning about the
world. Thus, studying bounded-optimal metareasoning might be a way to understand how people
manage to allocate their finite computational resources near-optimally with very little effort
(Gershman et al., 2015; Keramati et al., 2011).
Recent work has shown that approximate metareasoning over a discrete set of cognitive
strategies can save more time and effort than it takes and thereby improve overall performance
(Lieder et al., 2014). This approximation can drastically reduce the computational complexity of
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 11
metareasoning while achieving human-level performance (Lieder et al., 2014; Lieder & Griffiths,
2017). Thus, rather than metareasoning over all possible sequences of mental operations to
determine the exact amount of time to think, humans may simply metareason over a finite set of
cognitive systems that have different speed and accuracy tradeoffs. This suggests a cognitive
architecture comprising multiple systems for reasoning and decision making and a executive
control system that arbitrates between them – which is entirely consistent with extant theories of
cognitive control and mental effort (Norman & Shallice, 1986; Shenhav et al., 2017, 2013).
Dual-process theories can be seen as a special case of this cognitive architecture where the
number of decision systems is two.
According to this perspective, the executive control system selects between a limited
number of cognitive systems by predicting how well each of them would perform in terms of
decision quality and effort and then selects the systems with the best predicted performance
(Lieder & Griffiths, 2017). Assuming that each of these predictions takes a certain amount of
mental effort, this entails that the cost of deciding which cognitive system to rely on in a given
situation increases with the number of systems. At the same time, increasing the number of
systems also increases the agent’s cognitive flexibility thereby enabling it to achieve a higher level
of performance across a wider range of environments. Conversely, reducing the space of
computational mechanisms the agent can choose from entails that there may be problems for
which the optimal computational mechanisms will be no longer available. This dilemma
necessitates a tradeoff that sacrifices some flexibility to increase the speed at which cognitive
mechanisms can be selected. This raises the question of how many and which computational
mechanisms a bounded-optimal metareasoning agent should be equipped with, which we proceed
to explore in the following sections.
Deriving Bounded-Optimal Cognitive Systems
We now describe our general approach for extending resource-rational analysis to the level
of cognitive architectures. The first step is to model the environment. For the purpose of our
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 12
analysis, we characterize each environment by the set of decision problems Dthat it poses to
people and a probability distribution Pover Dthat represents how frequently the agent will
encounter each of them. The set of decision problems Dcould be quite varied, for example, it
could include deciding which job to pick and deciding what to eat for lunch. In this case Pwould
encode the fact that deciding what to eat for lunch is a more common type of decision problem
than deciding which job to pick. Associated with each decision problem dis a utility function
Ud(a)that represents the utility gained by the agent for taking action ain decision problem d.
Having characterized the environment in terms of decision problems, we now model how
people might solve them. We assume that there is a set of reasoning and decision-making systems
Tthat the agent could potentially be equipped with. The question we seek to investigate is what
subset M⊆T is optimal for the agent to actually be equipped with. The optimal set of systems
Mis dependent on three costs: (1) the action cost: the cost of taking the chosen action, (2) the
reasoning cost: the cost of using a system from Mto reason about which action to take, (3) the
metareasoning cost: the cost of deciding which system to use to decide which action to take. For
simplicity, we will describe each of the costs in terms of time delays, although they also entail
additional costs, including metabolic costs.
As an example, consider the scenario of deciding what to order for lunch at a restaurant.
The diner has a fixed amount of time she can spend at lunch until she needs to get back to work,
so time is a finite resource. The action cost is the time required to eat the meal. A person might
have multiple systems for deciding which items to choose. For example, one system may rely on
habit and order the same dish as last time. Another system may perform more logical computation
to analyze the nutritional value of each item or what the most economical choice is. Each system
has an associated reasoning cost, the time it takes for that system to decide which item to order.
It is clear that the diner has to balance the amount of time spent thinking about what meal to
pick (reasoning cost) with the amount of time it will take to actually eat the meal (action cost), so
that she is able to finish her meal in the time she has available. If the diner is extremely
time-constrained, perhaps because of an urgent meeting she needs to get back to, then she may
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 13
simply heuristically plop items onto her plate. But, if the diner has more time, then she may think
more about what items to choose.
In addition to the cost of reasoning and the cost of acting, having multiple decision systems
also incurs the cost of metareasoning, that is reasoning about how to reason about what to do. In
other words, the metareasoning cost is how much time it takes the diner to decide how much to
think about whether to rely on her habits, an analysis of nutritional value, or any of the the other
decision mechanisms she may have at her disposal. If the diner only has one system of thinking,
then the metareasoning cost is zero. But as the number of systems increases, the metareasoning
cost of deciding which system should be in control increases. This raises the question of what is
the optimal ensemble of cognitive systems, how many systems does it include, and what are they?
We can derive the answer to these questions by computing minimizing the expected sum of action
cost, reasoning cost, and metareasoning cost over the set of all possible ensembles of cognitive
systems.
In summary, our approach for deriving a bounded-optimal cognitive architecture proceeds
as follows:
1. Model the environment. Define the set of decision problems D, the distribution over them
P, and the utility for each problem Ud(a).
2. Model the agent. Define the set of possible cognitive systems Tthe agent could have.
3. Specify the optimal mind design problem. Define the metric that the bounded agent’s
behavior optimizes, i.e., a trade-off between the utility it gains and the costs that it incurs;
the action cost, reasoning cost, and metareasoning cost.
4. Solve the optimal mind design problem. Solve (3) to find the optimal set of systems
M⊆T for the agent to be equipped with.
Once we have done this, we can begin to probe how different parts of the simulation affect
the final result in step (4). For example, we expect that the optimal cognitive architecture for a
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 14
Figure 1. The reward rate in two-alternative forced choice (Simulation 1) usually peaks for a
moderately small number of decision systems. The expected utility per time of the optimal choice
of systems, M?, as a function of the number of systems (|M|). As the costliness of
metareasoning, 1
rmdecreases, the optimal number of systems increases. In this example
E[re] = 100 and σ(re) = 100.
variable environment should comprise multiple cognitive systems with different characteristics.
But at the same time, the number of systems should not be too high, or else the time spent on
deciding which system to use, the metareasoning cost, will be too high. In other words, we
hypothesize that the number of systems will depend on a tradeoff between the variability of the
environment and the metareasoning cost. Our simulations show that this is indeed the case.
Simulation 1: Two-Alternative Forced Choice
Our first simulation focuses on the widely-used two-alternative forced choice (2AFC)
paradigm, in which a participant is forced to select between two options. For example,
categorization experiments often require their participants to decide whether the presented item
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 15
belongs to the category or not, and psychophysics experiments often require participants to judge
whether two stimuli are the same or different. Even in simple laboratory settings, judgments
made within a 2AFC task seem to stem from systematically different modes of thinking.
Therefore, 2AFC tasks are a prime setting to start in evaluating our theory for dual process
systems. But before describing the details of our 2AFC simulation, we first review evidence of
dual-process accounts of behavior in the 2AFC paradigm.
A very basic binary choice task presents an animal with a lever that it can either press to
obtain food or decide not to press (Dickinson, 1985). It has been shown that early on in this task
rodents’ choices are governed by a flexible brain system that will stop pressing the lever when the
they no longer want the food. By contrast, after extensive training their choices are controlled by
a different, inflexible brain system that will continue to press the lever even when the reward is
devaluated by poisoning the food. Interestingly, these two systems are preserved in the human
brain and the same phenomenon has been demonstrated in humans (Balleine & O’Doherty, 2010).
Another example of two-alternative forced-choice is the probability learning task where
participants repeatedly choose between two options, the first of which yields a reward with
probability p1and the second of which yields a reward with probability p2= 1 p1. It has been
found that depending on the incentives people tend to make these choices in two radically
different ways (Shanks, Tunney, & McCarthy, 2002): When the incentives are low then people
tend to use a strategy that chooses option one with a frequency close to p1and option two with a
frequency close to p2– which can be achieved very efficiently (Vul, Goodman, Griffiths, &
Tenenbaum, 2014). By contrast, when the incentives are high then people employ a choice
strategy that maximizes their earnings by almost always choosing the option that is more likely to
be rewarded – which requires more computation (Vul et al., 2014).
The dual systems perspective on 2AFC leaves open the normative question: what set of
systems is optimal for the agent to be equipped with? To answer this question, we apply the
methodology described in the previous section to the problem of bounded-optimal binary-choice.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 16
Methods
As in the 2AFC probability learning task used by Shanks et al. (2002), the agent receives a
reward of +1 for picking the correct action and 0for picking the incorrect action. An
unboundedly rational agent would always pick the action with a higher probability of being
correct. Yet, although simple in set-up, computing the probability of an action being correct
generally requires complex inferences over many interconnected variables. For example, if the
choice is between turning left onto the highway or turning right to smaller backroads, estimating
the probability of which action will lead to less traffic may require knowledge of when rush hour
is, whether there is a football game happening, and whether there are accidents in either direction.
To approximate these often intractable inferences people appear to perform probabilistic
simulations of the outcomes, and the variability and biases of their predictions (Griffiths &
Tenenbaum, 2006; Lieder, Griffiths, Huys, & Goodman, 2018a) and choices (Vul et al., 2014;
Lieder, Griffiths, & Hsu, 2018) match those of efficient sampling algorithms. Previous work has
therefore modeled people as bounded-optimal sample-based agents, which draw a number of
samples from the distribution over correct actions and then picks the action that was sampled
most frequently. (Vul et al., 2014; Griffiths et al., 2015). In line with the prior work, we too model
the agent as being a sample-based agent, described formally below.
Let a0and a1be the actions available to the agent where a1has a probability θof being the
correct action and a0has a probability 1θof being correct. The probability θthat a1is correct
varies across different environments, reflecting the fact that in some settings it is easier to tell
which action is correct than others. For example, it is obvious between the choice of a two-month
old tomato and a fresh orange that the more nutritious choice is the latter. In this case, it is clear
that the fresh orange is correct with probability near one. On the other hand, it may be quite
difficult to decide between whether to attend graduate school at two universities with similar
programs. In this case, the difference between the probabilities of each being correct may be quite
marginal, and both might have close to a 0.5chance of being correct. We model the variability in
the difficulty of this choice by assuming that θis equally likely to be any value in the range
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 17
(0.5,1), i.e θPθ=Unif(0.5,1). We consider the range (0.5,1) instead of (0,1) without loss of
generality because we can always rename the actions so that a0is more likely to be correct than
a1.
To make a decision the sample-based agent draws some number of samples kfrom the
distribution over correct actions, iBern(θ), and picks the action aithat it sampled more. 1If the
agent always draws ksamples before acting, then its expected utility across all environments is
Eθ[U|k] = Zθ[P(a1is correct)·P(Agent picks a1|k)
+P(a0is correct)·P(Agent picks a0|k)]Pθ().(1)
See Appendix A for a detailed derivation of how to calculate the quantity in Equation 1. If there
were no cost for samples, then the agent could take an infinite number of samples to ensure
choosing the correct action. But this is, of course, impractical in the real world because drawing a
sample takes time and time is limited. Vul et al. (2014) show how the optimal number of samples
changes based on the cost of sampling in various 2AFC problems. They parameterize the cost of
sampling as the ratio, re, between the time for acting and the execution time of taking 1sample.
Suppose acting takes one unit of time, then the amount of time it takes to draw ksamples is k/re.
The total amount of time the agent takes is 1 + k/re. Thus, the optimal number of samples the
agent should draw to maximize its expected utility per unit time is
k= arg max
kN0
Eθ[U|k]
1 + k
re
.(2)
When the time it takes to generate a sample is at least one tenth of the time it takes to
execute the action (re10), then the optimal number of samples is either zero or one. In general,
the first sample provides the largest gain in decision quality and the returns diminish with every
subsequent sample. The point where the gain in decision quality falls below the cost of sampling
1If there is a tie, then the agent picks either a0or a1with equal probability. However, for odd k, the agent’s expected
utility after drawing ksamples, Eθ[U|k], is equal to its expected utility after drawing k+ 1 samples, Eθ[U|k+ 1].
Thus, we can restrict ourselves to odd kwhere no ties are possible.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 18
Table 1
The optimal set of cognitive systems (M) for the 2AFC task of Simulation 1 as a function of the
number of systems (|M|) and the variability of the environment (Var(re)) for E[re] = 100 and
rm= 1000.
|M|
Var(re)
103104105
1 3 3 1
2 3, 5 1, 5 1, 7
3 3, 5, 7 1, 3, 7 1, 3, 9
4 1, 3, 5, 7* 1, 3, 5, 7 1, 3, 7, 13
(*) Any set of four systems that included 3, 5, 7 was optimal.
depends on the value of re. Since this value can differ drastically across environments, achieving a
near-optimal tradeoff in all environments requires adjusting the number of samples. Even a simple
heuristic-based metareasoner that adapts the number of samples it takes based on a few thresholds
on redoes better than one which always draws the same number of samples (Icard, 2014).
Here, we study an agent that chooses how many samples to draw by metareasoning over a
finite subset Mof all possible numbers of samples. Furthermore, we assume that the time spent
metareasoning increases linearly with the number of systems. By analogy to Vul et al. (2014), we
formalize the metareasoning cost in terms of the ratio rmof the time it takes to act over the time it
takes to predict the performance of a single system.
We can again calculate the total amount of time the agent spends in the problem, while now
taking into account the time spent on metareasoning. Just as before, the agent spends one unit of
time executing its action, and k/reunits of time to draw ksamples. But now, we also account for
the time it takes the agent to predict performance of a system: 1/rm. The total amount of time it
takes the agent to metareason, i.e predict the performance of all systems, is |M|/rm. Therefore,
the total amount of time is 1 + πM(re)
re+|M|
rm. We assume the agent picks the optimal number of
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 19
Figure 2. Performance of agents with different numbers of decision mechanisms in the 2AFC
problem of Simulation 1. The plot shows the optimal number of decision systems as a function of
the standard deviation of reand 1/rm. In this example E[re] = 10.
samples out of the set of possible systems M:
k= arg max
k∈M∪{0}
Eθ[U|k]
1 + k
re+|M|
rm
.(3)
Given this formulation of the problem, we can now calculate the optimal set of systems for
the agent. The set of cognitive systems that results in the optimal expected utility per time for the
bounded sampling agent is
M= arg max
M⊂N
Ere
max
k∈M∪{0}
Eθ[U|k]
1 + k
re+|M|
rm
.(4)
Equation 4 resembles Equation 3 because both optimize the agent’s expected utility per time. The
difference is that Equation 3 calculates the optimal number of samples for a fixed cost of
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 20
sampling, while Equation 4 calculates the optimal number of systems for a distribution of costs of
sampling.
Note that the optimal set of systems depends on the distribution of the sampling cost re
across different environments. Since sampling an action generally takes less time than executing
the action, we assume that reis always greater than one. We can satisfy this constraint on reby
modeling reas following a shifted Gamma distribution, i.e re1Γ(α, β).
Results
Figure 1 shows a representative example2of the expected utility per time as a function of
the number of systems for different metareasoning costs. Under a large range of metareasoning
costs the optimal number of systems is just one, but as the costliness of selecting a cognitive
system decreases, the optimal number of systems increases. However even when the optimal
number of systems is more than one, each additional system tends to only result in a marginal
increase in utility, suggesting that one reason for few cognitive systems may be that the benefit of
additional systems is very low.
Figure 2 shows that the optimal number of systems increases with the variance of reand
decreases with the cost of selecting between cognitive systems (i.e., 1
rm). Interestingly, there is a
large set of plausible combinations of variability and metareasoning cost for which the
bounded-optimal agent has two cognitive systems. In addition, when the optimal number of
systems is two, then the gap between the values of the two systems picked increases with the
variance of re(see Table 1), resulting in one system that has high accuracy but high cost and
another system that has low accuracy and low cost, which matches the characteristics of the
systems posited by dual-process accounts. Thus, the conditions under which we would most
expect to see two cognitive systems like the ones suggested by dual-process theories are when the
environment is highly variable and arbitrating between cognitive systems is costly.
2For all experiments reported in this paper, we found that alternative values for E[re]or Var(re)did not change the
qualitative conclusions, unless otherwise indicated.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 21
Simulation 2: Sequential Decision-Making
Our first simulation modeled one-step decision problems in which the agent made a single
choice between two options. In our second simulation, we turn to more complex, sequential
decision problems, in which the agent needs to choose a sequence of actions over time in order to
achieve its goal. In these problems, the best action to take at any given point depends on future
outcomes and actions, thus leading to the need for planning. Furthermore, since actions only
affect the environment probabilistically, it leads to the need for planning under uncertainty.
Although planning often allows us to make better decisions, planning places high demands
on people’s working memory and time (Kotovsky, Hayes, & Simon, 1985). This may be why
research on problem solving has found that people use both planning and simple heuristics
(Newell & Simon, 1972; Atwood & Polson, 1976; Kotovsky et al., 1985) and models of problem
solving often assume that the mind is equipped with a planning strategy, such as means-ends
analysis, and one or two simple heuristics such as hill-climbing (Newell & Simon, 1972;
Gunzelmann & Anderson, 2003; Anderson, 1990).
Consistent with these findings, modern research on sequential decision-making points to the
coexistence of two systems: a reflective, goal-directed system that uses a model of the
environment to plan multiple steps into the future and a reflexive system that learns
stimulus-response associations (Dolan & Dayan, 2013). Interestingly, people appear to select
between these two systems in a manner consistent with rational metareasoning: When people are
given a task where they can either plan two steps ahead to find the optimal path or perform almost
equally well without planning, they often eschew planning (Daw et al., 2005; Kool, Cushman, &
Gershman, 2016), but when the incentive structure is altered to make planning worthwhile then
people predominantly rely on the planning system (Kool, Gershman, & Cushman, 2017). These
findings are also consistent with Anderson’s rational analysis of problem solving which assumed
that people select between planning according to means-ends-analysis and a hill climbing
heuristic according to a rational cost-benefit analysis (Anderson, 1990).
Working from the assumption that the mind is equipped with a planning-based system and a
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 22
Figure 3. Performance of agents with different numbers of cognitive systems in planning under
uncertainty (Simulation 2). The number of actions it takes an agent to reach a goal as a function
of the number of simulated paths before each action. For 0simulated paths the expected number
of actions was 500 (the maximum allowed).
reflexive system, Daw et al. proposed a normative theory of how to choose which system to use
(Daw et al., 2005). Here, we aim to derive a normative theory of what set of systems the mind
should be equipped with in the first place.
Methods
Like Daw et al., we model the challenge of finding a sequence of actions that achieves the
goal as a finite-horizon Markov decision problem (MDP; Sutton & Barto, 2018) with an
absorbing goal-state. This type of MDP is formally defined by a set of states S, a set of actions A,
a cost function c:S × A → R0that measures how costly each action ais depending on the
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 23
Figure 4. The expected cost incurred is a U-shaped function of the number of planning systems in
Simulation 2. As the cost of selecting a planning system ( 1
rm) decreases, the optimal number of
systems increases. The expected cost of 0 systems was 500, thus 1 system provided the greatest
reduction in cost. In this example E[re] = 100, Var(re) = 105, and ca= 1.
current state s, a transition probability model p:S × A × S [0,1] that defines the probability
of the next state given the current state and the action taken, an absorbing goal state g, and a time
horizon h. Experience in these MDPs can be thought of as a set of trials or episodes. A trial ends
once the agent reaches an absorbing goal-state gor it exceeds the maximal number of time steps
allowed by the time horizon h.
In the standard formulation, at each time step, the agent takes an action, which depends
upon its current state. The agent’s action choices can be concisely represented by a policy
π:S → A that returns an action for each state. An optimal policy minimizes the expected sum of
costs across the trial:
π= arg min
π
E"N
X
i=0
c(si, π(si))
π#,(5)
where siis the state at time step iand Nis the time step that the episode ends (either once the
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 24
agent reaches the goal state gor the time horizon his reached). The expectation is taken over the
states at each time step, which are stochastic according to the transition model p.
However, this formulation of the problem ignores the fact that the agent needs to think to
decide how to act, and that thinking also incurs cost. We extend the standard MDP formulation to
account for the cost of thinking. At each time step, the agent has a thinking stage, followed by an
acting stage. In the thinking stage, the agent executes a system tthat (stochastically) decides on
an action a. In the acting stage, the agent takes the action a. In addition to the cost c(s, a)of
acting, there is also a cost f(t)that measures the cost of thinking with system t. Then, an optimal
system minimizes the total expected cost of acting and thinking:
t= arg min
t
E"N
X
i=0
c(si, ai) + f(t)
t#,(6)
where a0,...aNare the actions chosen by tat each time step and s0,...sNare the states at each
time step. The expectation is taken over states and actions, which are stochastic because the
transition model pand the system tare not necessarily deterministic.
The agent’s thinking systems are based on bounded real-time dynamic programming
(BRTDP; McMahan, Likhachev, and Gordan, 2005), a planning algorithm from the artificial
intelligence literature. BRTDP simulates potential action sequences, and then uses these
simulations to estimate an upper bound and lower bound on how good each action in each
possible state. It starts with a heuristic bound, and then continuously improves the accuracy of its
estimates. Depending on the number of simulations chosen, it can be executed for an arbitrarily
short or long amount of time. Fewer simulations result in faster but less accurate solutions, while
more simulations results in slower but more accurate solutions, making BRTDP particularly
well-suited for studying metareasoning (Lin et al., 2015).
During the thinking stage, the agent chooses the number of action sequences to simulate
(k), and then based on this simulations, uses BRTDP to update its estimate of how good each
action is in each possible state. During the acting stage, the agent takes the action with the highest
upper bound on its value. Thus the agent’s policy is defined entirely by k, the number of action
sequences it simulates. This type of policy corresponds to the Think*Act policy from Lin et al.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 25
We consider environments in which there is a constant cost per action (ca) from all non-goal
states: c(s, a) = ca. The cost of executing a system is linear in the number of simulated action
sequences (k): f(k) = ce·k, where ceis the cost of each mental simulation. We reparameterize
the costs by the ratio of the cost of acting over the cost of thinking, re=ca
ce. Having defined the
agent policy and the quotes, Equation 6 simplifies to
k= arg min
kN0 1 + k
re!E[N|k],(7)
where Nis the number of time steps until the trial ends, either by reaching the goal state or the
time horizon. See Appendix B for a derivation.
Equation 7 defines the optimal system for the agent to use for a particular decision problem,
but we seek to investigate what set of systems is optimal for the agent to be equipped with for a
range of decision problems. We assume that there is a distribution of MDPs the agent may
encounter, and while reis constant within each problem, it varies across different problems.
Therefore, optimally allocating finite computational resources requires metareasoning. We
assume that metareasoning incurs a cost that is linear in the number of systems: cm· |M|, where
cmis the cost required to predict the performance of a single system. Similarly we can
reparametrize this cost using rm=ca/cm, so that the cost of metareasoning becomes |M|/rm.
Assuming that the agent chooses optimally from its set of planning systems, the optimal set
of systems that it should be equipped with is
M= arg min
M⊂N
Ere"min
k∈M∪{0}1 + k
reE[N|k]#+|M|
rm
.(8)
We investigated the size and composition of the optimal set of planning systems for a
simple 20 ×20 grid world where the agent’s goal is to get from the lower left corner to the upper
right corner with as little cost as possible. The horizon was set to 500, and the maximum number
and length of simulated action sequences at any thinking stage were set to 10. BRTDP was
initialized with a constant value function of 0for the lower bound and a constant value function of
106for the upper bound. This means that the agent’s initial policy was to act randomly–which is
highly suboptimal. For each environment, the ratio of the cost of action over the cost of planning
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 26
Figure 5. The optimal number of systems for planning under uncertainty (Simulation 2) as a
function of the standard deviation of reand rmfor E[re] = 100.
(re) was again drawn from a Gamma distribution and shifted by one, that is re1Γ(α, β). The
expected number of steps required to achieve the goal E[N|k]was estimated via simulation (see
Figure 3).
Results
We find that all our results match the 2-alternative forced choice setting extremely closely.
Because the agent rarely reached the goal with zero planning (E[N|k= 0] = 500) one system
provided the largest reduction in expected cost with each additional system providing at most
marginal reductions (Figure 4). The optimal number of systems increased with the variance of re
and decreased with the metareasoning cost ( 1
rm). This resulted in the optimal number of cognitive
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 27
Table 2
The optimal set of cognitive systems (M?) for planning under uncertainty (Simulation 2) as a
function of the number of systems (|M|) and the variability of the environment (Var(re)) with
E[re] = 100.
|M|
Var(re)
103104105
1 9 7 7
2 7, 9 4, 7 2, 7
3 1, 7, 9 4, 7, 9 1, 4, 9
4 1, 2, 7, 9 2, 4, 7, 9 1, 4, 7, 9
systems being two for a wide range of plausible combinations of variability and metareasoning
cost (Figure 5). In addition, when the number of systems was two, the difference between the
amount of planning performed by the two optimal systems increased with the variance of re.3
This resulted in one system that does a high amount of planning but is costly and another system
that plans very little but is computationally inexpensive, matching the characteristics of the two
types of systems postulated by dual-process theories.
Simulation 3: Strategic interaction in a two-player game
Starting in the 1980s, researchers began applying dual-process theories to social cognition
(Chaiken & Trope, 1999; Evans, 2008). One hypothesis for why the heuristic system exists is
because exact logical or probabilistic reasoning is often computationally prohibitive. For instance,
Herbert Simon famously argued that computational limitations place substantial constraints on
3This observation holds until the variance becomes extremely high (107for Table 2), in which case both systems
move towards lower values (Table 2). However, this is not a general problem but merely a quirk of the skewed
distribution we used for re.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 28
human reasoning (Simon, 1972, 1982). Such computational limitations become readily apparent
in problems involving social cognition because the number of future possibilities explodes once
the actions of others must be considered. For example, one of Simon’s classic examples was
chess, where reasoning out the best opening move is completely infeasible because it would
require considering about 10120 possible continuations.
In this section, we show that our findings in decision-making and planning tasks about the
optimal set of cognitive systems also applies to tasks that involve reasoning about decisions made
by others. Specifically, we focus on strategic reasoning in Go, an ancient two-player game.
Two-player games are the simplest and perhaps most widely used paradigm for studying strategic
reasoning about other people’s actions (Camerer, 2011). Although seemingly simple, it is
typically impossible to exhaustively reason about all possibilities in a game, making heuristic
reasoning necessary. This is especially true in Go, which has about 10360 continuations from the
first move (compare this to chess which has “only” 10120 possible continuations).
Methods
We now describe the details of our simulation deriving bounded-optimal architectures for
strategic reasoning in the game of Go.
The agent’s thinking systems are based on a planning algorithm known as Monte Carlo tree
search (MCTS) (Browne et al., 2012). Recently, AlphoGo, a computer system based on MCTS,
became the first to defeat the Go world champion and achieve superhuman performance in the
game of Go (Silver et al., 2016, 2017). Like other planning methods against adversarial
opponents, MCTS works by constructing a game tree to plan future actions. Unlike other
methods, MCTS selectively runs stochastic simulations (also known as rollouts) of different
actions, rather than exhaustively searching through the entire game tree. In doing so, MCTS
focuses on moves and positions whose values appear both promising and uncertain. In this regard,
MCTS is similar to human reasoning (Newell & Simon, 1972).
Furthermore, the number of simulations used by MCTS affect how heuristic/accurate the
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 29
a) b)
Figure 6. Performance as a function of the amount of reasoning in the game of Go (Simulation 3).
As the amount of computation (number of simulations) increases, the likelihood of selecting a
good action increases, thus resulting in larger utility (a) and the game tends to be won in
increasingly fewer moves (b).
method is, making it well-suited for studying metareasoning. When the number of simulations is
small, the algorithm is faster but less accurate. When the number of simulations is high, the
algorithm is slower but more accurate. Thus, similar to the sequential decision making setting
(Simulation 2), we assume that the agent metareasons over systems Mthat differ in how many
simulations (k) to perform.
On each turn, there is a thinking stage and an acting stage. In the thinking stage, the agent
executes a system that performs a number of stochastic simulations (k) of future moves and then
updates its estimate of how good each action is, i.e. how likely it is to lead to a winning state. In
the acting stage, the agent takes the action with the highest estimated value.
The agent attains a utility Ubased on whether it wins or loses the game. The unbounded
agent would simply choose the number of simulations kthat maximizes expected utility:
E[U|k]. However, the bounded agent incurs costs for acting and thinking. We assume that the
cost for acting is constant: ca. The cost for executing a system is linear in the number of
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 30
simulations it performs: k·ce, where ceis the cost of a single simulation. The bounded agent has
to optimize a trade-off between its utility Uand the costs of acting and thinking:
E[U(ca+k·ce)N|k],(9)
where Nis the number of turns until the game ends. For consistency, we can reparameterize this
as re=ca/ce, the ratio between the cost of acting and the cost of thinking, and without loss of
generality, we can let ca= 1. Equation 9 then simplifies into
B(k, re):=E"U 1 + k
re!N
k#.(10)
The optimal system for the agent to choose given a fixed value of reis
k(re) = arg maxkB(k, re). The optimal set of cognitive systems Mout of all possible systems
Tfor strategic interaction is
M= arg max
M⊂T
Emax
kB(k, re)|M|
rm
.(11)
In this case, the expectation is taken over re, as the goal is to find the set of systems that is optimal
across all problems in the environment.
In our simulations, the game is played on a 9×9board. Uis 500 if the agent wins, 250 if
the game ends in a draw, and 0if the agent loses. The opponent also runs MCTS with 5
simulations to decide its move. E[U|k]and E[N|k]are estimated using simulation (see Figure
6). For computational tractability, the possible number of simulations we consider are
T={5,10,...,50}.
Results
As in the previous tasks, the optimal number of systems depends on the variability of the
environment and the difficulty of selecting between multiple systems (Figure 7). As the cost of
metareasoning increases, the optimal number of systems decreases and the bounded-optimal
agent comes to reason less and less. By contrast, the optimal number of systems increases with
the variability of the environment. Furthermore, when the optimal number of systems is two, the
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 31
Table 3
The optimal set of cognitive systems (M?) for strategic reasoning in the game of Go (Simulation
3) depending on the number of systems (|M|) and the variability of the environment (Var(re)) for
E[re] = 10.
|M|
Var(re)
10 102103
1 10 10 10
2 10, 20 10, 20 10, 50
3 n/a* 10, 20, 50 10, 20, 50
4 n/a* 10, 20, 30, 50 10, 20, 30, 50
(*) This number of systems does not provide a noticeable increase in utility over fewer systems.
difference between the amount of reasoning performed by the two systems increases as the
environment becomes more variable (Table 3). In conclusion, the findings presented in this
section suggest that the kind of cognitive architecture that is bounded-optimal for simple
decisions and planning (i.e., two systems with opposite speed-accuracy tradeoffs) is also optimal
for reasoning about more complex problems, such as strategic interaction in games.
Simulation 4: Multi-alternative risky choice
Decision-making under risk is another domain in which dual-process theories abound (e.g.,
Steinberg, 2010; Mukherjee, 2010; Kahneman & Frederick, 2007; Figner, Mackinlay, Wilkening,
& Weber, 2009), and the dual-process perspective was inspired in part by Kahneman and
Tversky’s ground-breaking research program on heuristics and biases (Kahneman, Slovic, &
Tversky, 1982). Consistent with our resource-rational framework, previous research revealed that
people make risky decisions by arbitrating between fast and slow decision strategies in an
adaptive and flexible manner (Payne et al., 1993). When making decisions between the risky
gambles shown in Figure 8 people adapt not only how much they think but also how they think
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 32
Figure 7. The optimal number of systems for strategic reasoning in the game of Go (Simulation
3) as a function of the standard deviation of reand 1
rm.E[re] = 100 in this case.
about what to do. Concretely, people have been shown to use different strategies for different
types of decision problems (Payne et al., 1988). For instance, when some outcomes are much
more probably than others then people seem to rely on fast-and-frugal heuristics (Gigerenzer &
Goldstein, 1996) like Take-The-Best which decides solely based on the most probably outcome
that distinguishes between the alternatives and ignores all other possible outcomes. By contrast,
when all outcomes are equally likely, people seem to integrate the payoffs for multiple outcomes
into an estimate of the expected value of each gamble. Previous research has proposed at least ten
different decision strategies that people might use when choosing between risky prospects (Payne
et al., 1988; Thorngate, 1980; Gigerenzer & Selten, 2002). Yet, it has remained unclear how many
decision strategies a single person would typically consider (Scheibehenne, Rieskamp, &
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 33
Wagenmakers, 2013). Here, we investigate how many decision strategies a boundedly optimal
metareasoning agent should use in a multi-alternative risky-choice environment similar to the
experiments by Payne et al.. Unlike in the previous simulations these strategies differ not only in
how much computation they perform but also in which information they use and how they use it.
Figure 8. Illustration of the Mouselab paradigm used to study multi-alternative risky choice.
Methods
We investigated the size of the optimal subset of the ten decision strategies proposed by
Payne et al. as a function of the metareasoning cost and the variability of the relative cost of
reasoning. These strategies were the lexicographic heuristic (which corresponds to
Take-The-Best), the semi-lexicographic heuristic, the weighted-additive strategy, choosing at
random, the equal-weight heuristic, elimination by aspects, the maximum confirmatory
dimensions heuristic, satisficing, and two combinations of elimination by aspects with the
weighted additive strategy and the maximum confirmatory dimensions heuristic. Concretely, we
determined the optimal number of decision strategies 5×30 environments that differed in the
mean and the standard deviation of the distribution of re. The means were 10,50,100,500, and
1000, and the standard deviations were linearly spaced between 103and 3times the mean.
For each environment, four thousand decision problems were generated at random. Each
problem presented the agent with the choice between five gambles with five possible outcomes.
The payoffs for each outcome-gamble pair were drawn from a uniform distribution on the interval
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 34
[0,1000]. The outcome probabilities differed randomly from problem to problem except that the
second highest probability was always at most 25% of highest probability, the third highest
probability was always at most 25% of the second-highest probability, and so on.
Based on previous work on how people select cognitive strategies (Lieder & Griffiths,
2017), our simulations assume that people generally select the decision-strategy that achieves the
best possible speed-accuracy tradeoff. This strategy can be formally defined as the heuristic s?
with the highest value of computation (VOC; Lieder and Griffiths, 2017). Formally, for each
decision problem d, an agent equipped with strategies Sshould choose the strategy
s?(d, S, re) = max
s∈S VOC(s, d).(12)
Following Lieder and Griffiths (2017) we define a strategy’s VOC as decision quality minus
decision cost. We measure the decision quality by the ratio of the expected utility of the chosen
option over the expected utility of the best option, and we measure decision cost by the
opportunity cost of the time required to execute the strategy. Formally, the VOC of making the
decision dusing the strategy sis
VOC(s, d) = E[u(s(d))|d]
maxaE[u(a)|d]1
re
·ncomputations(s, d),(13)
where s(d)is the alternative that the strategy schooses in the decision d,1
reis the cost per
decision operation, and ncomputations(s, d)is the number of cognitive operations it performs in this
decision process. To determine the number of cognitive operations, we decomposed each strategy
into a sequence of elementary information processing operations (Johnson & Payne, 1985) in the
same way as Lieder and Griffiths (2017) did and counted how many of those operations each
strategy performed on any given decision problem.
We estimated the optimal set of strategies,
S?= max
S
EP(d)VOC(s?(d;S, re), d)1
rm
· |S|,(14)
by approximating the expected value in Equation 14 by averaging the VOC over 4000 randomly
generated decision problems. The resulting noisy estimates were smoothed with a Gaussian
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 35
kernel with standard deviation 20. Then the optimal set of cognitive strategies was determined
based on the smoothed VOC estimates for each combination of parameters. Finally, the number
of strategies in the optimal sets was smoothed with a Gaussian kernel with standard deviation 10,
and the smoothed values were rounded.
Results
As shown in Figure 9, we found that the optimal number of strategies increased with the
variability of the environment and decreased with the metareasoning cost. Like in the previous
simulations, the optimal number of decision systems increased from 1 for high metareasoning
cost and low variability to 2 for moderate metareasoning cost and variability, and increased
further with decreasing metareasoning cost and increasing variability. There was again a sizeable
range of plausible values in which the optimal number of decision systems was 2. For extreme
combinations of very low time cost and very high variability the optimal number of systems
increased to up to 5. Although Figure 9 only shows the results for E[re] = 100, the results for
E[re] = 10,50,500, and 1000 were qualitatively the same.
In this section, we applied our analysis to a more realistic setting than in the previous
sections. It used psychologically plausible decision strategies that were proposed to explain
human decision-making rather than algorithms. These strategies differed not only in how much
reasoning they perform but also in how they reason about the problem. For this setting, where the
environment comprised different kinds of problems favoring different strategies, one might expect
that the optimal number of systems would be much larger than in the previous simulations. While
we did find that having 35systems became optimal for a larger range of metareasoning costs and
variabilities, it is remarkable that having two systems was still bounded-optimal for a sizeable
range of reasonable parameters. This finding suggests that our results might generalize to the
much more complex problems people have to solve and people’s much more sophisticated
cognitive mechanisms.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 36
Figure 9. The optimal number of strategies for multi-alternative risky choice (Simulation 4) as a
function of the standard deviation of reand rmfor E[re] = 100.
General Discussion
We found that across four different tasks the optimal number and diversity of cognitive
systems increases with the variability of the environment but decreases with the cost of predicting
each system’s performance. Each additional system tends to provide at most marginal
improvements; so the optimal solutions tend to favor small numbers of cognitive systems, with
two systems being optimal across a wide range of plausible values for metareasoning cost and
variability. Furthermore, when the optimal number of cognitive systems was two, then these two
systems tended to lie on two extremes in terms of time and accuracy. One of them was much faster
but more error-prone whereas the second one was slower but more accurate. This might be why
the human mind too appears to contain two opposite subsystems within itself – one that is fast but
fallible and one that is slow but accurate. In other words, this mental architecture might have
evolved to enable people to quickly adapt how they think and decide to the demands of different
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 37
situations. Our analysis thereby provides a normative justification for dual-process theories.
The emerging connection between normative modeling and dual-process theories is
remarkable because these approaches correspond to opposite poles in the debate about human
rationality (Stanovich, 2011). In this debate, some researchers interpreted the existence of a fast,
error-prone cognitive system whose heuristics violate the rules of logic, probability theory, and
expected utility theory as a sign of human irrationality (Ariely, 2009; Marcus, 2009). By contrast,
our analysis suggests that having a fast but fallible cognitive system in addition to a slow but
accurate system may be the best possible solution. This implies that the variability, fallibility, and
inconsistency of human judgment that result from people’s switching between System 1 and
System 2 should not be interpreted as evidence for human irrationality, because it might reflect
the rational use of limited cognitive resources.
Limitations
One limitation of our analysis is that the cognitive systems we studied are simple
algorithms that abstract away most of the complexity and sophistication of the human mind. A
second limitation is that all of our tasks were drawn from the domains of decision-making and
reasoning. However, our conclusion only depends on the plausible assumption that the cost of
deciding which cognitive system to use increases with the number of systems. As long as this is
the case, the optimal number of cognitive systems should still depend on the tradeoff between
metareasoning cost and cognitive flexibility studied above, even though its exact value may be
different. Thus, our key finding that the optimal number of systems increases with the variability
of the environment and decreases with the metareasoning cost is likely to generalize to other tasks
and the much more complex architecture of the human mind.
Third, our analysis assumed that the mind is divided into discrete cognitive systems to make
the adaptive control over cognition tractable. While this makes selecting cognitive operations
much more efficient, we cannot prove that it is bounded-optimal to approximate rational
metareasoning in this way. Research in artificial intelligence suggests that there might be other
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 38
ways to make metareasoning tractable. One alternative strategy is the meta-greedy approximation
(Russell & Wefald, 1991a; Hay et al., 2012) which selects computations under the assumption
that the agent will act immediately after executing the first computation. According to the
directed cognition model (Gabaix & Laibson, 2005) this mechanism also governs the sequence of
cognitive operations people employ to make economic decisions. This model predicts that people
will always stop thinking when their decision cannot be improved by a single cognitive operation
even when significant improvements could be achieved by a series of two or more cognitive
operations. This makes us doubt that the meta-greedy heuristic would be sufficient to account for
people’s ability to efficiently solve complex problems, such as puzzles, where progress is often
non-linear. This might be why when Gabaix, Laibson, Moloche, and Weinberg (2006) applied
their model to multi-attribute decisions, they let it choose between macro-operators rather than
individual computations. Interestingly, those macro-operators are similar to the cognitive systems
studied here in that they perform different amounts of computation. Thus, the directed cognition
model does not appear to eliminate the need for sub-systems but merely proposes a mechanism
for how the mind might select and switch back-and-forth between them. Consistent with our
analysis, the time and effort required by this mechanism increases linearly with the number of
cognitive systems. While research in artificial intelligence as identified a few additional
approximations to rational metareasoning, those are generally to specific computational processes
and problems (Russell & Wefald, 1989; Lin et al., 2015; Vul et al., 2014) and would be applicable
to only a small subset of people’s cognitive abilities.
Relation to previous work
The work presented here continues the research programs of bounded rationality (Simon,
1956, 1982), rational analysis (Anderson, 1990) and resource-rational analysis (Griffiths et al.,
2015) in seeking to understand how the mind is adapted to the structure of the environment and its
limited computational resources. While previous work has applied the idea of bounded optimality
to derive optimal cognitive strategies for an assumed cognitive architecture (Lewis et al., 2014;
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 39
Griffiths et al., 2015; Lieder, Griffiths, & Hsu, 2018; Lieder, Griffiths, Huys, & Goodman, 2018a)
and the arbitration between assumed cognitive systems (Keramati et al., 2011), the work
presented here derived the cognitive architecture itself. By suggesting that the human mind’s
cognitive architecture might be bounded-optimal our analysis complements and completes
previous arguments suggesting that people make rational use of the cognitive architecture they are
equipped with (Lewis et al., 2014; Griffiths et al., 2015; Lieder, Griffiths, & Hsu, 2018; Lieder,
Griffiths, Huys, & Goodman, 2018a; Tsetsos et al., 2016; Howes et al., 2016). Taken together
these arguments suggest that people might be resource-rational after all.
Conclusion and Future Directions
A conclusive answer to the question whether it is boundedly optimal for humans to have
two types of cognitive systems will require more rigorous estimates of the variability of decision
problems that people experience in their daily lives and precise measurements of how long it
takes to predict the performance of a cognitive system. Regardless thereof, our analysis suggests
that the incoherence in human reasoning and decision-making are qualitatively consistent with the
rational use of a bounded-optimal set of cognitive systems rather than a sign of irrationality.
Perhaps more importantly, the methodology we developed in this paper makes it possible to
extend resource-rational analysis from cognitive strategies to cognitive architectures. This new
line of research offers a way to elucidate how the architecture of the mind is shaped by the
structure of the environment and the fundamental limits of the human brain.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 40
Appendix A
2AFC
In this appendix, we derive the formula for the utility of making a decision based on kmental
simulations used in our analysis of two alternative forced choice (i.e., Equation 1). Since there are
two possible choices, there are two ways in which the agent can score a reward of 1, that is
Eθ[U|k] = Zθ[P(a1is correct)·P(Agent picks a1|k)
+P(a0is correct)·P(Agent picks a0|k)]Pθ().(1)
If aiis the correct answer, then iBern(θ). The probability that the agent chooses aiis equal to
the probability that it sampled aimore than k/2times. The probability that the agent sampled a0
more than k/2times is ΘCDF(k/2, θ, k)where ΘCDF is the binomial cumulative density function.
Correspondingly, the probability that the agent sampled a1more than k/2times is
1ΘCDF(k/2, θ, k). Thus, we can write Equation 1as
Eθ[U|k] = Zθ[θ(1 ΘCDF(k/2, θ, k))
+ (1 θ) (ΘCDF (k/2, θ, k))] Pθ().
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 41
Appendix B
Sequential Decision-Making
Here, we provide a derivation of how to simplify the expression for the optimal number of
planning systems in Equation 6, that is
t= arg min
t
E"N
X
i=0
c(si, ai) + f(t)
t#,(6)
to the expression in Equation 7, that is
k= arg min
kN0 1 + k
re!E[N|k].(7)
Our reasoning behind this derivation is as follows: Since the cost of each thinking system is
linear in the number of simulations, i.e. ce·k, we can replace f(t)with ce·kin the expectation in
Equation 6. Since the cognitive systems are distinguished by the number of simulations they do,
we can condition on the number of simulations kinstead. Therefore, the expectation in Equation
6 becomes
E"N
X
i=0
c(si, ai) + ce·k
k#.
The cost of acting from non-goal states is constant, i.e. c(si, ai) = ca. Therefore, the expectation
simplifies to 6 becomes
E"N
X
i=0
ca+ce·k
k#=E[N(ca+ce·k)|k].
We can reparameterize using re=ca/ceby substituting cewith ca/re:
ENca+ca
re
·kk=caE" 1 + k
re!N
k#.
We now arrive at Equation 6 by picking the cognitive system (number of simulations) that
minimizes the above quantity.
k= arg min
k
caE" 1 + k
re!N
k#= arg min
k
E" 1 + k
re!N
k#.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 42
References
Anderson, J. R. (1990). The adaptive character of thought. Hillsdale, NJ: Psychology Press.
Ariely, D. (2009). Predictably irrational. New York: Harper Collins.
Atwood, M. E., & Polson, P. G. (1976). A process model for water jug problems. Cognitive
Psychology,8(2), 191–216.
Austerweil, J. L., & Griffiths, T. L. (2011). Seeking confirmation is rational for deterministic
hypotheses. Cognitive Science,35(3), 499–526.
Balleine, B. W., & O’Doherty, J. P. (2010). Human and rodent homologies in action control:
corticostriatal determinants of goal-directed and habitual action.
Neuropsychopharmacology,35(1), 48–69.
Bhui, R., & Gershman, S. J. (2017). Decision by sampling implements efficient coding of
psychoeconomic functions. bioRxiv, 220277.
Boureau, Y.-L., Sokol-Hessner, P., & Daw, N. D. (2015). Deciding how to decide: Self-control
and meta-decision making. Trends in cognitive sciences,19(11), 700–710.
Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., . ..
Colton, S. (2012). A survey of monte carlo tree search methods. IEEE Transactions on
Computational Intelligence and AI in games,4(1), 1–43.
Camerer, C. F. (2011). Behavioral game theory: Experiments in strategic interaction. Princeton,
NJ: Princeton University Press.
Chaiken, S., & Trope, Y. (1999). Dual-process theories in social psychology. Guilford Press.
Chater, N., & Oaksford, M. (1999). Ten years of the rational analysis of cognition. Trends in
cognitive sciences,3(2), 57–65.
Crockett, M. J. (2013). Models of morality. Trends in cognitive sciences,17(8), 363–366.
Cushman, F. (2013). Action, outcome, and value a dual-system framework for morality.
Personality and social psychology review,17(3), 273–292.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and
dorsolateral striatal systems for behavioral control. Nature neuroscience,8(12),
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 43
1704–1711.
Diamond, A. (2013). Executive functions. Annual review of psychology,64, 135–168.
Dickinson, A. (1985). Actions and habits: the development of behavioural autonomy.
Philosophical Transactions of the Royal Society B,308(1135), 67–78.
Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron,80(2), 312 - 325.
Evans, J. S. B. T. (2003). In two minds: dual-process accounts of reasoning. Trends in cognitive
sciences,7(10), 454–459.
Evans, J. S. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition.
Annual Review of Psychology,59, 255–278.
Evans, J. S. B. T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition.
Perspectives on Psychological Science,8(3), 223-241.
Figner, B., Mackinlay, R. J., Wilkening, F., & Weber, E. U. (2009). Affective and deliberative
processes in risky choice: age differences in risk taking in the Columbia card task. Journal
of Experimental Psychology: Learning, Memory, and Cognition,35(3), 709.
Gabaix, X., & Laibson, D. (2005). Bounded rationality and directed cognition (Tech. Rep.).
Cambridge, MA: Harvard University.
Gabaix, X., Laibson, D., Moloche, G., & Weinberg, S. (2006). Costly information acquisition:
Experimental analysis of a boundedly rational model. The American Economic Review,
96(4), 1043–1068.
Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A
converging paradigm for intelligence in brains, minds, and machines. Science,349(6245),
273–278.
Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond “heuristics and
biases”. European review of social psychology,2(1), 83–115.
Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: models of
bounded rationality. Psychological review,103(4), 650.
Gigerenzer, G., & Selten, R. (2002). Bounded rationality: The adaptive toolbox. Cambridge,
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 44
MA: MIT press.
Gilovich, T., Griffin, D., & Kahneman, D. (2002). Heuristics and biases: The psychology of
intuitive judgment. Cambridge university press.
Greene, J. D. (2015). Beyond point-and-shoot morality: Why cognitive (neuro) science matters
for ethics. The Law & Ethics of Human Rights,9(2), 141–172.
Griffiths, T. L., Lieder, F., & Goodman, N. D. (2015). Rational use of cognitive resources: Levels
of analysis between the computational and the algorithmic. Topics in cognitive science,
7(2), 217–229.
Griffiths, T. L., & Tenenbaum, J. B. (2001). Randomness and coincidences: Reconciling intuition
and probability theory. In Proceedings of the 23rd annual conference of the cognitive
science society (pp. 370–375).
Griffiths, T. L., & Tenenbaum, J. B. (2006). Optimal predictions in everyday cognition.
Psychological science,17(9), 767–773.
Gunzelmann, G., & Anderson, J. R. (2003). Problem solving: Increased planning with practice.
Cognitive systems research,4(1), 57–76.
Hahn, U., & Oaksford, M. (2007). The rationality of informal argumentation: a Bayesian
approach to reasoning fallacies. Psychological review,114(3), 704.
Hahn, U., & Warren, P. A. (2009). Perceptions of randomness: why three heads are better than
four. Psychological review,116(2), 454.
Hay, N., Russell, S. J., Tolpin, D., & Shimony, S. (2012). Selecting Computations: Theory and
Applications. In N. de Freitas & K. Murphy (Eds.), Proceedings of the 28th conference on
uncertainty in artificial intelligence. Corvallis: AUAI Press.
Horvitz, E. J., Cooper, G. F., & Heckerman, D. E. (1989). Reflection and action under scarce
resources: Theoretical principles and empirical study. In Proceedings of the eleventh
international joint conference on artificial intelligence (pp. 1121–1127). San Mateo, CA:
Morgan Kaufmann.
Horvitz, E. J., & Rutledge, G. (1991). Time-dependent utility and action under uncertainty. In
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 45
Proceedings of the seventh conference on uncertainty in artificial intelligence (pp.
151–158).
Howes, A., Warren, P. A., Farmer, G., El-Deredy, W., & Lewis, R. L. (2016). Why contextual
preference reversals maximize expected value. Psychological review,123(4), 368.
Icard, T. (2014). Toward boundedly rational analysis. In Proceedings of the 36th annual
conference of the cognitive science society (pp. 637–642).
Johnson, E. J., & Payne, J. W. (1985). Effort and accuracy in choice. Management science,31(4),
395–414.
Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Strauss and Giroux.
Kahneman, D., & Frederick, S. (2002). Representativeness revisited: Attribute substitution in
intuitive judgment. In T. Gilovich, D. Griffin, & D. Kahneman (Eds.), Heuristics and
biases: The psychology of intuitive judgment. Cambridge, UK: Cambridge University
Press.
Kahneman, D., & Frederick, S. (2005). A model of heuristic judgment. In K. J. Holyoak &
R. G. Morrison (Eds.), The cambridge handbook of thinking and reasoning (pp. 267–293).
Cambridge, UK: Cambridge University Press.
Kahneman, D., & Frederick, S. (2007). Frames and brains: Elicitation and control of response
tendencies. Trends in cognitive sciences,11(2), 45–46.
Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and
biases. Cambridge, UK: Cambridge University Press.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica,47(2), 263-291.
Kahneman, D., & Tversky, A. (1996). On the reality of cognitive illusions. Psychological
Review,103, 582 -59.
Keramati, M., Dezfouli, A., & Piray, P. (2011). Speed/accuracy trade-off between the habitual
and the goal-directed processes. PLoS Computational Biology,7(5), e1002055.
Khaw, M. W., Li, Z., & Woodford, M. (2017). Risk aversion as a perceptual bias (Tech. Rep.).
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 46
Cambridge, MA: National Bureau of Economic Research.
Kool, W., Cushman, F. A., & Gershman, S. J. (2016). When does model-based control pay off?
PLoS computational biology,12(8), e1005090.
Kool, W., Gershman, S. J., & Cushman, F. A. (2017). Cost-benefit arbitration between multiple
reinforcement-learning systems. Psychological science,28(9), 1321–1333.
Kotovsky, K., Hayes, J. R., & Simon, H. A. (1985). Why are some problems hard? evidence from
tower of hanoi. Cognitive psychology,17(2), 248–294.
Lewis, R. L., Howes, A., & Singh, S. (2014). Computational rationality: Linking mechanism and
behavior through bounded utility maximization. Topics in cognitive science,6(2), 279–311.
Lieder, F., & Griffiths, T. (2017). Strategy selection as rational metareasoning. Psychological
Review,124, 762 -794.
Lieder, F., & Griffiths, T. (in revision). Resource-rational analysis: Understanding human
cognition as the optimal use of limited computational resources.
Lieder, F., Griffiths, T. L., & Hsu, M. (2018). Overrepresentation of extreme events in decision
making reflects rational use of cognitive resources. Psychological review,125(1), 1.
Lieder, F., Griffiths, T. L., Huys, Q. J. M., & Goodman, N. D. (2018a). The anchoring bias
reflects rational use of cognitive resources. Psychonomic bulletin & review,25(1),
322–349.
Lieder, F., Griffiths, T. L., Huys, Q. J. M., & Goodman, N. D. (2018b). Empirical evidence for
resource-rational anchoring and adjustment. Psychonomic Bulletin & Review,25(2),
775–784.
Lieder, F., Krueger, P. M., & Griffiths, T. L. (2017). An automatic method for discovering
rational heuristics for risky choice. In Proceedings of the 39th annual meeting of the
cognitive science society. austin: Cognitive science soc.
Lieder, F., Plunkett, D., Hamrick, J. B., Russell, S. J., Hay, N. J., & Griffiths, T. L. (2014).
Algorithm selection by rational metareasoning as a model of human strategy selection. In
Advances in neural information processing systems (Vol. 27).
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 47
Lin, C. H., Kolobov, A., Kamar, E., & Horvitz, E. J. (2015). Metareasoning for planning under
uncertainty. In Proceedings of the 24th international conference on artificial intelligence
(pp. 1601–1609). AAAI Press.
Marcus, G. (2009). Kluge: The haphazard evolution of the human mind. Boston: Houghton
Mifflin Harcourt.
McMahan, H. B., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic
programming: RTDP with monotone upper bounds and performance guarantees. In
Proceedings of the 22nd international conference on machine learning (pp. 569–576).
Milli, S., Lieder, F., & Griffiths, T. L. (2017). When does bounded-optimal metareasoning favor
few cognitive systems? In Proceedings of the thirty-first AAAI conference on artificial
intelligence (pp. 4422–4428).
Mukherjee, K. (2010). A dual system model of preferences under risk. Psychological review,
117(1), 243.
Newell, A., & Simon, H. A. (1972). Human problem solving (Vol. 104) (No. 9). Englewood
Cliffs, NJ: Prentice-Hall.
Norman, D. A., & Shallice, T. (1986). Attention to action. In R. J. Davidson, G. E. Schwartz, &
D. Shapiro (Eds.), Consciousness and self-regulation (pp. 1–18). New York: Plenum Press.
Oaksford, M., & Chater, N. (1994). A rational analysis of the selection task as optimal data
selection. Psychological Review,101(4), 608.
Oaksford, M., & Chater, N. (2007). Bayesian rationality: The probabilistic approach to human
reasoning. Oxford, UK: Oxford University Press.
Parpart, P., Jones, M., & Love, B. (2017). Heuristics as Bayesian inference under extreme priors.
Cognitive Psychology.
Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive strategy selection in decision
making. Journal of Experimental Psychology: Learning, Memory, and Cognition,14(3),
534.
Payne, J. W., Bettman, J. R., & Johnson, E. J. (1993). The adaptive decision maker. Cambridge,
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 48
UK: Cambridge University Press.
Russell, S. J., & Subramanian, D. (1995). Provably bounded-optimal agents. Journal of Artificial
Intelligence Research,2, 575–609.
Russell, S. J., & Wefald, E. (1989). On optimal game-tree search using rational meta-reasoning.
In Proceedings of the 11th international joint conference on artificial intelligence-volume 1
(pp. 334–340).
Russell, S. J., & Wefald, E. (1991a). Do the right thing: studies in limited rationality.
Cambridge, MA: MIT press.
Russell, S. J., & Wefald, E. (1991b). Principles of metareasoning. Artificial intelligence,49(1-3),
361–395.
Scheibehenne, B., Rieskamp, J., & Wagenmakers, E.-J. (2013). Testing adaptive toolbox models:
A Bayesian hierarchical approach. Psychological Review,120(1), 39.
Shanks, D. R., Tunney, R. J., & McCarthy, J. D. (2002). A re-examination of probability
matching and rational choice. Journal of Behavioral Decision Making,15(3), 233–250.
Shenhav, A., Botvinick, M. M., & Cohen, J. D. (2013). The expected value of control: an
integrative theory of anterior cingulate cortex function. Neuron,79(2), 217–240.
Shenhav, A., Musslick, S., Lieder, F., Kool, W., Griffiths, T. L., Cohen, J. D., & Botvinick, M. M.
(2017). Toward a rational and mechanistic account of mental effort. Annual Review of
Neuroscience(40), 99-124.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., . . . Hassabis,
D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature,
529, 484-489.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., . .. Hassabis, D.
(2017). Mastering the game of Go without human knowledge. Nature,550(7676),
354–359.
Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological review,
63(2), 129-138.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 49
Simon, H. A. (1972). Theories of bounded rationality. Decision and organization,1(1), 161–176.
Simon, H. A. (1982). Models of bounded rationality: Empirically grounded economic reason.
Cambridge, MA: MIT press.
Sims, C. A. (2003). Implications of rational inattention. Journal of monetary Economics,50(3),
665–690.
Stanovich, K. E. (2009). Decision making and rationality in the modern world. Oxford, UK:
Oxford University Press.
Stanovich, K. E. (2011). Rationality and the reflective mind. Oxford, UK: Oxford University
Press.
Steinberg, L. (2010). A dual systems model of adolescent risk-taking. Developmental
psychobiology,52(3), 216–224.
Sutherland, S. (2013). Irrationality: The enemy within. London, UK: Pinter & Martin Ltd.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Tenenbaum, J. B., & Griffiths, T. L. (2001). The rational basis of representativeness. In
Proceedings of the 23rd annual conference of the cognitive science society (pp. 1036–41).
Thorngate, W. (1980). Efficient decision heuristics. Behavioral Science,25(3), 219–225.
Tsetsos, K., Moran, R., Moreland, J., Chater, N., Usher, M., & Summerfield, C. (2016).
Economic irrationality is optimal during noisy decision making. Proceedings of the
National Academy of Sciences,113(11), 3102–3107.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
science,185(4157), 1124–1131.
van der Meer, M., Kurth-Nelson, Z., & Redish, A. D. (2012). Information processing in
decision-making systems. The Neuroscientist,18(4), 342–359.
Von Neumann, J., & Morgenstern, O. (1944). The theory of games and economic behavior.
Princeton, NJ: Princeton university press.
Vul, E., Goodman, N., Griffiths, T. L., & Tenenbaum, J. B. (2014). One and done? Optimal
decisions from very few samples. Cognitive science,38(4), 599–637.
RATIONAL REINTERPRETATION OF DUAL-PROCESS THEORIES 50
Wason, P. C. (1968). Reasoning about a rule. The Quarterly journal of experimental psychology,
20(3), 273–281.
... While the influence of dual process theory 16 in AI and the social sciences cannot be not understated (Milli et al. 2019), 17 such black and white "dumbbell" thinking tends to constrain thought and theory if taken too literally (Minsky 1988). There is no reason that we need to stop with just two categories of human cognition; as if all deliberative, reflective, and meta-cognitive (thinking about thinking) cognition is of the same sort and should be classed as such. ...
Article
Full-text available
As artificial intelligence (AI) thrives and propagates through modern life, a key question to ask is how to include humans in future AI? Despite human involvement at every stage of the production process from conception and design through to implementation, modern AI is still often criticized for its “black box” characteristics. Sometimes, we do not know what really goes on inside or how and why certain conclusions are met. Future AI will face many dilemmas and ethical issues unforeseen by their creators beyond those commonly discussed (e.g., trolley problems and variants of it) and to which solutions cannot be hard-coded and are often still up for debate. Given the sensitivity of such social and ethical dilemmas and the implications of these for human society at large, when and if our AI make the “wrong” choice we need to understand how they got there in order to make corrections and prevent recurrences. This is particularly true in situations where human livelihoods are at stake (e.g., health, well-being, finance, law) or when major individual or household decisions are taken. Doing so requires opening up the “black box” of AI; especially as they act, interact, and adapt in a human world and how they interact with other AI in this world. In this article, we argue for the application of cognitive architectures for ethical AI. In particular, for their potential contributions to AI transparency, explainability, and accountability. We need to understand how our AI get to the solutions they do, and we should seek to do this on a deeper level in terms of the machine-equivalents of motivations, attitudes, values, and so on. The path to future AI is long and winding but it could arrive faster than we think. In order to harness the positive potential outcomes of AI for humans and society (and avoid the negatives), we need to understand AI more fully in the first place and we expect this will simultaneously contribute towards greater understanding of their human counterparts also.
... While the influence of dual process theory 15 in AI and the social sciences cannot be not understated (Milli et al., 2019) 16 , such black and white "dumbbell" thinking tends to constrain thought and theory if taken too literally (Minsky, 1988). There is no reason that we need to stop with just two categories of human cognition; as if all deliberative, reflective, and metacognitive (thinking about thinking) cognition is of the same sort and should be classed as such. ...
Preprint
Full-text available
As artificial intelligence (AI) thrives and propagates through modern life, a key question to ask is how to include humans in future AI? Despite human-involvement at every stage of the production process from conception and design through to implementation, modern AI is still often criticized for its "black box" characteristics. Sometimes, we do not know what really goes on inside or how and why certain conclusions are met. Future AI will face many dilemmas and ethical issues unforeseen by their creators beyond those commonly discussed (e.g., trolley problems and variants of it) and to which solutions cannot be hard-coded and are often still up for debate. Given the sensitivity of such social and ethical dilemmas and the implications of these for human society at large, when and if our AI make the "wrong" choice we need to understand how they got there in order to make corrections and prevent recurrences. This is particularly true in situations where human livelihoods are at stake (e.g., health, well-being, finance, law) or when major individual or household decisions are taken. Doing so requires opening up the "black box" of AI; especially as they act, interact, and adapt in a human world and how they interact with other AI in this world. In this article, we argue for the application of cognitive architectures for ethical AI. In particular, for their potential contributions to AI transparency, explainability, and accountability. We need to understand how our AI get to the solutions they do, and we should seek to do this on a deeper level in terms of the machine-equivalents of motivations, attitudes, values, and so on. The path to future AI is long and winding but it could arrive faster than we think. In order to harness the positive potential outcomes of AI for humans and society (and avoid the negatives), we need to understand AI more fully in the first place and we expect this will simultaneously contribute towards greater understanding of their human counterparts also.
Chapter
Due to fast development of information technologies, shortened product life-cycles, increasing speed of changing market demands and their processes, digitalization of company processes, the decision makers deal with strategic (and operational) uncertainties more intensively and use failure prevention and safety approaches as well as chaotic trial and error philosophy in their daily tasks. These hyper-complex decision situations request advanced approaches to handle the bounded instability and dichotomy of the parallel-processing of cognitive rational analysis of situations and the intuitive evaluation of dynamical element of development direction.
Chapter
Full-text available
One keystone in the evolution of sustainable transport solutions is the considering of the green transport corridor concept with its transnational and multimodal character and the involvement of the heterogeneous group of stakeholders that require multi-level approaches in the phases of implementation, management and governance. The authors were involved in the creation of a hub development strategy of Paldiski Port in Western Estonia aiming to link the port via a new railway bypass to Muuga Port around Tallinn representing the end of the “Rail Baltica” Green Transport Corridor. The purpose of the hub development of Paldiski Port was twofold. First target was the preparation of an enlargement of the logistics service portfolio in order to improve the competitiveness of Paldiski towards Finnish ports. Second aim was the integration of Paldiski Port as a future hub in the “Rail Baltica” Green Transport Growth Corridor.
Preprint
Lieder and Griffiths rightly urge that computational cognitive models be constrained by resource usage, but they should go further. The brain’s primary function is to regulate resource usage. As a consequence, resource usage should not simply select among algorithmic models of “aspects of cognition.” Rather, “aspects of cognition” should be understood as existing in the service of resource management.
Article
Full-text available
Although augmenting rational models with cognitive constraints is long overdue, the emotional system – our innately evaluative “affective” constraints – is missing from the model. Factoring in the informational nature of emotional perception, its explicit self-regulatory functional logic, and the predictable pitfalls of its hardwired behavioral responses (including a maladaptive form of “identity management”) can offer dramatic enhancements.
Article
Full-text available
Resource rationality may explain suboptimal patterns of reasoning; but what of “anti-Bayesian” effects where the mind updates in a direction opposite the one it should? We present two phenomena – belief polarization and the size-weight illusion – that are not obviously explained by performance- or resource-based constraints, nor by the authors’ brief discussion of reference repulsion. Can resource rationality accommodate them?
Article
Full-text available
Modeling human cognition is challenging because there are infinitely many mechanisms that can generate any given observation. Some researchers address this by constraining the hypothesis space through assumptions about what the human mind can and cannot do, while others constrain it through principles of rationality and adaptation. Recent work in economics, psychology, neuroscience, and linguistics has begun to integrate both approaches by augmenting rational models with cognitive constraints, incorporating rational principles into cognitive architectures, and applying optimality principles to understanding neural representations. We identify the rational use of limited resources as a unifying principle underlying these diverse approaches, expressing it in a new cognitive modeling paradigm called resource-rational analysis . The integration of rational principles with realistic cognitive constraints makes resource-rational analysis a promising framework for reverse-engineering cognitive mechanisms and representations. It has already shed new light on the debate about human rationality and can be leveraged to revisit classic questions of cognitive psychology within a principled computational framework. We demonstrate that resource-rational models can reconcile the mind's most impressive cognitive skills with people's ostensive irrationality. Resource-rational analysis also provides a new way to connect psychological theory more deeply with artificial intelligence, economics, neuroscience, and linguistics.
Article
Full-text available
Simple heuristics are often regarded as tractable decision strategies because they ignore a great deal of information in the input data. One puzzle is why heuristics can outperform full-information models, such as linear regression, which make full use of the available information. These "less-is-more" effects, in which a relatively simpler model outperforms a more complex model, are prevalent throughout cognitive science, and are frequently argued to demonstrate an inherent advantage of simplifying computation or ignoring information. In contrast, we show at the computational level (where algorithmic restrictions are set aside) that it is never optimal to discard information. Through a formal Bayesian analysis, we prove that popular heuristics, such as tallying and take-the-best, are formally equivalent to Bayesian inference under the limit of infinitely strong priors. Varying the strength of the prior yields a continuum of Bayesian models with the heuristics at one end and ordinary regression at the other. Critically, intermediate models perform better across all our simulations, suggesting that down-weighting information with the appropriate prior is preferable to entirely ignoring it. Rather than because of their simplicity, our analyses suggest heuristics perform well because they implement strong priors that approximate the actual structure of the environment. We end by considering how new heuristics could be derived by infinitely strengthening the priors of other Bayesian models. These formal results have implications for work in psychology, machine learning and economics.
Article
Full-text available
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Article
Full-text available
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.
Article
Full-text available
Many contemporary accounts of human reasoning assume that the mind is equipped with multiple heuristics that could be deployed to perform a given task. This raises the question of how the mind determines when to use which heuristic. To answer this question, we developed a rational model of strategy selection, based on the theory of rational metareasoning developed in the artificial intelligence literature. According to our model people learn to efficiently choose the strategy with the best cost-benefit tradeoff by learning a predictive model of each strategy’s performance. We found that our model can provide a unifying explanation for classic findings from domains ranging from decision-making to arithmetic by capturing the variability of people’s strategy choices, their dependence on task and context, and their development over time. Systematic model comparisons supported our theory, and four new experiments confirmed its distinctive predictions. Our findings suggest that people gradually learn to make increasingly more rational use of fallible heuristics. This perspective reconciles the two poles of the debate about human rationality by integrating heuristics and biases with learning and rationality.
Article
The theory of decision by sampling (DbS) proposes that an attribute’s subjective value is its rank within a sample of attribute values retrieved from memory. This can account for instances of context dependence beyond the reach of classic theories that assume stable preferences. In this paper, we provide a normative justification for DbS that is based on the principle of efficient coding. The efficient representation of information in a noiseless communication channel is characterized by a uniform response distribution, which the rank transformation implements. However, cognitive limitations imply that decision samples are finite, introducing noise. Efficient coding in a noisy channel requires smoothing of the signal, a principle that leads to a new generalization of DbS. This generalization is closely connected to range-frequency theory, and helps descriptively account for a wider set of behavioral observations, such as how context sensitivity varies with the number of available response categories.
Book
Game theory, the formalized study of strategy, began in the 1940s by asking how emotionless geniuses should play games, but ignored until recently how average people with emotions and limited foresight actually play games. This book marks the first substantial and authoritative effort to close this gap. Colin Camerer, one of the field's leading figures, uses psychological principles and hundreds of experiments to develop mathematical theories of reciprocity, limited strategizing, and learning, which help predict what real people and companies do in strategic situations. Unifying a wealth of information from ongoing studies in strategic behavior, he takes the experimental science of behavioral economics a major step forward. He does so in lucid, friendly prose. Behavioral game theory has three ingredients that come clearly into focus in this book: mathematical theories of how moral obligation and vengeance affect the way people bargain and trust each other; a theory of how limits in the brain constrain the number of steps of "I think he thinks . . ." reasoning people naturally do; and a theory of how people learn from experience to make better strategic decisions. Strategic interactions that can be explained by behavioral game theory include bargaining, games of bluffing as in sports and poker, strikes, how conventions help coordinate a joint activity, price competition and patent races, and building up reputations for trustworthiness or ruthlessness in business or life.