Content uploaded by Aleksandr G. Alekseev
Author content
All content in this area was uploaded by Aleksandr G. Alekseev on Jul 09, 2019
Content may be subject to copyright.
Content uploaded by Aleksandr G. Alekseev
Author content
All content in this area was uploaded by Aleksandr G. Alekseev on Dec 25, 2018
Content may be subject to copyright.
Using Response Times to Measure Ability on a Cognitive Task∗
Aleksandr Alekseev†
March 27, 2019
Abstract
I show how using response times as a proxy for effort coupled with an explicit process-based
model can address a long-standing issue of how to separate the effect of cognitive ability on
performance from the effect of motivation. My method is based on a dynamic stochastic model
of optimal effort choice in which ability and motivation are the structural parameters. I show
how to estimate these parameters from the data on outcomes and response times in a cognitive
task. In a laboratory experiment, I find that performance on a Digit-Symbol test is a noisy and
biased measure of cognitive ability. Ranking subjects by their performance leads to an incorrect
ranking by their ability in a substantial number of cases. These results suggest that interpreting
performance on a cognitive task as ability may be misleading.
Keywords: cognitive ability, test scores, response times, drift-diffusion model, choice-process
data
JEL codes: C24, C41, C91, D91, J24
∗I thank Jim Cox, Glenn Harrison, Susan Laury, Tom Mroz, Vjollca Sadiraj, and Todd Swarthout for their
valuable comments and suggestions. I thank conference participants at the Economic Science Association meetings,
the Southern Economic Association meetings, and the Western Economic Association meetings, as well as seminar
participants at Georgia State University, University of California San Diego, the University of Chicago, and Chapman
University for their feedback. This work has been supported by the Andrew Young School Dissertation Fellowship.
†Economic Science Institute, Chapman University, One University Drive, Orange, CA, 92866, e-mail:
alekseev@chapman.edu, phone: +1 (714) 744-7083, ORCID: 0000-0001-6542-1920.
1 Introduction
Correct measurement of cognitive ability is essential since ability is used as an explanatory variable
in a vast array of contexts. Economists have been using cognitive ability to explain differences in
earnings (Murnane et al.,1995;Heckman et al.,2006,2013), risk and time preferences (Dohmen
et al.,2010), the quality of decision-making (Agarwal and Mazumder,2013), strategic reasoning
(Gill and Prowse,2016), as well as differences in various life outcomes, such as teenage pregnancy,
marital status, smoking, and engaging in criminal activities (Duckworth et al.,2011). This literature
traditionally uses performance on a cognitive test as a measure of cognitive ability. A fundamental
flaw in this approach is that performance never reflects cognitive ability by itself. Performance also
reflects character skills, such as motivation (Borghans et al.,2008;Duckworth et al.,2011;Segal,
2012).1The traditional approach thus confounds actual ability with the combination of ability and
motivation, which may result in wrong conclusions about the effect of ability. Using performance
as a proxy for ability could be justified if subjects’ heterogeneity in motivation is small relative
to their heterogeneity in ability. However, the existing literature provides no way to empirically
evaluate this assumption.
I propose a new approach to measure cognitive ability that overcomes the issues with the
traditional approach. My method is based on a dynamic stochastic model of optimal effort choice
in which ability and motivation are the structural parameters. I show how these parameters can
be separately identified from the data on outcomes and response times in a cognitive task. The
proposed method is based on explicit modeling of the decision-making process and is inspired by the
literature on drift-diffusion models (Ratcliff,1978;Krajbich et al.,2012;Woodford,2014;Clithero,
2018;Webb,2019). These models have been shown to perform well in jointly predicting outcomes
and response times, as well as to match the actual processes in the brain.
I use response times as a proxy for effort, following Wilcox (1993) and Ofek et al. (2007). An
agent’s effective effort is modeled as a Brownian motion with drift in which the drift rate represents
the agent’s ability. Higher ability leads to faster accumulation of effective effort. The accumulated
1For example, consider two students, Adam and Bob, who are taking a cognitive test. Adam has high cognitive
ability but is not interested in the outcome of the test. Bob, on the other hand, has lower cognitive ability but is
highly motivated to get the right answers. As a result, Bob might end up having a higher score on the test, which
according to the traditional approach would imply that Bob has higher ability than Adam, while in reality, their
ranking by ability is the opposite.
1
effective effort at a given time determines the probability to answer a question correctly. Correct
answer yields utility that represents an agent’s motivation. Effort is costly, and the more time an
agent spends on a task, the higher will be the accumulated cost of effort. The agent’s problem is
to choose the optimal moment to stop the effective effort accumulation process. The solution to
the agent’s problem takes the form of a threshold rule in terms of the accumulated effective effort.
I derive a closed-form solution for the optimal threshold and show how it is related to ability and
motivation. The parameters of the model can be estimated using the maximum likelihood method
using the data on outcomes and response times from a series of trials of a cognitive task. The
proposed estimation strategy can be viewed as a version of a threshold regression model used in
survival analysis (Lee and Whitmore,2006).
I conduct a laboratory experiment to illustrate the proposed approach and compare it to the
traditional approach. In the experiment subjects take a Digit-Symbol test (DST) in which they
have to match symbols to digits. DST is designed to capture a subject’s processing speed, which
underlies more complex cognitive functions. DST is used in the economics literature (Segal,2012;
Dohmen et al.,2010)) and in intelligence scales such as WAIS (Weiss et al.,2010). Subjects are
free to choose how much time to spend on a task and are not extrinsically motivated for good
performance. I estimate ability and motivation for each subject individually and use the structural
model to perform a counterfactual simulation in which the only source of variation in performance
is variation in ability. I find that performance is a noisy and biased measure of ability. Variation in
ability can explain only 0.58 of the variation in observed performance. Subjects with relatively low
ability have lower performance than they would have if performance were an unbiased measure of
ability, while subjects with relatively high ability have even higher performance than they would
have. Ranking subjects by performance leads to an incorrect ranking by ability 24% of the time.
These results suggest that more care should be given when interpreting performance as cognitive
ability since such an interpretation may be misleading. The present paper, however, should be
viewed as a first step towards uncoupling ability from motivation on performance. More work is
needed to understand how well performance approximates ability in other cognitive and real-effort
tasks used in the literature. The main goal of the present paper is to provide the tools for this work
and to illustrate the usefulness of choice-process data and process-based modeling in developing
such tools.
2
2 Theoretical Model
Consider an agent working on a trial of a cognitive task. An outcome of the trial can be either a
success (the answer given by the agent is correct) or a failure (the answer given by the agent is
incorrect). The agent can exert effort, approximated by response time t, to increase the probability
of success. Following the literature on the drift-diffusion model, I assume that the agent accumulates
effective effort Etaccording to the Brownian motion with drift:
dEt=αdt +σdWt, E0= 0,(1)
in which the drift rate α > 0 represents the agent’s ability and the diffusion parameter σ > 0
represents her (inverse of) consistency.2Ability in this model is equivalent to the efficiency, or
intensity, of converting effort (time spent on a trial) into performance (probability of success). This
efficiency can vary across agents for a given task but is assumed to be fixed over the trials of a
task for each agent. Given a fixed amount of time, an agent with higher ability will have higher
performance on the task than an agent with lower ability. Having higher ability in the model thus
corresponds well to an intuitive notion of being good, or able, at doing something.
The agent stops the accumulation of effective effort and gives an answer to a trial when the
effective effort process (1) hits a threshold. Unlike in a typical drift-diffusion model, I assume that
there is a single threshold that is chosen optimally by the agent.3The agent uses a discount rate
ρ>0 and is assumed to experience a unit cost of effort, in utility terms, per unit of effort spent
on a trial. The utility of success is µ>0, and the utility of failure is normalized to zero. Utility
µrepresents the agent’s motivation for succeeding on a task, which is allowed to be agent- and/or
task-specific.4The probability of success p(·) depends on the accumulated effective effort. I assume
that p(·) is strictly increasing and strictly concave. At time τ, the agent’s discounted expected
utility is
EZτ
0−e−ρtdt +µp(Eτ)e−ρτ ,(2)
2The starting point of the effective effort process can be initialized at a value other than 0 to allow for the case
of multiple-answer questions. The starting value E0then would be chosen so that the probability of success at E0
equals 1/[number of answer options].
3Ratcliff and Van Dongen (2011) also study a single-threshold diffusion model, however, they do not consider
utility maximization.
4Strictly speaking, motivation in this model is measured in the units of the cost of effort.
3
which is the sum of the (negative) accumulated discounted cost of effort and the expected discounted
benefit from a success on a trial.
The agent chooses when to stop the accumulation of effective effort in order to maximize the
utility function (2). The solution to the agent’s problem is a stopping rule in terms of the accu-
mulated effective effort. The agent continues the accumulation of effective effort Etuntil it reaches
a threshold of E∗. The agent stops as soon as the threshold is hit.5Since the effective effort
accumulation process is stochastic, the optimal response time required to hit the threshold E∗is
a random variable.6Under the assumption of a Brownian motion with drift, the optimal response
time τ∗has an inverse Gaussian distribution with the pdf
f(τ∗) = E∗
p2πσ2(τ∗)3exp −(E∗−ατ∗)2
2σ2τ∗.(3)
The threshold E∗must satisfy the following optimality condition:7
ρ
βp0(E∗)−ρp(E∗) = 1
µ,where β≡−α+pα2+ 2ρσ2
σ2.(4)
For the empirical application of the method, I consider a special case of the model in which the
discount rate ρtends to zero.8In this case, the optimality condition for E∗becomes
p0(E∗) = 1
αµ.(5)
To solve the problem analytically, I further assume that p(E)=1−e−E. The optimal threshold
for the agent’s problem is then simply
E∗(α, µ) = ln α+ ln µ. (6)
5If E0=E∗, the agent should give an answer immediately.
6Importantly, the probability of success on a trial will not depend on the realized value of the response time.
Figure C.5 in Online Appendix Cempirically validates this prediction.
7See Online Appendix Bfor the derivations.
8In the experiment, decisions are made on timescales of under a minute. Discounting is unlikely to have a
meaningful effect on such short timescales.
4
It follows from equation (6) that the agent’s optimal threshold, and hence her performance,
is increasing in both ability and motivation.9The effect of ability on average effort, ¯τ∗=E∗
α,
cannot be unambiguously signed. For agents with high E∗(>1), an increase in ability will lead
to lower average effort, and vice versa. This result is a natural consequence of the concavity of
p:10 at high levels of E∗the marginal increase in E∗due to higher ability will be lower than the
marginal increase in α. Hence the average effort needed to reach a higher effective effort threshold
will be lower. The comparative statics results for the optimal threshold and average effort imply,
in particular, that one cannot use a single measure, either performance or effort, to identify ability.
However, combining the two pieces of data does allow one to separate the effect of ability from the
effect of motivation, as next section shows.
3 Estimation Strategy
Suppose that we observe a sequence of Nindependent and identical trials of a cognitive task
performed by an individual. Each observation is a pair (xi, ti), i = 1, . . . , N , where xi∈ {0,1}is
an outcome of a trial iand ti>0 is a response time in that trial. The likelihood of an observation
i, conditional on the parameters of the model θ≡(α, µ, σ), is
l(xi, ti|θ) = p(E∗(θ))xi(1 −p(E∗(θ)))1−xif(ti|θ).(7)
The first part is simply the Bernoulli likelihood. The second part, f(ti|θ), is the likelihood that the
stochastic process (1) hits the threshold E∗at time ti, given by (3). The ability of each individual,
as well as the two other parameters of the model, can then be estimated using the maximum
likelihood method:
ˆ
θ= arg max
θ
ln L(θ|x,t)≡
N
X
i=1
ln l(xi, ti|θ).(8)
There are three moments in the data and two functional relations, p(E) and E∗(α, µ), that
exactly identify the three parameters of the model. Let Xbe a Bernoulli random variable encoding
an outcome of a trial, and Tbe an inverse Gaussian random variable encoding a response time.
9It is straightforward to show using (5) that these comparative statics results hold for any increasing and concave
function p.
10 Diminishing marginal product of effective effort appears to be a reasonable assumption for a cognitive production
function p.
5
Then E[X] = p(E∗), which yields an estimate of the optimal threshold: c
E∗=p−1(X). The second
moment is E[T] = E∗
α, which yields an estimate of ability: bα=p−1(X)
T. Equation (6) then yields an
estimate of motivation: bµ=T
p−1(X)exp p−1(X). Finally, the third moment, E1
T=1
E[T]+σ2
(E∗)2,
yields an estimate of σ2:c
σ2= (p−1(X))21
T−1
T.11
4 Experiment
To illustrate the method, I conducted an experiment at the Experimental Economics Center lab
at Georgia State University (GSU) in June 2017 and March-April 2018. The experiment consists
of 11 sessions with 192 participants in total. The subjects in the experiment are undergraduate
students at GSU. The average earnings in the experiment are $36.35.
The main part of the experiment is a cognitive task, which is a version of a Digit-Symbol test
(DST).12 In a DST, subjects have to find correct correspondences between digits and symbols. In
the present implementation, subjects are given a key with six digit-symbol pairs and a list of 14
symbols to fill six numbered boxes.13 The DST consists of 100 trials in which the key and the
list of available symbols change in every trial.14 Subjects are free to choose how much time to
spend on each trial.15 Unconstrained (or endogenous, in the language of Spiliopoulos and Ortmann
(2018)) response time is important in this context since response time is assumed to be the only
margin of effort in the experiment. In order to minimize the interdependency between the rounds
(e.g., via learning) as much as possible, I do not provide subjects with any feedback between the
rounds. Subjects learn their score only at the end of the experiment. The incentives in the DST
are flat: each subject receives $20 for completion regardless of performance. This incentive scheme
11 The functional form assumption p(E)=1−e−Einvolves an implicit normalization. One could introduce an
additional parameter γin the probability of success function, p(E) = 1−e−γ E , which in the present case is normalized
to 1. Then one would have to normalize σ2to 1 and estimate γ.
12 The experiment also included a risk elicitation task and a survey, results of which are not reported here.
13 See Appendix Afor the subject instructions and screenshots.
14 In a traditional implementation of a DST, the key does not change across trials. Performance on a traditional
DST then captures subjects’ working memory in addition to processing speed. In the present context, however,
processing speed is the only quantity of interest. See Benndorf et al. (2018) for a similar argument.
15 In most implementations, the time that is allowed to spend on a cognitive task is constrained. The performance
measure that is used in the present experiment, i.e., performance with no time constraint, is, therefore, not strictly
identical to the performance measures typically used. The underlying message, however, would remain the same even
if the time were constrained: in order to separate the effect of ability from the effect of motivation, one needs to
supplement a measure of performance with a measure of effort. In the case of a time constraint, however, the relevant
measure of effort would be difficult to observe.
6
allows one to elicit a subject’s intrinsic motivation since good performance is not extrinsically
incentivized.16
The benefit of a DST is that it measures fluid intelligence, i.e., the ability to solve novel problems
that do not rely on any cultural background or accumulated knowledge for solution (Cattell,1971).
Performance on a DST is associated with processing speed. Processing speed is positively associated
with other IQ measures since the processing speed is the basis for more complex cognitive functions
(Vernon,1983). In economics, researchers have used a DST to study the relationship between
cognitive ability and risk and time preferences (Dohmen et al.,2010) and the role of motivation in
performance (Segal,2012).
5 Results
Subjects perform surprisingly well on the DST given that they were not extrinsically rewarded for
good performance. The median score is 92 and the interquartile range (IQR) for the score is only
9. Such high scores suggest that the subjects had non-trivial levels of intrinsic motivation in the
task. The median subject took 20.6 seconds on average to complete a single trial. The IQR for the
mean response time (MRT) is 6.9.17 To get a rough idea of processing speed, one can look at the
ratio of a score to the total time spent on a task. According to this measure, the median subject
gave 2.6 correct answers per minute.
Figure 1shows the distribution of the individual-level estimates18 of the ability parameter
α.19 The distribution is bell-shaped and concentrated around the median but asymmetric. The
sample distribution has a higher mass of subjects with a just-below-median ability and a longer
and fatter right tail relative to a reference normal distribution. The graph shows considerable
variation in ability among subjects. For example, a subject at the 75th percentile is 1.5 times
better at converting their exerted effort into accumulated effective effort than a subject at the 25th
16 An important modification of this baseline design would be to introduce variable conditional rewards, which
would allow one to study how parameter estimates change with the reward level.
17 The distribution of mean response times is long-tailed and well-approximated by an inverse Gaussian distribution.
See Figure C.2 in Online Appendix Cfor the distribution of scores and response times in the sample.
18 See Figure C.3 in Online Appendix Cfor a quantile probability plot of a model’s fit.
19 The subjects with a perfect score of 100 (6 subjects or 3% of the sample) were assigned a score of 99 by randomly
selecting a trial and assigning it as incorrectly solved. The model cannot be estimated in the case of a perfect score.
This is a finite sample issue: adding more trials would likely eliminate the instances of perfect scores. Excluding the
subjects with perfect scores does not alter the results significantly.
7
Figure 1: Distributions of Raw Ability
0.00
0.05
0.10
0.15
0.20
2.32 5.89 7.28 9.10 14.70
Ability
Density
Panel A.
Note: The figure shows the distribution of the individual-level estimates of ability in the sample. The smooth
solid line is the kernel density estimate, the vertical bars are the histogram, the dotted line is the reference
density of a normal distribution with the parameters matching the sample moments, and the vertical dashed
line is the sample median. The breaks on the horizontal axis correspond to the quintiles of the distribution.
percentile. A subject with the highest ability is 2 times better than a median subject and 6.3 times
better than a subject with the lowest ability.
The advantage of having an explicit structural model is that it allows one to conduct a coun-
terfactual exercise. This exercise asks the following question: How would the distribution of per-
formance in the sample look like if it only varied based on ability? Answering this question is
important because it empirically evaluates how well performance approximates true ability. If the
two distributions are similar, performance is a good proxy for ability. If the two distributions are
different, a correction method, such as the one proposed here, is required. I use formula (6) to
compute the optimal effort threshold E∗
iand the probability of success p∗
iimplied by this thresh-
old for each subject while holding motivation fixed at the median level. This procedure yields
the distribution of counterfactual performance that would arise in the sample due to variation in
ability alone. Note that this counterfactual performance is stripped down from all the variation in
motivation and thus represents an unconfounded measure of ability.
Before comparing the observed and counterfactual performance, it is worth to recall a simple
model of performance as a noisy measure of the true underlying ability:
P1=P0+, (9)
8
Figure 2: Observed and Counterfactual Performance
0
5
10
15
0.6 0.7 0.8 0.9 1.0
Performance
Density
Counterfactual
Observed
Panel A.
0.7
0.8
0.9
1.0
0.7 0.8 0.9 1.0
Counterfactual Performance
Observed Performance
Panel B.
Note: Panel A shows the kernel density estimates of the distributions of observed and counterfactual per-
formance. Panel B shows the scatterplot of the observed and counterfactual performance. The dotted line
is the 45-degree line. The dashed line is the linear fit.
where P1is the observed performance, P0is the counterfactual performance defined as above that
reflects the true underlying ability,20 and is a mean-zero noise term. Noise in this model is caused
by the variation in motivation. For the observed performance to be a good proxy for ability, the
variance of the noise σ2
should be small relative to the variance of the observed performance σ2
P1,
and the noise term should be orthogonal to ability, cov(P0, ) = 0. The observed performance in
our case fails to satisfy either property.
Figure 2(Panel A) plots the kernel density estimates of the distributions of the observed and
counterfactual performance. It is immediately clear that the observed performance contains a
substantial degree of noise. The ratio of the variance of the noise to the variance of the observed
performance, σ2
/σ2
P1, is 0.55. Such a high noise component results in only a moderate association
between the observed and counterfactual performance: variation in the counterfactual performance
can explain only 0.58 of the variation in the observed performance in a simple linear regression.
If one cares only about whether the ranking by performance is similar to the ranking by ability,
the picture is similarly unsatisfactory (Kendall’s τ= 0.54, p-value <0.001). In particular, ranking
subjects by their performance leads to an incorrect ranking by their ability in 24% of cases.21
20 For simplicity, I assume that there is no measurement error in the counterfactual performance, which of course
will not be true in practice. Allowing for this additional measurement error would only increase the overall noisiness
of the observed performance.
21An alternative way to interpret this number is that the probability that two subjects taken at random will have
incorrect ability ranking, as implied by performance, is 24%.
9
However, the observed performance is not just a highly noisy measure of ability. It is also a
biased measure. The issue lies in the fact that ability, as measured by the counterfactual perfor-
mance, is positively associated with the noise term (Kendall’s τ= 0.28, p-value <0.001). Panel B
on Figure 2illustrates the bias that results from this association by presenting a scatterplot of the
observed performance against the counterfactual performance. If the observed performance were
an unbiased measure of true ability, the dots on the graph would lie along the 45-degree line (dotted
line on the graph). This is clearly not the case. Subjects with relatively low ability (<0.92) score
less than they should, while subjects with relatively high ability (>0.92) score even higher than
they should. This bias is represented by a linear fit (dashed line on the graph) that has a slope
greater than one and a negative intercept.
The high degree of noise coupled with the systematic bias in the observed performance is likely
to lead to invalid inferences when ability, proxied by performance, is used as a control or a causal
regressor. For instance, suppose that a researcher is interested in the effect of ability on some
outcome of interest, and the outcome of interest might depend on both ability and motivation. The
researcher, however, only has access to performance as a proxy for ability. Then it is possible that
a researcher finds a positive effect of performance on the outcome of interest and concludes that
ability has a positive effect when, in reality, ability has no effect: this would be the case when there
is a strong positive effect of motivation on the outcome of interest.22 This issue is an instance of
an omitted variable bias.
6 Conclusion
The economics literature uses cognitive ability as an explanatory variable in a vast array of economic
contexts. The traditional approach of using performance on a cognitive test as a measure of ability
confounds actual ability with the combination of ability and motivation, which may result in wrong
conclusions about the effect of ability. In this paper, I propose a new approach to measure cognitive
ability that overcomes this issue. The proposed approach is based on using response times data, in
addition to performance data, as a proxy for effort together with an explicit process-based model
inspired by the drift-diffusion model. I model ability and motivation as parameters of the structural
22 In general, issues of this kind will arise when the effects of ability and motivation on the outcome of interest
differ. Table C.2 in Online Appendix Cmakes this point clear by presenting the results of a simulation exercise.
10
model and show how to estimate these parameters from the data on outcomes and response times
in a cognitive task. In a laboratory experiment, I find that performance is a noisy and biased
measure of cognitive ability. Ranking subjects by their performance leads to an incorrect ranking
by their ability in a substantial number of cases.
These results suggest that more care should be given when interpreting performance as cognitive
ability, as is usually done, since such an interpretation may be misleading. The present paper
proposes a method to deal with this issue that calls for taking advantage of the response time
data and spelling out an explicit model of effort choice that structurally separates ability from
motivation. The proposed method can be broadly applied in various settings, including existing
data, since collecting response time data is costless, and software applications collect these data in
the background. The estimates of ability and motivation can be easily computed since they are
simple functions of the sample moments. Future work should investigate how well performance
approximates ability in other cognitive and real-effort tasks used in the literature.
11
References
Agarwal S, Mazumder B (2013). “Cognitive Abilities and Household Financial Decision Making.”
American Economic Journal: Applied Economics,5(1), 193–207.
Benndorf V, Rau HA, S¨olch C (2018). “Minimizing Learning Behavior in Repeated Real-Effort
Tasks.” Working Paper 343, Center for European, Governance and Economic Development
Research, Georg-August-Universit¨at G¨ottingen.
Borghans L, Duckworth AL, Heckman JJ, Ter Weel B (2008). “The Economics and Psychology of
Personality Traits.” Journal of Human Resources,43(4), 972–1059.
Cattell RB (1971). Abilities: Their Structure, Growth, and Action. Boston: Houghton Mifflin.
Clithero JA (2018). “Improving Out-Of-Sample Predictions Using Response Times and a Model of
the Decision Process.” Journal of Economic Behavior & Organization,148, 344 – 375.
Dohmen T, Falk A, Huffman D, Sunde U (2010). “Are Risk Aversion and Impatience Related to
Cognitive Ability?” American Economic Review,100(3), 1238–1260.
Duckworth AL, Quinn PD, Lynam DR, Loeber R, Stouthamer-Loeber M (2011). “Role of Test
Motivation in Intelligence Testing.” Proceedings of the National Academy of Sciences,108(19),
7716–7720.
Gill D, Prowse V (2016). “Cognitive Ability, Character Skills, and Learning to Play Equilibrium:
A Level-k Analysis.” Journal of Political Economy,124(6), 1619–1676.
Heckman J, Pinto R, Savelyev P (2013). “Understanding the Mechanisms Through Which an
Influential Early Childhood Program Boosted Adult Outcomes.” American Economic Review,
103(6), 2052–2086.
Heckman JJ, Stixrud J, Urzua S (2006). “The Effects of Cognitive and Noncognitive Abilities on
Labor Market Outcomes and Social Behavior.” Journal of Labor Economics,24(3), 411–482.
Krajbich I, Lu D, Camerer C, Rangel A (2012). “The Attentional Drift-Diffusion Model Extends
to Simple Purchasing Decisions.” Frontiers in psychology,3, 193.
Lee MLT, Whitmore G (2006). “Threshold Regression for Survival Analysis: Modeling Event Times
by a Stochastic Process Reaching a Boundary.” Statistical Science, pp. 501–513.
Murnane R, Willett JB, Levy F (1995). “The Growing Importance of Cognitive Skills in Wage
Determination.” The Review of Economics and Statistics,77(2), 251–66.
Ofek E, Yildiz M, Haruvy E (2007). “The Impact of Prior Decisions on Subsequent Valuations in
a Costly Contemplation Model.” Management Science,53(8), 1217–1233.
Ratcliff R (1978). “A Theory of Memory Retrieval.” Psychological Review,85(2), 59.
Ratcliff R, McKoon G (2007). “The Diffusion Decision Model: Theory and Data for Two-Choice
Decision Tasks.” Neural Computation,20(4), 873–922.
Ratcliff R, Van Dongen HPA (2011). “Diffusion Model for One-Choice Reaction-Time Tasks and
the Cognitive Effects of Sleep Deprivation.” Proceedings of the National Academy of Sciences,
108(27), 11285–11290.
12
Segal C (2012). “Working When No One Is Watching: Motivation, Test Scores, and Economic
Success.” Management Science,58(8), 1438–1457.
Spiliopoulos L, Ortmann A (2018). “The BCD of Response Time Analysis in Experimental Eco-
nomics.” Experimental Economics,21(2), 383–433.
Vernon PA (1983). “Speed of Information Processing and General Intelligence.” Intel ligence,7(1),
53–70.
Webb R (2019). “The (Neural) Dynamics of Stochastic Choice.” Management Science,65(1),
230–255.
Weiss LG, Saklofske DH, Coalson DL, Raiford SE (2010). WAIS-IV Clinical Use and Interpretation:
Scientist-Practitioner Perspectives. Academic Press.
Wilcox NT (1993). “Lottery Choice: Incentives, Complexity and Decision Time.” The Economic
Journal,103(421), 1397–1417.
Woodford M (2014). “Stochastic Choice: An Optimizing Neuroeconomic Model.” American Eco-
nomic Review,104(5), 495–500.
13
Appendices
A Experimental Instructions (Online)
This task is based on finding correct correspondences between numbers and symbols. In each round,
you will see 6 pairs of number–symbol combinations (the key) arranged in a table at the upper part
of the screen, see Figure A.1a for an example. Below the key, there will be 6 empty numbered boxes.
You will use the key to fill in the boxes with the symbols located in a column to the left of the
boxes. You will do this by dragging the symbols into the boxes. If a symbol from the column is in
the key, drag it to the corresponding numbered box. Some of the symbols will not be listed in the
key. In this case, you should not use them in any of the boxes. Some of the numbers will not have
corresponding symbols. In this case, you should leave those boxes empty. Each box, therefore, can
contain either one or no symbols. Figure A.1b shows an example of a correctly solved round.
After filling all the boxes as you see fit, click “Submit”, and you will proceed to the next round. You
can proceed with each round at your own pace, there is no time limit. We ask that you complete
all 100 rounds of the task. We will show you your score at the end of the task. You will receive
$20 for completing this task.
You will have 3 practice rounds before the actual task begins. This will give you a chance to
familiarize yourself with the interface. During the practice, you will receive feedback if you make a
mistake.
(a) Decision screen (b) Correct solution
Figure A.1: Digit-Symbol Task
A.1
B Math Appendix (Online)
The discounted value function h(E) of the problem (2) must satisfy the following Hamiltonian-
Jacobi-Bellman (HJB) equation:
0 = −ρh −k+αh0+σ2
2h00,(B.1)
where kis the cost of a unit of effort, assumed to be equal to 1. The general solution to the HJB
equation (B.1) is
h(E) = Aeβ1E+Beβ2E−k
ρ,(B.2)
where β1,2are the roots of the characteristic equation
σ2
2β2+αβ −ρ= 0.(B.3)
The two roots are
β1,2=−α±pα2+ 2ρσ2
σ2.(B.4)
It is worth noting that the term with the negative root β1is explosive and thus needs to be
eliminated, hence A= 0. The HJB equation then becomes
h(E) = Beβ2E−k
ρ.(B.5)
To determine the optimal threshold E∗, two conditions are used: the value-matching condition
h(E∗) = µp(E∗) and the smooth-pasting condition h0(E∗) = µp0(E∗). From the smooth-pasting
condition it follows that
Beβ E∗=µp0(E∗)
β,(B.6)
where β≡β2. Plugging it into the value-matching condition yields
µp0(E∗)
β−k
ρ=µp(E∗),(B.7)
B.2
which after re-arranging the terms becomes
ρ
βp0(E∗)−ρp(E∗) = k
µ.(B.8)
The limiting result in the case of ρ→0 follows from (B.8) after noting that
lim
ρ→0
ρ
β= lim
ρ→0
ρσ2
−α+pα2+ 2ρσ2
= lim
ρ→0
σ2
2σ2
2√α2+2ρσ2
(l’Hopital’s rule)
= lim
ρ→0pα2+ 2ρσ2
=α.
Equation (B.8) then becomes
p0(E∗) = k
αµ.(B.9)
Assuming that p(E) = 1 −e−Eand noting that p0(E) = e−E, one obtains from (B.9)
e−E∗=k
αµ (B.10)
or
E∗= ln α+ ln µ
k.(B.11)
Consider the average response time ¯τ∗=E∗/α = ln(αµ)/α. The marginal effect of ability on
¯τ∗is given by
τ∗
α=α1
αµ µ−ln αµ
α2
=1−ln(αµ)
α2.
This expression is positive if the optimal threshold is sufficiently low or when αµ < e. If the optimal
threshold is high, or αµ >e, the effect of ability is negative.
B.3
C Additional Analysis (Online)
C.1 Monte Carlo Simulations
First, I consider the validity of the estimation procedure using a simulation exercise. Simulations
are conducted for the five different parameter vectors listed in Table C.1. Parameter values are
drawn from a uniform distribution on [1,10]. For each parameter vector, the data (100 observations)
on outcomes and response times is simulated using the theoretical model 1000 times. The resulting
distributions of the parameter estimates are presented on Figure C.1. The vertical lines indicate
the true values of the parameters. As expected, the distributions of parameter estimates are well-
centered around the true values.
Table C.1: True Parameter Values and Mean Estimates
α µ σ
Panel A. True Values
1 3.39 4.35 6.16
2 2.66 7.32 6.16
3 2.51 8.27 4.46
4 6.27 1.08 3.64
5 2.80 7.17 9.25
Panel B. Mean Estimates
1 3.65 4.87 6.31
2 2.88 8.53 6.34
3 2.67 9.68 4.60
4 6.49 1.12 3.69
5 3.20 8.40 9.52
Notes: Panel A shows the five different true parameter
vectors drawn from a uniform distribution. Panel B
shows the corresponding mean estimates of parameters
from the simulated data.
Second, I consider the consequences of using performance as a proxy for ability. I assume that
the true data generating process for some outcome of interest yis
yi=β0+β1αi+β2µi+i,(C.12)
where iindexes a subject, αiis ability of a subject i,µiis motivation of a subject i, and iis an
error term. I further assume that a researcher estimates a model in which only performance piis
C.4
Figure C.1: Monte Carlo Simulations
alpha
mu
sigma
1
2
3
4
5
0.0 2.5 5.0 7.5 10.0 12.5 0 10 20 30 40 50 4 8 12 16
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
Estimate
Relative Frequency
Note: The figure shows the histograms of the distributions of parameter estimates (ˆα, ˆµ, ˆσ) for five different
values of the true vector of parameters. The vertical lines correspond to the true value of parameters.
C.5
observed, but not ability and motivation:
yi=γ0+γ1pi+ηi.(C.13)
I then study how the true values of β1and β2affect the estimates of γ1. In the simulation, the
values of αiare drawn from a truncated normal distribution with the mean 7 and the standard
deviation 1 and the lower bound of 1. The values of the logarithm of µiare drawn from a truncated
normal distribution with the mean 1, the standard deviation 1, bounded between 0 and 3. These
distributional assumption are made to roughly match the observed distributions of ability and
motivation in the experiment. Performance as a function of ability and motivation is then computed
using the model as pi= 1−(αiµi)−1. The noise term iis drawn from a normal distribution with the
mean 0 and the standard deviation 3. The generated data consists of 1000 observations. Table C.2
shows the true values of β1and β2and the corresponding estimates of γand its standard error. The
table makes it clear that issues arise whenever sgn(β1β2)6= 1: the sign of the estimated coefficient
on performance does not coincide with the sign of the ability coefficient in the true model, which
would lead to wrong conclusions about the effect of ability on the outcome.
Table C.2: True Parameter Values and Estimates
β1β2ˆγ1ˆγ1se
0 1 72.52 3.30
0−1−71.72 3.43
1 1 78.81 3.29
1−1−65.43 3.69
1 0 6.69 2.79
−1 1 66.23 3.55
−1−1−78.01 3.39
−1 0 −5.89 2.76
Notes: The table reports the coefficients on ability
(β1) and motivation (β2) in the true model, and the
corresponding estimates of the coefficient on perfor-
mance (ˆγ1) and its standard error from the estimated
model.
C.6
C.2 Summary Statistics of the DST
Figure C.2 (Panel A) shows the distribution of the raw scores from the DST. The subjects perform
very well on the DST with 75% of the subjects scoring 87 and above. Figure C.2 (Panel B) shows
the distribution of the mean response times, averaged across all rounds for each subject. The
distribution is tightly concentrated around the median of 20.6 seconds and has a relatively fat right
tail. The actual distribution (solid line) matches closely the reference inverse Gaussian distribution
(dotted line) with the parameters matching the sample moments. In fact, one cannot reject the null
hypothesis that the sample of mean response times comes from the inverse Gaussian distribution
(Kolmogorov-Smirnov test p−value = 0.393).
Figure C.2: Distributions of Scores and Mean Response Times on DST
0.00
0.02
0.04
0.06
58 87 92 96 100
Score
Density
Panel A.
0.000
0.025
0.050
0.075
0.100
12.7 17.620.6 24.5 47.1
Mean Response Time (sec)
Density
Panel B.
Note: Panel A shows the distribution of the scores on the DST. Panel B shows the distribution of the mean
response times on the DST. The smooth solid line is the kernel density estimate, the vertical bars are the
histogram, and the vertical dashed line is the sample median. The breaks on the horizontal axis correspond
to the quintiles of the distribution. On Panel B, the dotted line is the reference density of an inverse Gaussian
distribution with the parameters matching the sample moments.
C.7
C.3 Additional Estimates
Figure C.3 shows the quantile probability plot adopted from Ratcliff and McKoon (2007).23 The
data are pooled across all subjects. The graph shows the proportion of correct and incorrect
response (horizontal axis) against the quantiles of the distribution of response times (vertical axis).
The quantiles of response times are 0.1,0.3,0.5,0.7, and 0.9. The circles represent the predicted
values from the estimated model, and the crosses represent the actual values. As is clear from the
picture, the model does a good job at jointly predicting outcomes and response times in the pooled
data.
Figure C.3: Quantile Probability Graph for Pooled Data
Error Responses Correct Responses
17.6
19.7
21.3
23.2
25.8
0.0 0.2 0.4 0.6 0.8 1.0
Response Proportion
RT quantile (sec)
Actual
Predicted
Note: The figure shows the quantile probability plot from the pooled data (averaged across all subjects).
Points on the right (left) correspond to success (failure) rates. Circles represent the predicted values from
the estimated model, crosses represent the observed data.
A useful alternative way of looking at the ability differences across subjects is to convert the
ability estimates into performance. To translate the estimates of ability into performance, I compute
the probability of success at the average accumulated effective effort in tmseconds, where tm(≈
5.71 seconds) is calibrated such that it is the time for a person with median ability to reach a 0.5
23Since only one treatment was used and there was no variation in difficulty, it is not possible to draw the complete
lines as in Ratcliff and McKoon (2007).
C.8
Figure C.4: Distributions of Raw and Transformed Ability
0
1
2
3
4
0.20 0.43 0.50 0.58 0.75
Performance in tmseconds
Density
Panel B.
Note: The figure shows the distribution of the probability of success in tmseconds. The smooth solid line is
the kernel density estimate, the vertical bars are the histogram, the dotted line is the reference density of a
normal distribution with the parameters matching the sample moments, and the vertical dashed line is the
sample median. The breaks on the horizontal axis correspond to the quintiles of the distribution.
probability of success.24 Due to variation in ability, subjects will have different levels of accumulated
effective effort in tmseconds, which will then translate into different probabilities of success. Figure
C.4 shows the distribution of the resulting performance. This distribution is more symmetric than
the distribution of raw ability estimates. In fact, one cannot reject the null hypothesis of the
distribution of performance in tmseconds being normal (Shapiro-Wilk test p-value = 0.574). A
subject at the 75th percentile would have a 1.4 times higher performance in tmseconds than a
subject at the 25th percentile. A subject with the highest ability would have a 1.5 times higher
performance than a median subject and a 3.8 times higher performance than a subject with the
lowest ability.
On Figure C.5, I address the point about whether a success on a trial of the DST is significantly
correlated with the response time in that trial. I estimate a logistic regression of the outcome of
a trial on the response time, for each subject individually. I then present the estimated regression
coefficients on the response time graphically by ordering them from lowest to highest. The graph
shows the point estimates and the 95% confidence intervals around them. As can be seen from
the graph, in the overwhelming majority of the cases, the null hypothesis about no significant
24 The resulting performance is counterfactual in the sense that it is generated using the model and ability estimates
from a hypothetical scenario. This counterfactual performance is of course different from the observed performance
on the test, which is a conventional measure of ability. The benefit of this transformation is that it converts ability
into familiar performance terms. Both the counterfactual performance and raw ability estimates can be viewed as
the same quantity (ability) expressed in different units.
C.9
effect of the response time on the success cannot be rejected. The effect is significant only for 13
subject, which represents 7% of the sample, and even among these subjects, there is no systematic
relationship between response times and outcomes.
Figure C.5: Response Times and Success
0.00
0.25
0.50
0.75
1.00
-1.0 -0.5 0.0 0.5 1.0 1.5
Regression Coefficient
CDF
Not Significant
Significant
Note: The graph shows the regression coefficients from individual-level logistic regressions of outcomes on
response times. Each point on the graph represents an individual-level estimate, and the points are ordered
from lowest to highest. The error bars show 95% confidence intervals. Significance is determined based on
a 0.05 cutoff for the p-value.
C.10