ArticlePDF Available

The cognitive reflection test revisited: exploring the ways individuals solve the test

Authors:

Abstract and Figures

Individuals’ propensity not to override the first answer that comes to mind is thought to be a crucial cause behind many failures in reasoning. In the present study, we aimed to explore the strategies used and the abilities employed when individuals solve the cognitive reflection test (CRT), the most widely used measure of this tendency. Alongside individual differences measures, protocol analysis was employed to unfold the steps of the reasoning process in solving the CRT. This exploration revealed that there are several ways people solve or fail the test. Importantly, 77% of the cases in which reasoners gave the correct final answer in our protocol analysis, they started their response with the correct answer or with a line of thought which led to the correct answer. We also found that 39% of the incorrect responders reflected on their first response. The findings indicate that the suppression of the first answer may not be the only crucial feature of reflectivity in the CRT and that the lack of relevant knowledge is a prominent cause of the reasoning errors. Additionally, we confirmed that the CRT is a multi-faceted construct: both numeracy and reflectivity account for performance. The results can help to better apprehend the “whys and whens” of the decision errors in heuristics and biases tasks and to further refine existing explanatory models.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ptar20
Download by: [Columbia University Libraries] Date: 01 March 2017, At: 13:15
Thinking & Reasoning
ISSN: 1354-6783 (Print) 1464-0708 (Online) Journal homepage: http://www.tandfonline.com/loi/ptar20
The cognitive reflection test revisited: exploring
the ways individuals solve the test
B. Szaszi, A. Szollosi, B. Palfi & B. Aczel
To cite this article: B. Szaszi, A. Szollosi, B. Palfi & B. Aczel (2017): The cognitive reflection test
revisited: exploring the ways individuals solve the test, Thinking & Reasoning
To link to this article: http://dx.doi.org/10.1080/13546783.2017.1292954
Published online: 01 Mar 2017.
Submit your article to this journal
View related articles
View Crossmark data
The cognitive reection test revisited: exploring the
ways individuals solve the test
B. Szaszi
a
,
b
, A. Szollosi
b
,
c
, B. Pal
b
and B. Aczel
b
a
Doctoral School of Psychology, E
otv
os Lorand University, Budapest, Hungary;
b
Institute of
Psychology, E
otv
os Lor
and University, Budapest, Hungary;
c
School of Psychology, The
University of New South Wales, Sydney, Australia
ABSTRACT
Individualspropensity not to override the rstanswerthatcomestomindis
thought to be a crucial cause behind many failures in reasoning. In the present
study, we aimed to explore the strategies used and the abilities employed when
individuals solve the cognitive reection test (CRT), the most widely used measure
of this tendency. Alongside individual differences measures, protocol analysis was
employed to unfold the steps of the reasoning process in solving the CRT. This
exploration revealed that there are several ways people solve or fail the test.
Importantly, 77% of the cases in which reasoners gave the correct nal answer in
our protocol analysis, they started their responsewiththecorrectanswerorwitha
line of thought which led to the correct answer. We also found that 39% of the
incorrect responders reected on their rst response. The ndings indicate that the
suppression of the rstanswermaynotbetheonlycrucialfeatureofreectivity in
the CRT and that the lack of relevant knowledge is a prominent cause of the
reasoning errors. Additionally, we conrmed that the CRT is a multi-faceted
construct: both numeracy and reectivity account for performance. The results can
help to better apprehend the whys and whensof the decision errors in heuristics
and biases tasks and to further rene existing explanatory models.
ARTICLE HISTORY Received 6 June 2016; Accepted 27 January 2017
KEYWORDS Cognitive reection test; process-tracing; reasoning; thinking errors; heuristics and biases
Introduction
In the decades-long aim of psychological research to understand errors in
human thinking, the cognitive reection test (CRT; Frederick, 2005) has
become a pivotal tool to measure a unique dimension of individual differen-
ces. The three-item test was originally created to assess one type of cognitive
ability or disposition: the capacity to suppress the incorrect intuitiveanswer
and substitute it with the correct one.
1
The bat and the ball problem is the
CONTACT B. Szaszi szaszi.barnabas@gmail.com
1
The responses in the CRT are often grouped into three categories: intuitive incorrect(10 cents, 100
machines, 24 days); non-intuitive incorrect(any other answer); and non-intuitive correct(5 cents, 5
machines, 47 days).
© 2017 Informa UK Limited, trading as Taylor & Francis Group
THINKING & REASONING, 2017
http://dx.doi.org/10.1080/13546783.2017.1292954
most well-known example from the test: A bat and a ball cost $1.10 in total.
The bat costs $1 more than the ball. How much does the ball cost? The task can
trigger a misleading answer (in this case, 10 cents), which the participants
need to overcome before engaging in further reection to arrive at the cor-
rect solution (5 cents). These supposed steps of the reasoning process make
the CRT a paradigmatic demonstration of the fallibility of human thinking.
Since its publication, the original paper introducing the CRT (Frederick,
2005) has been cited over 1900 times.
2
The cause of its popularity is multifac-
eted: it possesses high face validity, it is easy to administer, it predicts decision
performance in many different situations, and it correlates with a great num-
ber of other measures. Just to highlight a few examples, individuals with
higher CRT scores are more disposed to avoid decision biases (Toplak, West,
& Stanovich, 2011,2014) and perform better on general ability measures
(Liberali, Reyna, Furlan, Stein, & Pardo, 2012; Stupple, Ball, & Ellis, 2013). The
CRT also predicts intertemporal behaviour (Frederick, 2005), risky choice
(Cokely & Kelley, 2009; Frederick, 2005), utilitarian moral judgement (Paxton,
Ungar, & Greene, 2012), conservatism (Pennycook, Cheyne, Seli, Koehler, &
Fugelsang, 2012), and belief in the supernatural (Gervais & Norenzayan,
2012). Extended versions of the CRT have been created (e.g., Baron, Scott,
Fincher, & Metz, 2014; Primi, Morsanyi, Chiesi, Donati, & Hamilton, 2015;
Thomson & Oppenheimer, 2016; Toplak et al., 2014), as the original three
items of the CRT became increasingly well known to the public.
Besides its growing popularity in empirical studies, the theoretical founda-
tions of the test have been repeatedly questioned. Two closely related sets of
issues prevail in the current discussions: rst, what does the CRT measure?
And second, what are the steps of the reasoning process when people try to
solve the test?
Regarding the rst issue, most researchers argue that the CRT assesses
reectivity. Two views dominate the literature about the interpretation of
reectivity. The most popular interpretation was proposed by Frederick
(2005), conceptualising cognitive reection as the ability or disposition to
resist reporting the response that rst comes to mind(p. 35). This approach
of reectivity has been promoted by, among others, Toplak et al. (2011) who
considered the CRT as a measure of miserly processing, referring to peoples
tendency to rely on heuristic processing instead of using more cognitively
expensive analytical processes. The explanation of both of these research
groups builds on the assumption that the key property of the CRT is that rst
an incorrect intuitiveanswer comes to the mind, and then late suppression
mechanisms need to intervene and override the heuristic answer to be able
to reach a normative solution by further deliberation.
2
Based on Google Scholar, January 2017.
2B. SZASZI ET AL.
Cokely and Kelley (2009) were the rst to extend the dominant theoretical
framework that only emphasised the role of late suppression mechanisms.
They argued that early selection control mechanisms (Jacoby, Kelley, &
McElree, 1999) may play an important role in the reective behaviour. They
proposed that people scoring higher on the CRT process information more
elaborately and tend to use more thorough search processes. Baron et al.
(2014) provided evidence for this hypothesis. In their study, they created no-
lure versions of the CRT
3
and found that these items loaded on the same fac-
tor as the standard CRT items. Additionally, both types of items (lure, no-lure)
correlated to a similar extent with other measures, such as the actively open-
minded thinking (AOT; Baron, 1993) or belief bias syllogisms (BBS; Evans,
Barston, & Pollard, 1983). As the authors did not nd evidence to support the
claim that the suppression of an initial response tendency is relevant in the
CRT, but observed that the test assesses the extensiveness of search, they
concluded that the CRT is a measure of reection-impulsivity (RI; Kagan,
Rosman, Day, Albert, & Phillips, 1964). This, in turn, is an indicator of cognitive
style where there is a relative preference for impulsivity (speed) versus reec-
tion (accuracy).
There is a parallel discussion concerning the CRT as a measurement tool. It
has been argued that the CRT measures solely numeracy
4
as its items are
numerical tasks. Moderate-to-strong correlations have been found between
the CRT and other assessments of numeracy (Finucane & Gullion, 2010; Liberali
et al., 2012). Welsh, Burns, and Delfabbro (2013) observed that the CRT has pre-
dictive power only on those heuristics and biases tasks where numeracy plays
a role in arriving at the correct solution. They concluded that the CRT assesses
numerical abilities rather than the inhibition of a prepotent response. Other
studies, employing factor analysis techniques, found that the CRT items loaded
on the same factor as other numerical items (Baron et al., 2014;La
g, Bauger,
Lindberg, & Friborg, 2014;Study1inLiberalietal.,2012; Weller et al., 2013).
Sinayev and Peters (2015) studied whether numeric abilities or cognitive reec-
tion are responsible for the predictive power of the CRT. Based on the
observed performance on the CRT, they estimated two variables: the numerical
score was calculated as the proportion of correct responses, while the cognitive
reection score was computed as the proportion of non-intuitiveanswers.
They observed that only the numerical scores in the CRT accounted for perfor-
mance on other decision-making and heuristics and biases tasks.
However, other results support the idea that in addition to numeracy,
reective ability is also involved in solving the CRT successfully. In contrast to
3
No lure CRT tasks are CRT-like arithmetic problems that supposedly do not trigger an intuitive incor-
rectresponse. For example, If it takes 1 nurse 5 min to measure the blood pressure of 6 patients, how
many minutes would it take 100 nurses to measure the blood pressure of 300 patients?(Baron, Scott,
Fincher, & Metz, 2014)
4
Numeracy is ones ability to store, represent and process mathematical operations (Peters, 2012).
THINKING & REASONING 3
Welsh et al.s(2013)ndings, Campitelli and Labolita (2010) observed that the
CRT correlates with tasks without mathematical component. Pennycook and
Ross (2016) reviewed evidence that the CRT was predictive of a diverse range
of variables even after controlling for numeracy. Liberali et al. (2012) found
that the bivariate correlations between the CRT and the numeracy scales
were not high and the CRT items loaded on a numeracy-independent factor
based on the results of the factor analysis. The authors concluded that the
CRT is not just another test of numeracy, but also added that the CRT and
objective numeracy are, in fact, related. Campitelli and Gerrans (2014) applied
a mathematical modelling approach to tackle the conundrum. They esti-
mated an inhibition parameter employing BBS and the AOT. They also
assessed a numerical parameter using a numeracy scale. The results indicated
that the models including both an inhibition parameter and a mathematical
component tted the data better than a model including only a mathematical
parameter.
Most studies using the CRT employed some explicit or tacit assumptions
about the steps involved in the reasoning process of the CRT. Although a few
studies tried to explore these assumptions, the analyses were based on aggre-
gated data (e.g., Mata, Ferreira, & Sherman, 2013; Travers, Rolison, & Feeney,
2016), giving rise to methodological limitations. More specically, data aggre-
gation can overshadow the existence of subgroups that may follow different
strategies when solving the test (Fic, 2014).
According to the most common understanding of the CRT, suppression of
arst answer is a necessary step for good performance. This view about the
task relies on two important assumptions. First, it assumes that even those
who give the correct answer start their thinking with an incorrect intuitive
response, although they are able to suppress it. Frederick (2005) postulated
that even the correct responders consider rst the incorrect answer, based on
the observation that the 10 centsanswer was often crossed out next to the
5 centsanswer in the bat and the ball problem. Mata et al. (2013) found evi-
dence that a majority of the correct responders were aware of the intuitive
response. Nevertheless, the authors did not control in their study for the
time-course assumption of the reasoning process which is theoretically cru-
cial, as it is possible that those who indicated awareness of the intuitive
responsemay have had a correct rst response and only later, during the
deliberation period, did they take into account the incorrect alternative
response. Travers et al. (2016) used a computer-mouse tracking paradigm,
where participants were asked to choose an answer on each CRT task by click-
ing on one of four response options on the screen. The authors observed that
individuals who solved the tasks correctly tended to move the mouse more
slowly away from the incorrect intuitiveresponse than from other non-intu-
itive incorrectresponse options before clicking on the correct answer. Never-
theless, based on these ndings, it is difcult to conclude whether or not
4B. SZASZI ET AL.
there were responders whose rst answer was correct. The results imply only
that, on average, correct responders are more likely to start their thinking
with the intuitive incorrectresponse than with other incorrect answers, and
not that they never start their thinking with the correct response. Further-
more, the results of some recent studies suggest that there are individuals
with correct intuitions. For example, Peters (2012) argues that people with
higher numeracy rely on their superior number intuitions(p. 32) and based
on the Fuzzy Trace theory (Reyna, Nelson, Han, & Dieckmann, 2009), she also
claims that they may derive a richer gist from numbers(Peters, 2012, p. 32).
Supporting this idea, Thompson and Johnson (2014) reported that some indi-
viduals responded normatively on reasoning tasks when they were asked to
report the initial answer that comes to mind. These tasks similarly to the
CRT are thought to trigger an incorrect response that needs to be sup-
pressed in order to arrive at the correct answer. The authors argued that cog-
nitive capacity drove the production of the initial correct response.
Svedholm-Hakkinens(2015) experiments provided more evidence for the
same idea: when solving BBS, high-ability people did not show the sign of
belief-inhibition; that is, they seemed to start to think using normative logic.
According to the second underlying assumption of the suppression-focused
interpretation of the CRT, those who give the incorrect heuristic answer do not
reect on it. Otherwise, as Frederick (2005,p.27)argues,even a momentof
reection would lead to the recognition of the failure. Previous studies have
found that people spend more time (Johnson, Tubau, & De Neys, 2016)and
show longer distances travelled by the mouse cursor (Travers et al., 2016)on
correct responses than on the intuitive incorrectanswers. However, these
results only support the idea that, on average, people deliberate more before
producing the correct responses and one cannot conclude that the incorrect
responders did not reect. Furthermore, the fact that incorrect responders
were not aware of the correct response (Mata et al., 2013; Travers et al., 2016)
does not imply that these individuals did not reason analytically (Elqayam &
Evans, 2011). In contrast to this assumption, Meyer, Spunt, and Frederick (2015)
observed that many of their participants failed to solve the bat and the ball
problem despite the fact that they had been warned to think carefully about it.
Moreover, previous ndings have also brought evidence that deliberation does
not necessarily lead to the change of the initial incorrect intuition: for instance,
it has been repeatedly shown that people use reective reasoning to rationalise
or justify their rst thoughts in the Wason selection task (Evans, 1996; Evans &
Ball, 2010;Wason&Evans,1975).
The current research
Our study includes both exploratory and conrmatory research. First, we
aimed to explore the skills required to solve the CRT successfully. To identify
THINKING & REASONING 5
the crucial individual differences behind good performance on the CRT, we
used one numeracy and four reectivity tests. The rationale for using several
measures of reectivity is that there are competing theoretical concepts of
reectivity and there is no agreement on a single and valid assessment
approach. Consequently, one of our aims was to nd which reectivity mea-
sure predicts best the performance on the CRT since this analysis can help us
reveal which conceptualisation of reectivity is captured by the CRT.
Furthermore, we aimed to explore the strategies employed when individu-
als solve the CRT. Here, we focused on two crucial questions concerning the
above-detailed assumptions of the most widely used interpretation of the
CRT. First, we aimed to explore the proportion of correct responses in the CRT
in which the reasoners start their response with the correct answer or with a
line of thought which led to the correct answer. Second, we studied the pro-
portion of the incorrect responses in which the reasoners reect on the
answer that rst comes to their mind. Note that the rst and second questions
focus on the correct and incorrect cases, respectively. To investigate the strat-
egies employed, we used protocol analysis (Ericsson & Simon, 1980), which
has been found to be a valid method for studying thought processes without
altering performance (Fox, Ericsson, & Best, 2011; for limitations see: De Neys
& Glumicic, 2008; Reisen, Hoffrage, & Mast, 2008). Besides the fact that this
method has been used in several studies in the decision-making literature to
track thinking processes (e.g., Brandst
atter & Gussmack, 2013; Cokely & Kelley,
2009; Tor & Bazerman, 2003), we used protocol analysis as it provided some
unique advantages. For instance, with the use of this method, we could differ-
entiate individuals who deliberated after reporting a rst answer from those
who did not deliberate, without interrupting the reasoning process, and while
still being able to keep the CRT tasks open ended and not reducing the num-
ber of alternative answer options.
We formulated a number of additional hypotheses to test the validity of
the ndings of the protocol analysis. First, we hypothesised that it takes more
time to solve the problems correctly in cases where the responders start their
response with the incorrect answer or with a line of thought leading to the
incorrect answer (Incorrect start) than when they start their response with the
correct answer or with a line of thought leading to the correct answer (Correct
start). Second, we expected that there would be no signicant difference in
terms of reaction time (RT) and social desirability between the Correct start
and Incorrect startcases. Finding that individuals in the Correct startcases
have longer RTs or are more socially desirable would indicate the presence of
a confound in our data: that is, Correct startpeople may also suppress their
rst thought but do not verbalise it in our protocol analysis. Third, we
expected that incorrect responders spend more time on solving the problems
when they reect on the rst answer that comes to their mind (Reective)
compared to when they do not deliberate on it (Non-reective).
6B. SZASZI ET AL.
Finally, based on the assumption that individual differences can predict
the usage of different reasoning strategies (e.g., Peters, 2012; Thompson &
Johnson, 2014), we aimed to test two conrmatory hypotheses. First, we
hypothesised that individuals with higher numeracy scores more often
have Correct startthan their less numerate counterparts. Second, we
hypothesised that individuals who score higher on the reectivity scale will
more often deliberate after the rst answer that comes to their mind than
people who score lower on the same scale. Prior to data collection, the deci-
sion was made that for the purpose of testing the hypothesis about reectiv-
ity and deliberation, we would use the reectivity scale that had been found
to best predict the CRT performance.
Method
Participants
Two hundred and nineteen students (75% female, M= 22.04 years, SD = 2.28)
participated in our study. The participants were recruited through the univer-
sity subject pool and they received course credit in exchange for their partici-
pation. All participants were native speakers of Hungarian and signed an
informed ethical consent form. As nine participants indicated after the proto-
col analysis that they were familiar with the CRT questions, they were
excluded from the online session and the analysis.
Procedure
The study consisted of an ofine and an online session. For the ofine session,
participants were invited to the lab to participate in a personal interview. First,
they were informed that the session would be recorded and later analysed.
This was followed by the detailed verbal instruction of the protocol and a
warm-up session. After that, participants were asked to solve the three items
of the CRT
5
in the standard order whilst thinking aloud. Not to have any unde-
sired inuence, the experimenter was seated behind the participants and pro-
vided no feedback regarding the participants performance on the CRT.
Participants were asked to read aloud the tasks, and then to think aloud while
working on the questions but not to explain their thoughts. They were also
requested to indicate when they felt that they are nished with the problems.
Finally, participants were asked whether they were familiar with the CRT tasks.
5
The European version of the bat and ball problem was administered where the cost of the bat and the
ball is given in .
THINKING & REASONING 7
During the online sessions, participants completed the following ques-
tionnaires and ability measures in a xed order using the Qualtrics survey
software tool in installments: AOT (Baron, 1993), rational-experiential inven-
tory (REI; Pacini & Epstein, 1999), BBS (De Neys, Moyens, & Vansteenwegen,
2010), Berlin numeracy test (BNT; Cokely, Galesic, Schulz, Ghazal, & Garcia-
Retamero, 2012), semantic illusions (SIs; Mata, Schubert, & Ferreira, 2014)
and nally the balanced inventory of desirable responding (BIDR; Paulhus,
1991).
Materials
Numeracy measure
We used the computer adaptive version of the BNT (Cokely et al., 2012)to
measure numeracy. The BNT predicts the comprehension of everyday risk,
and the performance on the CRT and many other decision-making tests more
strongly than other numerical instruments. Additionally, it is able to differenti-
ate between highly educated individuals. The test consists of two or three
questions adaptively selected based on the former answers.
Reectivity measures
Participants were asked to ll out the AOT (see Appendix A.1) which meas-
ures peoples tendency to consider several possible answers when facing a
question, to search for evidence supporting an answer other than their previ-
ously established answer, and to seek evidence against their favoured
answer (Baron, 1993). We used the eight-item version of the AOT (Haran,
Ritov, & Mellers, 2013) supplemented by three additional items which
increase the overall reliability of the original scale (Baron, personal
communication).
We also administered the 20-item rationality scale from the REI (Pacini &
Epstein, 1999) which measures the degree to which a person engages in and
enjoys effortful cognitive activity. The inventory separates the construct of
Rationality from Faith in Intuition. In this test, participants are asked to indi-
cate on a ve-point Likert-scale how much statements such as I enjoy intel-
lectual challengesare judged to be true for themselves.
Three valid and three invalid BBS were presented in a random order (see
Appendix A.1). Four of our items were adopted from De Neys et al. (2010)
study, and two additional items were developed by our research group. BBS
can be used as a reectivity measure because the supposed underlying mech-
anism behind performance on BBS items is the same as behind the CRT items.
People tend to decide upon the logical validity of the syllogisms based on the
8B. SZASZI ET AL.
believability of the conclusion, which is thought to be an intuitive response.
Supposedly, people have to suppress the rst intuition and engage in effortful
reasoning to arrive at the correct answer (Evans, 2003).
A set of SI (Mata et al., 2014) were also administered. SI tests are usually
used to measure the degree to which individuals process verbal or written
information carefully and accurately without containing any mathematical
content (Barton & Sanford, 1993; Erickson & Mattson, 1981). Consequently, we
presumed that SI could potentially assess reective processing without mea-
suring numeracy. The SI block consisted of six questions containing SIs where
to give the right answer participants needed to realise the semantic inconsis-
tency embedded into the question (e.g., How many animals of each kind did
Moses take on the Ark?) and two simple general knowledge questions (see
Appendix A.1). These latter general knowledge questions were used so partic-
ipants would not become suspicious once they detected the illusions. The SIs
were adapted from Mata et al. (2014). Based on a similar thinking, Thomson
and Oppenheimer (2016) also created an alternate form of the CRT using
tasks with non-numerical content.
Social desirability measure
Participants were also asked to ll out the BIDR (Paulhus, 1991). BIDR meas-
ures the responders tendency to answer in a way that makes them socially
desirable in order to manage self-presentation. The BIDR consists of two sub-
scales (Self-Deceptive Enhancement, Impression Management), from which
only the second one was administered for the purpose of this study. The sub-
scale consists of 20 items, such as I sometimes drive faster than the speed
limit, and the responders had to report their answer on a seven-point rating
scale.
Bayes factor
As no scientic inference can be made to the hypotheses from statistically
non-signicant results alone (Dienes, 2014), we calculated Bayes factors (B)to
supplement the frequentist analyses and used it to determine whether the
null results in this study imply data-insensitivity or provide evidence for
the null hypotheses. Bis a statistical measure which can be used to assess the
degree to which the data support one hypothesis compared to another one.
To interpret the Bvalues, we employed Jeffreyss(1961) sensitivity criterion.
Accordingly, Bvalues less than 1/3 indicate substantial evidence for the null
while Bvalues more than 3 indicate substantial evidence for the alternative
hypothesis. Bvalues between 1/3 and 3 show that the data are insensitive
and should not be used as scientic evidence towards any of the hypotheses.
THINKING & REASONING 9
For the Bcalculations, we applied the B calculator of Dienes (2008) imple-
mented in R.
6
Results
Descriptive results of the CRT
As the rst step of our analysis, we compared the descriptive results of the
protocol analysis with the most commonly reported descriptive patterns from
previous studies of the CRT. The data showed acceptable reliability as mea-
sured by Cronbach-alpha (0.64), which is comparable with the results of previ-
ously reported studies (Campitelli & Gerrans, 2014; Liberali et al., 2012; Primi
et al., 2015; Weller et al., 2013). While, in total, 28% of the responses were cor-
rect, the participants reported the intuitive incorrectanswers and other
incorrect answers in 60% and in 8% of the cases, respectively, and gave up on
solving the problems in 4% of the cases. The proportion of different types of
answers showed considerable variance across the tasks of the CRT. Table 1
provides a summary of these ndings. Both the solution rates and the propor-
tion of different types of answers were in line with the previous ndings in the
literature (e.g., Primi et al., 2015). Our results were also consistent with
previous results regarding gender differences in the CRT performance (e.g.,
Frederick, 2005): the MannWhitney test indicated that males scored higher
(Mdn = 1) on the CRT than females (Mdn = 0), W= 5206, p= 0.003.
Individual differences measures and the CRT performance
The rst part of the follow-up online survey containing the numeracy and
reectivity measures was returned by 206 out of the 210 participants while
195 individuals (93%) completed the second survey comprising the social
desirability scale. Appendix A.2 provides an overview of the descriptive statis-
tics of the used tests. Each analysis was run with all of the data available for
Table 1. The number and the proportion of answers per answer type.
Correct answers Intuitive incorrect answers Other incorrect answers Gave up
CRT1 44 (21%) 150 (71%) 5 (2%) 11 (5%)
CRT2 46 (22%) 130 (62%) 28 (13%) 6 (3%)
CRT3 87 (41%) 98 (47%) 18 (9%) 7 (3%)
Total 177 (28%) 387 (60%) 51 (8%) 24 (4%)
6
In order to compute B, one has to model the predictions of the tested hypotheses. Since all of the
hypotheses in the current study had directional predictions, following Dieness recommendations (2011,
2014), we modelled the alternative hypotheses with half-normal distributions with 0 probability for nega-
tive values. We applied two ways to determine the SD of the half-normal distributions. If we had informa-
tion on the effect size of the alternative model, then we used it as the SD of the half-normal distribution.
Otherwise, we estimated the maximum possible effect size of the alternative hypothesis and we applied
the half of it as the SD of the half-normal distribution.
10 B. SZASZI ET AL.
that test. BNT showed signicant correlation with the CRT performance, r=
0.49, p<0.001, and all the reectivity measures (REI, AOT, SI, and BBS) also
correlated signicantly with the CRT (Table 2). However, after controlling for
BNT, the partial correlation analysis showed that only REI, r(178) = 0.26, p<
0.001, and AOT, r(178) = 0.20, p= 0.007, retained a signicant relation with
the CRT (SI., r(178) = 0.03, p<0.71; BBS, r(178) = 0.11, p<0.13).
As a next step, we aimed to investigate the individual differences behind
good performance on the CRT. To do that, we built standard multiple regres-
sion models to assess the variablespredictive ability on the CRT performance.
First, all the independent variables were entered into the model, then all the
statistically non-signicant predictors were removed. Our nal model, com-
prising BNT, b= 0.39, 95% CI [0.29, 0.48], t= 8.22, p<0.001, and REI, b= 0.02,
95% CI [0.01, 0.03], t= 4.16, p<0.001, tted the data best, F(2,203) = 48.09,
p<0.001, adj. R
2
= 0.32.
7
Protocol analysis: exploring the ways individuals solve the CRT
Two raters, blind to our hypotheses, categorised the verbal reports using the
following coding system (Table 3). First, the answer of every individual on
each CRT task has been signed as correct or incorrect. Then, a different cate-
gorisation procedure was applied for the correct and for the incorrect
answers. The coding system is summarised in Table 3 with some prototypical
examples from the bat and the ball problem. The result of the categorisation
procedure showed high inter-rater reliability, kappa = 0.83.
The correct answers were classied into the Correct start, or the Incor-
rect startcategories. All the cases where participants started their response
with a line of thought which led to the correct answer (i.e., after reading the
task, expressed a coherent sequence of mental steps that led her to the cor-
rect answer), or after reading a question immediately gave the correct answer,
were categorised as Correct start. Otherwise, the cases where the
Table 2. Correlations of the main variables.
Berlin
numeracy
test
Rational-
experiential
inventory
Actively
open-minded
thinking
Semantic
illusions
Belief bias
syllogisms
CRT 0.494

0.291

0.256

0.187

0.292

Berlin numeracy test 0.143
0.24

0.286

0.384

Rational-experiential
inventory
0.339

0.095 0.165
Actively open-minded
thinking
0.206

0.242

Semantic illusions 0.224

p<0.05;

p<0.01.
7
The assumptions of the multiple regression were not met. A bootstrapping estimation of 10,000 sam-
ples conrmed the results of the regression analysis.
THINKING & REASONING 11
Table 3. Categorisation of the verbal reports and the number of cases and individuals in each category.
Participantsnal
answer Basis of the categorisation Categories Denition of the categories Example
No. of cases (no.
of individuals)
Correct What does the person start to
say after reading out loud the
task?
Correct start Starting their response with the
correct answer
Its 5 cents! 124 (86)
Starting their response with a
thinking leading to the correct
answer
I see. This is an equation. Thus. if the ball equals to x.
the bat equals to x plus 1
Incorrect start Starting their response with an
incorrect answer
I would say 10 cents. But this cannot be true as it
does not sum up to 1.10
37 (34)
Starting their response with a
thinking leading to an
incorrect answer
Lets see! 1.10 minus 1 is 10 centsWait. thats
wrong! This should be solved as an equation
Incorrect What does the person say after
reporting a rst answer?
Reective Expressing doubt and re-
performing original strategy
but Im not sureIf together they cost 1.10. and
the bat costs 1 more than the ball. the solution
should be 10 cents. Im done.
219 (136)
Non-reective No reection Ok. Im done. 142 (106)
12 B. SZASZI ET AL.
participants started their response with an incorrect answer or with a line of
thought which led to an incorrect answer, but later realised their failure, were
labelled as Incorrect start.
The incorrect responses were grouped as Reectiveor as Non-reec-
tive. Regarding the incorrect cases, the categorisation procedure focused on
whether the participant reected or not after reporting a rst answer. A case
was classied as Non-reective, if the participant accepted the rst answer
that came to her mind without any type of consideration, or simply echoed it.
Otherwise (e.g., when the participant tried to reframe the problem, re-per-
formed the original strategy, looked for alternative strategies or answers,
expressed doubt), the protocol was categorised as Reective.
The data of one participant partially and the data of two individuals
completely were omitted, as the audio recordings of their trials were dam-
aged. The exclusion criterion was set before the experiment was conducted.
All the cases were excluded where the raters did not agree about the group-
ing of the protocol, to minimise the noise in the results of the protocol analy-
sis. As a result, 76 additional cases (12%) were omitted from the subsequent
analyses. The cases where the participants gave correct and incorrect answers
were analysed separately according to the corresponding hypotheses.
Analysis of the correct cases
The protocol analysis of the correct answers suggests that the participants
performed a Correct-startin 124 cases (77%) and showed an Incorrect start
pattern only in 37 cases (23%). The Correct startpattern emerged as domi-
nant for all of the CRT items (see Appendix B.1.1); however, it was most
robustly expressed for the lily padstask. Note that the individual protocols
formed the bases of the analysis.
To test the validity of this result, further analyses were conducted. First, we
tested the hypothesis that the average nal response time (FRT) in the Incor-
rect startgroup is longer than in the Correct startgroup. The rationale
behind this thinking is that those in the Incorrect startgroup need to per-
form extra mental operations compared to those who started their response
with the correct answer or with a line of thought leading to the correct
answer. In this study, FRT was operationalised as latency between the points
at which the participants nished reading aloud the tasks and when they indi-
cated that their nal answer had been given. Log transformation was con-
ducted to correct for the deviations from the normal distribution on FRT data.
These log-transformed data were used in the comparison of several linear
mixed random-effects models.
8
The base-model contained only the
8
We used the glmer and lmer functions from the lme4 package in R for the mixed-effect analyses (Bates,
Maechler, Bolker, & Walker, 2015). The corresponding t statistics reported are based on the result of Wald t tests.
THINKING & REASONING 13
participantsID as a random intercept regressed on FRT. In the second model,
a random intercept was specied for each of the CRT items. As a result, the
model t increased signicantly, x
2
(1) = 15.41, p<0.001. In the third model,
group membership (Correct startvs. Incorrect start) was added as a xed
effect which signicantly increased the model t, x
2
(1) = 52.37, p<0.001.
The analysis revealed that the FRT was signicantly higher in the Incorrect
startgroup than in the Correct startgroup, b= 1.02, 95% CI[0.77, 1.29],
t(158.81) = 7.91, p<0.001.
For the purposes of the current study, we dened RT as the time interval
that happened between the end of the task-reading and the onset of the for-
mulation of the individuals answer. Assuming that any deliberative process
is expressed in terms of thinking times, if people in the Correct startgroup
also started their reasoning process with an incorrect answer or with a line of
thought which led to an incorrect answer and suppressed this rst thought
before starting to articulate their answer, their RT should be longer than the
RT of the Incorrect startgroup. This would indicate the presence of a con-
found in our data. To test this hypothesis, we built a linear mixed random-
effect model and conducted model comparisons in the same way for RT as
we did for FRT above. We found that neither the CRT items increased the t
of the model signicantly, nor did the xed effect of the group membership.
Additionally, we calculated Bto determine whether this null result implies
data-insensitivity or provides evidence for the null hypothesis. The analysis
yielded B
H(0,1.63)
= 0.28, indicating evidence for the null.
9,10
Thus, we found
no difference in RT between the Incorrect startand the Correct start
groups.
People ranking higher on the social desirability scale may be less likely to
verbalise the rst answer that comes to mind in case it is incorrect. As this
could result in a possible confound in our ndings, we tested the hypothesis
that individuals in the Correct startgroup score higher on the BIDR than
people in the Incorrect startgroup. We compared mixed random-effect
logistic regression models where the group membership was the outcome
variable. First, we specied random intercepts for each participant and then
for each CRT item. This latter effect did not signicantly increase the t of the
model. In the last step, BIDR was stepped into the model, but we found no
evidence that the groups differ in Social Desirability. The Bayesian analysis
9
Hindicates that we applied a half-normal distribution to model the predictions of the alternative
hypothesis. The rst number in the bracket displays the centre of the distribution, and the second indi-
cates the SD of the distribution.
10
We assumed that the effect size of H1 cannot be bigger than the average RT of the group with longer
RT. Consequently, the average RT in the Correct startgroup was taken as an estimate of the maximum
effect size of H1. The half of its value was employed as the SD of the model.
14 B. SZASZI ET AL.
further supported that BIDR does not predict the group membership of the
participants, B
H(0,0.45)
= 0.015.
11
Analysis of the incorrect cases
The protocol analysis of the incorrect answers aimed to explore whether there
are people who check the rst answer that comes to their mind but still fail to
solve the task. The data suggest that in 142 of the 361 cases (39%), people
engaged in some kind of reective behaviour after reporting their rst
answer, while in 219 cases (61%) people accepted the rst answer that they
reported without any further deliberation. We observed a similar pattern for
all the CRT items (see Appendix B.1.2).
Based on the denition of the Reectiveand Non-reectivegroups,
one would expect that FRT in the Non-reectivegroup is shorter than in
the Reectivegroup. To test this assumption, we again compared linear
mixed random-effect models. The model comparison method followed the
procedure introduced above. The base-model contained random intercept
for each participant. Then, random intercept was added for the CRT items,
which signicantly increased the t of the model, x
2
(1) = 13.31, p<0.001.
Finally, group membership was added as a xed effect. We found that the
group membership variable signicantly increased the t of the model, x
2
(1)
= 91.63, p<0.001. The analysis revealed that people in the Reective
group spent signicantly more time on solving the problems than people
in the Non-reectivegroup, b= 0.73, 95% CI[0.59, 0.87], t(349.6) = 10.24,
p<0.001.
Individual differences as predictors of task solution
12
We hypothesised that more numerate individuals start their thinking with cor-
rect strategies or have correct intuitions on the CRT more often than their low
numeracy counterparts. We compared mixed random-effect logistic regres-
sion models to test whether group membership (Correct startvs. Incorrect
start) is predicted by BNT performance. In the rst model, we specied a ran-
dom intercept for each participant. The CRT item variable being stepped into
the model as a random factor did not increase the model t, nor did BNT per-
formance yield a signicant effect. We calculated Bin order to test whether
the data supported the null-hypothesis. The analysis resulted in B
H(0,0.45)
=
0.62, suggesting that the data obtained are not sensitive enough to permit a
11
As there was no previous study examining the predictive power of BIDR on the CRT performance, we
applied the predictive power of the BNT as a rough estimate for the maximum effect size of H1. Thus, the
half of this value was employed as the SD of the model.
12
Although we did not formulate specic hypotheses, Appendix B.2 depicts the means and standard
deviations of all the individual differences measures (BNT, AOT, REI, BBS, SI, BIDR) across the different cate-
gories created in the protocol analyses.
THINKING & REASONING 15
conclusion.
13
It has to be added that our data showed a ceiling effect on BNT
among the correct responders which is not surprising taken that CRT tasks
are highly difcult. Taken together, these ndings do not allow us to draw
any inference regarding our hypothesis.
Our last hypothesis predicted that people in the Reectivegroup score
higher on the REI scale than the members of the Non-reectivegroup. To
test this idea, we built a linear mixed random-effect logistic regression mod-
els. First, we added a random intercept for each participant, in a model with
group membership as the criterion variable. Adding random intercepts for
the individual CRT items did not increase the model t signicantly. Adding
REI as a xed-effect predictor failed to increase model t signicantly. The
result of the corresponding Bayes factor analysis indicated that the obtained
data is not sensitive enough to permit a conclusion,
14
B
H(0,0.03)
= 0.80.
Discussion
The ndings of this study deepen our understanding about how people solve
the CRT and about the abilities needed for its correct solution. The results sug-
gest that there are individuals who start their response with the correct
answer or with a line of thought which led to the correct answer when solving
the CRT tasks. Mata et al. (2013, Study 5) explicitly asked the participants after
solving the modied version of the bat and the ball problem whether the typ-
ically incorrect solution came to their mind while thinking about the task. As
we did, they also found that correct responders had not thought of the intui-
tive responsein a noteworthy number of cases (28%),
15
which can be inter-
preted as the proportion of the Correct startindividuals. Cokely and Kelley
(2009), based on the ndings of their protocol analysis, also argued that the
signicance of early selection control mechanisms is underestimated in the
decision literature. However, these results provide empirical evidence that
the early selection processes may play an important role in solving the CRT.
The nding that the majority of the correct responders started their
response with the correct answer or with a line of thought which led to the
correct answer raises questions regarding the usage of the CRT as a pure
13
The predictive power of the BNT for giving the right answer on the CRT was taken as the maximum of
the expected effect size for H1, and so the half of this value was employed as the SD of the model.
14
We took the maximum expected effect size from a model where REI predicted the accuracy of the
answer for H1. The half of its value was employed as the SD of the model.
15
Compared to our ndings, the relatively low proportion of Correct startcases could have been caused
by several differences between the two experimental designs. First, unlike us, the authors used the modi-
ed bat and ball problem. Additionally, the authors did not control for the time-course assumption of the
answers, which is crucial regarding our theoretical question, as it is possible that those who indicated
awareness of the intuitiveresponse may have started to think with a correct strategy, and the incorrect
solution came to their mind only later. Finally, their results are based on participantsself-reports after
solving the task and not on verbal protocols.
16 B. SZASZI ET AL.
measurement of the ability to override the rst intuitive response. In addi-
tion, our correlational results further support that the late suppression mecha-
nism may not be the only feature of reectivity in the CRT. We have found
that the REI and the AOT were the best predictors of the CRT performance
above the BNT, and not the reectivity measures which theoretically build
upon the preconception of the suppression of a rst intuitive answer(BBS,
SI). Cokely and Kelley (2009) found that the quantity of the verbalised reason-
ing in risky decision-making tasks was related to CRT performance. Campitelli
and Labollita (2010) have found that individuals who solved more CRT tasks
possessed more general knowledge and used more detailed heuristic cues.
Cokely, Parpart, and Schooler (2009) demonstrated that more reective indi-
viduals provided more normatively justiable judgements in environments
where multiple diagnostic cues were available; however, they also relied
more on heuristic processes when there was no diagnostic cue available.
Additionally, Baron et al. (2014) observed that the predictive power of the
CRT does not stem from the disposition to overcome an initial intuition in
moral judgements. In line with previous results, our ndings support the view
that the denition of reectivity at least when it is operationalised by the
CRT should not be restricted to the description of the ability or disposition
to override gut feelings, but instead a broader RI account of reectivity should
be used embracing the general preference for speed over accuracy.
Stanovich, Toplak, and West (2008) suggested a general framework to
understand rational thinking errors in heuristics and biases tasks. Their classi-
cation embraces two different kinds of causes that may be behind the think-
ing failures. The rst cause is rooted in the individualstendency to use
heuristic-processing mechanisms (Simon, 1956; Stanovich et al., 2008; Tversky
& Kahneman, 1974). The heuristics and biases tasks are designed to trigger
automatic but incorrect responses, which can lead individuals to report this
incorrect answer as it is of low computational expense. The second cause is
called the mindware problem (Perkins, 1995); it stems from the fact that indi-
viduals lack the declarative knowledge and strategic rules that are needed to
solve some problems. Consequently, even when individuals put considerable
mental effort into the problem-solving process, the lack of this necessary
knowledge can lead to thinking failures (Stanovich et al., 2008).
The CRT is believed to assess peoples tendency to answer questions with
the rst idea that comes to their mind without checking it(Kahneman, 2011,
p. 65). Toplak et al. argued (2011,2014) that incorrect responding on the CRT
is not a result of a mindware problem, but rather that of miserly processing.
In a recent review, Pennycook, Fugelsang, and Koehler (2015) considered the
role of cognitive abilities rather rudimentary" (2015, p. 426). However, we
found that many reasoners are not able to come to the right solution in the
CRT even if they reect on their rst answer. Consequently, the mindware
problem should be considered as one of the reasons people make errors on
THINKING & REASONING 17
the CRT tasks. Meyer et al. (2015) work also supports our ndings in this
regard. The authors used four different kinds of manipulation to make people
reect on the bat and the ball problem and found that throughout all condi-
tions a signicant amount of people still reported an incorrect response. Their
results also suggest that the tendency to fail the task can be caused either by
hopeless(low ability) or by careless(high ability, low reectivity) behaviou-
ral patterns. A recent study of Szollosi, Bago, Szaszi, and Aczel (in press) brings
further evidence to this hypothesis: their results showed that many of the par-
ticipants who failed to solve the bat and ball problem reported that they had
veried their answer, which can be interpreted as an indication of deliberative
thinking. Additionally, our nding converges with others in the literature
showing that a period of reection does not necessarily produce benecial
results (Stanovich et al., 2008; Thompson, Turner, & Pennycook, 2011; Thomp-
son et al., 2013). This result raises serious concerns about the usage of the
CRT as a measure of cognitive miserliness and warns that whenever the CRT
is used in correlational studies, researchers have to take into consideration
whether the lack of miserliness or the mindware problem could have caused
the effect as the failure on the CRT tasks can be caused by both.
The responses in the CRT are often grouped into intuitive incorrect,
non-intuitive correct,andnon-intuitive incorrectcategories (e.g.,
Pennycook, Cheyne, Koehler, & Fugelsang, 2015). More importantly, many
studies make central conclusions from the hypotheses built on this classica-
tion (e.g., B
ockenholt, 2012; Brosnan, Hollinworth, Antoniadou, & Lewton,
2014;Piazza&Sousa,2013;Sinayev&Peters,2015). Although our study did
not focus on the question of whether a response was intuitive or deliberative
(Evans, 2003,2009), the results of the protocol analysis suggest that partici-
pants deliberated after articulating a rst response in 39% of the trials where
they reported an incorrect intuitivenal response. Note that we do not
mean to speculate on whether the rst response was generated by intuition
or deliberation, but we argue that many of the reasoners engaged in some
form of reection despite eventually reporting the intuitive incorrect
answer. As a consequence, the classication based only on the nal answer
to indicate deliberative tendencies yields a contaminated measure that could
lead to biased results. Our conclusion here is in line with previous research
(e.g., Elqayam & Evans, 2011; Thompson & Johnson, 2014; Thompson et al.,
2011 ): solely based on the normativity of the responses, one cannot infer
whether the answer was the output of Type 1 or Type 2 processes (Evans &
Stanovich, 2013), or the decision-maker engaged in deliberation or not. Our
results indicate that before building on the conclusions of the studies using
the original classication schema, more scientic examination would be
needed to investigate the validity and the reliability of the intuitive/delibera-
tive categories.
18 B. SZASZI ET AL.
In accord with previous ndings (e.g., Campitelli & Gerrans, 2014; Del Miss-
ier, M
antyl
a, & Bruin, 2012; Pennycook & Ross, 2016), our results support the
idea that both reective ability and numeracy account for the performance in
the CRT. Consequently, we suggest that whenever the CRT is used as a stand-
alone individual differences measure, one should draw only careful conclu-
sions about the reasons behind any correlations found (see also, Aczel, Bago,
Szollosi, Foldes, & Lukacs, 2015), as there is no simple way to tell whether
numerical abilities or the reective disposition are causing the effect.
However, the methodological difculty in the dissociation of numeracy
and reectivity is rooted deeper than the reliability of the tests. Those who
have better numerical abilities might have richer and more accurate intuitions
(Pachur & Spaar, 2015; Peters, 2012; Reyna et al., 2009; Thompson & Johnson,
2014), or use early controlled processes (Jacoby, Shimizu, Daniels, & Rhodes,
2005; Peters, 2012), which could lead them to more accurate responding
without being reective in reectivity tests that are based on numerical tasks.
At the same time, low numeracy can lead to low scoring even for the highly
reective individuals. (See also the mindware problem). Similarly, in numeracy
tests, high reectivity can lead people to put more effort into the problem-
solving procedure resulting in more correct responses (Ghazal, Cokely, &
Garcia-Retamero, 2014), but low reectivity can have a detrimental effect on
performance.
16
As a consequence, whenever researchers aim to assess reec-
tivity with numerical test-based assessment tools, they have to be careful
about the interpretation of the ndings, as it is not possible to determine
only by examining the accuracy measures whether numeracy or reectivity
lead to a correct/incorrect response. However, this conclusion is not specic
to the numerical domain (Szaszi, 2016), but holds true for any domain-specic
reectivity test where additional thinking effort increases the probability of
successful responding (for a similar argument, see Baron, Badgio, & Gaskins,
1986).
Fox et al. (2011) outlined that verbal protocols do not assure a complete
record of the participantsthoughts(p. 338). Consequently, one limitation of
our thinking aloud study is that we cannot exclude the possibility that some
of those who apparently started their response with the correct answer or
with a line of thought which led to the correct answer did not perceive any
other response option. Although the RT measure supported the idea that the
Correct startgroup do not need to inhibit a rst answer before starting to
verbalise their response, there are alternative explanations that cannot be
ruled out in our experimental design. First, RT is a valid measure to diagnose
16
Working memory (WM) differences can bring additional complexity in the equation: people with
higher working memory spam are thought to be more numerate (Peters, Dieckmann, Dixon, Hibbard, &
Mertz, 2007; Reyna, Nelson, Han, & Dieckmann, 2009), but they may nd the cost of additional thinking
lower than their low WM counterparts (Stupple, Gale, & Richmond, 2013).
THINKING & REASONING 19
how much thinking is being done, but it is less reliable in determining how
many mental operations are occurring. Additionally, one can assume that
Correct startindividuals are more cognitively able than people in the Incor-
rect startgroup. Taken as a whole, it is possible that Correct startpeople
suppress their rst answer and generate a new answer or strategy in the
same time-frame as Incorrect startresponders generate their rst answer.
Finally, it is possible that Correct startreasoners considered the intuitive
responseduring the reading phase, and if so, our RT measure would not be a
sensitive measure of it.
It has been argued that reectivity is a key individual differences dimen-
sion predicting rational errors in heuristics and biases tasks (e.g., Stanovich
et al., 2008; Toplak et al., 2011) and in diverse everyday situations (Pennycook
et al., 2015). Our study aimed to enhance our knowledge of the CRT, as it is
the most widely used behavioural measure of reectivity. In sum, we
observed that there are several ways people can solve or fail the test. Impor-
tantly, some individuals started their response with the correct answer or
with a line of thought which led to the correct answer, while others fail to
solve the CRT tasks even when they reect on it. Additionally, the current
results suggest that the CRT test rather measures a general preference for
speed over accuracy and not just individualsability to suppress a rst intui-
tive answer. In our view, the CRT is a useful and important measurement tool
of reectivity. However, this study raises doubts about the validity of the stud-
ies that build on the CRT as a simple measure of analytical thinking, since the
use of the CRT as a standalone predictor can easily lead to the overestimation
of the role of reectivity and the underestimation of the role of numerical
ability in decision performance. As the CRT tasks are pivotal examples in sev-
eral dual-process models of reasoning and decision-making, the implications
of our ndings go beyond the CRT as a measurement tool. Our implications
about the processes and abilities involved in the CRT can be used to better
apprehend the whysand whens(De Neys & Bonnefon, 2013) of the deci-
sion errors in heuristics and biases tasks and to further rene existing explana-
tory models.
Acknowledgments
We would like to thank
Arp
ad V
olgyesi for running the verbal protocols, Bence Bago
and Zoltan Kekecs for their helpful comments with the analysis, Melissa Wood for
proofreading the manuscript and Melinda Sz
aszi-Szrenka for her supporting patience
throughout the study.
Disclosure statement
No potential conict of interest was reported by the authors.
20 B. SZASZI ET AL.
Funding
This work was supported by the doctoral scholarship of E
otv
os Lor
and University, and
by the Pallas Ath
en
e Domus Animae Alap
ıtv
any. Aba Szollosi was supported by the
Nemzet Fiatal Tehets
egei
ertScholarship [NTP-NFT
O-16-1184].
ORCID
B. Szaszi http://orcid.org/0000-0001-7078-2712
A. Szollosi http://orcid.org/0000-0003-3457-542X
B. Palhttp://orcid.org/0000-0002-6739-8792
B. Aczel http://orcid.org/0000-0001-9364-4988
References
Aczel, B., Bago, B., Szollosi, A., Foldes, A., & Lukacs, B. (2015). Measuring individual dif-
ferences in decision biases: Methodological considerations. Frontiers in Psychology,
6, 1770.
Baron, J. (1993). Why teach thinking? An essay. Applied Psychology, 42(3), 191214.
Baron, J., Badgio, P., & Gaskins, I. W. (1986). Cognitive style and its improvement: A nor-
mative approach. Advances in the Psychology of Human Intelligence, 3, 173220.
Baron, J., Scott, S., Fincher, K., & Metz, S. E. (2014). Why does the cognitive reection
test (sometimes) predict utilitarian moral judgment (and other things)? Journal of
Applied Research in Memory and Cognition, 4, 265284
Barton, S. B., & Sanford, A. J. (1993). A case study of anomaly detection: Shallow seman-
tic processing and cohesion establishment. Memory & Cognition, 21(4), 477487.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Lme4: Linear mixed-effects mod-
els using Eigen and S4. R Package, version 1.1-8. Retrieved from https://cran.r-proj
ect.org/web/packages/lme4/index.html
B
ockenholt, U. (2012). The cognitive-miser response model: Testing for intuitive and
deliberate reasoning. Psychometrika, 77(2), 388399.
Brandst
atter, E., & Gussmack, M. (2013). The cognitive processes underlying risky
choice. Journal of Behavioral Decision Making, 26(2), 185197.
Brosnan, M., Hollinworth, M., Antoniadou, K., & Lewton, M. (2014). Is empathizing intui-
tive and systemizing deliberative? Personality and Individual Differences, 66,3943.
Campitelli, G., & Gerrans, P. (2014). Does the cognitive reection test measure cognitive
reection? A mathematical modeling approach. Memory & Cognition, 42(3), 434
447.
Campitelli, G., & Labollita, M. (2010). Correlations of cognitive reection with judg-
ments and choices. Judgment and Decision Making, 5(3), 182191.
Cokely, E. T., Galesic, M., Schulz, E., Ghazal, S., & Garcia-Retamero, R. (2012). Measuring
risk literacy: The Berlin numeracy test. Judgment and Decision Making, 7(1), 2547.
Cokely, E. T., & Kelley, C. M. (2009). Cognitive abilities and superior decision making
under risk: A protocol analysis and process model evaluation. Judgment and Deci-
sion Making, 4(1), 2033.
Cokely, E. T., Parpart, P., & Schooler, L. J. (2009). On the link between cognitive control
and heuristic processes. In N. A. Taatgnen & H. Van Rijn (Eds.), Proceedings of the
31th annual conference of the cognitive science society (pp. 29262931). Austin, TX:
Cognitive Science Society.
THINKING & REASONING 21
Del Missier, F., M
antyl
a, T., & Bruin, W. B. (2012). Decisionmaking competence, execu-
tive functioning, and general cognitive abilities. Journal of Behavioral Decision Mak-
ing, 25(4), 331351.
De Neys, W., & Bonnefon, J.-F. (2013). The whysand whensof individual differences
in thinking biases. Trends in Cognitive Sciences, 17(4), 172178.
De Neys, W., & Glumicic, T. (2008). Conict monitoring in dual process theories of think-
ing. Cognition, 106(3), 12481299.
De Neys, W., Moyens, E., & Vansteenwegen, D. (2010). Feeling were biased: Autonomic
arousal and reasoning conict. Cognitive, Affective, & Behavioral Neuroscience, 10(2),
208216.
Dienes, Z. (2008). Understanding psychology as a science: An introduction to scientic
and statistical inference. New York, NY: Palgrave Macmillan.
Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspec-
tives on Psychological Science, 6(3), 274290.
Dienes, Z. (2014). Using Bayes to get the most out of non-signicant results. Frontiers in
psychology, 5, 781.
Elqayam, S., & Evans, J. S. B. (2011). Subtracting oughtfrom is: Descriptivism versus
normativism in the study of human thinking. Behavioral and Brain Sciences, 34(5),
233248.
Erickson, T. D., & Mattson, M. E. (1981). From words to meaning: A semantic illusion.
Journal of Verbal Learning and Verbal Behavior, 20(5), 540551.
Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological Review, 87(3),
215251.
Evans, J. S. B. (1996). Deciding before you think: Relevance and reasoning in the selec-
tion task. British Journal of Psychology, 87(2), 223240.
Evans, J. S. B. (2003). In two minds: Dual-process accounts of reasoning. Trends in Cogni-
tive Sciences, 7(10), 454459.
Evans, J. S. B. (2009). How many dual-process theories do we need? One, two, or many?
In In two minds: Dual processes and beyond (pp. 3354). New York, NY: Oxford Uni-
versity Press.
Evans, J. S. B., & Ball, L. J. (2010). Do people reason on the Wason selection task: A new
look at the data of Ball et al. (2003). Quarterly Journal of Experimental Psychology, 63
(3), 434441.
Evans, J. S. B., Barston, J. L., & Pollard, P. (1983). On the conict between logic and belief
in syllogistic reasoning. Memory & Cognition, 11(3), 295306.
Evans, J. S. B., & Stanovich, K. E. (2013). Dual-process theories of higher cognition
advancing the debate. Perspectives on Psychological Science, 8(3), 223241.
Fic, M. (2014). Double jeopardy in inferring cognitive processes. Frontiers in Psychol-
ogy, 5, 1130.
Finucane, M. L., & Gullion, C. M. (2010). Developing a tool for measuring the decision-
making competence of older adults. Psychology and Aging, 25(2), 271288.
Fox, M. C., Ericsson, K. A., & Best, R. (2011). Do procedures for verbal reporting of think-
ing have to be reactive? A meta-analysis and recommendations for best reporting
methods. Psychological Bulletin, 137(2), 316344.
Frederick, S. (2005). Cognitive reection and decision making. The Journal of Economic
Perspectives, 19(4), 2542.
Gervais, W. M., & Norenzayan, A. (2012). Analytic thinking promotes religious disbelief.
Science, 336(6080), 493496.
Ghazal, S., Cokely, E. T., & Garcia-Retamero, R. (2014). Predicting biases in very highly
educated samples: Numeracy and metacognition. Judgment and Decision Making, 9
(1), 1534.
22 B. SZASZI ET AL.
Haran, U., Ritov, I., & Mellers, B. A. (2013). The role of actively open-minded thinking in
information acquisition, accuracy, and calibration. Judgment and Decision Making, 8
(3), 188201.
Jacoby, L. L., Kelley, C. M., & McElree, B. D. (1999). The role of cognitive control: Early
selection versus late correction. In S. Chaiken & Y. Trope (Eds.), Dual-process theories
in social psychology (pp. 383400). New York, NY: Guilford.
Jacoby, L. L., Shimizu, Y., Daniels, K. A., & Rhodes, M. G. (2005). Modes of cognitive con-
trol in recognition and source memory: Depth of retrieval. Psychonomic Bulletin &
Review, 12(5), 852857.
Jeffreys, H. (1961). The theory of probability. Oxford: Oxford University Press.
Johnson, E. D., Tubau, E., & De Neys, W. (2016). The doubting system 1: Evidence for
automatic substitution sensitivity. Acta Psychologica, 164,5664.
Kagan, J., Rosman, B. L., Day, D., Albert, J., & Phillips, W. (1964). Information processing
in the child: Signicance of analytic and reective attitudes. Psychological Mono-
graphs: General and Applied, 78(1), 137.
Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus, and Giroux.
La
g, T., Bauger, L., Lindberg, M., & Friborg, O. (2014). The role of numeracy and intelli-
gence in healthrisk estimation and medical data interpretation. Journal of Behav-
ioral Decision Making, 27(2), 95108.
Liberali, J. M., Reyna, V. F., Furlan, S., Stein, L. M., & Pardo, S. T. (2012). Individual differ-
ences in numeracy and cognitive reection, with implications for biases and falla-
cies in probability judgment. Journal of Behavioral Decision Making, 25(4), 361381.
Mata, A., Ferreira, M. B., & Sherman, S. J. (2013). The metacognitive advantage of delib-
erative thinkers: A dual-process perspective on overcondence. Journal of Personal-
ity and Social Psychology, 105(3), 353355.
Mata, A., Schubert, A.-L., & Ferreira, M. B. (2014). The role of language comprehension
in reasoning: How good-enoughrepresentations induce biases. Cognition, 133(2),
457463.
Meyer, A., Spunt, R., & Frederick, S. (2015). The bat and ball problem. Unpublished
manuscript.
Pachur, T., & Spaar, M. (2015). Domain-specic preferences for intuition and delibera-
tion in decision making. Journal of Applied Research in Memory and Cognition, 4(3),
303311.
Pacini, R., & Epstein, S. (1999). The relation of rational and experiential information
processing styles to personality, basic beliefs, and the ratio-bias phenomenon. Jour-
nal of Personality and Social Psychology, 76(6), 972987.
Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. R.
Shaver, & L. S. Wrightsman (Eds.), Measures of personality and social psychological
attitudes (pp. 1759). San Diego, CA: Academic Press.
Paxton, J. M., Ungar, L., & Greene, J. D. (2012). Reection and reasoning in moral judg-
ment. Cognitive Science, 36(1), 163177.
Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2015). Is the cognitive
reection test a measure of both reection and intuition? Behavior Research Meth-
ods, 48(1), 341348.
Pennycook, G., Cheyne, J. A., Seli, P., Koehler, D. J., & Fugelsang, J. A. (2012). Analytic
cognitive style predicts religious and paranormal belief. Cognition, 123(3), 335346.
Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). Everyday consequences of ana-
lytic thinking. Current Directions in Psychological Science, 24(6), 425432.
Pennycook, G., & Ross, M. R. (2016). Commentary: Cognitive reection vs. calculation in
decision making. Frontiers in Psychology,7, 9. Retrieved from https://www.ncbi.nlm.
nih.gov/pmc/articles/PMC4722428/
THINKING & REASONING 23
Perkins, D. (1995). Outsmarting IQ: The emerging science of learnable intelligence. New
York, NY: Free Press.
Peters, E. (2012). Beyond comprehension the role of numeracy in judgments and deci-
sions. Current Directions in Psychological Science, 21(1), 3135.
Peters, E., Dieckmann, N., Dixon, A., Hibbard, J. H., & Mertz, C. K. (2007). Less is more in
presenting quality information to consumers. Medical Care Research and Review, 64
(2), 169190.
Piazza, J., & Sousa, P. (2013). Religiosity, political orientation, and consequentialist
moral thinking. Social Psychological and Personality Science, 5(3), 334342.
Primi, C., Morsanyi, K., Chiesi, F., Donati, M. A., & Hamilton, J. (2015). The development
and testing of a new version of the cognitive reection test applying item response
theory (IRT). Journal of Behavioral Decision Making, 29. doi:10.1002/bdm.1883
Reisen, N., Hoffrage, U., & Mast, F. W. (2008). Identifying decision strategies in a con-
sumer choice situation. Judgment and Decision Making, 3(8), 641658.
Reyna, V. F., Nelson, W. L., Han, P. K., & Dieckmann, N. F. (2009). How numeracy inuen-
ces risk comprehension and medical decision making. Psychological Bulletin, 135(6),
943973.
Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological
Review, 63(2), 129138.
Sinayev, A., & Peters, E. (2015). Cognitive reection vs. calculation in decision making.
Frontiers in Psychology, 6, 532. doi:10.3389/fpsyg.2015.00532
Stanovich, K. E., Toplak, M. E., & West, R. F. (2008). The development of rational thought:
A taxonomy of heuristics and biases. Advances in Child Development and Behavior,
36, 251285.
Stupple, E. J., Ball, L. J., & Ellis, D. (2013). Matching bias in syllogistic reasoning: Evidence
for a dual-process account from response times and condence ratings. Thinking &
Reasoning, 19(1), 5477.
Stupple, E. J., Gale, M., & Richmond, C. R. (2013). Working memory, cognitive miserli-
ness and logic as predictors of performance on the cognitive reection test. In M.
Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth (Eds.), Proceedings of the 35th annual
conference of the cognitive science society (pp. 13961401). Austin, TX: Cognitive Sci-
ence Society.
Svedholm-H
akkinen, A. M. (2015). Highly reective reasoners show no signs of belief
inhibition. Acta Psychologica, 154,6976.
Szaszi, B. (2016). The role of expertise and preference behind individualstendency to
use intuitive decision style. Journal of Applied Research in Memory and Cognition, 5
(3), 329330.
Szollosi, A., Bago, B., Szaszi, B., & Aczel, B. (in press). Exploring the determinants of con-
dence in the bat-and-ball problem.
Thomson, K. S., & Oppenheimer, D. M. (2016). Investigating an alternate form of the
cognitive reection test. Judgment and Decision Making, 11(1), 99113.
Thompson, V. A., & Johnson, S. C. (2014). Conict, metacognition, and analytic thinking.
Thinking & Reasoning, 20(2), 215244.
Thompson, V. A., Turner, J. P., & Pennycook, G. (2011). Intuition, reason and metacogni-
tion. Cognitive Psychology, 63(3), 107140.
Thompson, V. A., Turner, J. P., Pennycook, G., Ball, L. J., Brack, H., Ophir, Y., & Ackerman,
R. (2013). The role of answer uency and perceptual uency as metacognitive cues
for initiating analytic thinking. Cognition, 128(2), 237251.
24 B. SZASZI ET AL.
Toplak, M. E., West, R. F., & Stanovich, K. E. (2011). The Cognitive Reection Test as a
predictor of performance on heuristics-and-biases tasks. Memory & Cognition, 39(7),
12751289.
Toplak, M. E., West, R. F., & Stanovich, K. E. (2014). Assessing miserly information proc-
essing: An expansion of the Cognitive Reection Test. Thinking & Reasoning, 20(2),
147168.
Tor,A.,&Bazerman,M.H.(2003). Focusing failures in competitive environments: Explain-
ing decision errors in the Monty Hall game, the acquiring a company problem, and
multiparty ultimatums. Journal of Behavioral Decision Making, 16(5), 353374.
Travers, E., Rolison, J. J., & Feeney, A. (2016). The time course of conict on the cogni-
tive reection test. Cognition, 150, 109118.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases.
Science, 185(4157), 11241131.
Wason, P. C., & Evans, J. S. B. (1975). Dual processes in reasoning? Cognition, 3(2), 141
154.
Weller, J. A., Dieckmann, N. F., Tusler, M., Mertz, C. K., Burns, W. J., & Peters, E. (2013).
Development and testing of an abbreviated numeracy scale: A Rasch analysis
approach. Journal of Behavioral Decision Making, 26(2), 198212.
Welsh, M., Burns, N., & Delfabbro, P. (2013). The Cognitive Reection Test: How much
more than numerical ability. In M. Knauff, M. Pauen, N. Sebanz, & I. Wachsmuth
(Eds.), Proceedings of the 35th annual conference of the cognitive science society
(pp. 13961401). Austin, TX: Cognitive Science Society.
Appendices
Appendix 1
A.1. Materials used
A.1.1. Actively open-minded thinking scale
1. Allowing oneself to be convinced by an opposing argument is a sign of good character
2. People should take into consideration evidence that goes against their beliefs
3. People should revise their beliefs in response to new information or evidence
4. Changing your mind is a sign of weakness
5. Intuition is the best guide in making decisions
6. It is important to persevere in your beliefs even when evidence is brought to bear against them
7. One should disregard evidence that conicts with ones established beliefs
8. People should search actively for reasons why their beliefs might be wrong
9. When we are faced with a new question, the rst answer that occurs to us is usually best
10. When faced with a new question, we should consider more than one possible answer before
reaching a conclusion
11. When faced with a new question, we should look for reasons why our rst answer might be
wrong, before deciding on an answer
Note. Items 18 were published by Haran et al. (2013). Items 911 were provided through personal
communication by Jonathan Baron. Reverse scored items: 4, 5, 6, 7, 9.
THINKING & REASONING 25
A.1.2. Semantic illusions
1. There is a running race among A, B, C, D, E, F. If B pass the person in second place, what place is
now B in.
2. Larrys father has ve sons, viz. Ten, Twenty, Thirty, FortyGuess what would be the name of the
fth?
3. How many animals of each kind did Moses take on the ark?
4. In which decade did the Beatles become the most popular American band ever?
5. In which day of September did the Twin Towers in Washington, DC get attacked by Islamist
terrorists?
6. A plane was ying from Germany to Barcelona. On the last leg of the journey, it developed engine
trouble. Over the Pyrenees, the pilot started to lose control. The plane eventually crashed right on
the border. Wreckage was equally strewn in France and Spain. Where should the survivors be
buried?
Note. Items 1 and 2 were collected from the Internet while items 36 were adopted from Mata et al.
(2014) study.
A.1.3. Belief bias syllogisms
Invalid/believable Valid/unbelievable
1. All owers need light.
Roses need light.
Roses are owers.
2. All mammals can walk.
Whales are mammals.
Whales can walk.
3. All dogs have snouts.
Labradors have snouts.
Labradors are dogs.
4. All vehicles have wheels.
Boats are vehicles.
Boats have wheels.
5. All fruits have corns.
Apples have corns.
Apples are fruits.
6. All birds have wings.
Cats are birds.
Cats have wings.
Note. Items 14 were adopted from De Neys et al. (2010). Items 5 and 6 were developed by our
research group.
A.2. Descriptive statistics of the tests used in the study
CRT AOT REI BBS SI BNT BIDR
Number of people 210 206 206 206 206 206 195
Theoretical range 031177 20100 06061420140
Range of data 033971 2798 06061447118
Median 1 57 75 5 2 2 86
Mean 0.8 56.7 72.1 4.5 2.6 2.4 84.9
SD 1.0 6.5 13.4 1.8 1.4 1.3 15.0
Note. AOT, actively open-minded thinking; REI, rational-experiential inventory; BBS, belief bias syllo-
gisms; SI, semantic illusions; BNT, Berlin numeracy test; BIDR, balanced inventory of desirable
responding.
26 B. SZASZI ET AL.
Appendix 2
B.1. Protocol analysis results per CRT item
B.1.1. Distribution of nal correct responses per CRT item: number of
trials in the correct startand the incorrect startgroups.
Item Correct start (n) Incorrect start (n) Total (n)
CRT1 24 14 38
CRT2 28 11 39
CRT3 72 12 84
CRT 124 37 161
B.1.2. Distribution of nal incorrect responses per CRT item: number of
trials in the reectiveand the non-reectivegroups.
Item Non-reective (n)Reective (n) Total (n)
CRT1 78 56 134
CRT2 83 43 126
CRT3 58 43 101
CRT 219 142 361
B.2. Means and standard deviations of the individual differences
measures used for each protocol category (mean (SD)).
Correct start Incorrect start Non-reective Reective Gave up
BNT 3.14 (1.10) 3 (1.20) 2.11 (1.18) 2.06 (1.17) 1.95 (1.21)
AOT 58.45 (6.03) 58.22 (5.55) 55.76 (6.90) 56.45 (6.24) 54.36(6.75)
REI 76.60 (11.42) 77.32 (8.57) 69.30 (13.90) 71.63 (12.70) 65.45 (19.31)
BBS 5.05 (1.59) 5.03 (1.61) 4.31 (1.89) 4.27 (1.82) 4.68 (2.06)
SI 2.86 (1.23) 2.65 (1.27) 2.53 (1.46) 2.42 (1.59) 2.64 (1.33)
BIDR 83.72 (14.97) 87.29 (14.48) 85.38 (14.54) 83.6 (14.78) 89.48 (15.72)
Note. BNT, Berlin numeracy test; AOT, actively open-minded thinking; REI, rational-experiential inven-
tory; BBS, belief bias syllogisms; SI, semantic illusions; BIDR, balanced inventory of desirable
responding.
THINKING & REASONING 27
B.4. The number of Correct startand Incorrect startcases within
the correct and incorrect nal responses.
Correct start Incorrect start
Correct nal response 124 37
Incorrect nal response 1 349
Note. We run an additional protocol analysis to separate the Correct startand Incorrect startcases
within the incorrect responses. Similar to Appendix Sections B.1.1 and B.1.2, this table only shows
those cases where the raters were on agreement upon the categorisation of the cases. Twelve cases
were excluded from the 362 incorrect responses due to disagreement among the raters.
Figure B.1. Histograms of nal response times and reaction times broken down by CRT
tasks.
28 B. SZASZI ET AL.
... Cognitive bias, characterized by systematic thought patterns that can introduce errors in memory, perception, or judgment [1], has accumulated extensive attention within decision-making contexts. The process of decision making is governed by a dual reasoning mechanism, encompassing intuitive rapid processing and a more deliberate, complex cognitive process [2,3]. Although these mechanisms function concurrently, one may exert dominance and contribute to cognitive bias. ...
... Therefore, it can shed light on whether therapists' cognitive biases arise from more analytical, conscious thought processes (rational processing) or from intuitive, pre-conscious responses (experiential processing). The tool has moderate validity [10] and moderate internal reliability [3]. ...
Article
Full-text available
Background: Cognitive bias may appear in occupational therapists' interpretation of physical examinations. Since different strategies for decision making have been shown to reduce bias, its quantification is an essential first step towards awareness and bias reduction. Our aims: (1) quantify cognitive bias by testing the differences in occupational therapists' assessment of lateral pinch force modulation between young and older adults, and between women and men; and (2) to test for a correlation between the tendency to bypass an intuitive response and the degree of cognitive bias. Methods: Occupational therapists (n = 37; age 40.3 ± 11.4 years) used a visual analogue scale to rate pre-recorded simulations of the digital output of lateral pinch modulation videos of different levels of abilities coupled with videos of young/old men/woman pressing the force sensor. They filled out the Cognitive Reflection Test and the Rational-Experiential Inventory-40. Results: Subjects showed higher bias towards old individuals compared to young ones (p < 0.001), but with no sex bias (p = 0.119). Rational ability correlated with cognitive bias of assessment of lateral pinch modulation in old individuals (r = 0.537, p < 0.001). Discussion: Occupational therapists might underestimate the physical abilities of older adults. Biased evaluation might cause assignment of redundant exercises and therefore loss of time, effort, and resources.
... We used the 3-item Cognitive Reflection Task (CRT; Frederick, 2005) to assess reflective and deliberative (versus intuitive and quick) cognitive processing style. The CRT is scored by summing the correct responses across the three items (with possible scores ranging from 0 to 3), and has previously shown lower internal reliability (Cronbach's a ¼ 0.64; Szaszi et al., 2017) but higher test-retest reliability (r ¼ 0.81; Stagnaro et al., 2018). ...
Article
While the construct of self-deception has received ample theoretical and empirical attention, its virtuous counterpart—self- honesty—has been largely neglected. Yet, as argued here and elsewhere, the metacognitive practice of being honest with oneself may be among the most crucial concomitants of psychological growth and change. Consequently, drawing on theory and research from across several disciplines, this paper proposes a novel framework for understanding and measuring self-honesty as a core value. Using data from three separate studies that explore self-honesty via a newly developed self- report, implicit association test and an ecological momentary assessment measure, findings offer preliminary support for the construct of self-honesty as a distinct, psychometrically valid and reliable construct with relevance to multiple indices of psychological functioning. Pending further research, these findings have potentially far-reaching implications for our understanding and promotion of human virtue and flourishing.
... With some consideration of their answer participants could then arrive at the correct reflective answer (e.g., 29) or an entirely incorrect answer (e.g., 31). Reflective (i.e., correct) scores were summed for an overall reflective score on this test and intuitive (i.e., incorrect lure-based) scores were summed for the intuitive score (for published sources of CRT items used, [16,39,[80][81][82][83][84]). ...
Article
Full-text available
Background Anodal transcranial direct current stimulation (tDCS) over the right dorsolateral prefrontal cortex (DLPFC) has shown to have effects on different domains of cognition yet there is a gap in the literature regarding effects on reflective thinking performance. Objective The current study investigated if single session and repeated anodal tDCS over the right DLPFC induces effects on judgment and decision-making performance and whether these are linked to working memory (updating) performance or cognitive inhibition. Methods Participants received anodal tDCS over the right DLPFC once (plus sham tDCS in a second session) or twice (24 h apart). In the third group participants received a single session of sham stimulation only. Cognitive characteristic measures were administered pre-stimulation (thinking disposition, impulsivity, cognitive ability). Experimental tasks included two versions of the Cognitive Reflection Test (numeric vs verbal-CRT), a set of incongruent base-rate vignettes, and two working memory tests (Sternberg task and n-back task). Forty-eight participants (mean age = 26.08 ± 0.54 years; 27 females) were recruited. Results Single sessions of tDCS were associated with an increase in reflective thinking performance compared to the sham conditions, with stimulation improving scores on incongruent base rate tasks as well as marginally improving numeric CRT scores (compared to sham), but not thinking tasks without a numeric component (verbal-CRT). Repeated anodal stimulation only improved numeric CRT scores. tDCS did not increase working memory (updating) performance. These findings could not be explained by a practice effect or a priori differences in cognitive characteristics or impulsivity across the experimental groups. Conclusion The current results demonstrate the involvement of the right DLPFC in reflective thinking performance which cannot be explained by working memory (updating) performance or general cognitive characteristics of participants.
... Thirdly, some previous studies have shown that different types of CR tests (i.e., CRT-3: larger numerical-CR tests, and verbal-CR tests) are substantially correlated. The degree of overlap suggests that different types of CR reflect the same construct (Otero 2019;Otero et al. 2022;Patel 2017;Pennycook et al. 2016;Sirota et al. 2021;Ståhl and Van Prooijen 2018;Szaszi et al. 2017;Thomson and Oppenheimer 2016;Toplak et al. 2014; among others). However, our findings show sex differences in numerical-CR tests (CRT-3 and larger CR tests) but not on verbal-CR tests. ...
Article
Full-text available
The current study presents a meta-analytic review of the differences between men and women in cognitive reflection (CR). The study also explores whether the type of CR test (i.e., numerical tests and verbal tests) moderates the relationship between CR and sex. The results showed that men score higher than women on CR, although the magnitude of these differences was small. We also found out that the type of CR test moderates the sex differences in CR, especially in the numerical tests. In addition, the results showed that the length of numerical tests (i.e., number of items) does not affect the differences between men and women in CR. Finally, the implications of these results are discussed, and future research is suggested.
... This short scale is reported to capture the unbiased reasoning of a person. Szaszi et al. (2017) reported that this scale showed similar dispositions and correlations to the original long scale by Stanovich and West (1997). Janssena et al. (2020) has reported reliability coefficients varying between 0.65 to 0.92. ...
Article
Full-text available
This Special Issue aims to capture current theoretical and methodological developments in the field of metareasoning, which is concerned with the metacognitive processes that monitor and control our ongoing thinking and reasoning [...]
Article
Reasoning can be fast, automatic, and intuitive or slow, deliberate, and analytical. Use of one cognitive reasoning style over the other has broad implications for beliefs, but differences in cognitive style have not previously been reported in those with mild cognitive impairment (MCI). Here, the cognitive reflection test is used to measure cognitive style in healthy older adults and those with MCI. Those with MCI performed worse than cognitively healthy older adults, indicating they are more likely to engage in intuitive thinking than age-matched adults. This association is reliable after controlling for additional cognitive, self-report, and demographic factors. Across all measures, subjective cognitive decline was the best predictor of cognitive status. A difference in cognitive style represents a novel behavioral marker of MCI, and future work should explore whether this explains a broader pattern of reasoning errors in those with MCI, such as susceptibility to scams or impaired financial reasoning.
Article
The cognitive reflection test (CRT) assesses an individual’s capacity to restrain impulsive and intuitive responses and to engage in critical reflection on mathematical problems. The literature indicates that several factors influence students’ performance on CRT, including gender, age, and prior knowledge of mathematics. In this study, our objective was to investigate the correlation between CRT scores and students’ achievements in both mathematics and physics. We conducted our research with a sample of 150 Italian high school students, and the findings revealed a positive predictive relationship between CRT scores and students’ performance in both mathematics and physics. Furthermore, we employed an ordinal logistic regression to evaluate the impact of CRT scores, gender, and school level on students’ achievements in mathematics and physics. The results showed that both CRT scores and school level had statistically significant effects on predicting these achievements. In contrast, gender emerged as a statistically significant factor only in predicting students’ mathematics achievements.
Article
Full-text available
Purpose The number of mindfulness intervention projects is continually increasing. Within the educational environment, mindfulness has purported links to well-being, positive behaviour, educational and cognitive performance. Trait mindfulness is related to rational thinking and better performance in cognitive tests, suggesting that innate mindfulness ability contributes to self-regulation ability and thus the efficacy of mindfulness interventions. The current study investigates whether mindfulness is a moderating factor. It examines correlations between cognitive performance and trait mindfulness. The study investigates the influence of trait mindfulness on the ability of students to enter state mindfulness in an attempt to understand the role both types of mindfulness may have on cognitive performance. Participants and Method Two-hundred and five male students aged fifteen and sixteen completed the adolescent version of the Mindfulness Awareness Scale, the Cognitive Reflection Test, and the Toronto Mindfulness Scale. Results Hierarchical regression analysis found that state mindfulness was a predictor of cognitive reflection ability. ANOVA also found that having either trait or state mindfulness predicted higher cognitive reflection scores, but only state mindfulness had a significant effect on cognitive reflection. Trait mindfulness was not a moderating factor. Conclusion Both state and trait aspects of mindfulness ability influence cognitive performance. Those with higher trait mindfulness ability are better able to enter state mindfulness and thus had better cognitive reflection scores. However, where it is possible to induce state mindfulness into those with low trait mindfulness, CRT scores were also higher although not significantly so.
Article
Full-text available
A new version of the Rational–Experiential Inventory (REI), which measures rational and experiential thinking styles and includes subscales of self-reported ability and engagement, was examined in two studies. In Study 1, the two main scales were independent, and they and their subscales exhibited discriminant validity and contributed to the prediction of a variety of measures beyond the contribution of the Big Five scales. A rational thinking style was most strongly and directly related to Ego Strength, Openness, Conscientiousness, and favorable basic beliefs about the self and the world, and it was most strongly inversely related to Neuroticism and Conservatism. An experiential thinking style was most strongly directly related to Extraversion, Agreeableness, Favorable Relationships Beliefs, and Emotional Expressivity, and it was most strongly inversely related to Categorical Thinking, Distrust of Others, and Intolerance. In Study 2, a rational thinking style was inversely related and an experiential thinking style was unrelated to nonoptimal responses in a game of chance. It was concluded that the new REI is a significant improvement over the previous version and measures unique aspects of personality.
Article
Full-text available
Pachur and Spaar (2015) provide an important addition to the literature of decision style, suggesting that its traditional domain-general approach should be enriched with a domain-specific perspective. While I also emphasize the importance of the domain-specific approach, I discuss that it is crucial to disentangle the influence of preference for intuitive decision style and domain-specific expertise on individuals' tendency to use intuitive decision mode whenever decision mode is discussed in a domain-specific way. Additionally, I argue that the domain-specific decision style measures are assessing people's tendency and not (just) their preference to use intuitive decision mode. Finally, I discuss some practical consequences of this refinement.
Article
Full-text available
Much research in cognitive psychology has focused on the tendency to conserve limited cognitive resources. The CRT is the predominant measure of such miserly information processing, and also predicts a number of frequently studied decisionmaking traits (such as belief bias and need for cognition). However, many subjects from common subject populations have already been exposed to the questions, which might add considerable noise to data. Moreover, the CRT has been shown to be confounded with numeracy. To increase the pool of available questions and to try to address numeracy confounds, we developed and tested the CRT-2. CRT-2 questions appear to rely less on numeracy than the original CRT but appear to measure closely related constructs in other respects. Crucially, substantially fewer subjects from Amazon’s Mechanical Turk have been previously exposed to CRT-2 questions. Though our primary purpose was investigating the CRT-2, we also found that belief bias questions appear suitable as an additional source of new items. Implications and remaining measurement challenges are discussed. © 2016, Society for Judgment and Decision making. All rights reserved.
Article
Full-text available
Sinayev and Peters (2015; hereafter SP namely, that the propensity or disposition to think analytically plays an important role in CRT performance (Pennycook et al., 2015b). We discuss recent empirical evidence that supports the claim that the CRT is more than just a measure of numeracy or, more generally, cognitive ability.
Article
People often fail to solve deceptively simple mathematical problems, a tendency popularly demonstrated by the bat-and-ball problem. The most prominent explanation of this finding is that, to spare cognitive effort, people substitute the difficult task with an easier one, without being aware of the substitution. Despite this latter assumption, recent studies have found decreased levels of post-decision confidence ratings when people gave the answer of an easier calculation, suggesting that people are sensitive to their errors. In the current study, we investigated a mechanism that might be responsible for such a decrease in people's confidence ratings when they make errors: their attempts to make certain that their answer is correct (verification) and the perceived level of task difficulty (verifiability). We found that these two factors predicted people's confidence, suggesting that people's self-assessment of the perceived task difficulty and of their attempt to verify their response might determine their confidence. Implication for current models of post-decision confidence on reasoning problems is discussed.
Article
Reasoning that is deliberative and reflective often requires the inhibition of intuitive responses. The Cognitive Reflection Test (CRT) is designed to assess people’s ability to suppress incorrect heuristic responses in favor of deliberation. Correct responding on the CRT predicts performance on a range of tasks in which intuitive processes lead to incorrect responses, suggesting indirectly that CRT performance is related to cognitive control. Yet little is known about the cognitive processes underlying performance on the CRT. In the current research, we employed a novel mouse tracking methodology to capture the time-course of reasoning on the CRT. Analysis of mouse cursor trajectories revealed that participants were initially drawn towards the incorrect (i.e., intuitive) option even when the correct (deliberative) option was ultimately chosen. Conversely, participants were not attracted to the correct option when they ultimately chose the incorrect intuitive one. We conclude that intuitive processes are activated automatically on the CRT and must be inhibited in order to respond correctly. When participants responded intuitively, there was no evidence that deliberative reasoning had become engaged.