ArticlePDF Available

Designing Clicker Questions to Stimulate Transfer.

Authors:

Abstract and Figures

One goal of education is transfer: the ability to apply learning in contexts that differ from the original learning situation. How do we design opportunities to promote transfer in a large lecture course? Studies have shown that learning is enhanced when 1 or more tests are included during learning (the testing effect, e.g., Roediger & Karpicke, 2006) and when study sessions are distributed over time (the spacing effect, e.g., Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006). Furthermore, research on analogical reasoning suggests thinking about abstract principles in multiple superficially different exemplars is critical for transfer. Our quasi-experimental study evaluated whether spaced analogical reasoning questions (delivered via clickers) would improve transfer over a 10-week course. Students in 2 sections of an introductory psychology course were presented with low stakes testing opportunities during every class session of a term. Although both sections used clickers, the Testing group received an additional 1–3 analogical reasoning questions per week designed so that students would apply the same abstract concept (correlation/causation) in a new context (e.g., prenatal growth, literacy). The Notes group was presented with equivalent information in a direct instruction format. To assess long-term transfer, transfer questions were embedded into the midterm and final exam. The Testing group outperformed the Notes group on transfer in the final exam. This indicates that practicing analogical transfer is particularly important for delayed test situations. The current research provides unique insight into the how formative assessment can produce long-term transfer of abstract concepts with current pedagogical technologies.
Content may be subject to copyright.
Designing Clicker Questions to Stimulate Transfer
Ji Y. Son
California State University, Los Angeles
Mariela J. Rivas
University of California, Irvine
One goal of education is transfer: the ability to apply learning in contexts that differ
from the original learning situation. How do we design opportunities to promote
transfer in a large lecture course? Studies have shown that learning is enhanced when
1 or more tests are included during learning (the testing effect, e.g., Roediger &
Karpicke, 2006) and when study sessions are distributed over time (the spacing effect,
e.g., Cepeda, Pashler, Vul, Wixted, & Rohrer, 2006). Furthermore, research on ana-
logical reasoning suggests thinking about abstract principles in multiple superficially
different exemplars is critical for transfer. Our quasi-experimental study evaluated
whether spaced analogical reasoning questions (delivered via clickers) would improve
transfer over a 10-week course. Students in 2 sections of an introductory psychology
course were presented with low stakes testing opportunities during every class session
of a term. Although both sections used clickers, the Testing group received an
additional 1–3 analogical reasoning questions per week designed so that students would
apply the same abstract concept (correlation/causation) in a new context (e.g., prenatal
growth, literacy). The Notes group was presented with equivalent information in a
direct instruction format. To assess long-term transfer, transfer questions were embed-
ded into the midterm and final exam. The Testing group outperformed the Notes group
on transfer in the final exam. This indicates that practicing analogical transfer is
particularly important for delayed test situations. The current research provides unique
insight into the how formative assessment can produce long-term transfer of abstract
concepts with current pedagogical technologies.
Keywords: analogy, transfer, testing effect, clicker
One goal of education is transfer: the ability
to apply learning in a new context that differs
from the original learning situation. Many of the
most important concepts in any course are ab-
stract in that there are a variety of superficially
different instantiations that exemplify the same
concept. For example, in an introductory psy-
chology course (among others), students are
often taught that a correlational result does not
necessarily imply a causal relationship between
variables. This abstract idea can be embodied in
a vast number of different examples. Even
though men who became fathers have lower
testosterone than men who did not (Gettler,
McDade, Agustin, Feranil, & Kuzawa, 2015),
this does not mean that fatherhood causes de-
creases in testosterone. Knowing that children
who attend preschool tend to have better long-
term academic and economic outcomes than
children who do not does not mean that pre-
school causes those improved outcomes—we
need experiments to conclude causality (Heck-
man et al., 2010;Schweinhart, 1993). Even
though these examples do not have superficial
features in common (objects such as testoster-
one, fathers, preschool, achievement), the un-
derlying structure, the abstract relationships be-
tween the objects, is the same. Fostering a deep
understanding of these abstract, structural, rela-
tional principles is simultaneously the most
This article was published Online First August 8, 2016.
Ji Y. Son, Department of Psychology, California State
University, Los Angeles; Mariela J. Rivas, School of Edu-
cation, University of California, Irvine.
This research was supported by grants from the Califor-
nia State University, Los Angeles Scholarship of Teaching
and Learning program. We thank Catherine Haras, Cheryl
Ney, and James Rudd for their support of research that leads
to engaging and effective learning opportunities for our
students.
Correspondence concerning this article should be ad-
dressed to Ji Y. Son, Department of Psychology, California
State University, 5151 State University Drive, Los Angeles,
CA 90032. E-mail: json2@calstatela.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Scholarship of Teaching and Learning in Psychology © 2016 American Psychological Association
2016, Vol. 2, No. 3, 193–207 2332-2101/16/$12.00 http://dx.doi.org/10.1037/stl0000065
193
worthwhile goal of teaching and often the most
difficult to achieve.
The goal of this research is to explore meth-
ods that could be implemented in large lecture
settings that promote transfer of deep principles
across superficially different domains (also
called “far transfer,” see Barnett & Ceci, 2002).
In the present study, participants’ success in
learning abstract concepts was measured by the
degree of transfer to superficially dissimilar do-
mains that have the same relational structure.
Success on these questions requires analogical
reasoning, an employment of relational knowl-
edge that transcends superficial contextual de-
tails (Brown & Kane, 1988). With these mea-
sures, we examined whether a popular method
of classroom engagement, questioning via per-
sonal response systems (PRS), could be tailored
to promote far transfer.
Designing Effective PRS Questions
Many instructors across a variety of institu-
tions are using technologies such as PRS, com-
monly called clickers, in an attempt to improve
learning and engagement. Most reviews
(Caldwell, 2007;Fies & Marshall, 2006;Kay &
LeSage, 2009;Keough, 2012;Landrum, 2015)
note that the majority of evidence in support of
clickers focuses on improvements to student
engagement such as attendance, attention, and
participation. These reviews also note converg-
ing evidence of clickers making an impact on
pedagogy (e.g., leading to more contingent
teaching, feedback, and assessment). So it is not
much of a surprise that many of the studies that
were reviewed also demonstrated improve-
ments in learning (usually measured with
grades as well as broad assessments such as
exam scores). This literature also tends to have
a cautionary tone: It comes with warnings that
the technology alone is no panacea and encour-
ages good pedagogical design to capitalize on
the potential of clickers (e.g., Landrum, 2015).
Designing “good” clicker questions, those
that lead to better student outcomes, is still
largely up to individual instructors. Clicker
question guidelines (e.g., Horowitz, 2006;Mar-
tyn, 2007;Robertson, 2000) often emphasize
clarity in presentation (e.g., have no more than
five answer options, rehearse your presentation)
without much direction on how to promote deep
learning. Some clicker resource guides, such as
those put out by the Science Education Initiative
affiliated with Carl Wieman (Wieman et al.,
2008), note that the best clicker questions are
challenging and designed in such a way where
even wrong answers reveal misconceptions
(e.g., these answers might be designed from
previous student responses). Dangel and Wang
(2008) use a version of Bloom’s Taxonomy of
Educational Objectives (Anderson & Krath-
wohl, 2001) to assert that the best uses of
clicker technology emphasize higher order cog-
nitive skills such as applying, analyzing, evalu-
ating, and creating over memorizing. Although
these guidelines sound intuitively correct, there
is little evidence as to whether these types of
questions lead to better thinking. In the leading
edge of science education, there have been no-
table broad efforts to develop more sophisti-
cated clicker use for large enrollment science
courses (e.g., physics, biology) and adjusting
the placement of clicker activities in the flow of
class time to promote these deeper levels of
learning (e.g., Beatty, Gerace, Leonard, & Du-
fresne, 2006;Crossgrove & Curran, 2008;
Crouch & Mazur, 2001).
The approach in this study is smaller in scale.
We attempt to examine the impact of one type
of “deep learning” question: the application of
an abstract concept across diverse contexts. Al-
though ideally, courses emphasize many differ-
ent levels of Bloom’s Taxonomy, there is not
yet evidence for the notion that practicing a
particular type of thinking with clicker ques-
tions leads to particular forms of deeper learn-
ing. Overall grades, general test performance,
and student opinions are also not direct methods
of assessing such learning. If instructors are
being advised to develop challenging clicker
questions that promote “deep thinking,” we
need evidence that systematically asking high-
er-order thinking questions produces such
thinking. We turn to the literature in cognitive
science on analogical reasoning for principles
on how to develop questions that would foster
effective application and transfer of abstract
concepts.
How to Foster Analogical Reasoning
The fundamental cognitive skill in applica-
tion of an abstract concept is analogical reason-
ing, the transfer of relational information from a
domain that already exists in memory (often
194 SON AND RIVAS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
called the source) to critical thinking in a new
domain (often called the target). To transfer
abstract knowledge to new situations, students
must accomplish two forms of mental disci-
pline: (a) focus in on the relevant structural
information that are in common between the
source and target, and (b) avoid distraction by
nonessential concrete details that differ between
the source and target.
Analogical reasoning studies in lab settings
have revealed some key findings about transfer
that provide hints toward better clicker question
design. Even though far transfer is generally
difficult to produce (e.g., Gick & Holyoak,
1980,1983), exposure to multiple source exem-
plars (e.g., three or more) is more likely to
foster transfer to novel targets than just one
(Gick & Holyoak, 1980,1983;Catrambone &
Holyoak, 1989). Explicit explanations of the
abstract structure and specialized language also
help but these are only effective when intro-
duced in conjunction with multiple training in-
stances (Gick & Holyoak, 1980,1983;Son,
Doumas, & Goldstone, 2010). Deep under-
standing of these instances at the causal level is
critical for far transfer; knowledge of superficial
features or surface traits is insufficient for trans-
fer (Brown & Kane, 1988). Comparison of mul-
tiple examples leads to greater understanding at
a causal or relational level; this then leads to
more accurate generalization (Catrambone &
Holyoak, 1989;Gentner, 1983;Gentner, Loew-
enstein, & Thompson, 2003).
This consensus about the importance of dis-
cussing and comparing multiple examples for
far transfer are based on lab studies with very
short time scales (e.g., typically less than 1-hr
sessions that encompass learning and test). But
most educational experiences occur on very dif-
ferent time scales (e.g., over weeks and
months). Only a few studies have attempted to
investigate the effects of presenting multiple
training instances over educationally relevant
learning and assessment intervals (Gluckman,
Vlach, & Sandhofer, 2014;Vlach & Sandhofer,
2012). In these studies, elementary students
demonstrated more sophisticated analogical
transfer of science concepts when multiple ex-
amples were spaced out in time (e.g., one per
week) rather than presented in immediate suc-
cession (e.g., four analogs in one session). Al-
though many studies demonstrate that the spac-
ing learning opportunities over longer time
scales is important for memorization (see Car-
penter et al., 2012, for a recent review), few
studies have examined spacing to promote far
transfer across contexts.
From the analogical transfer literature we
have learned that students need well-structured
exposure to multiple analogs and those multiple
analogs should be spaced out. A series of lec-
tures in a course can easily space out multiple
isomorphic examples with extensive compari-
son and explicit statements of the abstract sim-
ilarities all without the aid of PRS. Would in-
corporating these analogical learning strategies
in the form of clicker questions have advantages
over doing so without clicker questions?
Testing Effect for Far Transfer
We predict an advantage in learning from
multiple analogs with clickers, not because of
the technology itself, but because the technol-
ogy supports the well-established “testing ef-
fect.” The “testing effect” (e.g., Karpicke &
Blunt, 2011;McDaniel, Roediger, & McDer-
mott, 2007) refers to the enhanced performance
on a final test when one has taken an initial test
on studied material relative to studying alone.
Even when the test is not identical to the for-
mative study questions, testing as a method of
study produces better test performance (Rohrer,
Taylor, & Sholar, 2010;McDaniel, Thomas,
Agarwal, Mcdermott, & Roediger, 2013). Fur-
thermore, the testing effect can be combined
with the spacing effect: when test-as-study ses-
sions are spaced out, there is a greater effect
than test-as-study all at once or spaced out study
sessions that do not include testing (Cull, 2000).
Our interest here is whether formative spaced
testing, implemented through clickers, could be
used to enhance analogical transfer across
knowledge domains. Our hypothesis is this: In
order to foster transfer, we should test for trans-
fer regularly. We explored this hypothesis by
presenting students with an abstract idea com-
monly taught to psychology students (that cor-
relation does not equal causation) embedded in
multiple instantiations (a variety of empirical
studies about developmental psychology) over a
10-week course. These examples were pre-
sented along with explicit explanations of the
abstract principles and opportunities to com-
pare/align these exemplars. However, one
group of students was additionally presented
195DESIGNING CLICKER QUESTIONS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
with clicker questions about these correlational/
experimental situations (we called this the Test-
ing group). The other group, called the Notes
group, viewed a notes slide (instead of a ques-
tion slide) that covers roughly equivalent con-
tent. Far transfer was measured with novel
questions about correlation/causation on the
midterm and final.
Method
Participants
Participants were 209 undergraduate students
(155 female) enrolled in two sections of a gen-
eral education life span development class at an
urban public regional institution. There were
129 students in the Testing section and 80 stu-
dents in the Notes section.
At our university, this course satisfies a gen-
eral education requirement for nonmajors so the
vast majority of students were not psychology
majors (86%). The ethnic breakdown of the
participants was as follows: 58% Latino/
Chicano, 16% Asian/Asian American, 9%
White/European American descent, 8% Black,
and 9% Other. The average age of the partici-
pants was 19.78 years (SD 1.93, range 18
33). Preliminary chi-square and ttests between
the two groups indicated that there were no
significant differences in these self-reported de-
mographic characteristics such as gender, eth-
nicity, major (psychology vs. other), or age,
ps.79.
Design
The design of this field study was quasi-
experimental because classrooms were assigned
to conditions rather than individual students.
The two sections of the same 10-week course
were taught by the same instructor (the first
author) with the same syllabus, lecture slides,
readings, assignments, and tests. The Testing
section was taught in the Winter quarter and the
Notes section was taught in the Spring quarter
of the same academic year. Although the names
of these sections emphasize the difference be-
tween them, both Testing and Notes sections
had largely similar clicker and lecture experi-
ences. Both courses utilized the same PRS-
system (iClicker) and included clicker ques-
tions throughout the quarter. The only
difference between the two groups was the de-
sign of 24 critical slides (out of a total of 483
slides presented throughout the quarter). In the
Testing group, there were an added 24 clicker
questions designed as transfer questions regard-
ing the concept of correlation and causation.
The Notes group, in lieu of clicker questions,
had 24 lecture slides with corresponding con-
tent.
Materials and Procedure
The teaching and testing materials for this
study focused on inferring correlation or causa-
tion from research design. We will first outline
the common teaching that both sections re-
ceived about correlational studies and experi-
ments during a unit on “research methodology”
presented during the second week of the quar-
ter. Then we will describe the experimental
manipulation that differed between the two
groups. Finally, we will outline the various
measures of learning and transfer implemented
in this study.
Teaching Materials
During the second week of class, both the
Testing and Notes groups were introduced to
the idea that only an experimental study can
lead to conclusions about causation and that
correlational studies cannot. The pedagogical
features of this lesson included comparison/
contrast of correlational and experimental ver-
sions of studies that examined the same research
question, exploration of this concept in multiple
contexts, and the use of clickers to promote
individual thinking as well as communication
with peers.
The lecture outlined the features of correla-
tional and experimental studies and how these
features afforded different conclusions. There
were two sets of examples that contrasted a
correlational and experimental version of a sim-
ilar research question. The first context was
about the documentation of a longitudinal de-
cline in testosterone after becoming a father
(Gettler et al., 2015). The lecture included de-
tails about the correlational study as well as an
exercise in developing an experiment (noting
that such an experiment would be practically
impossible and unethical). The next context pre-
sented correlational and experimental studies
linking preschool to academic and economic
196 SON AND RIVAS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
success more than 15 years later (Heckman et
al., 2010;Schweinhart, 1993). Students in both
sections engaged with clicker questions on cau-
sation and correlation in these two contexts.
After students clicked in their intuitions about
whether the data from these studies could show
causation, they had an opportunity to share with
a neighboring student before clicking in again.
These four initial clicker questions (two from
each context), which we will refer to as baseline
clicker questions, offer insight into what stu-
dents’ initial impressions about correlation and
causation were and their intuitions after the
exercises, discussion, and lecture. Even though
the lesson plan and the materials were the same
for both sections, there may have been subtle
differences between the two lectures that may
have affected student learning. These baseline
clicker questions could provide a measure of
any differences in initial learning. Two of these
questions appeared on the midterm as a measure
of long-term retention. We will refer to stu-
dents’ performance on these two questions at
the time of the midterm exam as the baseline
test.
Each participant purchased their own clicker
of the brand iClicker. The presentation slides
used in each class lecture were projected to the
front of the classroom. There were a total of 24
critical slides (out of a total of 483) that differed
between the two groups (13 before the midterm
and 11 after the midterm). These critical slides
were spaced throughout the course at a rate of
about one to three per week. Importantly, al-
though there were 24 additional clicker ques-
tions about correlation and causation presented
to students in the Testing lectures, both classes
made extensive use of clickers throughout the
course. There were 112 clicker questions that
the two courses had in common.
The critical slides were presented usually af-
ter the instructor had described some correla-
tional study or experiment. In the Testing con-
dition, the critical slide displayed a multiple-
choice question (i.e., the right answer was
presented alongside wrong answers). After stu-
dents provided their responses, the class’ distri-
bution of responses was shown and correct/
incorrect answers were briefly discussed. In the
Notes condition, the content (e.g., information
from the question and the right answer) was
phrased as notes. To compensate for the added
discussion that may have followed a clicker
question, the slides in the Notes condition in-
cluded the basic feedback that might follow a
clicker question alongside the correct answer
(see Figure 1). The notes for the lecture slides
were available to all students before class—
however, these critical slides were not provided
in advance to either condition.
Clicker questions were presented during the
normal flow of lecture time. Students were al-
lowed to use any and all notes that they had with
them in order to answer the question. Typically,
students had 1 min after the question was read
aloud to click in their answers. Even though
students had access to all of their notes, they
could not look up the answers to the critical
Figure 1. The slide on the left was an example of a clicker question from the Testing
lectures. The slide on the right shows the equivalent content (and following feedback) shown
in Notes lectures.
197DESIGNING CLICKER QUESTIONS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
clicker questions. Because critical questions
were always about a novel study that they had
just been introduced to, they would not have the
answer to these questions in their notes.
Assessment
To examine any differences in learning from
the initial introductory lecture on correlation/
causation, we will examine performance on the
baseline clicker questions (these were part of
the initial lecture on research design presented
during Week 2) and the baseline test questions
(the baseline clicker questions that appeared on
the midterm test as a measure of long-term
retention).
Learning about correlation/causation was
measured on two occasions: the midterm (Week
6, 4 weeks after initial instruction) and final
exam (Week 11, 9 weeks after initial instruc-
tion). There were two types of questions on
each test that assessed learning the distinction
between correlation and causation: retention
and transfer. Retention questions were test
questions that the testing group previously ex-
perienced as clicker questions. Note that these
are only “retention” questions for the Testing
group and they are expected to demonstrate
enhanced performance on these questions rela-
tive to the Notes group. Transfer questions
never appeared as clicker questions to either
group. These questions queried students about
correlation/causation in contexts that were not
discussed during class.
The midterm test included two retention and
two transfer questions. The midterm contained
58 other multiple choice questions on subjects
covered up to Week 5 in course. The final exam
included three memory test questions and two
transfer test questions. The final exam was cu-
mulative and contained 95 other multiple choice
questions on topics covered during the entire
10-week term.
Results
Were There Preexisting Differences
Between the Testing and Notes Groups?
As in any quasi-experimental study, a con-
cern is the equivalence of the Testing and Notes
groups. There are many factors that make two
instantiations of a lecture subtly different so it is
appropriate for us to check whether the effect of
the dedicated lecture on correlation/causation
was dissimilar between the two groups. The
introduction to correlation and causation at the
beginning of the course included clicker ques-
tions that were posed to both the Testing and the
Notes group. Equivalence in initial learning
would temper the notion that the variability seen
in later tests would be explained by differences
in initial teaching/learning. Table 1 shows the
frequencies of correct and incorrect responses
on the clicker questions presented to both
groups during the research methods lecture. A
question about a correlational study (testoster-
one and fatherhood) was posed before the lec-
ture on correlation/causation (Testosterone
Question 1) and after the explanation (Testos-
terone Question 2). Note that students in both
Table 1
Results of the Correct Responses From the Baseline Clicker and Baseline Test Questions
Testing group Notes group
nfrequency percent frequency percent N
2
p
1. Testosterone question 123 5 .04 8 .11 198 3.31 .07
2. Testosterone question 2 123 69 .56 40 .54 197 .08 .79
3. Correlational preschool question 122 32 .26 17 .23 197 .32 .78
4. Experimental preschool question 123 63 .51 48 .66 196 3.94 .05
A. Testosterone question (midterm) 129 56 .43 41 .51 209 .01 .91
B. Experimental preschool question
(midterm) 129 104 .81 65 .81 209 1.22 .27
Note. The total number of responses for these individual clicker questions may not be the same as the total number of
students present during that lecture (123 for the Testing group and 80 for the Notes group) for a variety of reasons (e.g.,
students may have stepped out of the room, been distracted, or may have inadvertently believed they already clicked in).
Questions 1– 4 are the baseline clicker questions (presented to both classes during initial lecture on correlation/causation)
and Questions A–B are the baseline test questions.
198 SON AND RIVAS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
groups were overwhelmingly incorrect on the
first testosterone question but the initial differ-
ences between groups was marginally signifi-
cant such that the Testing group had slightly
worse prelecture intuitions than the Notes
group. After some teaching (lecture explana-
tion, compare/contrast, peer discussion), the
percentages of correct responses were higher on
testosterone question 2 (but not at ceiling) and
the two groups were not significantly different
from one another. When another context was
introduced (about the role of preschool in later
achievement), the majority of students were in-
correct on the following correlational preschool
question and the two groups were not signifi-
cantly different from one another. After more
instruction (explanation, peer discussion), the
two groups both improved on the follow-up
question, the experimental preschool question.
On this last question of the unit, students in the
Notes group were more likely to be correct than
those in the testing group. Any preexisting dif-
ferences before the experimental treatment were
in favor of the Notes group rather than the
experimental group.
Retention on the baseline test questions
(these were presented to both groups as clicker
questions) were not different on the midterm
test. At the bottom of Table 1, we see that the
Testing and Notes groups were not significantly
different from one another in proportion correct
on these questions (testosterone question and
experimental preschool question) when they ap-
peared on the midterm.
Midterm/Final Exam Results
The midterm and final exam questions reveal
the influence of transfer testing over a more
educationally relevant time course. Table 2
shows the means and standard deviations of the
two groups on the critical midterm and final
examination questions.
We conducted a 2 2, Section (Testing,
Notes) Question Type (retention, transfer),
mixed repeated measures ANOVA on midterm
performance. This analysis revealed a signifi-
cant main effect of section, F(1, 207) 7.49,
MSE .23, p.01, partial
2
.04. The
Testing group scored significantly better on
these items than the Notes group. There was
also an effect of question type, F(1, 207)
27.57, MSE .13, p.001, partial
2
.12.
Because different superficial contexts support
thinking about correlation/causation differently,
the transfer items in this case were easier to
answer than the retention items.
1
Furthermore,
there was a significant interaction between sec-
tion and question type, F(1, 207) 6.15, p
.01, partial
2
.03. The size of the benefit
from the clicker experience was greater for the
retention questions than for the transfer ques-
tions. This is expected given that the testing
group had previously seen the retention ques-
tions as clicker questions.
We conducted the same 2 2 mixed re-
peated measures ANOVA on final examination
performance. Once again, there was a signifi-
cant main effect of section, F(1, 207) 10.7,
MSE .16, p.001, partial
2
.05. The
Testing group scored significantly better on
these items than the Notes group. There was
also an effect of question type, F(1, 207)
4.79, MSE .07, p.03, partial
2
.02. This
time, participants, on average, scored higher on
retention items compared to the transfer items.
There was no significant interaction between
section and question type, F(1, 207) .003.
The size of the benefit from the clicker experi-
ence was similar for both retention and transfer
questions.
1
That is, students may find reasoning about experimental
design (e.g., pick out the appropriate experimental design)
easier than reasoning about correlational design (e.g., what
is the problem with correlational design; see Appendix for
complete questions). On the midterm transfer questions, we
asked students to reason about experimental design but on
the final transfer, we tested reasoning about correlational
design. These differences in the content of the transfer
questions may have contributed to the difference in perfor-
mance on midterm and the final exam.
Table 2
Assessment Results for the Testing and
Notes Groups
Testing
group
(N129)
Notes
group
(N80)
M SD M SD d
Midterm
Retention questions .53 .50 .31 .47 .46
Transfer questions .64 .35 .59 .36 .12
Final
Retention questions .52 .35 .39 .29 .4
Transfer questions .46 .36 .33 .31 .4
199DESIGNING CLICKER QUESTIONS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
To confirm that these effects were limited to
relevant questions, we also examined whether
there were any statistical differences in nontar-
get questions, midterm, and final exam items
that were unrelated to correlation and causation.
A22, Section Test (midterm, final),
repeated measures ANOVA on the nontarget
questions revealed no significant effects nor in-
teractions, ps.3.
Discussion
During the initial lecture about correlation/
causation, students in the Notes group were
either more or equally likely to be correct on the
clicker questions as the Testing group. At the
midterm, we see that the Testing group signif-
icantly outperformed the Notes group but
mainly on questions that they had practiced with
clickers. Only at the final exam, after approxi-
mately 9 weeks of clicker questions about cor-
relation/causation, we detect a significant effect
on transfer: the Testing group outperformed the
Notes group on novel questions. These transfer
results echo previous research in memory-based
tasks (Roediger & Karpicke, 2006) that testing
is particularly important over longer time
scales.
The current study attempted to conduct rig-
orous research on issues related to basic cogni-
tion in an ecologically valid setting (a college
course) with authentic learning materials (cor-
relation/causation/research methodology). The
results connect the well-documented testing ef-
fect with analogical transfer while adding to the
demonstrations of the testing effect in real class-
room settings (e.g., Mayer et al., 2009;McDan-
iel, Agarwal, Huelser, McDermott, & Roediger,
2011;McDaniel et al., 2013;Roediger, Agar-
wal, McDaniel, & McDermott, 2011).
Much of the research on the testing effect
documents improvements in performance ex-
clusively on tests of memory by practicing on
tests of memory. Our study adds to the fledgling
literature demonstrating that memory testing
improves even abstract and skill-based learning
(Butler, 2010;Johnson & Mayer, 2009;Mc-
Daniel et al., 2013). These studies have gone
beyond changing the phrasing of the questions
from retention tests but have asked whether
memory testing can confer benefits to concep-
tual applications. However, the present results
also go beyond practicing tests of memory; stu-
dents experienced enhancements in novel trans-
fer questions where they did not practice re-
trieval. The clicker questions presented in class
were all applications of similar concepts in
highly dissimilar situations. Presumably, stu-
dents had to recall some of their knowledge
about correlation and experimental design but
these clicker questions did not explicitly ask
them to do so. Instead, each clicker question
asked students to practice transfer, identifying
the correct conclusions that one could draw in a
novel study context. Students could not “re-
trieve” the correct conclusion because it had
never been presented to them before. In this
case, “application practice” may have provided
students with the process required to arrive at a
valid conclusion.
Furthermore, because clicker questions occur
during the flow of lecture, students had full
access to any notes they wrote during or brought
to class. Students may or may not have refer-
enced previous notes about correlation/experi-
mentation. Even so, Agarwal, Karpicke, Kang,
Roediger, and McDermott (2008) have found
that even practice testing in an open-book set-
ting confers similar benefits as does closed-
book testing.
Critical Features of Testing Experience
In addition to any retrieval or application
demands made by the presentation of clicker
questions, there may be other features of the
testing experience that led to the effects on
memory and transfer. Studies that take place
over relevant time scales often have little con-
trol over what students choose to do during the
interval between the experimental interventions
and the ultimate measures of performance. The
students in the testing group may have used the
clicker questions to reflect on what they need to
study (e.g., they might realize they need to
review the material on correlation/causation
further). Prior research has found that students
use self-testing to monitor their own learning
(Kornell & Son, 2009) and report changing their
study habits in response to clickers (Dawson,
Meadows, & Haffie, 2010). So even beyond
direct benefits of testing itself, the clicker ques-
tions may have prompted metacognitive consid-
erations that may have independent benefits.
All of the clicker questions in this study were
multiple-choice so students in the testing group
200 SON AND RIVAS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
inevitably practiced thinking about wrong an-
swers while considering correct ones. This may
provide a network of possible misconceptions.
Learners that practice navigating through both
diagnostic and nondiagnostic information may
have more robust encoding (Nairne, 2006) than
students that are simply told about this distinc-
tion. They also had time to consider the distri-
bution of answers given by the class. Students
may have taken that opportunity to consider the
allure of popular wrong answers.
Timely feedback may also be an important
part of the testing experience (e.g., Pashler,
Rohrer, Cepeda, & Carpenter, 2007). Even
when a student has failed to learn a concept,
feedback may provide a valuable relearning op-
portunity that may ultimately lead to better per-
formance on a final test. Especially when trying
to teach against particularly strong misconcep-
tions or fallacies, even when students correct
these misconceptions, these errors are likely to
be reproduced in delayed testing (Butler, Fazio,
& Marsh, 2011). Note that in both groups stu-
dents were initially close to unanimous in their
assertion that fatherhood is one of the causes of
change in testosterone from a correlational
study (only 13 out of 185 students initially
thought that a correlational result was not evi-
dence for a causal relationship). Even though
students were instructed on this particular sce-
nario (they also received direct instruction on
correlation/causation in this context and had an
opportunity to peer share), only 55% were con-
vinced at the end of the lecture and only 46%
eventually answered this testosterone question
correctly on the midterm exam. It may be more
difficult to teach the logic of correlation/
causation in some contexts because these con-
clusions are always subject to prior beliefs and
other misconceptions about that specific context
(e.g., class discussion seem to indicate that stu-
dents seem to feel that biological measures are
more subject to causal interpretations than psy-
chological measures). Students may also have
more easily learned about the affordances of
experimental design than the limitations of cor-
relational designs because they seem inclined to
rampantly assume causation. That inclination is
not challenged when thinking about an experi-
ment but must be reined in when thinking about
a correlational study.
Pedagogical Implications
Examining the testing effect live in a class-
room also contributes to practitioner knowledge
about how to design PRS questions that pro-
mote deeper thinking. What we find is fairly
straightforward: Application questions can im-
prove application skills but such skill building
takes time. Designing questions and introducing
comparison cases helps students practice their
thinking. Memorizing or rehearsing a phrase
such as “correlation does not equal causation” is
not equivalent to applying the idea accurately in
a new situation. Students need practice actually
applying these concepts and that is better done
in the context of trying to answer a question
than listening to a lecture.
This current study also points to a need for
more fine-tuned instruments that specifically as-
sess improvements in particular types of think-
ing. Instead of predicting that clicker questions
simply improve grades overall, we measured
specific types of thinking that would improve
(retention and transfer). We found that this dis-
tinction was important because there were
slower improvements to transfer than memory
retention.
Limitations and Future Directions
Our study was designed as a test of the ef-
fectiveness of analogical reasoning questions
against direct instruction of that content. By
conducting this study in an authentic classroom
environment, we lost some of the control of
tighter laboratory studies but gained a look at
“deep learning” mixed in with all the other
demands in a typical lecture style class. Also, it
is important to keep in mind that the transfer
effects seen in this study were modest. This is
unsurprising given that the treatment was fairly
modest as well (only 5% of the slides were
different between the two groups). But these
effect sizes may depend largely on the concepts
being taught and other aspects of pedagogy.
This study leaves a lot of uncovered ground
in the quest to design better questions for teach-
ing and learning. We need other in vivo studies
that contrast different types of questions and
interactions with questions. For instance, are
questions that focus on checking learning more
effective than questions that prompt interest?
Do problem-solving questions have a different
201DESIGNING CLICKER QUESTIONS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
effect than memory-based questions? What
kinds of foils are most effective for fostering
critical thinking about content? Also new tech-
nology (e.g., Poll Everywhere) allows instruc-
tors to go beyond the multiple-choice format.
What do students learn from looking at word
clouds from free responses compared to the
typical feedback from multiple-choice ques-
tions? Many of these possibilities can be tested
in large lecture contexts that use clickers al-
ready without much disruption to the rest of the
course goals. The pedagogical value of partic-
ular questions can also be investigated in other
modalities (e.g., paper-pencil tests; quizzes on a
course management platform).
But in addition to these studies that investi-
gate the design of questions posed to students,
we also need data to develop larger scale frame-
works for how questions might fit in with other
active learning methods for fostering transfer.
What is the role of questions in a dynamic
classroom? Combining active learning methods
may be far more effective in teaching applica-
tion of abstract concepts than any single method
alone. The field also needs direction in under-
standing how an instructor should react to stu-
dent responses. Is the type of teaching needed to
address a misconception shared by the entire
class different than the pedagogical techniques
that would help a class that has a more mixed
profile (e.g., half the students understand and
half do not)?
There are many types of “deep thinking” and
some of these conceptions might be specific to
a discipline. For instance, in chemistry and
other sciences, educators may value students’
ability to map between different types of repre-
sentations (e.g., molecular interactions, equa-
tions, macroscale phenomena). Other disci-
plines might value specific kinds of critical
thinking. For example, “historical thinking”
means that students should question and evalu-
ate sources of evidence (in addition to analyzing
the content). How should we modify questions
to promote these different types of deep think-
ing? How little we know about developing these
varieties of “deep knowing” pushes educators to
modify their technologically enabled class-
rooms in service of higher order thinking rather
than implementing pedagogical interventions in
ways that require the least effort (e.g., using
questions provided by textbook companies or
designing questions that will be easy for stu-
dents to answer correctly).
Conclusion
Teaching for transfer is noble but sometimes
feels intractable. There are so many major con-
cepts that cut across particular situations in ev-
ery discipline but it often feels unfair to test
students on their ability to transfer those con-
cepts because we often fail miserably at teach-
ing for transfer. Further research on teaching
that actually is focused on transfer is gravely
needed; this holds educators to a higher stan-
dard and will lead to teaching practices that
encourage students to greater levels of under-
standing. Transfer is crucial because the finan-
cial and human investment in education is only
justifiable when formal schooling transfers to
novel problems outside of what was presented
in the original learning context. As a scholarly
community of teachers, we should invest our
intellectual resources on developing those skills
that our students will transfer to diverse real
world situations (e.g., APA, 2013). We cannot
shy away from teaching and testing for transfer
because it is difficult. Our work is intended to
provide some hope that even small efforts to
promote transfer can have long-term effects.
References
Agarwal, P. K., Karpicke, J. D., Kang, S. H. K.,
Roediger, H. L., III, & McDermott, K. B. (2008).
Examining the testing effect with open- and
closed-book tests. Applied Cognitive Psychology,
22, 861– 876. http://dx.doi.org/10.1002/acp.1391
American Psychological Association. (2013). APA
guidelines for the undergraduate psychology ma-
jor: Version 2.0. Retrieved from http://www.apa
.org/ed/precollege/undergrad/index.aspx
Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001).
A taxonomy for learning, teaching and assessing:
A revision of Bloom’s Taxonomy of educational
objectives: Complete ed. New York, NY: Long-
man.
Barnett, S. M., & Ceci, S. J. (2002). When and where
do we apply what we learn? A taxonomy for far
transfer. Psychological Bulletin, 128, 612– 637.
http://dx.doi.org/10.1037/0033-2909.128.4.612
Beatty, I. D., Gerace, W. J., Leonard, W. J., &
Dufresne, R. J. (2006). Designing effective ques-
tions for classroom response system teaching.
American Journal of Physics, 74, 31–39. http://dx
.doi.org/10.1119/1.2121753
202 SON AND RIVAS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Brown, A. L., & Kane, M. J. (1988). Preschool
children can learn to transfer: Learning to learn
and learning from example. Cognitive Psychology,
20, 493–523. http://dx.doi.org/10.1016/0010-
0285(88)90014-X
Butler, A. C. (2010). Repeated testing produces su-
perior transfer of learning relative to repeated
studying. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 36, 1118.
Butler, A. C., Fazio, L. K., & Marsh, E. J. (2011).
The hypercorrection effect persists over a week,
but high-confidence errors return. Psychonomic
Bulletin & Review, 18, 1238 –1244. http://dx.doi
.org/10.3758/s13423-011-0173-y
Caldwell, J. E. (2007). Clickers in the large class-
room: Current research and best-practice tips. CBE
Life Sciences Education, 6, 9 –20. http://dx.doi
.org/10.1187/cbe.06-12-0205
Carpenter, S. K., Cepeda, N. J., Rohrer, D., Kang,
S. H., & Pashler, H. (2012). Using spacing to
enhance diverse forms of learning: Review of re-
cent research and implications for instruction. Ed-
ucational Psychology Review, 24, 369 –378. http://
dx.doi.org/10.1007/s10648-012-9205-z
Catrambone, R., & Holyoak, K. J. (1989). Overcom-
ing contextual limitations on problem-solving
transfer. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 15, 1147–
1156. http://dx.doi.org/10.1037/0278-7393.15.6
.1147
Cepeda, N. J., Pashler, H., Vul, E., Wixted, J. T., &
Rohrer, D. (2006). Distributed practice in verbal
recall tasks: A review and quantitative synthesis.
Psychological Bulletin, 132, 354 –380.
Crossgrove, K., & Curran, K. L. (2008). Using click-
ers in nonmajors- and majors-level biology cours-
es: Student opinion, learning, and long-term reten-
tion of course material. CBE Life Sciences
Education, 7, 146 –154. http://dx.doi.org/10.1187/
cbe.07-08-0060
Crouch, C. H., & Mazur, E. (2001). Peer instruction:
Ten years of experience and results. American
Journal of Physics, 69, 970 –977. http://dx.doi.org/
10.1119/1.1374249
Cull, W. L. (2000). Untangling the benefits of mul-
tiple study opportunities and repeated testing for
cued recall. Applied Cognitive Psychology, 14,
215–235. http://dx.doi.org/10.1002/(sici)1099-
0720(200005/06)14:3%3C215::aid-acp640%3E3
.0.co;2-1
Dangel, H. L., & Wang, C. X. (2008). Student re-
sponse systems in higher education: Moving be-
yond linear teaching and surface learning. Journal
of Educational Technology Development and Ex-
change, 1, 93–104.
Dawson, D. L., Meadows, K. N., & Haffie, T. (2010).
The effect of performance feedback on student
help-seeking and learning strategy use: Do clickers
make a difference? The Canadian Journal for the
Scholarship of Teaching and Learning, 1, 6. http://
dx.doi.org/10.5206/cjsotl-rcacea.2010.1.6
Fies, C., & Marshall, J. (2006). Classroom response
systems: A review of the literature. Journal of
Science Education and Technology, 15, 101–109.
Gentner, D. (1983). Structure-mapping: A theoretical
framework for analogy. Cognitive Science, 7, 155–
170. http://dx.doi.org/10.1207/s15516709cog
0702_3
Gentner, D., Loewenstein, J., & Thompson, L.
(2003). Learning and transfer: A general role for
analogical encoding. Journal of Educational Psy-
chology, 95, 393– 408. http://dx.doi.org/10.1037/
0022-0663.95.2.393
Gettler, L. T., McDade, T. W., Agustin, S. S., Feranil,
A. B., & Kuzawa, C. W. (2015). Longitudinal
perspectives on fathers’ residence status, time al-
location, and testosterone in the Philippines. Adap-
tive Human Behavior and Physiology, 1, 124 –149.
http://dx.doi.org/10.1007/s40750-014-0018-9
Gick, M. L., & Holyoak, K. J. (1980). Analogical
problem solving. Cognitive Psychology, 12, 306 –
355. http://dx.doi.org/10.1016/0010-0285(80)
90013-4
Gick, M. L., & Holyoak, K. J. (1983). Schema in-
duction and analogical transfer. Cognitive Psy-
chology, 15, 1–38. http://dx.doi.org/10.1016/0010-
0285(83)90002-6
Gluckman, M., Vlach, H. A., & Sandhofer, C. M.
(2014). Spacing simultaneously promotes multiple
forms of learning in children’s science curriculum.
Applied Cognitive Psychology, 28, 266 –273.
http://dx.doi.org/10.1002/acp.2997
Halvorsen, J. A., Vleugels, R. A., Bjertness, E., &
Lien, L. (2012). A population-based study of acne
and body mass index in adolescents. Archives of
Dermatology, 148, 131–132.
Heckman, J. J., Moon, S. H., Pinto, R., Savelyev,
P. A., & Yavitz, A. (2010). The rate of return to the
High/Scope Perry preschool program. Journal of
Public Economics, 94(1–2), 114–128. http://dx.doi
.org/10.1016/j.jpubeco.2009.11.001
Horowitz, H. M. (2006). ARS revolution: Reflections
and recommendations. In D. A. Banks (Ed.), Au-
dience response systems in higher education: Ap-
plications and cases (pp. 53– 63). Hersey, PA:
Information Science. http://dx.doi.org/10.4018/
978-1-59140-947-2.ch004
Johnson, C. I., & Mayer, R. E. (2009). A testing
effect with multimedia learning. Journal of Edu-
cational Psychology, 101, 621.
Karpicke, J. D., & Blunt, J. R. (2011). Retrieval
practice produces more learning than elaborative
studying with concept mapping. Science, 331,
772–775. http://dx.doi.org/10.1126/science.11
99327
203DESIGNING CLICKER QUESTIONS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Kay, R. H., & LeSage, A. (2009). Examining the
benefits and challenges of using audience response
systems: A review of the literature. Computers &
Education, 53, 819 – 827. http://dx.doi.org/10
.1016/j.compedu.2009.05.001
Keough, S. M. (2012). Clickers in the classroom: A
review and a replication. Journal of Management
Education, 36, 822– 847. http://dx.doi.org/10
.1177/1052562912454808
Kornell, N., & Son, L. K. (2009). Learners’ choices
and beliefs about self-testing. Memory, 17, 493–
501.
Landrum, R. E. (2015). Teacher-ready research re-
view: Clickers. Scholarship of Teaching and
Learning in Psychology, 1, 250 –254. http://dx.doi
.org/10.1037/stl0000031
Martyn, M. (2007). Clickers in the classroom: An
active learning approach. EDUCAUSE Quarterly,
30, 71.
Mayer, R. E., Stull, A., DeLeeuw, K., Almeroth, K.,
Bimber, B., Chun, D.,...Zhang, H. (2009).
Clickers in college classrooms: Fostering learning
with questioning methods in large lecture classes.
Contemporary Educational Psychology, 34, 51–
57.
McDaniel, M. A., Agarwal, P. K., Huelser, B. J., Mc-
Dermott, K. B., & Roediger, H. L., III. (2011). Test-
enhanced learning in a middle school science class-
room: The effects of quiz frequency and placement.
Journal of Educational Psychology, 103, 399 – 414.
http://dx.doi.org/10.1037/a0021782
McDaniel, M. A., Roediger, H. L., III, & McDermott,
K. B. (2007). Generalizing test-enhanced learning
from the laboratory to the classroom. Psychonomic
Bulletin & Review, 14, 200 –206. http://dx.doi.org/
10.3758/BF03194052
McDaniel, M. A., Thomas, R. C., Agarwal, P. K.,
McDermott, K. B., & Roediger, H. L. (2013).
Quizzing in middle-school science: Successful
transfer performance on classroom exams. Applied
Cognitive Psychology, 27, 360 –372. http://dx.doi
.org/10.1002/acp.2914
Nairne, J. S. (2006). Modeling distinctiveness: Im-
plications for general memory theory. Distinctive-
ness and Memory, 27– 46.
Pashler, H., Rohrer, D., Cepeda, N. J., & Carpenter,
S. K. (2007). Enhancing learning and retarding
forgetting: Choices and consequences. Psycho-
nomic Bulletin & Review, 14, 187–193. http://dx
.doi.org/10.3758/BF03194050
Robertson, L. J. (2000). Twelve tips for using a
computerized interactive audience response sys-
tem. Medical Teacher, 22, 237–239. http://dx.doi
.org/10.1080/01421590050006179
Roediger, H. L., Agarwal, P. K., McDaniel, M. A., &
McDermott, K. B. (2011). Test-enhanced learning
in the classroom: Long-term improvements from
quizzing. Journal of Experimental Psychology:
Applied, 17, 382–395. http://dx.doi.org/10.1037/
a0026252
Roediger, H. L., III, & Karpicke, J. D. (2006). The
power of testing memory: Basic research and im-
plications for educational practice. Perspectives on
Psychological Science, 1, 181–210. http://dx.doi
.org/10.1111/j.1745-6916.2006.00012.x
Rohrer, D., Taylor, K., & Sholar, B. (2010). Tests
enhance the transfer of learning. Journal of Exper-
imental Psychology: Learning, Memory, and Cog-
nition, 36, 233–239. http://dx.doi.org/10.1037/
a0017678
Schweinhart, L. J. (1993). Significant benefits: The
High/Scope Perry preschool study through Age 27.
Ypsilanti, MI: High/Scope Educational Research
Foundation.
Son, J. Y., Doumas, L. A. A., & Goldstone, R. L.
(2010). When do words promote analogical trans-
fer? Journal of Problem Solving, 3, 52–92. http://
dx.doi.org/10.7771/1932-6246.1079
Vlach, H. A., & Sandhofer, C. M. (2012). Fast map-
ping across time: Memory processes support chil-
dren’s retention of learned words. Frontiers in
Psychology, 3, 46.
Wieman, C., Perkins, K., Gilbert, S., Benay, F., Ken-
nedy, S., Semsar, K., & Simon, B. (2008). Clicker
resource guide: An instructors guide to the effec-
tive use of personal response systems (clickers) in
teaching. Vancouver, Canada: University of Brit-
ish Columbia.
204 SON AND RIVAS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Appendix
Questions That Appeared on the Midterm and Final Exam
During lecture On midterm
(Testosterone question)
From this study’s results (testosterone levels drop after
fatherhood), can we conclude that fatherhood causes
testosterone levels to drop more steeply?
From the testosterone study’s results (testosterone levels
drop after fatherhood), can we conclude that
fatherhood causes testosterone levels to drop more
steeply?
A. No, we cannot conclude this. A) We can conclude that fatherhood influences
testosterone levels but there are probably other factors
involved as well.
B. Yes, we can conclude this. B) Yes, we can conclude this because of study’s research
design was scientific.
C. We can conclude that fatherhood influences
testosterone levels but there are probably other
factors involved as well.
C) Yes, we can conclude this because they tested groups
of men at two different times, it was a longitudinal
study.
D.Imnotsure... D)No,wecannot conclude this because this was not an
experiment.
(Correlational preschool question)
Does this result [children who go to preschool usually
have better educational and economic outcomes]
show that preschool causes better educational and
economic outcomes?
A. No, this is a correlational study.
B. No, this is a correlational experiment.
C. Yes, this shows the results of an experiment.
D. Yes, this result avoids the third variable and
directionality problem.
E. This shows that preschool partially leads to better
educational and economic outcomes but there may
be other factors involved.
(Experimental preschool question)
Does the Perry Preschool result show that preschool
causes better educational and economic outcomes?
The Perry Preschool Project showed which of the
following:
A. No, this is a correlational study. A) Preschool plays a causal role in long-term social and
cognitive benefits.
B. No, this is a correlational experiment. B) Preschool is a cheap way of educating all Americans.
C. Yes, this shows the results of an experiment and
avoids the third variable and directionality problem.
C) Most parents who send their children to preschool are
educated and understand the long-term benefits of
attending preschool.
D. Yes, this shows that preschool made the treatment
children smarter.
D) Preschool is only beneficial for children who do not
come from abused or neglected backgrounds.
E. Yes, this shows that preschool gives children soft
skills.
Women who smoke during pregnancy are also more
likely to drink alcohol. Does this mean that smoking
during pregnancy causes these women to drink
alcohol?
Women who smoke during pregnancy are also more
likely to drink alcohol. Does this mean that smoking
during pregnancy causes these women to drink
alcohol?
A. Yes, smoking is a social drug similar to alcohol so
when they smoke, they also might be offered
alcohol.
A) Yes, smoking is a social drug similar to alcohol so
when they smoke, they also might be offered alcohol.
(Appendix continues)
205DESIGNING CLICKER QUESTIONS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Appendix (continued)
During lecture On midterm
B. Yes, this result came from a correlational study and
all correlational studies can establish causation.
B) Yes, this result came from a correlational study and
all correlational studies can establish causation.
C. No, this was not the result of an experiment. There
might be a third variable that causes both smoking
and drinking.
C) No, this was not the result of an experiment. There
might be a third variable that causes both smoking
and drinking.
D. No, this was not the result of an experiment. We do
not know whether drinking causes smoking or
smoking causes drinking.
D) No, this was not the result of an experiment. We do
not know whether drinking causes smoking or
smoking causes drinking.
E. C and D are both correct. E) C and D are both correct.
Which of the following demonstrates that parental
sensitivity causes secure attachment in children?
Which of the following demonstrates that parental
sensitivity causes secure attachment in children?
A. Parents who respond to their children’s cries quickly
are more likely than other parents to have children
with secure attachments.
A) Adults who were securely attached to their parents
during their childhood are more likely than other
adults to have children with secure attachments.
B. Children who are temperamentally easy are more
likely to have secure attachments to their caregivers.
B) Parents who are taught to be responsive to their
irritable children are more likely than parents who are
not taught this to have children who are securely
attached.
C. Parents who are taught to be responsive to their
irritable children are more likely than parents who
are not taught this to have children who are securely
attached.
C) Children who are temperamentally easy are more
likely to have secure attachments to their caregivers.
D. Adults who were securely attached to their parents
during their childhood are more likely than other
adults to have children with secure attachments.
D) Parents who respond to their children’s cries quickly
are more likely than other parents to have children
with secure attachments.
Which research question must be tested through a
correlational study?
A) whether parents who frequently hit their children have
more aggressive children
B) whether reinforcement promotes superior learning
C) whether children with ADHD respond best to a
certain medication
D) whether varying the temperature of a room affects
personality
If someone wanted to know whether massages would
CAUSE low birth weight (LBW) infants to gain
weight, which experimental design would be best?
A) LBW infants should be randomly assigned to be in
the massage condition or the non-massage condition.
B) LBW infants should all be in the control condition.
Then they should be massaged.
C) Female LBW infants should be massaged and male
LBW infants should not be massaged. Female LBW
infants are naturally smaller so they would need to be
massaged.
D) LBW infants should be massaged and their weight
should be compared to infants of normal birth weight.
During lecture On final exam
Which of these shows that phonemic awareness causes
reading achievement?
Which of these shows that phonemic awareness causes
increases in reading achievement?
A. Teaching phonemic-awareness skills to 4- and 5-
year-olds produces better readers (and spellers) for
at least four years after the training.
A) Children who know which sounds go with which
letters before entering kindergarten show an
advantage in reading up through 7th grade.
(Appendix continues)
206 SON AND RIVAS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Appendix (continued)
During lecture On final exam
B. Children who have phonemic awareness tend to be
better readers and spellers.
B) Children who have parents that teach them phonemic
awareness earlier are better readers.
C. Children who have parents that teach them phonemic
awareness earlier are better readers.
C) Teaching phonemic-awareness skills to a randomly
selected group of preschoolers produces better reading
(and spelling) skills.
D. Children who come into school already knowing
alphabet tend to be better readers (up through 7th
grade).
D) Preschool children who have phonemic awareness are
better readers and spellers later on when they are in
2nd and 3rd grade.
E) All of the above.
Did Mischel’s studies show that delay of gratification
(DoG) skills cause better academic and social
outcomes?
Did Walter Mischel’s longitudinal “marshmallow”
studies show that delay of gratification (DoG) skills
cause better academic and social outcomes?
A. Yes, because his longitudinal study that followed
kids through their life showed that traits they had in
preschool came before their future successes.
A) Yes because his longitudinal study that followed kids
through their life showed that traits they had in
preschool came before their future successes.
B. Yes, because his study was an experiment that
measured DoG skills in a laboratory setting.
B) Yes, because his study was an experiment that
measured DoG skills in a laboratory setting.
C. No, his longitudinal study was not an experiment so
we cannot rule out third variables that may have
caused both DoG skills and better outcomes.
C) No, his longitudinal study was not an experiment so
we cannot rule out third variables that may have
caused both DoG skills and better outcomes.
D. No because his study was cross-sectional rather than
longitudinal, thus you cannot show causation.
D) No because his study was cross-sectional rather than
longitudinal, thus you cannot show causation.
Do these studies show that teens can prevent acne by
losing weight? (e.g., Do these studies show that
weight gain causes acne?)
Recent research (Halvorsen, Vleugels, Bjertness, & Lien,
2012) has found that overweight or obese teenagers—
particularly young women—were significantly more
likely to develop acne than normal-weight
adolescents. Does this study show that obesity causes
acne?
A. Yes, these are all published experiments. A) No, these are correlational studies.
B. Yes, these studies show stress hormones affects both
acne and weight gain.
B) No, these are correlational experiments.
C. No, these are correlational studies. C) Yes, these are all published experiments.
D. No, these are correlational experiments. D) Yes, these studies show stress hormones affects both
acne and weight gain.
E. This shows that weight gain does partially cause
acne but is probably not the only determinant.
E) This shows that weight gain does partially cause acne
but is probably not the only determinant.
Which research question must be tested through a
correlational study?
A) whether socioculturally based teaching promotes
superior learning
B) whether depressed parents have more depressed
children
C) whether children with ADHD respond best to a
certain type of study environment
D) whether rewarding children with particular foods
changes their food preferences
Why is it difficult to fully understand the causal
relationship between attachment security during
infancy and later functioning?
A) The studies are usually correlational.
B) Because attachment is highly abstract and there are no
methods to measure or evaluate its quality.
C) Both A and B contribute to this difficulty.
D) Neither A nor B contribute to this difficulty.
Note. The retention questions appear with their counterpart that was presented during lecture. The transfer questions do
not have a counterpart presented during lecture.
Received January 7, 2016
Revision received June 6, 2016
Accepted June 20, 2016
207DESIGNING CLICKER QUESTIONS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
... When looking at optimizing practice quizzing, the two-way interaction between subjects and orienting task is one promising starting point. This is supported by a growing body of research which suggests that learners be presented with quiz questions that are adapted to their current state of knowledge instead of everyone being presented with the same quiz questions in the same order, which is often the case (e.g., Bae et al., 2019;McDaniel et al., 2013;Son & Rivas, 2016;Wooldridge et al., 2014). ...
... When looking at optimizing practice quizzing, the two-way interaction between subjects and orienting task is one promising starting point. This is supported by a growing body of research which suggests that learners be presented with quiz questions that are adapted to their current state of knowledge instead of everyone being presented with the same quiz questions in the same order, which is often the case (e.g., Bae et al., 2019;McDaniel et al., 2013;Son & Rivas, 2016;Wooldridge et al., 2014). ...
Article
Providing quiz questions has emerged as a powerful means to support learning. However, it is still unclear whether adaptive practice quizzing will enhance beneficial effects in authentic contexts. To address this question, university students (N = 188; n = 155 female) were randomly assigned to employ either adaptive practice quizzing, non- adaptive practice quizzing, or note-taking following three consecutive sessions of a standard psychology university lecture for undergraduate pre-service teachers. In the adaptive practice quizzing condition, quiz questions were a- dapted to learners’ expertise via cognitive demand ratings, whereas in the non-adaptive condition quiz questions followed a fixed sequence. Students in the adaptive practice quizzing condition outperformed those in the non- adaptive condition after a two-week delay, but not after a one-week delay. Exploratory mediation analyses show that performance on the quiz questions during the learning phase seems to be partly responsible for this effect.
... Transfer tests act as an assessment of meaningful learning by requiring learners to make inferences and apply their knowledge using organized and integrated mental models (Wittrock, 1989;Mayer, 2014). Transfer, broadly defined as the productive use of prior learning in a novel context, is often associated with meaningful learning and is a paramount goal of education (e.g., Dudai et al., 2007;Karpicke, 2017;Son & Rivas, 2016; for review, see . ...
... Retrieval practice, having students retrieve and apply information, is an effective means of promoting transfer, especially when the retrieval events are spaced over time (Agarwal and Bain 2019). The testing effect, in which students learn by taking frequent, typically low-stakes, tests over material, is also an effective means of enhancing transfer (Son and Rivas 2016;Veltre, Cho, and Neely 2015), although the kinds of questions used to test knowledge are important. The questions should promote reasoning and application in novel, but still similar, contexts. ...
Article
The authors describe a research-based conceptual framework of how students learn that can guide the design, implementation, and troubleshooting of teaching practice. The framework consists of nine interacting cognitive challenges that teachers need to address to enhance student learning. These challenges include student mental mindset, metacognition and self-regulation, student fear and mistrust, prior knowledge, misconceptions, ineffective learning strategies, transfer of learning, constraints of selective attention, and the constraints of mental effort and working memory. The challenges are described with recommendations on how to address each one. What is effective for one situation may not be effective in others, and no single teaching method will always be optimal for all teachers, students, topics, and educational contexts. The teacher’s task is to manage this complex interaction successfully.
... Hence, classroom use of technology must be scrutinized to aid in the completion of learning goals, such as increasing engagement or for review sessions (Son & Rivas, 2016;Sanchez et al., 2019;Stachowski & Hamilton, 2019), but not be entirely depended on to motivate students. ...
Article
Full-text available
Based on the tenets of self-determination theory, intrinsic motivation is guided by satisfaction of the 3 basic psychological needs—autonomy, competence, and relatedness. However, recent research has shown promise for adding a new basic psychological need—novelty—in self-determination theory. This article briefly discusses the theory behind novelty as a motivator in the classroom, as well as its effect in technology and learning and future directions for research. As a motivator, novelty has mixed and complex outcomes in the classroom. Balancing novelty and familiarity, or scaffolding, is a common and effective pedagogical practice. Technology is now commonly used as a novel factor in the classroom, although can prove to be expensive. The largest drawback to novelty is its ability to become familiar, therefore instructors must understand what a student has previously experienced and continue to adapt practices to create subjective novelty for their students. Further experimental research is needed to explore the effects of novel teaching practices, including the use of technology, on student motivation and learning outcomes.
... Retrieval practice effects on higher order learning may be more sensitive than fact learning to encoding factors, such as the way material is presented during study (Eglington & Kang, 2016). In addition, retrieval practice may be more beneficial for higher order learning if it includes more scaffolding (Fiechter & Benjamin, 2017; but see and targeted practice with application questions (Son & Rivas, 2016). ...
Article
Full-text available
The science of learning has made a considerable contribution to our understanding of effective teaching and learning strategies. However, few instructors outside of the field are privy to this research. In this tutorial review, we focus on six specific cognitive strategies that have received robust support from decades of research: spaced practice, interleaving, retrieval practice, elaboration, concrete examples, and dual coding. We describe the basic research behind each strategy and relevant applied research, present examples of existing and suggested implementation, and make recommendations for further research that would broaden the reach of these strategies.
Article
Full-text available
Given the growing interest in retrieval practice among educators, it is valuable to know when retrieval practice does and does not improve student learning—particularly for educators who have limited classroom time and resources. In this literature review, we developed a narrow operational definition for “classroom research” compared to previous reviews of the literature. We screened nearly 2000 abstracts and systematically coded 50 experiments to establish a clearer picture of benefits from retrieval practice in real world educational settings. Our review yielded 49 effect sizes and a total n = 5374, the majority of which (57%) revealed medium or large benefits from retrieval practice. We found that retrieval practice improved learning for a variety of education levels, content areas, experimental designs, final test delays, retrieval and final test formats, and timing of retrieval practice and feedback; however, only 6% of experiments were conducted in non-WEIRD countries. Based on our review of the literature, we make eight recommendations for future research and provide educators with a better understanding of the robust benefits of retrieval practice across a range of school and classroom settings.
Article
Full-text available
Over the past decade, instructors in colleges and universities increasingly have used Student Response Systems (SRSs)--typically in large classes to increase the level of student engagement and learning. Research shows that both students and instructors perceive SRSs to be beneficial, although evidence of improved learning has been less clear. Experts emphasize that instructors must consider how technology might enhance good pedagogy in order for increases in learning to occur. SRSs do increase student engagement and provide prompt feedback—two key practices that promote learning. However, professional groups propose goals for students in higher education that focus on deep learning rather than the knowledge-centered emphasis of many large classes. Recent research shows that SRSs coupled with pedagogical enhancements can promote deep learning when teaching and questioning strategies center on higher-level thinking skills. A framework integrating the levels of student responses with principles for good pedagogical practice is provided as a guide for using SRSs to foster deep learning
Article
Full-text available
Past paternal psychobiology research has focused almost exclusively on biological, residential fathers and the role of fathers as direct caregivers. Here, drawing on a large sample of Filipino men, we help to expand this research area by testing for relationships between fathers’ testosterone, prolactin, and weekly hours in work, childcare, and recreation. Using longitudinal data collected when men were an average of 21.5 and 26.0 years old, we tested whether changes in fathers’ investments in childcare and work interrelated with testosterone changes. We also assessed whether fathers’ residence status affected paternal testosterone changes. Cross-sectionally, we did not find evidence that fathers’ testosterone or prolactin varied based on work effort or weekly hours of childcare (all p > 0.1). Fathers who increased their weekly involvement in childcare between baseline and follow-up experienced declines in testosterone, on average (p
Article
Full-text available
We examined whether learning from quizzing arises from memorization of answers or fosters more complete understanding of the quizzed content. In middle-school science classes, we spaced three multiple-choice quizzes on content in a unit. In Experiment 1, the class exams included questions given on quizzes, transfer questions targeting the same content, and content that had not been quizzed (control content). The quizzing procedure was associated with significant learning benefits with large effect sizes and similar effect sizes for both transfer items and identical items. In Experiment 2, quiz questions focused on definitional information or application of the principle. Application questions increased exam performance for definitional-type questions and for different application questions. Definition questions did not confer benefits for application questions. Test-enhanced learning, in addition to other factors in the present quizzing protocol (repeated, spaced presentation of the content), may create deeper understanding that leads to certain types of transfer. Copyright (c) 2013 John Wiley & Sons, Ltd.
Article
Despite a century's worth of research, arguments surrounding the question of whether far transfer occurs have made little progress toward resolution. The authors argue the reason for this confusion is a failure to specify various dimensions along which transfer can occur, resulting in comparisons of "apples and oranges." They provide a framework that describes 9 relevant dimensions and show that the literature can productively be classified along these dimensions, with each study situated at the intersection of Various dimensions. Estimation of a single effect size for far transfer is misguided in view of this complexity. The past 100 years of research shows that evidence for transfer under some conditions is substantial, but critical conditions for many key questions are untested.
Chapter
This chapter describes my 25-year journey and experience with audience response systems (ARS), starting with my first realization of the potential of ARS while teaching at a University as an adjunct professor. A synopsis of the initial ARS experiment conducted in the mid-1980s at IBM's Management Development Center serves as a baseline. The conclusions from this study justified the use of keypads in the classroom at IBM, and after publication, set the stage for the growth of the ARS industry. The ARS activities pursued after retiring from IBM in 1988 are described, including the advances that my companies made in software, graphics, and keypad technology, which we incorporated into our products. Finally, the chapter offers 10 recommendations for higher quality questions developed by ARS users. I conclude that these recommendations are critical prerequisites to the continued growth of the ARS industry in academia.
Article
To explain the specificity of retention, students of memory appeal often to the concept of distinctiveness. Distinctiveness is not a fixed property of a cue, or a target trace, or even of an interaction between a given cue and a given target. This chapter introduces a simple retrieval model and shows how it helps account for some of the phenomena classically associated with the study of distinctiveness. It shows how the model informs us about the particulars of the von Restorff effect and about the paradoxical effects of processing similarity and difference on episodic retrieval. It then considers the role of time in the calculation of distinctiveness and contrasts the retrieval model with certain extant models of temporal distinctiveness. The chapter ends by discussing how the retrieval model forces us to reassess some widely held beliefs about memory, particularly the notion that memory is directly related to the match between an encoded cue and an encoded target.
Article
Student response systems (AKA clickers) are being used widely by educators, and the pedagogical research that documents their benefits and drawbacks continues to increase. In this teacher-ready research review, I provide a brief overview of the current literature, review the research about clickers influencing student performance, provide an overview about how clickers are used in additional contexts, and close with recommendations and thoughts about the optimal use of clickers.
Article
Spacing multiple study opportunities apart from one another is known by psychologists to be a highly effective study method (see Dempster, 1996). This study examines whether including tests during study would produce practical benefits for learning beyond that provided by distributed study alone. In addition, spacing of both study and test (massed, uniform distributed, and expanding distributed) is investigated. To-be-remembered information was repeated with a single learning session (Experiment 1), reviewed immediately after initial learning (Experiment 2), or reviewed days after initial learning (Experiments 3 and 4). As expected, large distributed practice effects were shown across experiments. In addition to these effects, testing produced significant benefits for learning in all four experiments, which were of moderate or large size (Cohen's d of 0.52 to 1.30) for three experiments. Expanding test spacing, however, did not independently benefit learning in any of the learning situations studied. Educators should take advantage of the large benefits that distributed study and testing have on learning by spacing multiple tests of information within learning sessions and by distributing tests across multiple review sessions.
Article
The spacing effect refers to the robust finding that long-term memory is promoted when learning events are distributed in time rather than massed in immediate succession. The current study extended research on the spacing effect by examining whether spaced learning schedules can simultaneously promote multiple forms of learning, such as memory and generalization, in the context of an educational intervention. Thirty-six early elementary school-aged children were presented with science lessons on one of three schedules: massed, clumped, and spaced. At a 1-week delayed test, children in the spaced condition demonstrated improvements in both memory and generalization, significantly outperforming children in the other conditions. However, there was no observed relationship between children's memory performance and generalization performance. The current study highlights directions for future research and contributes to a growing body of work demonstrating the benefits of spaced learning for educational curriculum. Copyright © 2014 John Wiley & Sons, Ltd.