ArticlePDF Available

Abstract and Figures

Multiple-choice questions are frequently used in college classrooms as part of student assessment. While multiple-choice assessments (compared to other formats such as constructed response) seem to be the preferred method of testing by instructors and students, their effectiveness in assessing comprehension in college courses is sometimes called into question. Research has shown that there are ways to optimize the construction and use of multiple-choice testing to benefit college classroom instruction and assessment, student learning, and performance, and to more efficiently utilize instructor’s time and energy. This teacher-ready research review provides an overview of the research on utilizing multiple-choice questions as well as some tips on using, writing, and administering multiple-choice questions during assessments. We also summarize the benefits and potential issues with using multiple-choice questions including concerns about cheating, ways to detect and deter cheating, and testing issues and strategies unique to online formats. We hope that this short review will be helpful to instructors as they prepare courses and assessments and further encourage the use of empirical data in pedagogy related decision-making.
Content may be subject to copyright.
TEACHER-READY RESEARCH REVIEW
Multiple-Choice Questions: Tips for Optimizing Assessment
In-Seat and Online
Xiaomeng Xu, Sierra Kauer, and Samantha Tupy
Idaho State University
Multiple-choice questions are frequently used in college classrooms as part of student
assessment. While multiple-choice assessments (compared to other formats such as
constructed response) seem to be the preferred method of testing by instructors and
students, their effectiveness in assessing comprehension in college courses is some-
times called into question. Research has shown that there are ways to optimize the
construction and use of multiple-choice testing to benefit college classroom instruction
and assessment, student learning, and performance, and to more efficiently utilize
instructor’s time and energy. This teacher-ready research review provides an overview
of the research on utilizing multiple-choice questions as well as some tips on using,
writing, and administering multiple-choice questions during assessments. We also
summarize the benefits and potential issues with using multiple-choice questions
including concerns about cheating, ways to detect and deter cheating, and testing issues
and strategies unique to online formats. We hope that this short review will be helpful
to instructors as they prepare courses and assessments and further encourage the use of
empirical data in pedagogy related decision-making.
Keywords: multiple-choice, testing, exams, assessment, pedagogy
Multiple-choice questions are commonly
used for assessment in college classrooms. In-
troductory and lower-level courses are espe-
cially likely to feature quizzes and exams that
contain at least some multiple-choice items as
class sizes tend to be larger. Efficiency in grad-
ing tests is also one reason multiple-choice is
utilized in standardized testing for placement
into college and graduate school. Research in-
dicates that instructors may have a preference
for multiple-choice exams not only because it
makes administering and grading the exam sim-
pler, but because it allows for objective and
consistent grading (Simkin & Kuechler, 2005;
Zeidner, 1987). Students also tend to prefer
multiple-choice questions due to reduced in-
structor bias and because they consider multi-
ple-choice question exams to be easier since
they can choose answers based on the process of
elimination (Simkin & Kuechler, 2005;
Struyven, Dochy, & Janssens, 2005;Tozoglu,
Tozoglu, Gurses, & Dogar, 2004;Zeidner,
1987). This preference by instructors and stu-
dents for multiple-choice exams has led to high
use of multiple-choice testing in college. De-
spite the preference by many instructors and
students for this format, multiple-choice ques-
tion assessments are sometimes called into
question regarding their ability to assess com-
prehension (Ozuru, Briner, Kurby, & McNa-
mara, 2013).
In this teacher-ready research review, we
summarize and discuss the literature on multi-
ple-choice testing (focusing on assessment,
feedback, and efficiency), highlight valid con-
cerns, and provide tips for attenuating or by-
passing potential issues (see Table 1 for some
highlights). These tips are by no means exhaus-
tive, and other reviews of multiple-choice test-
ing and strategies exist, perhaps most notably
Haladyna, Downing, and Rodriguez’s (2002)
taxonomy of 31 multiple-choice item-writing
Xiaomeng Xu, Sierra Kauer, and Samantha Tupy, De-
partment of Psychology, Idaho State University.
Correspondence concerning this article should be ad-
dressed to Xiaomeng Xu, Department of Psychology, Idaho
State University, 921 S 8th Avenue, Stop 8112, Pocatello,
ID 83209. E-mail: xuxiao@isu.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Scholarship of Teaching and Learning in Psychology © 2016 American Psychological Association
2016, Vol. 2, No. 2, 147–158 2332-2101/16/$12.00 http://dx.doi.org/10.1037/stl0000062
147
guidelines. This review outlines some of their
key points but also provides further information
based on more recent empirical literature. In
addition, this review also addresses multiple-
choice testing and optimization specifically in
the context of online assessments (e.g., in online
or hybrid courses).
Assessment
Effectiveness and Assessment Quality
Constructed-response exams consist of ques-
tions where the test-taker needs to produce an
answer rather than selecting from provided op-
tions (as per multiple-choice exams). These con-
structed-responses products may be short such as
with fill-in-the-blank or longer such as with short
answers and essays. Compared to constructed-
response exams, multiple-choice exams can be
effective, yielding similar test scores (e.g., Hick-
son, Reed, & Sander, 2012) or even higher test
scores (e.g., Park, 2010). Multiple-choice testing
has been shown to positively enhance retention of
the material that is tested (a testing effect) and to
boost performance on later tests (see Marsh, Roe-
diger, Bjork, & Bjork, 2007, for a review). Stu-
dents also report that multiple-choice questions
bolster confidence and self-esteem and are useful
for learning basic concepts (Douglas, Wilson, &
Ennis, 2012). It is important to note however that
one reason students may typically prefer multiple-
choice questions is because they perform better on
them than on short-answer questions, which may
provide misinformation to instructors about true
student learning (e.g., Funk & Dickson, 2011).
Multiple-choice questions may improve
quality of testing as it enables test-givers to
ask a greater number of questions on a
Table 1
Optimizing Multiple-Choice Question Testing
Domain Optimization techniques
Assessment quality Utilize questions designed for higher-order cognitive assessment.
Discourage students from guessing and/or utilize methods of scoring that penalize
guessing.
Improve quality of exams by conducting item analyses.
Utilize collaborative testing when appropriate.
Fairness Use questions that are clearly written.
Write questions that cover a broad range of topics.
Use questions that are consistent with the syllabus.
Inform students about the knowledge to be assessed.
Feedback Utilize elaborative feedback.
Timing (i.e., immediate or delayed) of feedback should be based on difficulty level of item
and context of the assessment.
Give students opportunities to self-correct.
Provide elaborate and timely feedback for online assessments using software.
Solicit feedback on assessments from students.
Formatting and content Utilize 3-choice items.
Avoid questions that use negatives.
Avoid multipart and giveaway questions.
Avoid “none of the above” questions.
Avoid composite answers such as “A and B but not C.”
Avoid “all of the above” questions.
Choices should be parallel in structure and equal in length.
Question stems should be as short as possible, but contain all relevant information.
Randomize answer positions.
Cheating countermeasures Use alternate test forms.
Utilize alternate or assigned seating for assessments.
Provide students with an academic integrity policy.
Utilize honor codes and academic honesty agreements.
Draw from a large question bank and randomize question answers and order.
Defer question feedback until after online assessments close.
Change exam questions between semesters.
Utilize “lockdown browsers” for online classes.
148 XU, KAUER, AND TUPY
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
broader set of topics (which can contribute to
the reliability of the assessment), increases
perceived objectivity in the grading process
(e.g., ensuring that students do not lose points
for poor spelling, grammar, or writing skills),
reduces subjectivity/inconsistency/human er-
rors in scoring (particularly when machine
graded), and can reduce student anxiety (Sim-
kin & Kuechler, 2005). Multiple-choice test-
ing also allows instructors to calculate exam
statistics such as item-total correlations
(sometimes referred to as discrimination co-
efficients) which are the correlations between
scores on a particular item (dichotomized into
0 for incorrect and 1 for correct) and overall
test scores (see Davis, 1993, for instructions
on how to calculate discrimination levels by
hand if correlation coefficient computer pro-
grams are not available), and utilize these
statistics in improving the quality of subse-
quent exams.
Shallower Assessment
Instructors may be concerned that multiple-
choice questions do not assess comprehension,
higher-order cognitions, or deeper understand-
ing of the topic the way short-answer and essay
questions can (e.g., Ozuru et al., 2013). It is also
more difficult to pinpoint student’s true knowl-
edge (e.g., why they got the question right/
wrong, if they were guessing) and to provide
partial credit with multiple-choice testing. Re-
lated to this issue, students are also more likely
to perceive multiple-choice tests as assessing
lower-level cognitive processing (e.g., memo-
rizing facts) and thus use more surface-learning
approaches when they are aware that multiple-
choice testing will be used (e.g., Scouller, 1998;
Scouller & Prosser, 1994;Yonker, 2011). Low-
er-level courses with large enrollments such as
introductory psychology classes are more likely
to utilize (sometimes exclusively) multiple-
choice testing. Introductory psychology stu-
dents also exhibit low levels of course material
retention (e.g., 56% accuracy 2 years after the
class), and it is possible that multiple-choice
tests, as well as the way students perceive these
tests and shallowly study for them (focusing on
recall rather than meaning and understanding),
may contribute to this issue (Landrum & Gu-
rung, 2013;Scouller, 1998;Scouller & Prosser,
1994).
Assessing Deeper Thinking
While many instructors utilize multiple-
choice questions (effectively) to assess more
superficial or rote-learning understanding (e.g.,
definitions, identifying a lobe of the brain), it is
also possible for multiple-choice questions to
assess higher-order cognitive skills, synthesis,
application, and other components of deeper
understanding and thinking (e.g., Simkin &
Keuchler, 2005;Tractenberg, Gushta, Mul-
roney, & Weissinger, 2013).
To assess more higher-order cognitive skills,
test-makers can create multiple-choice ques-
tions that ask for the best answer available (e.g.,
what was the main reason, which is most
likely), with multiple choices offering plausible
(even good) answers but in which only one is
the best answer. Deeper-thinking questions may
also ask students to identify the theory that an
example illustrates or to predict the outcome of
a hypothetical (Davis, 1993). Some examples of
multiple-choice questions that assess deeper un-
derstanding and thinking have been shared by
psychologists such as Steven Pinker (Steven
Pinker’s mind games, 2014), and via helpful
assessment tools by teachers/researchers in
other fields (e.g., the Blooming Biology Tool;
Crowe, Dirks, & Wenderoth, 2008).
Different formats of multiple-choice ques-
tions can also be used to assist in assessing
deeper thinking and understanding. One exam-
ple is the ordered multiple-choice testing format
(Briggs, Alonzo, Schwab, & Wilson, 2006),
where each answer choice represents a different
developmental level of understanding. Thus in-
terpreting item responses provides instructors
with a better grasp of how deeply a student
understands the content.
Another method to try to ensure that students’
true understanding of the topic is assessed
(rather than random guessing) is to conduct
confidence testing (see Davies, 2002), where
students are required to report level of confi-
dence in knowing the answer prior to selecting
(or even being able to view with online testing)
answer choices. Scoring is then based on a
combination of whether or not students selected
the correct answer, as well as how confident
they were in their response. Thus, a confident
149MULTIPLE-CHOICE ASSESSMENTS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
response paired with a correct answer would
score higher than a nonconfident response (e.g.,
guessing) paired with a correct answer. This
method may be problematic, however, due to
logistics (e.g., increased grading time if calcu-
lation of augmented scores is not automated)
and concerns about individual differences
(rather than actual knowledge) affecting how
confidently students respond. Some research
has shown that personality affects confidence
reporting and scores (R. Hansen, 1971;Jacobs,
1971), although other research has shown that
correlations between confidence testing scores
and personality do not persist once students
have had practice with confidence testing and
their performance is taken into account (Ech-
ternacht, Boldt, & Sellman, 1972).
Foster and Miller (2009) provided another
format (discrete-option multiple choice) that
may better assess true understanding than typi-
cal multiple-choice formats. This format (opti-
mized for online testing) randomly presents the
answer options for the question one at a time to
the student who selects whether or not that
option is correct. Jensen et al. (2006) also de-
scribed another format (“You are the teacher”
questions), which asks students to imagine that
they are a teacher correcting a short-answer/
essay and their goal is to identify all errors in
the passage they are reading. The multiple-
choice question therefore is a short paragraph
and the answer choices are different numbers of
errors (e.g., 0, 1, 2, 3, 4, or more).
Finally, instructors concerned about guessing
may explicitly discourage students from guess-
ing or utilize a method of scoring that does so
(e.g., penalizing for incorrect answers as per
some standardized tests such as the SAT). One
method is formula scoring which occurs when a
proportion of the number of incorrect response
is subtracted from the number of correct re-
sponses (see Frary, 1988).
Low-Quality Questions and Answers
Multiple-choice questions can be effective,
but there is room for improvement on their
quality. For example, DiBattista and Kurzawa
(2011) analyzed 1,198 multiple-choice items
from exams across 16 undergraduate classes
and found that many were flawed. The most
common problems included nonoptimal incor-
rect answers (i.e., selected by fewer than 5% of
examinees) and unsatisfactory item-total corre-
lations (i.e., less than .20). Based on the re-
sults of their analyses, DiBattista and Kurzawa
(2011) suggested that test-makers improve the
quality of multiple-choice exams by conducting
an item analysis after administering a test and
removing or replacing incorrect items that
weaken discriminatory power. DiBattista and
Kurzawa also noted that postsecondary institu-
tions should provide training for faculty on test-
ing (including how to interpret an item analysis
report) and support for faculty to improve their
assessments (including providing faculty with
an item analysis report following every multi-
ple-choice exam). Instructors may want to pay
special attention to common errors of students
(on assessments as well as in class) as these can
be utilized as high-quality incorrect answers
(see freely available workshop material by Zim-
maro, 2010, for this and additional tips on writ-
ing good multiple-choice exams).
Fairness
McCoubrie (2004) reviewed the literature on
multiple-choice questions and offered sugges-
tions on how to improve the fairness of these
questions. Fairness is an important aspect for
test-makers to keep in mind, as students who
perceive assessments to be fair are more likely
to study and learn the material rather than study-
ing just for the test (particularly if the exam is
high-stakes). Important factors that contribute to
fairness include ensuring that items are clear (e.g.,
questions and answers are not confusing), that
they cover a broad range of topics, and are con-
sistent with the syllabus (e.g., not testing only a
small proportion of what is discussed in the
course). Also, student perceptions of testing influ-
ence how they choose to study and multiple-
choice questions tend to be seen as requiring
lower-level cognitive processing (Scouller, 1998).
Therefore, it may be beneficial to provide students
with information about multiple-choice exams to
assist them in preparing appropriately (e.g., letting
students know if multiple-choice questions will be
assessing only factual information or will require
deeper understanding and critical thinking skills).
False Knowledge and Memory
How students study for multiple-choice ques-
tion exams can influence memory of concepts
and answers to questions. For example, intro-
150 XU, KAUER, AND TUPY
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ductory psychology students have been shown
to rate answers as true based on familiarity of
the question item (Begg, Armour, & Kerr,
1985). Thus, mere familiarity of an incorrect
item can influence choice of test question an-
swers. Furthermore, students may be exposed to
false information in the form of incorrect an-
swers to multiple-choice questions and may re-
member this misinformation as true at later
times. Therefore, multiple-choice questions
may lead to false knowledge (Roediger &
Marsh, 2005;Marsh et al., 2007). However,
there are ways to attenuate this negative testing
effect. One way to enhance student learning and
help with potential false memory issues is to use
collaborative testing where student work in
pairs or groups. Collaborative testing has been
shown to lead to better performance on assess-
ments (Rao, Collins, & DiCarlo, 2002) as well
as better retention of knowledge 4 weeks later
(Cortright, Collins, Rodenbaugh, & DiCarlo,
2003). While collaborative testing can reduce
anxiety, the mechanisms through which exam
scores improve are having good discussions,
remembering information better, and an in-
creased ability to think about the tested infor-
mation (Kapitanoff, 2009). Another way to at-
tenuate the false memory effect is by providing
timely and appropriate feedback (Butler & Roe-
diger, 2008).
Feedback
Content of Feedback
Consistent meaningful feedback (e.g., de-
tailed explanations of why certain answers were
correct or incorrect) is an important component
of student learning outcomes, enjoyment, en-
gagement in the course, and ratings of teaching
quality (Gaytan & McEwen, 2007). Elaborated
feedback has been shown to be more effective
for learning when compared to simple answer
verification (Pridemore & Klein, 1991). Addi-
tionally, formative feedback (aimed at modify-
ing thinking or behavior to improve learning)
needs to also be supportive and nonevaluative
(e.g., judging the response, not the student; see
Shute, 2008, for guidelines on providing forma-
tive feedback). More elaborate and detailed
feedback also tends to require more effort on the
part of the instructor, which may (in part) ex-
plain why not all students who take multiple-
choice assessments receive this type of feed-
back. However, there are strategies that can be
utilized (including technology) which reduce
instructor burden when providing high-quality
feedback (see Feedback Techniques section, be-
low).
Timing of Feedback
An important component for student learning
is to provide timely feedback after testing (e.g.,
Butler, Karpicke, & Roediger, 2007). Multiple-
choice exams facilitate this process as these
exams tend to be quicker and less subjective to
grade (particularly as grading can be computer-
ized). Many recommend that feedback be pro-
vided as soon as possible after testing (e.g.,
Gaytan & McEwen, 2007), although some re-
search indicates that a short delay (e.g., 10 min)
is optimal (produces better outcomes than im-
mediate feedback) as it spaces out the learning
(Butler et al., 2007). Hattie and Timperley
(2007) reviewed the literature on feedback and
found that across eight meta-analyses (398 stud-
ies), immediate feedback is generally superior
to delayed feedback (see their Table 2). How-
ever, they pointed out that there are additional
important considerations when determining
timing. For example, delayed feedback be-
comes more effective (and superior to immedi-
ate feedback) as the difficulty level of the item
increases (e.g., Clariana, Wagner, & Roher
Murphy, 2000) potentially because the delay
allows for greater processing. Similarly, the ef-
fects of timing depend on the context of the
assessment with immediate feedback being
more effective in studies that use actual quizzes
and delayed feedback being more effective in
laboratory studies that use list learning (Kulik &
Kulik, 1988).
Feedback Techniques
One method of providing higher-quality
feedback while minimizing instructor burden is
to offer students opportunities to self-correct.
Grühn and Cheng (2014) found that students
who were allowed to hand in a self-corrected
midterm performed better on the final exam
compared to students who took a traditional
midterm (and were not allowed to self-correct).
Grühn and Cheng utilized procedures for self-
correction as laid out by Montepare (2005,
2007). Specifically, students take an exam as
151MULTIPLE-CHOICE ASSESSMENTS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
per usual and hand in their (original) answers
and are then provided with a copy of the ques-
tions to take home. Students then hand in a
second set of answers (self-corrected) after they
have had time to review the material. Scoring
takes both the original and self-corrected re-
sponses into account (full credit for questions
that are correct on both sets of responses, partial
credit for questions that were initially incorrect
but correct in the revised set of responses, no
credit for questions that are incorrect on both
sets of responses). Thus, the self-correction
method focuses on fostering additional learning
and rewards students’ additional time and en-
ergy spent with the course material. Learning
can also be facilitated with instructor- or com-
puter-graded assessments if students are al-
lowed to resubmit based on feedback (e.g., Cole
& Todd, 2003).
For online multiple-choice testing, detailed
feedback (immediate or delayed) can be auto-
matically provided to students based on the
correct (or incorrect) items that they choose
(instructors can also program the settings such
that students can view all feedback for all an-
swer choices, regardless of what they got cor-
rect or incorrect). While this feedback needs to
be programmed into the online system, once it
has been input it can be provided automatically
to all students without the need for the instructor
to individually mark up each exam, write the
same comments over and over, or go over each
exam question and answer in class.
Instructors providing feedback need to decide
how to give students access to the questions and
their answers—that is, whether or not students’
exams are returned to them. In courses where
large high-quality test banks are available, in-
structors may have the ability to utilize new
questions each semester and thus may return
exams without worrying about students copying
questions and answers. However, many instruc-
tors recycle at least some of their questions and
need to balance optimal feedback (e.g., imme-
diate, elaborate) with pragmatic concerns (thus
techniques such as self-correcting may not be as
appealing for some instructors). In face-to-face
settings, instructors can supervise review of the
exam and return exams temporarily (e.g., for a
portion of a class period or during office hours).
For online settings, instructors can program
feedback to be temporary and delayed until after
the exam closes for everyone (thus students
cannot see feedback while the exam is still
“live” for other students, preventing the sharing
of answers). In some situations, these efforts
may not be enough to deter academic dishon-
esty and instructors may need to utilize other
techniques (see Cheating Prevention and Coun-
termeasures section, below).
Finally, in addition to providing students with
feedback on their responses and performance, it
may be useful to solicit feedback from students,
allowing them an opportunity to comment on
the exam and for instructors to utilize this data
in future assessments (Davis, 1993).
Efficiency
Test Administration
Simkin and Kuechler (2005) presented a re-
view of the literature on multiple-choice ques-
tions compared to constructed response ques-
tions. Some benefits of multiple-choice include
efficiency in testing large numbers of students
and grading large numbers of exams (e.g., scor-
ing via computerized mechanisms), which also
facilitates providing students with timely feed-
back. Additionally, multiple versions of the
same multiple-choice exam are relatively easy
to administer (which may help deter cheating)
and test items can be stored, edited, and reused
fairly easily, helping to reduce the time needed
for test-makers to design exams (this is reduced
even further if instructors have access to test
banks).
Format and Content of
Multiple-Choice Items
There are many different ways to format mul-
tiple-choice questions and answers but the most
typical format is to have a question (stem), a
correct answer, and incorrect (but potentially
plausible) answers. One main issue when mak-
ing tests is the number of choices that should be
provided for each question. This can vary from
as few as two (e.g., true/false questions) to as
many as the test-maker decides to create (al-
though many tests including standardized ex-
ams typically utilize 4 –5 choices per question).
Given that creating plausible choices consumes
the time and energy of instructors, it is benefi-
cial to know the lowest number of choices that
are sufficient for testing purposes. In one study
152 XU, KAUER, AND TUPY
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
by Baghaei and Amrahi (2011), 180 English
undergraduates took three different versions of
the same (30 multiple-choice items) grammar
test with either five, four, or three choices per
item (the four- and three-choice versions were
created by randomly deleting one or two incor-
rect choices from the five-item version). They
found no evidence of significant differences
across the three versions in test reliability, re-
sponse behaviors, item difficulty, or item fit
statistics, and thus concluded that three items
per multiple-choice question is optimal. Rodri-
guez (2005) conducted a meta-analysis on 27
studies (across multiple domains including sub-
ject matter and age level) on this topic and also
concluded that three choices is the optimal for-
mat for multiple-choice questions. They also
note that three choices may provide an additional
benefit, as more three-choice questions can be
administered in the same testing time-period as
questions with more than three choices. Thus,
instructors who utilize three-choice items can
cover more content in their exams. One study for
example supported this idea and found that stu-
dents answered (on average) items with three
choices 5 seconds faster than those with four or
five choices (Schneid, Armour, Park, Yudkowsky,
& Bordage, 2014).
While certain types of questions are more
difficult (students routinely perform worse) than
others, they do not increase discriminability be-
tween lower- and higher-performing students
and thus should be avoided (Caldwell & Pate,
2013). These include questions that use nega-
tives (e.g., “all of the following EXCEPT . . .”)
and “None of the Above” as an answer choice.
Other studies of multiple-choice testing also
recommend that “None of the Above” and com-
posite questions such as “A and B, but not C” be
omitted from answer choices (e.g., DiBattista,
Sinnige-Egger, & Fortuna, 2014;Pachai, DiBat-
tista, & Kim, 2015), or to at least emphasize
(e.g., via bolding) any negative wording in the
question (J. D. Hansen & Dexter, 1997). J. D.
Hansen and Dexter (1997) also offered addi-
tional recommendations for wording such as
avoiding multipart questions and giveaways to
correct/incorrect answers (e.g., if one choice is
much longer than the others, if grammar is
consistent/inconsistent between the question
and answer choices, if multiple incorrect answer
choices have the same meaning). The authors
also pointed out that “All of the Above” should
be avoided as students who know that two items
are correct can select this response without ac-
tually knowing that the other items are also
correct. In her book on teaching, Davis (1993)
also noted (in addition to the wording tips
above) that choices should be parallel in struc-
ture (e.g., number of qualifiers, details) and
equal in length, and that the stem of the question
should be kept as short as possible (to avoid
confusion) but should contain all relevant infor-
mation (e.g., the student will not need to read all
the choices before they can understand the ques-
tion).
Order of questions can be set up in many
ways including using the same order for all
copies of a particular test, randomizing the or-
der of questions (particularly simple to do for
online tests), utilizing multiple versions (with
different ordering) of the same test to help pre-
vent cheating (particularly in large classes), or-
dering questions based on difficulty level, and
ordering questions sequentially to reflect read-
ings and lectures (e.g., early questions ask about
early chapters/lectures). Ordering in terms of
difficulty level (e.g., from easy to difficult, or
vice versa) does not lead to differences on per-
formance, but does lead to differences in stu-
dents’ perceptions of performance such that stu-
dents are more optimistic if exams progress
from easy to difficult (Weinstein & Roediger,
2010,2012). In terms of ordering that affects
actual (as opposed to perceived) performance, a
study by Balch (1989) found that sequential
ordering led to higher scores than random or
chapter-contiguous (questions on the same
chapter appeared together, but not sequentially,
throughout the test) orders (the order of ques-
tions did not significantly affect how long it
took students to complete tests). Balch sug-
gested that this increased performance with se-
quential ordering is due to facilitation of stu-
dent’s memory based on the idea of encoding
specificity (i.e., by mimicking the context and
order in which content was learned, recall is
enhanced). Thus, sequential ordering may be
optimal for facilitating student performance.
Answer positions should be randomized
when possible (e.g., computerized software can
take care of this task) as test makers have a
tendency to place correct answers in middle
positions. This propensity is shared (or perhaps
reinforced) by test takers who are more likely to
select middle-positioned items (e.g., the “c” re-
153MULTIPLE-CHOICE ASSESSMENTS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
sponse in a five-choice question), particularly
when guessing (Attali & Bar-Hillel, 2003).
Cheating Prevention and Countermeasures
Concerns about academic dishonesty are
common with the use of multiple-choice
questions, as it is easier for students to copy
answers (e.g., from another student, a book,
or online resources) via this format than on
constructed-response questions. Numerous
articles in the higher education literature note
this concern and pedagogy articles offering
advice and methods for detecting cheating
date back to at least the 1920s (e.g., Bird,
1927). Multiple articles have addressed meth-
ods to detect cheating, often through statisti-
cal analysis of the likelihood of similar an-
swers within pairs of students (e.g., Bellezza
& Bellezza, 1989;Frary, Tideman, & Watts,
1977;Harpp & Hogan, 1993;van der Linden
& Sotaridona, 2004). However, these methods
for detecting answer-copying are not com-
monly utilized within college classroom set-
tings, presumably due to their cost in terms of
time and energy (Frary, 1993). Thus, preven-
tion of cheating may be the most effective and
feasible route for test-makers. Answer copy-
ing can be reduced by using alternate test
forms, with both questions and answers rear-
ranged, as only rearranging questions does
not significantly reduce copying (Houston,
1983). As students are more likely to copy
from those sitting next to them (Houston,
1976), adjusting seating such as with alterna-
tion can reduce cheating. Similarly, assigned
seating via seating charts may also be utilized
to prevent cheating, and can also facilitate
cheating detection as who sat where is re-
corded allowing for error-similarity analysis
(Harpp & Hogan, 1993).
Academic dishonesty can be further pre-
vented by providing an academic integrity
policy that defines cheating and encourages
academic integrity so that students are clear
on what is expected of them (Olt, 2002;
Rowe, 2004). Additionally, honor codes (at
the institutional, departmental, or even course
level) and academic honesty agreements that
students are asked to sign (on paper or elec-
tronically) may be utilized. There is evidence
that honor codes are effective at reducing the
prevalence of cheating (e.g., Hutton, 2006;
McCabe & Treviño, 2002;McCabe, Treviño,
& Butterfield, 2001;Vandehey, Diekhoff, &
LaBeff, 2007), with some evidence that lon-
ger, formal honor codes that carry conse-
quences are more effective (Gurung, Wil-
helm, & Filz, 2012). Additionally, honor
systems at the institutional level can be effec-
tive at fostering academic integrity such that
students from schools with traditional honor
systems (compared to students from schools
with nonhonor or modified systems) rate sce-
narios of academic dishonesty as being more
dishonest and express a higher likelihood of
reporting incidences of academic dishonesty
(Schwartz, Tatum, & Hageman, 2013).
There are unique issues with cheating in
online/distance-learning courses as instruc-
tors are not present to ensure students do not
use resources (e.g., readings, notes, the Inter-
net) to search for answers or share answers
with peers (Olt, 2002;Rowe, 2004). How-
ever, even with online classes there are coun-
termeasures that can be taken to reduce the
occurrence of student cheating on multiple-
choice exams. Rowe (2004) suggested draw-
ing questions from a large question bank and
randomizing question and answer order. De-
ferring feedback on questions until after the
quiz or exam is closed will reduce the inci-
dence of students copying or taking screen-
shots of answers and sharing them with other
students. Instructors can also set time and
access limits to exams and quizzes using
many course software programs (Olt, 2002).
It is also suggested that instructors change the test
questions between semesters, which can prevent
sharing of questions and answers between stu-
dents taking the course at different times. While it
may be optimal to update questions each semester,
unless one is using an already established test
bank, this strategy may consume significant in-
structor time and energy (and defeat one of the
pros of using multiple-choice tests). Thus, the
most feasible strategy may be to utilize some of
the other suggestions outlined in this section and
update test questions as frequently as the instruc-
tor’s resources allow.
Finally, in an attempt to increase academic
integrity, online courses may utilize “lockdown
browsers,” for example, Respondus LockDown
Browser® (Respondus, 2015). This is ideal for
controlling the virtual environment of the stu-
dents while completing assignments or assess-
154 XU, KAUER, AND TUPY
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
ments that require independent work. Several
lockdown browsers can be downloaded for free
by students and are user-friendly. Most are
compatible with several learning platforms
(e.g., Blackboard, D2L, Moodle), and provide
testing environments that prevent task switch-
ing, searching screen shots, copy and paste, and
so on. Some lockdown browsers offer monitor-
ing of the student via webcam to allow for
virtual proctoring (Respondus, 2015). Stack
(2015) investigated the difference between stu-
dents taking an online exam through the Re-
spondus LockDown Browser and students tak-
ing a traditional in-seat exam. He noted that
there were no significant differences found be-
tween the two groups (Stack, 2015). This is
important considering there has been evidence
for increased levels of cheating for online
courses compared to traditional courses (Lanier,
2006).
Conclusion
Multiple-choice testing is a preferred assess-
ment method for students and instructors alike,
and very commonly used in college classrooms
(whether in-seat or online). While there are
valid concerns about the use of multiple choice,
there are also ways to help mitigate against
potential negatives of this method and to en-
hance student learning, performance, and enjoy-
ment as well as instructor’s efficiency, under-
standing of student outcomes, and evaluations.
For example, assessment of deeper-level think-
ing is a common concern with multiple-choice
question use. However, with appropriate for-
matting of multiple-choice testing it is possible
to assess deeper- level thinking (Simkin & Kue-
chler, 2005;Tractenberg, Gushta, Mulroney, &
Weissinger, 2013). Student understanding and
learning can also be increased by utilizing ap-
propriate feedback in terms of content and tim-
ing. Efficiency and quality of test administra-
tion, grading, and student learning can be
increased through formatting, test construction,
and cheating countermeasures. This short re-
view is by no means an exhaustive summary of
the large pedagogy literature on testing. How-
ever, we hope that our summary will encourage
the utilization of empirical data in pedagogy
related decision-making and be helpful to in-
structors as they prepare courses and assess-
ments.
References
Attali, Y., & Bar-Hillel, M. (2003). Guess where:
The position of correct answers in multiple- choice
test items as a psychometric variable. Journal of
Educational Measurement, 40, 109 –128. http://dx
.doi.org/10.1111/j.1745-3984.2003.tb01099.x
Baghaei, P., & Amrahi, N. (2011). The effects of the
number of options on the psychometric character-
istics of multiple choice items. Psychological Test
and Assessment Modeling, 53, 192–211.
Balch, W. R. (1989). Item order affects performance
on multiple-choice exams. Teaching of Psychol-
ogy, 16, 75–77. http://dx.doi.org/10.1207/
s15328023top1602_9
Begg, I., Armour, V., & Kerr, T. (1985). On believ-
ing what we remember. Canadian Journal of Be-
havioural Science Review, 17, 199 –214. http://dx
.doi.org/10.1037/h0080140
Bellezza, F. S., & Bellezza, S. F. (1989). Detection of
cheating on multiple-choice tests by using error-
similarity analysis. Teaching of Psychology, 16,
151–155. http://dx.doi.org/10.1207/s15328023
top1603_15
Bird, C. (1927). The detection of cheating in objec-
tive examinations. School & Society, 25, 261–262.
Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson,
M. (2006). Diagnostic assessment with ordered
multiple-choice items. Educational Assessment,
11, 33– 63. http://dx.doi.org/10.1207/s15326977
ea1101_2
Butler, A. C., Karpicke, J. D., & Roediger, H. L. III.
(2007). The effect of type and timing of feedback
on learning from multiple-choice tests. Journal of
Experimental Psychology: Applied, 13, 273–281.
http://dx.doi.org/10.1037/1076-898X.13.4.273
Butler, A. C., & Roediger, H. L. III. (2008). Feed-
back enhances the positive effects and reduces the
negative effects of multiple-choice testing. Mem-
ory & Cognition, 36, 604 – 616. http://dx.doi.org/
10.3758/MC.36.3.604
Caldwell, D. J., & Pate, A. N. (2013). Effects of
question formats on student and item performance.
Journal of Pharmaceutical Education, 77, 71.
http://dx.doi.org/10.5688/ajpe77471
Clariana, R. B., Wagner, D., & Roher Murphy, L. C.
(2000). Applying a connectionist description of
feedback timing. Educational Technology Re-
search and Development, 48, 5–22. http://dx.doi
.org/10.1007/BF02319855
Cole, R. S., & Todd, J. B. (2003). Effects of Web-
based multimedia homework with immediate rich
feedback on student learning in general chemistry.
Journal of Chemical Education, 80, 1338. http://
dx.doi.org/10.1021/ed080p1338
Cortright, R. N., Collins, H. L., Rodenbaugh, D. W.,
& DiCarlo, S. E. (2003). Student retention of
course content is improved by collaborative-group
155MULTIPLE-CHOICE ASSESSMENTS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
testing. Advances in Physiology Education, 27,
102–108. http://dx.doi.org/10.1152/advan.00041
.2002
Crowe, A., Dirks, C., & Wenderoth, M. P. (2008).
Biology in bloom: Implementing Bloom’s Taxon-
omy to enhance student learning in biology. CBE
Life Sciences Education, 7, 368 –381. http://dx.doi
.org/10.1187/cbe.08-05-0024
Davies, P. (2002). There’s no confidence in multiple-
choice testing. Proceedings of 6th CAA Confer-
ence. Loughborough, UK: Loughborough Univer-
sity. Retrieved from https://dspace.lboro.ac.uk/
dspace-jspui/bitstream/2134/1875/1/davies_p1.pdf
Davis, B. G. (1993). Tools for teaching. San Fran-
cisco, CA: Jossey-Bass.
DiBattista, D., & Kurzawa, L. (2011). Examination
of the quality of multiple-choice items on class-
room tests. Canadian Journal for the Scholarship
of Teaching and Learning, 2, 4.
DiBattista, D., Sinnige-Egger, J. A., & Fortuna, G.
(2014). The “None of the above” option in multi-
ple-choice testing: An experimental study. Journal
of Experimental Education, 82, 168 –183. http://dx
.doi.org/10.1080/00220973.2013.795127
Douglas, M., Wilson, J., & Ennis, S. (2012). Multi-
ple-choice question tests: A convenient, flexible,
and effective learning tool? A case study. Innova-
tions in Education and Teaching International, 49,
111–121. http://dx.doi.org/10.1080/14703297
.2012.677596
Echternacht, G. T., Boldt, R. F., & Sellman, W. S.
(1972). Personality influences on confidence test
scores. Journal of Educational Measurement, 9,
235–241. http://dx.doi.org/10.1111/j.1745-3984
.1972.tb00957.x
Foster, D., & Miller, H. L. (2009). A new format for
multiple-choice testing: Discrete-option multiple-
choice. Results from early studies. Psychology Sci-
ence Quarterly, 51, 355–369.
Frary, R. B. (1993). Statistical detection of multiple-
choice answer copying: Review and commentary.
Applied Measurement in Education, 6, 153–165.
http://dx.doi.org/10.1207/s15324818ame0602_4
Frary, R. B. (1988). Formula scoring of multiple-
choice tests (correction for guessing). Educational
Measurement: Issues and Practice, 7, 33–38.
http://dx.doi.org/10.1111/j.1745-3992.1988
.tb00434.x
Frary, R. B., Tideman, T. N., & Watts, T. M. (1977).
Indices of cheating on multiple-choice tests. Jour-
nal of Educational and Behavioral Statistics, 2,
235–256. http://dx.doi.org/10.3102/10769986
002004235
Funk, S. C., & Dickson, K. L. (2011). Multiple-
choice and short-answer exam performance in a
college classroom. Teaching of Psychology, 38,
273–277. http://dx.doi.org/10.1177/009862831
1421329
Gaytan, J., & McEwen, B. C. (2007). Effective online
instructional and assessment strategies. American
Journal of Distance Education, 21, 117–132.
http://dx.doi.org/10.1080/08923640701341653
Grühn, D., & Cheng, Y. (2014). A self-correcting
approach to multiple-choice exams improves stu-
dents’ learning. Teaching of Psychology, 41, 335–
339. http://dx.doi.org/10.1177/0098628314549706
Gurung, R., Wilhelm, T., & Filz, T. (2012). Optimiz-
ing honor codes for online exam administration.
Ethics & Behavior, 22, 158 –162. http://dx.doi.org/
10.1080/10508422.2011.641836
Haladyna, T. M., Downing, S. M., & Rodriguez,
M. C. (2002). A review of multiple-choice item-
writing guidelines for classroom assessment. Ap-
plied Measurement in Education, 15, 309 –333.
http://dx.doi.org/10.1207/S15324818AME1503_5
Hansen, J. D., & Dexter, L. (1997). Quality multiple-
choice test questions: Item-writing guidelines and
an analysis of auditing testbanks. Journal of Edu-
cation for Business, 73, 94 –97. http://dx.doi.org/
10.1080/08832329709601623
Hansen, R. (1971). The influence of variables other
than knowledge on probabilistic tests. Journal of
Educational Measurement, 8, 9 –14. http://dx.doi
.org/10.1111/j.1745-3984.1971.tb00900.x
Harpp, D. N., & Hogan, J. J. (1993). Crime in the
classroom: Detection and prevention of cheating
on multiple-choice exams. Journal of Chemical
Education, 70, 306 –323. http://dx.doi.org/10
.1021/ed070p306
Hattie, J., & Timperley, H. (2007). The power of
feedback. Review of Educational Research, 77,
81–112. http://dx.doi.org/10.3102/003465430
298487
Hickson, S., Reed, W. R., & Sander, N. (2012).
Estimating the effect on grades of using multiple-
choice versus constructive-response questions:
Data from the classroom. Educational Assessment,
17, 200 –213. http://dx.doi.org/10.1080/10627197
.2012.735915
Houston, J. P. (1976). The assessment and prevention
of answer copying on undergraduate multiple-
choice examinations. Research in Higher Educa-
tion, 5, 301–311. http://dx.doi.org/10.1007/
BF00993429
Houston, J. P. (1983). Alternate test forms as a means
of reducing multiple-choice answer copying in the
classroom. Journal of Educational Psychology, 75,
572–575. http://dx.doi.org/10.1037/0022-0663.75
.4.572
Hutton, P. A. (2006). Understanding student cheating
and what educators can do about it. College Teach-
ing, 54, 171–176. http://dx.doi.org/10.3200/CTCH
.54.1.171-176
Jacobs, S. S. (1971). Correlates of unwarranted con-
fidence in responses to objective test items. Jour-
156 XU, KAUER, AND TUPY
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
nal of Educational Measurement, 8, 15–20. http://
dx.doi.org/10.1111/j.1745-3984.1971.tb00901.x
Jensen, M., Duranczyk, I., Staat, S., Moore, R.,
Hatch, J., & Somdahl, C. (2006). Using a recipro-
cal teaching strategy to create multiple-choice
exam questions. American Biology Teacher, 68,
67–71. http://dx.doi.org/10.1662/0002-7685
(2006)68[67:TCMEQ]2.0.CO;2
Kapitanoff, S. H. (2009). Collaborative testing: Cog-
nitive and interpersonal processes related to en-
hanced test performance. Active Learning in
Higher Education, 10, 56 –70. http://dx.doi.org/10
.1177/1469787408100195
Kulik, J. A., & Kulik, C. C. (1988). Timing of feed-
back and verbal learning. Review of Educational
Research, 58, 79 –97. http://dx.doi.org/10.3102/
00346543058001079
Landrum, E. R., & Gurung, A. R. (2013). The mem-
orability of introductory psychology revisited.
Teaching of Psychology, 40, 222–227. http://dx
.doi.org/10.1177/0098628313487417
Lanier, M. M. (2006). Academic integrity and dis-
tance learning. Journal of Criminal Justice Edu-
cation, 17, 244 –261. http://dx.doi.org/10.1080/
10511250600866166
Marsh, E. J., Roediger, H. L. III, Bjork, R. A., &
Bjork, E. L. (2007). The memorial consequences
of multiple-choice testing. Psychonomic Bulletin
& Review, 14, 194 –199. http://dx.doi.org/10.3758/
BF03194051
McCabe, D. L., & Treviño, L. K. (2002). Honesty
and honor codes. Academe, 88, 37– 41. http://dx
.doi.org/10.2307/40252118
McCabe, D. L., Treviño, L. K., & Butterfield, K. D.
(2001). Cheating in academic institutions: A de-
cade of research. Ethics & Behavior, 11, 219 –232.
http://dx.doi.org/10.1207/S15327019EB1103_2
McCoubrie, P. (2004). Improving the fairness of mul-
tiple-choice questions: A literature review. Medi-
cal Teacher, 26, 709 –712. http://dx.doi.org/10
.1080/01421590400013495
Montepare, J. M. (2005, October). A self-correcting
approach to multiple choice tests. APS Observer,
18, 35–36.
Montepare, J. M. (2007). A self-correcting approach
to multiple choice tests. In B. Perlman, L. I. Mc-
Cann, & S. H. McFadden (Eds.), Lessons learned
(Vol. 3, pp. 143–154). Washington, DC: Associa-
tion for Psychological Science.
Olt, M. R. (2002). Ethics and distance education:
Strategies for minimizing academic dishonesty in
online assessment. Online Journal of Distance
Learning Administration, 5(3).
Ozuru, Y., Briner, S., Kurby, C. A., & McNamara,
D. S. (2013). Comparing comprehension measured
by multiple-choice and open-ended questions. Ca-
nadian Journal of Experimental Psychology, 67,
215–227. http://dx.doi.org/10.1037/a0032918
Pachai, M. V., DiBattista, D., & Kim, J. A. (2015). A
systematic assessment of ‘none of the above’ on
multiple choice tests in a first year psychology
classroom. Canadian Journal for the Scholarship
of Teaching and Learning. Advance online publi-
cation. http://dx.doi.org/10.5206/cjsotl-rcacea
.2015.3.2
Park, J. (2010). Constructive multiple-choice testing
system. British Journal of Educational Technol-
ogy, 41, 1054 –1064. http://dx.doi.org/10.1111/j
.1467-8535.2010.01058.x
Pridemore, D., & Klein, J. D. (1991). Control feed-
back in computer-assisted instruction. Educational
Technology Research and Development, 39, 27–
32. http://dx.doi.org/10.1007/BF02296569
Rao, S. P., Collins, H. L., & DiCarlo, S. E. (2002).
Collaborative testing enhances student learning.
Advances in Physiology Education, 26, 37– 41.
http://dx.doi.org/10.1152/advan.00032.2001
Respondus. (2015, November 2). LockDown
Browser instructor resources. Retrieved from
https://www.respondus.com/products/lockdown-
browser/resources.shtml
Rodriguez, M. C. (2005). Three options are optimal
for multiple-choice items: A meta-analysis of 80
years of research. Educational Measurement: Is-
sues and Practice, 24, 3–13. http://dx.doi.org/10
.1111/j.1745-3992.2005.00006.x
Roediger, H. L. III, & Marsh, E. J. (2005). The
positive and negative consequences of multiple-
choice testing. Journal of Experimental Psychol-
ogy: Learning, Memory, and Cognition, 31, 1155–
1159. http://dx.doi.org/10.1037/0278-7393.31.5
.1155
Rowe, N. C. (2004). Cheating in online student as-
sessment: Beyond plagiarism. Online Journal of
Distance Learning Administration, 7(2).
Schneid, S. D., Armour, C., Park, Y. S., Yudkowsky,
R., & Bordage, G. (2014). Reducing the number of
options on multiple-choice questions: Response
time, psychometrics and standard setting. Medical
Education, 48, 1020 –1027. http://dx.doi.org/10
.1111/medu.12525
Schwartz, B., Tatum, H., & Hageman, M. (2013).
College students’ perceptions of and responses to
cheating at traditional, modified, and non-honor
system institutions. Ethics & Behavior, 23, 463–
476. http://dx.doi.org/10.1080/10508422.2013
.814538
Scouller, K. (1998). The influence of assessment
method on students’ learning approaches: Multiple
choice question examination versus assignment es-
say. Higher Education, 35, 453– 472. http://dx.doi
.org/10.1023/A:1003196224280
Scouller, K. M., & Prosser, M. (1994). Students’
experiences in studying for multiple choice ques-
tion examinations. Studies in Higher Education,
157MULTIPLE-CHOICE ASSESSMENTS
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
19, 267–279. http://dx.doi.org/10.1080/0307507
9412331381870
Shute, V. J. (2008). Focus on formative feedback.
Review of Educational Research, 78, 153–189.
http://dx.doi.org/10.3102/0034654307313795
Simkin, M., & Kuechler, W. L. (2005). Multiple-
choice tests and student understanding: What is the
connection? Decision Sciences Journal of Innova-
tive Education, 3, 73–98. http://dx.doi.org/10
.1111/j.1540-4609.2005.00053.x
Stack, S. (2015). The impact of exam environments
on student test scores in online courses. Journal of
Criminal Justice Education, 26, 273–282. http://dx
.doi.org/10.1080/10511253.2015.1012173
Steven Pinker’s mind games. (2014, April 11). New
York Times. Retrieved January 9, 2016, from http://
www.nytimes.com/interactive/2014/04/13/education/
edlife/edlife-quiz-psych.html?_r0
Struyven, K., Dochy, F., & Janssens, S. (2005). Stu-
dents’ perceptions about evaluation and assess-
ment in higher education: A review. Assessment &
Evaluation in Higher Education, 30, 325–347.
http://dx.doi.org/10.1080/02602930500099102
Tozoglu, D., Tozoglu, M. D., Gurses, A., & Dogar,
C. (2004). The students’ perceptions: Essay versus
multiple-choice type exams. Journal of Baltic Sci-
ence Education, 2, 52–59.
Tractenberg, R. E., Gushta, M. M., Mulroney, S. E.,
& Weissinger, P. A. (2013). Multiple choice ques-
tions can be designed or revised to challenge learn-
ers’ critical thinking. Advances in Health Sciences
Education: Theory and Practice, 18, 945–961.
http://dx.doi.org/10.1007/s10459-012-9434-4
Vandehey, M. A., Diekhoff, G. M., & LaBeff, E. E.
(2007). College cheating: A twenty-year follow-up
and the addition of an honor code. Journal of
College Student Development, 48, 468–480.
http://dx.doi.org/10.1353/csd.2007.0043
van der Linden, W. J., & Sotaridona, L. (2004). A
statistical test for detecting answer copying on
multiple-choice tests. Journal of Educational Mea-
surement, 41, 361–377. http://dx.doi.org/10.1111/
j.1745-3984.2004.tb01171.x
Weinstein, Y., & Roediger, H. L. III. (2010). Retro-
spective bias in test performance: Providing easy
items at the beginning of a test makes students
believe they did better on it. Memory & Cognition,
38, 366 –376. http://dx.doi.org/10.3758/MC.38.3
.366
Weinstein, Y., & Roediger, H. L. III. (2012). The
effect of question order on evaluations of test
performance: How does the bias evolve? Memory
& Cognition, 40, 727–735. http://dx.doi.org/10
.3758/s13421-012-0187-3
Yonker, J. E. (2011). The relationship of deep and
surface study approaches on factual and applied
test-bank multiple-choice question performance.
Assessment & Evaluation in Higher Education, 36,
673– 686. http://dx.doi.org/10.1080/02602938
.2010.481041
Zeidner, M. (1987). Essay versus multiple-choice
type classroom exams: The students’ perspec-
tive. Journal of Educational Research, 80, 352–
358. http://dx.doi.org/10.1080/00220671.1987
.10885782
Zimmaro, D. W. (2010). Writing good multiple-
choice exams [Workshop material]. Austin, TX:
Learning Sciences, University of Texas—Austin.
Retrieved from https://learningsciences.utexas.edu/
sites/default/files/writing-good-multiple-choice-
exams-04-28-10_0.pdf
Received November 4, 2015
Revision received May 5, 2016
Accepted May 9, 2016
158 XU, KAUER, AND TUPY
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
... They are required to identify the learning needs, evaluate teachers, improve instructions and have accountability for schools. Among these types of assessments, certain types are more frequently utilised by educational organisations, such as selected-response items or multiple-choice questions (Gierl, Bulut, Guo, & Zhang, 2017;Xu, Kauer, & Tupy, 2016). It is expected that any kind of assessment system should be designed to support the school curriculum, rather than to expect the school curriculum to fit the assessment system (Wiliam, 2014). ...
... However, they are not as efficient and reliable as selectedresponse ones, thus, it is recommended to construct-response format questions to measure skills that are impossible to measure by selected-response items (Downing & Haladyna, 2006). The most frequent used selected-response format questions are multiple-choice questions (Downing & Haladyna, 2006;Reynolds et al., 2009) and they are widely used in colleges as assessment tools (Ebel & Frisbie, 1991;Gierl et al., 2017;Xu et al., 2016). Generally, any multiple-choice item has a stem, answer options, and some supporting information. ...
... The presence of random guessing in multiple-choice tests prevents teachers and educators from identifying if students fully understand a particular topic or not (Paxton, 2000). There are some bits of advice on preparing better quality multiple-choice questions (Haladyna, Downing, & Rodriguez, 2002;Xu et al., 2016) particularly to minimise guessing that may involve the process of elimination by using partial knowledge, but it is not a very frequent issue (Ebel & Frisbie, 1991). ...
Thesis
Computational thinking, a form of thinking and problem solving, is defined as a mental process for abstracting problems and formulating solutions. Computational thinking is considered to be an essential skill for everyone and has become the centre of attention in education settings. There is a limited number of tools to measure computational thinking skills by multiple-choice questions, and limited research on the relationship between computational thinking and other domains. The purpose of this research is to investigate the relationship between computational thinking performance, perception of computational thinking skills and school achievement of secondary school students. Computational thinking performance of secondary school students in Kazakhstan is measured by using a bespoke multiple-choice test, which focuses on the following elements of computational thinking: logical thinking, abstraction and generalisation. The perceptions of computational thinking skills are self-reported using a pre-existing questionnaire, which covers the following factors: creativity, algorithmic thinking, cooperation, critical thinking and problem solving. The General Knowledge Test results that contain scores for 14 different subjects are used as indicators of students’ school achievement, with further sub-scores for the science subjects, language subjects and humanities. The sample group of 775 grade eight students are drawn from 28 secondary schools across Kazakhstan. The validity and reliability of the multiple-choice questions are established by using Item Response Theory models. The item difficulty, discrimination and guessing coefficients are calculated; and the item characteristic curves for each question and test information functions for each quiz are obtained. As a result, the multiple-choice questions are concluded as a valid and reliable tool to measure the computational thinking performance of students. Multiple regression is used to examine the relationship between computational thinking performance, perception of computational thinking and school achievement sub-scores. The results of the data analysis show that science subjects, language subjects and perception of computational thinking skills are significant predictors for computational thinking performance, showing a moderate relationship between computational thinking performance and school achievement. However, no significant relationship is found between humanities subject scores and computational thinking performance. This study also adds to the literature for the studies that investigate the relationship between computational thinking skills and other variables. This research contributes to the development of validated tools to measure computational thinking performance by using multiple-choice questions. This study investigates the relationship between computational thinking performance and general school achievement of secondary school students, and its findings shed light on the measurement of children’s cognitive development. The findings can help in designing better curricula by adjusting subjects that enhance children’s higher-order thinking abilities. The findings obtained in this research also adds to the literature for the studies that investigate the relationship between computational thinking skills and other variables.
... Previous measures used in this domain have often been created by researchers for the purpose of the research (Blackburn et al., 2014;Mullan and Wong, 2010), whereas other research has used an existing validated safe food-handling measure (the Food Safety Knowledge Questionnaire) (Byrd-Bredbenner et al., 2007;Mullan et al., 2013;Mullan et al., 2015;. However, this measure has some limitations for use, including length, having long and complicated question stems and response options, and varying numbers of response options throughout, which can increase participant burden (Boland et al., 2010;Jenn, 2006;Lavrakas, 2008;Xu et al., 2016). Additionally, previous measures of safe food-handling knowledge have not included a comprehensive assessment of consumers' understanding of the recommended safe food-handling practices (Government of Western Australia Department of Health, 2020a;World Health Organization, 2006). ...
... It has been suggested that these strategies are important in questionnaire design for reducing participant burden and ensuring more reliable data (Boland et al., 2010;Jenn, 2006;Lavrakas, 2008;Xu et al., 2016). Past measures of safe food-handling knowledge have not adopted such a rigorous approach to measure development. ...
Article
Purpose Foodborne illness remains high globally, with the majority of cases occurring in the domestic environment. Research in the safe food-handling domain is limited by the absence of an up-to-date and suitable measure of safe food-handling knowledge for use among consumers, with previous measures limited by questionnaire design features that increase participant burden and burnout and a lack of alignment with current safe food-handling guidelines. The purpose of this study is to develop a safe food-handling knowledge measure to capture a comprehensive understanding of consumers’ safe food-handling knowledge while minimising participant burden and burnout. Design/methodology/approach Items were developed and evaluated prior to administering them to participants. Data was collected among 277 participants who completed the measure online. Findings Results indicated that the measure had good acceptability among participants in the sample (mean = 5.44, SD = 0.77, range = 2.42–7) and that the measure had acceptable reliability (Cronbach’s α = 0.60), item discrimination and item difficulty. These findings suggest that the safe food-handling knowledge measure would be suitable for use in future studies examining consumer safe food-handling. Originality/value This study provides an updated, acceptable and suitable safe food-handling knowledge measure for use among consumers to better understand consumers’ understanding of safe food-handling practices. Use of this measure in future research can improve the measurement of consumer safe food-handling knowledge to allow for better tailoring of future interventions and health campaigns for safe food-handling among consumers.
... Another available recommendation on response options arrangement, even if less frequent than "logical" ordering, is randomizing the order of options. Scrambling options leads to key randomization and can be applied instead of "logical" ordering (Xu, Kauer, & Tupy, 2016) or more restrictively, only to items with options that cannot be "logically" ordered (Gierl et al., 2017). Same as other guidelines, randomizing the order of options aims at removing any position-based keying biases that test developers might unconsciously have deployed when constructing single items (Joseph, 2019). ...
... Use of different assessment methods leads students to different learning approaches (Struyven et al., 2005). Students also vary in their preference for different assessment formats (Xu et al., 2016). Using multiple assessment formats provides a comprehensive picture of students' mastery level (Brookhart & Nitko, 2018;Waugh & Gronlund, 2013). ...
Article
Full-text available
Although studies have been conducted on educators' perceptions of assessment practices , few studies explored students' perceptions of the ethical issues in classroom assessment. A mixed design research method was used to examine factors associated with students' perceptions of the ethicality of classroom assessment practices. A sample of 1996 undergraduate students enrolled in 177 colleges and universities in China participated in the quantitative phase of the study to complete a survey measuring students' perceptions regarding the ethicality of classroom assessment. In the qualitative phase, 579 participants responded to the open-ended questions concerning their justification of the ethicality of the individual assessment situations. Quantitative analyses indicated that students' gender, grade level, major, and program were associated with their perceptions of the ethicality of multiple assessment practices (i.e., multiple assessment, surprise items, considering effort and attendance in grading, and giving feedback). Qualitative analyses showed that conflicting needs of different stakeholders in classroom assessment (i.e., student needs vs. assessment needs, teacher needs vs. assessment needs, student needs vs. student needs) were associated with their perceptions. Findings of the current study offer insights for teachers regarding how to make classroom assessment practices ethical based on diverse needs of stakeholders involved in assessment.
... It is absolutely possible to assess higher-order thinking using multiple-choice questions, though as with any exam, care must be taken in constructing the questions (Hancock, 1994). There are many excellent resources available for the construction of high-quality multiple-choice questions that measure higher-order thinking (see Haladyna, Downing, & Rodriguez, 2002;Scully, 2017;Xu, Kauer, & Tupy, 2016). It is important to include such questions because students who are given assessments that include higher-order thinking develop a better understanding of and memory for the material by the end of the course compared with students who are asked to answer only surface-level types of questions (Jensen, McDaniel, Woodard, & Kummer, 2014). ...
Chapter
Full-text available
While teaching psychology is always demanding, teaching courses about the psychology of learning presents unique challenges for instructors. Learning courses have specialized language and procedures not found in other areas of psychology, students tend to enter courses with certain misconceptions, and published materials related to teaching learning can be lacking. This chapter discusses these and other challenges and potential ways to overcome them. Being aware of these pitfalls can help instructors to understand any confusion students might have or develop about the material and take actions to correct it. Also included is a brief history of learning as a field, and proposed core content and learning outcomes for learning courses. Evidence-based teaching and assessment strategies are discussed in general, along with specific examples pertaining to learning courses. A way of approaching the teaching of operant conditioning based on common student difficulties is also outlined. Lastly, some general teaching tips as well as teaching resources (some general, some specific to learning courses) are provided. Though this chapter is aimed at instructors of learning courses (or those looking for guidance in teaching the learning portion of an introductory psychology course), many of the strategies can be applied widely.
... This bias does not imply non-adherence to item-writing guidelines because no guide has provided any specific recommendations about distractors' placement. However, it confirms that test developers do not usually randomize options order, contrary to recommendations from recent guides (Xu et al., 2016;Gierl et al., 2017). ...
Article
Full-text available
Middle bias has been reported for responses to multiple-choice test items used in educational assessment. It has been claimed that this response bias probably occurs because test developers tend to place correct responses among middle options, tests thus presenting a middle-biased distribution of answer keys. However, this response bias could be driven by strong distractors being more frequently located among middle options. In this study, the frequency of responses to a Chilean national examination used to rank students wanting to access higher education was used to categorize distractors based on attractiveness level. The distribution of different distractor types (best distractor, non-functioning distractors…) was analyzed across 110 tests of 80 five-option items administered to assess several disciplines in five consecutive years. Results showed that the strongest distractors were more frequently found among middle options, most commonly at option C. In contrast, the weakest distractors were more frequently found at the last option (E). This pattern did not substantially vary across disciplines or years. Supplementary analyses revealed that a similar position bias for distractors could be observed in tests administered in countries other than Chile. Thus, the location of different types of distractors might provide an alternative explanation for the middle bias reported in literature for tests’ responses. Implications for test developers, test takers, and researchers in the field are discussed.
... 154). According to Xu, Kauer, and Tupy (2016), formative feedback is "aimed at modifying thinking or behavior to improve learning" and should judge the response rather than the student (p. 151). ...
Thesis
Full-text available
The goal of this study was to examine differences in student learning outcomes and motivation over time as a function of scenario-based e-learning (SBeL) and feedback types(elaborative and intrinsic) as well as the interaction between these two variables. Research Question One asked if there was a statistically significant main effect of scenarios, main effect of feedback type, or interaction between the two variables on learning outcomes over time. Research Question Two asked if there was a statistically significant main effect of scenarios, main effect of feedback type, or interaction between the two variables on participants’ motivation over time. Are searcher-developed instrument was used as a pretest, immediate posttest, and delayed posttest to gauge participants’ knowledge and skill acquisition. Keller’s (2010) Instructional Materials Motivation Survey (IMMS) was used to determine participants’ motivation. The results from these measurements were analyzed using two 2×2 repeated measures analysis of variance (ANOVA) with between-group factors and random assignment. The results showed no statistically significant main effects of scenario level or feedback type over time and no statistically significant interaction between these variables on participants’ knowledge or motivation. Therefore, there was not sufficient evidence to reject the null hypotheses for either research question. As could be expected as a result of receiving the instructional intervention, a statistically significant main effect of time was observed for participants’ scores between the knowledge pretest and immediate knowledge posttest. Interestingly, a statistically significant main effect of time was also observed for participants’ scores between the immediate and delayed knowledge posttests suggesting longer-term gains in knowledge; however, mastery levels decreased from immediate to delayed posttests suggesting there were no gains in skill acquisition. A statistically significant main effect of time on the IMMS instrument was also observed. This could be interpreted as participants having lost motivation after receiving the instructional treatments, which could be due to a lack of confidence in their ability to succeed on the knowledge instrument. The results of this study contribute to research into online learning, SBeL, feedback, and motivation.
... Đặc biệt, lần đầu tiên trong lịch sử đào tạo lái xe ô tô Việt Nam và có lẽ là của cả thế giới, mỗi bài sát hạch lý thuyết còn có 1 câu hỏi tính "điểm liệt" được chọn ngẫu nhiên từ "60 câu hỏi về tình huống mất an toàn giao thông nghiêm trọng" trong bộ đề, chỉ cần trả lời sai câu này dù đúng tất cả những câu còn lại cũng bị trượt [3]. Cách tính "điểm liệt" này hoàn toàn không có trong lý thuyết trắc nghiệm khách quan [4]- [6]. ...
Article
Từ tháng 8 năm 2020, lần đầu tiên trong lịch sử đào tạo lái xe, bài sát hạch lý thuyết có một câu hỏi tính “điểm liệt” từ “60 câu hỏi về tình huống mất an toàn giao thông nghiêm trọng”. Để đánh giá, những câu hỏi này được đem khảo sát với học sinh lớp 11, những người chưa từng được đào tạo lái xe. Kết quả, về độ khó câu hỏi, có 81,6% phù hợp, còn lại là quá khó, nhưng cũng không có câu hỏi nào là hoàn toàn không trả lời được. Có 55% câu hỏi có vấn đề về phương án nhiễu, nghĩa là chưa đảm bảo chất lượng. Về nội dung, không có câu hỏi nào là tình huống thực sự, những vấn đề hỏi không có gì nghiêm trọng hơn câu hỏi bình thường khác và 20% chỉ hỏi về lái xe mô tô. Do vậy, có thể nói rằng những câu hỏi “điểm liệt” này không có tác dụng đánh giá, đo lường năng lực người học mà chỉ gây ra những rủi ro không cần thiết và cả sự mất công bằng trong sát hạch lý thuyết lái xe.
Conference Paper
Full-text available
What happens when we transform graduate students’ learning environments into simultaneous warp zones? It enables us to take what we usually define as distinct learning spaces and mode of delivery and make them interchangeable interaction and collaboration dimensions. In this presentation we will share two cases in which we applied the concept of “warp zones” in learning experience design and the pedagogical approach we adopted, and discuss the challenges we faced and the lessons we learned.
Article
Có thể khẳng định chất lượng đào tạo và sát hạch lý thuyết lái xe phụ thuộc vào chất lượng bộ đề. Dù có một số thay đổi so với bộ đề sát hạch cũ, nhưng qua phân tích độ khó và cách thức biên soạn, có thể nhận định rằng bộ 200 câu hỏi sát hạch lý thuyết lái xe hạng A1 vẫn chưa có chất lượng hơn bộ cũ và khó có thể đánh giá việc học tập lý thuyết lái xe một cách hiệu quả. Cụ thể là chỉ có 0.5% câu hỏi khó, 3.5% trung bình và 96% dễ, trong đó, có đến 43.5% số câu hỏi là hoàn toàn dễ dàng, ai dự sát hạch cũng trả lời đúng. Có 74.5% vi phạm nguyên tắc viết câu hỏi trắc nghiệm nhiều lựa chọn, từ thể thức đến nội dung. Ngoài ra, có thể khẳng định rằng câu hỏi “điểm liệt” chỉ là một cách thức gây khó, chứ không giúp nâng cao hiệu quả. Để cải thiện chất lượng đào tạo và sát hạch lái xe, sửa đổi và hoàn thiện các bộ câu hỏi sát hạch lái xe của tất cả các hạng là một việc làm cấp thiết.
Article
Full-text available
The purpose of this study was to examine students’ perceptions toward essay versus multiple-choice exams. Fifty students from a science education department participated in this study. Overall student-rating data revealed that students showed significantly (p<0.001) more favorable attitudes towards multiple choice test format compared to essay type formats in terms of the most critical dimensions assessed. These findings suggest that student perceptions should be taken into consideration while planning and constructing classroom testing. Suggestions for future research and implications for having students’ perceptions are also discussed. Students are the most affected by the educational testing. Therefore, teachers and measurement specialists should take into consideration of students’ attitudes and perceptions regarding test formats, because they are good sources of information about a test’s face validity besides its content, construct, and predictive validity. It is important to have evidence for face validity of the tests developed by the teachers from the students’ critical perspectives. Feedback from the students on various components or facets of classroom tests is valuable source of information, because their perspectives affect test preparation behavior, student cooperation and test motivation during the exam, and influence the level of test performance and attainment on the exam (Zeidner, 1987). If students have a positive tendency toward a particular test format, the possibility of student cooperation, teacher-student rapport, and test motivation would be enhanced, while aversive emotional reactions and harmful motivational tempers would be lessened. Since students had strong preference for multiple choice over essay type formats in this study, teachers should pay attention and give careful consideration and weight to the multiple choice format, when they are initially planning a classroom test and deciding appropriated item format in a previously planned test.
Article
Full-text available
Article
Full-text available
Montepare suggested the use of a self-correcting approach to multiple-choice tests: Students first take the exam as usual, but are allowed to hand in a self-corrected version afterwards. The idea of this approach is that the additional interaction with the material may foster further learning. To examine whether such an approach actually improves learning, we compared two large sections in psychology: one section used traditional exams and the other section used self-correcting midterm exams. Indeed, compared to the traditional approach, students using the self-correcting approach performed better on the final exam. Moreover, students who self-corrected more items performed better on the final exam above and beyond students' original performance. As a tool to foster students' engagement and learning, the self-correcting approach might be especially useful in large classroom settings.
Article
Full-text available
While enrollment in online courses has tripled in ten years, little is known about the impact of different exam-taking environments on learning. A recent study of economics students found that online students taking un-proctored exams scored one letter grade higher than online students taking proctored exams. However, there were no apparent systematic safeguards against cheating in the un-proctored section. This study adds to the literature by comparing student’s performance in online classes with proctored exams to the performance of online students in a carefully controlled online testing environment (Respondus Lockdown Browser [RLB]). Methods: Data refer to 287 students enrolled in a criminology course at a Carnegie research-extensive university. The experimental group consisted of students in online sections who were called back to campus to take exams in a proctored environment. The control group took exams off campus using RLB. All sections were taught by the same instructor. The dependent variable is the score on the standardized final examination. The central independent variable is the type of exam environment: proctored vs. the RLB environment. Results: Controlling for other constructs, there was no significant difference between student exam scores in the proctored sections and the sections employing RLB. Conclusion: The results suggest that the judicious use of RLB can level the playing field between exam scores in proctored vs. nonproctored online sections. Technology such as RLB, which minimizes opportunities for cheating online, may provide tools for fairness in grading. Future work is needed for other courses, other fields, and other types of academic institutions.
Article
Multiple choice writing guidelines are decidedly split on the use of ‘none of the above’ (NOTA), with some authors discouraging and others advocating its use. Moreover, empirical studies of NOTA have produced mixed results. Generally, these studies have utilized NOTA as either the correct response or a distractor and assessed its effect on difficulty and discrimination. In these studies, NOTA commonly yields increased difficulty when it is used as the correct response, and no change in discrimination regardless of usage. However, when NOTA is implemented as a distractor, rarely is consideration given to the distractor that could have been written in its place. Here, we systematically replaced each distractor in a series of questions with NOTA across different versions of an Introductory Psychology examination. This approach allowed us to quantify the quality of each distractor based on its relative discrimination index and assess the effect of NOTA relative to the quality of distractor it replaced. Moreover, our use of large Introductory Psychology examinations afforded highly stable difficulty and discrimination estimates. We found that NOTA increased question difficulty only when it was the correct response, with no effect on difficulty of replacing any distractor type with NOTA. Moreover, we found that NOTA decreased discrimination when it replaced the most effective distractors, with no effect on discrimination of replacing either the correct response or lowest quality distractor with NOTA. These results replicate the common finding that inclusion of NOTA as the correct response increases question difficulty by equally luring high-performing and low-performing students toward distractors. Moreover, we have shown that including NOTA as a distractor can reduce discrimination if used in lieu of a well written alternative, suggesting that multiple choice authors should avoid using NOTA on multiple choice tests.
Article
Cheating on multiple-choice examinations is a serious problem not easily overcome by using more test forms, more proctors, or larger testing rooms. A statistical procedure compares answers for pairs of students using those items on which both made errors. If the number of identical wrong answers is sufficiently greater than the number expected by chance and if the students were seated close together, then cheating is likely. Using this analysis with 90 examinations has suggested ways to discourage cheating and demonstrated some limitations of the procedure.
Article
The authors experimentally investigated the effects of multiple-choice and short-answer format exam items on exam performance in a college classroom. They randomly assigned 50 students to take a 10-item short-answer pretest or posttest on two 50-item multiple-choice exams in an introduction to personality course. Students performed significantly better on items presented in a multiple-choice format. The high internal validity achieved with matched test items, manipulation of item type order, and manipulation of student expectancy across exams was complemented by high external validity and pedagogical ecology afforded by the college classroom, extending previous laboratory findings. Performance on multiple-choice exams may provide inaccurate information to instructors concerning student learning and overestimate students' learning of course information.
Article
The major aim of the present research was to compare students’ attitudes and dispositions toward teacher-made essay versus multiple-choice type exams. The primary study was conducted on a sample of 174 junior high school students, who were administered a test attitude inventory specifically designed to assess students’ attitudes towards essay versus multiple-choice type formats on a variety of critical dimensions. The study was partially replicated on a sample of 101 seventh- and eighth-grade students who were administered a modified version of the test attitude inventory that was used in the first study. Overall, the data from both studies were remarkably consistent, pointing to more favorable student attitudes towards multiple-choice compared to essay type formats on most dimensions assessed. The practical significance of the results for classroom test construction are discussed and some suggestions are made about potential future applications of test attitude inventories in the classroom setting.