Article

Multiple Choice Randomization

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Multiple-choice randomized (MCR) examinations in which the order of the items or questions as well as the order of the possible responses is randomized independently for every student are discussed. This type of design greatly reduces the possibility of cheating and has no serious drawbacks. We briefly describe how these exams can be conveniently produced and marked. We report on an experiment we conducted to examine the possible effect of such MCR randomization on student performance and conclude that no adverse effect was detected even in a quite large sample.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... – Other research questions: Kataoka et al. [61]; McLeod et al. [75] @BULLET For Section 2.3.2: Outcomes considered – Learning only: Aberson et al. [5]; Ayres and Way [11]; Bolzan [17]; Enders and Diener-West [37]; Cicchitelli and Galmacci [25]; Collins and Mittage [27]; Davies et al. [30]; Giambalvo et al. [45]; Gonzalez et al. [46]; Ip [57]; Kataoka et al. [61]; Lee [68]; Lipson [69]; Luchini et al. [70]; Mahmud and ...
... Robertson [72]; McLeod et al. [75]; Meyer and Lovett [76]; Meyer and Thille [77]; Periasamy [88]; Stangl et al. [101]; Stephenson [103]; Sundefeld et al. [105]; Watson and Kelly [112] – Attitudinal only: Alldredge et al. [8]; Bijker et al. [16] – Both learning and attitudinal: Alldredge and Brown [7]; Alldredge and Som [9]; Dinov and Sanchez [33]; Dutton and Dutton [35]; Hilton and ...
... – Randomized control trial—individual assignment: Enders and Diener- West [37]; Cicchitelli and Galmacci [25]; Davies et al. [30]; Gonzalez et al. [46]; McLeod et al. [75] – Randomized control trial—group assignment: Alldredge et al. [8]; Hilton and Christensen [54] – Observational case-control: Aberson et al. [5]; Alldredge and Brown [7]; Alldredge and Som [9]; Ayres and Way [11]; Bijker et al. [16]; Bolzan [17]; Collins and Mittage [27]; Dinov and Sanchez [33]; Dutton and Dutton [35]; Ip [57]; Kataoka et al. [61]; Lipson [69]; Luchini et al. [70]; Stangl et al. [101]; Stephenson [103]; Utts et al. [109]; Ward [111] – Paired (pre vs. post) or Crossover: Lee [68]; Mahmud and Robertson [72]; Meyer and Lovett [76]; Periasamy [88]; Sundefeld et al. [105]; Watson and Kelly [112] @BULLET For Section 2. ...
Article
In this thesis, I explore the state of quantitative research in the field of statistics education. First, a content review from several prominent sources in statistics education is conducted. Based on this review, recommendations are made for advancing methodological research in this field. Next, the design and analysis of a randomized experiment in an introductory statistics course are presented. In this experiment, factorial and crossover designs were used to explore several implementation aspects of ``clickers'', a technology for collecting and displaying, in real time, student responses to questions posed by the instructor during class. One goal was to determine which aspects were most effective in helping improve engagement and learning; another goal was to explore issues involved with implementing a large-scale experiment in an educational setting. The aspects explored were the number of questions asked, the way those questions were incorporated into the material, and whether clicker use was required or monitored. There was little evidence that clicker use increased engagement but some evidence that it improved learning, particularly when a low number of clicker questions were well incorporated into the material (vs. being asked consecutively). Finally, a strategy for exploiting interactions between design factors and noise variables in the educational context is examined. The objectives of this strategy are: 1) Identify a teaching method that is robust to the effects of uncontrollable sources of variation on the outcome, or 2) Identify when a teaching method should be customized based on a noise variable. Achieving the first objective is desirable when there is heterogeneity in the noise variable within a class, for example, when the noise variable represents characteristics of the students themselves. The second objective involves using information in the interaction to proactively customize a teaching method to particular groups, and is easiest for noise variables measured at the instructor or classroom level.
... To prevent the later distractors from being easily ruled out, Mosier and Price (1945) suggested that the randomization process should include not only the position of the correct answer but also the position of distractors. McLeod, Zhang, and Yu (2003) investigated the effects of randomizing the positions of the response options independently for each student or ordering the distractors logically or numerically within the multiple-choice item. Although the authors hypothesized that logical or numerical ordering of distractors could be an advantage to the students, they did not find any evidence in favor of logical or numerical ordering of distractors. ...
... Although the authors hypothesized that logical or numerical ordering of distractors could be an advantage to the students, they did not find any evidence in favor of logical or numerical ordering of distractors. McLeod et al. (2003) concluded that randomizing the position of distractors for every student could be a better option for reducing the possibility of cheating without adversely affecting the psychometric characteristics of the items. ...
... But this approach is limited to content areas such as mathematics. The other recommendation is to randomize the order of distractors, which can reduce the possibility of cheating and improve test security (McLeod et al., 2003;Mosier & Price, 1945;Schroeder et al., 2012). ...
Article
Full-text available
Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also called the distractors. Despite a vast body of literature on multiple-choice testing, the task of creating distractors has received much less attention. In this study, we provide an overview of what is known about developing distractors for multiple-choice items and evaluating their quality. Next, we synthesize the existing guidelines on how to use distractors and summarize earlier research on the optimal number of distractors and the optimal ordering of distractors. Finally, we use this comprehensive review to provide the most up-to-date recommendations regarding distractor development, analysis, and use, and in the process, we highlight important areas where furt
... A number of studies reported findings of little impact on item and test performance when item positions were changed (Leary & Dorans, 1985;McLeod, Zhang, & Yu, 2003;Ryan & Chiu, 2001;Vander Schee, 2013). In a comprehensive review of research on item order effect, Leary and Dorans (1985) evaluated prior studies on item order effect, including section order, interaction between item and other factors (e.g., anxiety level), item order on item response theory (IRT) item parameter estimates, and IRT true score equating results. ...
... 36). In the study by McLeod et al. (2003), the researchers conducted an experiment by randomizing both the order of questions and the order of response options. This study found that the randomization design would reduce cheating with no serious drawbacks. ...
Article
Rearranging response options in different versions of a test of multiple‐choice items can be an effective strategy against cheating on the test. This study investigated if rearranging response options would affect item performance and test score comparability. A study test was assembled as the base version from which 3 variant versions were created by rearranging the response options of the items. The 4 versions were administered to randomly equivalent samples of approximately 1,200 test takers in an operational administration. The weighted root mean squared difference and the test characteristic curves were computed from the data to assess the differences between the base and its variant versions. The item‐level and test‐level results show very small differences between the base and the 3 variant versions.
... Since this study revealed that MCQs are much more popular for students in compare with other types of questions, this maybe because of their easiness for cheating. [35] Now, there are several ways that could reduce the probability of cheating in MCQs such as test randomization, [36] computer-aided assessment, [37] and automatic test generator. [38] The results of a study for investigating evaluation methods in theoretical and practical courses of laboratory sciences students in Tehran University of Medical Sciences showed that the midterm exam is used in 35% of the total theoretical and practical courses, and among the evaluation methods, the MCQs method with 70% was among the common methods of assessments. ...
Article
Full-text available
INTRODUCTION: Evaluation has become an inseparable part of education process which gives feedback to students and professors to improve education quality. This study aimed to elicit preferences of professors and students about attributes of evaluation method in theoretical courses in Kermanshah University of Medical Sciences, Iran, in 2018. METHODS: Discrete choice experiment (DCE) method used for eliciting preferences of participants of the study. A narrative literature review and interview with eight professors and ten students conducted to determine attributes and levels of evaluation methods in the university. Furthermore, experimental design used for making final choice sets of the evaluation methods. We included 213 students and 30 professors in the study. Conditional logistic regression model performed to data analysis. RESULTS: Most of the professors (36.67%) preferred to allocate up to 30% of evolution scores to midterm examination. However, the most percentage of students (30.45%) were agree to include midterm examination up to 15% of total scores. The majority of students prefer to examination questions compromise just presented materials, while 70% of professors prefer to include additional texts for evaluation examinations. In case of quiz examination, professors in comparison with students prefer that quiz should have higher proportion of total scores. DCE analysis indicated that professors and students preferred a mix of questions in examinations. In addition, additional resources beyond what is taught in class made utility for professors and disutility for students. Quiz, also, increased the utility of an evaluation package in professors. CONCLUSION: The findings showed that there is a gap between preferences of professors and students regarding some attributes of evaluation methods such as student’s discipline, examination materials, and quiz. Further studies are needed to examining other attributes of evaluation methods in theatrical and practical courses in Iran and other contexts.
... We were aware of a plethora of studies and foundational books discussing the possible effects of rearranging the answer options, of the arrangement of the conceptual questions in a different order than they were presented in the text book, and discussing statistical methods for detecting significant differences in the data collected during online testing [14][15][16][17]. We did not find any previous study on questions or options randomization for the BEMA therefore we believe that our study would bring new insight and awareness to the effect of the randomization on the BEMA test results. ...
Preprint
Full-text available
We describe a retrospective study of the responses to the Brief Electricity and Magnetism Assessment (BEMA) collected from a large population of 3480 students at a large public university. Two different online testing setups were employed for administering the BEMA. Our tale focuses on the interpretation of the data collected from these two testing setups. Our study brings new insight and awareness to the effect of the randomization on the BEMA test results. Starting from an a priori common sense model, we show simple methods to detect and separate guessing from the genuinely thought responses. In addition, we show that the group of responses with low scores (7 or less out of 30) had different answer choice strategies than the groups with average or high scores, and that the time-in-testing is an essential parameter to be reported. Our results suggest that the data should be cleaned to insure that only valid times-in-testing are included before reporting and comparing statistics. We analyze in detail the effect of the answer-options randomization and the effect of the shuffling of the questions order. Our study did not detect any significant effect of the option randomization alone on the BEMA scores, for valid times-in-testing. We found clear evidence that shuffling the order of the independent questions does not affect the scores of the BEMA test, while shuffling the order of the dependent questions does.
... Randomizing the position of response options can prevent less plausible distractors from easily being ruled out by the examinees. McLeod, Zhang, and Yu (2003) suggested that randomizing the position of response options can also reduce the possibility of cheating among the examinees and thereby improve the test security. ...
Article
Full-text available
The arrangement of response options in multiple-choice (MC) items, especially the location of the most attractive distractor, is considered critical in constructing high-quality MC items. In the current study, a sample of 496 undergraduate students taking an educational assessment course was given three test forms consisting of the same items but the positions of the most attractive distractor varied across the forms. Using a multiple-indicators–multiple-causes (MIMIC) approach, the effects of the most attractive distractor's positions on item difficulty were investigated. The results indicated that the relative placement of the most attractive distractor and the distance between the most attractive distractor and the keyed option affected students’ response behaviors. Moreover, low-achieving students were more susceptible to response-position changes than high-achieving students.
... Highly speeded tests will also show an order effect, as will tests arranged in order of item difficulty (Kleinke, 1980). Other studies have shown that item order effects are insignificant when the tests are not speeded (e.g., Bresnock, Graves, & White, 1989;McLeod, Zhang, & Yu, 2003;Plake, 1980). The argument can also be made that differences in test performance due to random ordering are error variance which should not be modeled. ...
Conference Paper
Full-text available
While many studies exploring the effect of item order, testing professionals remain concerned about changing or randomizing item order. For this study item difficulty is computed and compared based on the relative position where the item was administered. Strong evidence shows no significant effect of item order on item difficulty.
... Randomization for multiple choice designs has been praised by McLeod, Zhang, and Yu (2003) for educational purposes. As they suggest, randomization provides advantages for preventing cheating and has no adverse effect on student performance. ...
Thesis
Full-text available
Social media today play an increasingly important role in computer science, the information technologies industry and society at large, changing people's everyday communication and interaction. The domain of social media encompasses a variety of services, such as social networking services, collaborative projects, microblogging services and even virtual social worlds and virtual game worlds. There are long established principles, guidelines, and heuristics that apply to social media design and are part of the foundations of human-computer interaction (HCI). For example, in interaction design two set of goals guide the design of systems, usability goals and user experience goals.However, current design and development frameworks are still ill-equipped for the ever-changing online world. Ironically, they fail to take into account the social dimension of social media software. Cracks in the social fabric of a community operating under social media software may have devastating effects, not only to the evolution of the community but also to the longevity of the social media service. As such, social media cannot be developed in isolation, without taking into consideration the social experiences of users. Psychological and sociological principles should become part of the design process of modern social media. My research contributes to this endeavor by focusing on the design and engineering of social experiences on social media services. In my dissertation, I propose that an additional layer be added to the usability and user experience goals. The new layer includes social experience goals, which are further classified as desirable, undesirable and neutral. I produced a new definition for social interaction design that incorporates social experience goals. Building upon previously developed frameworks and models for interaction design, I demonstrated how social interaction design applies to activities such as needfinding, developing alternative designs using prototyping and modeling, developing interactive versions of design and evaluating designs. I presented the benefits of using such framework by focusing on two showcase phenomena deeply rooted in social behaviors - aggression and groupthink. The aim was to demonstrate that social media design and development could be driven by goals that aim to increase collaboration and decrease conflict in a community. I analyzed the effects of different features found in social media today in respect to aggression and groupthink, and found positive evidence to suggest that social interaction, behavior, attitudes and phenomena can be affected by social media design. By examining two vastly diverse social experience goals using quantitative as well as qualitative research methods currently used in HCI, I demonstrated the usefulness of social interaction design in various classifications of social media services such as collaborative projects, social networking sites and even virtual game worlds. In short, I argue that social experience could be engineered through software using the framework I provide for social interaction design.
... It is hard to work out a self-test module which bans the opportunity to interchange correct answers between students and thus resists cheating and avoids repetition of questions. Moreover, a typical self-test consists of a database containing many multiple choice questions from which a random sample is taken and presented to each individual student (McLeod et al., 2003). If the generated database is sufficiently large, this structure will indeed solve the problem of students interchanging correct answers. ...
Article
Full-text available
A learning environment for statistical education aims at providing on-line course material for distance learning. It typically also includes practical exercise material, providing the student the ability to test his/her statistical knowledge in a real-life situation. A desirable aspect of a self-test module is that students cannot cheat and copy the answers from their colleagues. The Elestat project maintains this scenario with the construction of a web-based self-test module that contains many realistic datasets of which the observations are randomized before they are presented to a student, obliging the student to analyze a different dataset each time an exercise is started. The student is guided through the exercises in a step-by-step manner. The open source statistical software package R takes care of calculating the correct solution in real-time and offering immediate feedback. The web-technology is based on a collaboration between Java and the Rserve-interface. The Elestat project is accessible via the website http://www.Elestat.be. INTRODUCTION A growing number of electronic learning environments have begun their march through the field of education. Many e-learning environments for statistical education aim at providing on-line course material for distance learning (Graham et al., 2000, and Stephenson, 2001). Some examples of this material consist of theory content, example exercises, applets and self-tests. Rather than only designing static learning content, the student must be provided with a tool to test his knowledge on the subject. It is hard to work out a self-test module which bans the opportunity to interchange correct answers between students and thus resists cheating and avoids repetition of questions. Moreover, a typical self-test consists of a database containing many multiple choice questions from which a random sample is taken and presented to each individual student (McLeod et al., 2003). If the generated database is sufficiently large, this structure will indeed solve the problem of students interchanging correct answers. But what if an enthusiastic student wants to make use of the self-test very frequently? The probability that this student gets the same question more than once is surely present. In statistics, however, exercises with only multiple choice questions are not always appropriate. A very important topic in the field of applied statistics is the analysis of datasets. Each analysis includes the calculation of data-dependent quantities (e.g., test statistic, p-values, means, confidence intervals). Apart from multiple choice questions, where the correct computed value is listed among the selectable choices, open questions where the student needs to fill out his calculated values, are highly desirable. Although this type of questions may be very good, they still suffer from the drawback that they are not of interest anymore to the student when presented a second time. In this paper, we introduce a web based self-test for statistical data analysis, which generates for every student a randomized dataset. The most important advantage of this approach is that even when a student is provided twice the same problem with the same questions, the data analysis will be different each time. As a consequence, the correct conclusions are possibly different too. This self-test tool is a part of a larger e-learning environment (www.Elestat.be). At Ghent University, the exercise tool has been used in a basic statistics course for students 3 rd Bachelor in bio-engineering sciences. The first section contains a more detailed discussion about the setup of such an electronic exercise environment. In the next section, some technical specifications of the random generator and navigation tools are given. Finally, the results of a survey are discussed in the final conclusion.
... We are assuming that the order of items does not affect students' results. This is consistent with the results obtained in the experiment carried out by McLeod, Zhang, and Yu (2003). ...
Article
Subjects’ decisions in multiple-choice tests are an interesting domain for the analysis of decision making under uncertainty. When the test is graded using a rule that penalizes wrong answers, each item can be viewed as a lottery where a rational examinee would choose whether to omit (sure reward) or answer (take the lottery) depending on risk aversion and level of knowledge. We formalize students as heterogeneous decision makers with different risk attitudes and levels of knowledge. Building on IRT, we compute the optimal penalty given students’ optimal behavior and the trade-off between bias and measurement error. Although MCQ examinations are frequently used, there is no consensus as to whether a penalty for wrong answers should be used or not. For example, examinations for medical licensing in some countries include MCQ sections with penalty while in others there is no penalty for wrong answers. We contribute to this discussion with a formal analysis of the effects of penalties; our simulations indicate that the optimal penalty is relatively high for perfectly rational students but also when they are not fully rational: even though penalty discriminates against risk averse students, this effect is small compared with the measurement error that it prevents.
... C r i s p , E d w a r d J . P a l m e r The order in which questions are presented in a test has been shown to have minimal influence on the overall scores obtained by students (McLeod, Zhang & Hao, 2003). Questions may be arranged according to topic, similarity of concepts, difficulty order or simply at random. ...
Article
Full-text available
The appropriate analysis of students’ responses to an assessment is an essential step in improving the quality of the assessment itself as well as staff teaching and student learning. Many academics are unfamiliar with the formal processes used to analyze assessment results; the standard statistical methods associated with analyzing the validity and reliability of an assessment are perceived as being too difficult for academics with a limited understanding of statistics. This inability of academics to apply conventional statistical tools with authority often makes it difficult for them to make informed judgements about improving the quality of the questions used in assessments. We analyzed students’ answers to a number of selected response assessments and examined different formats for presenting the resulting data to academics from a range of disciplines. We propose the need for a set of simple but effective visual formats that will allow academics to identify questions that should be reviewed before being used again and present the results of a staff survey which evaluated the response of academics to these presentation formats. The survey examined ways in which academics might use the data to assist their teaching and students’ learning. We propose that by engaging academics with a formal reflection of students’ responses, academic developers are in a position to influence academics’ use of specific items for diagnostic and formative assessments.
Article
Assessment Methods in Statistical Education: An International Perspective provides a modern, international perspective on assessing students of statistics in higher education. It is a collection of contributions written by some of the leading figures in statistical education from around the world, drawing on their personal teaching experience and educational research. The book reflects the wide variety of disciplines, such as business, psychology and the health sciences, which include statistics teaching and assessment. The authors acknowledge the increasingly important role of technology in assessment, whether it be using the internet for accessing information and data sources or using software to construct and manage individualised or online assessments.
Article
General chemistry tests from the Examinations Institute of the Division of Chemical Education of the American Chemical Society have been analyzed to identify factors that may influence how individual test items perform. In this paper, issues of item order (position within a set of items that comprise a test) and answer order (position of correct answer relative to incorrect distractors) are discussed. Answer order is identified as potentially important, particularly for conceptually based items. When the correct answer appears earlier among the answer choices, there is some greater propensity for student performance to be better. Item-order effects are also possible, particularly when students encounter several challenging items consecutively. Performance on the next item may be lower than expected, possibly because of cognitive-load effects.
Article
Full-text available
The paper presents the main aspects and implementation of the classification system, database design and software, implementation of a multiple choice examination system for general chemistry in order to generate tests for student evaluation. The testing system was used to generate items for multiple choice examinations for first year undergraduate students in Material Engineering and Environmental Engineering from Technical University of Cluj-Napoca, Romania, which all attend the same General Chemistry course.
Article
Full-text available
Building on Item Response Theory we introduce students’ optimal behavior in multiple-choice tests. Our simulations indicate that the optimal penalty is relatively high, because although correction for guessing discriminates against risk-averse subjects, this effect is small compared with the measurement error that the penalty prevents. This result obtains when knowledge is binary or partial, under different normalizations of the score, when risk aversion is related to knowledge and when there is a pass-fail break point. We also find that the mean degree of difficulty should be close to the mean level of knowledge and that the variance of difficulty should be high.
Article
Three item-difficulty sequence forms of an achievement test were administered to sixth-grade students. No relationship between item-difficulty sequence and test performance, reliability, or item difficulty and discrimination was discovered.
Article
-90 students in introductory psychology responded to a questionnaire designed to assess test-taking strategies on multiple-choice rests. The data suggested that previous studies may not have actually tested item-difficulty sequence effects since item sequence is under examinees' control. Test constructors have been advised to arrange the items in achievement tests in order of increasing difficulty (e.g., Anastasi, 1976; Thorndike & Hagen, 1961) according to the contention that easy items in che beginning instill confidence and minimize the debilitating effects of test anxiety. Evidence supporting this practice has been found for tests administered under both speeded (Flaugher, Melton, & Myers, 1968; Towle & Merrill, 1975) and power conditions (Hambleton & Traub, 1974). Numerous other studies have found results which do not justify the pracrice (e.g., Huck & Bowers, 1972; Marso, 1970). In Hambleton and Traub's study, which found a sequence effect for a power test, control procedures ensured all subjects attempted the items in the order presented. Ostensibly other studies which did not use such controls made the implicit assumption that all subjects used the same strategy of answering item in sequence. The present study tested the validity of this assumption. Method.-Subjects were 39 female and 51 male introductory psychology students. The Examination Strategy Questionnaire, constructed especially for this study, contained 6 questions to assess the strategies students use for taking multiple-choice achievement tests. Question 1 asked the subject to select from a list of strategies the one that best describes how he asadly takes multiple-choice tests. Space was provided for the description of other strategies. Questions 2, 3, and 4 asked the subject to indicate the strategy he woald use to take tests having known item difficulty sequences; namely, hard-to-easy, easy-to-hard, and random sequences, respectively. For Questions 5 and 6, the subject indicated which one of the three types of tests described in Questions 2, 3, and 4 he would most prefer and least prefer to take if given the choice. The questionnaire was administered during the first 15 min. of a class meeting. Resalts.-Sixty-two subjects (69%) reported that they used strategies by which they seek out and answer the easy items first leaving the difficult ones for last. Only 16 (18%) indicated that they usually answer items sequentially without skipping any. Twelve subjects (13%) described other strategies. While 84 subjects (93%) would use their usual strategy for a random sequence, 30 (33%) would use a different strategy for a hard-to-easy sequence, and 21 (23%) would use a different strategy for an easy-to-hard sequence. Fifty-eight
Article
Administered the Test Anxiety Scale for Children and the CMA scale to 332 6th graders. Later Ss were given an intelligence test under a number of experimental conditions designed to induce varying amounts of stress. Results were analyzed by means of 2 (anxiety) * 5 (experimental conditions) * 2 (sex) analyses of covariance, Ss having been classified as high or low anxious on the basis of their anxiety-scale scores. These analyses revealed that none of the effects of the main independent variables or of their interactions were significant. Results do not support either of the hypotheses: that high-anxious Ss will be more adversely affected by stress; and that test anxiety is more directly related to test performance than is general anxiety. (French summary)
Article
After examining this study, you may decide never again to be concerned about item order in your examinations.
  • K D Hopkins
Hopkins, K. D. (1998), Educational and Psychological Measurement and Evaluation, Boston: Allyn and Bacon.
com Rich Text Format, Version 1.5 Specifications, www.biblioscape.com/rtf15_spec.htm Scantron Homepage, www.scantron
  • Perl Homepage
Perl Homepage, www.perl.com Rich Text Format, Version 1.5 Specifications, www.biblioscape.com/rtf15_spec.htm Scantron Homepage, www.scantron.com Statistics Laboratory, University of Western Ontario, www.stats.uwo.ca/statlab A. Ian McLeod Department of Statistical and Actuarial Sciences University of Western Ontario London, Ontario, N6A 5B7