ArticlePDF Available

Using artificial intelligence to create biology multiple choice questions for higher education

Authors:

Abstract and Figures

This study aims to determine the validity, reliability, level of difficulty, and discrimination power of an artificial intelligence (AI)-generated collection of biology questions for higher education. Students’ responses to AI-generated questions are also presented in this study. A sample of 272 students was selected using a random sampling technique to answer a series of multiple-choice questions and complete a questionnaire. Based on the research findings, 20 of the 21 questions generated by ChatGPT AI are valid. Cronbach’s alpha coefficient was determined to be 0.65 (fairly reliable) for the twenty valid questions. Based on student responses to questions generated by ChatGPT’s AI, it was determined that 79% of students indicated that the AI-generated questions were relevant to the class subject. 72% of students reported that the clarity of AI-generated questions was acceptable. 73% of students reported that the accuracy of AI-generated questions was good.
Copyright © 2023 by Author/s and Licensed by Modestum. This is an open access article distributed under the Creative Commons Attribution License which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is proper ly cited.
Agricultural and Environmental Education
2023, 2(1), em002
e-ISSN: 2752-647X
https://www.agrenvedu.com/ Research Article OPEN ACCESS
Using artificial intelligence to create biology multiple choice
questions for higher education
Nanda Eska Anugrah Nasution 1*
1 UIN Kiai Haji Achmad Siddiq Jember, Jaw a Timur, INDONESIA
*Corresponding Author: nsteska@gmail.com
Citation: Nasution, N. E. A. (2023). Using artificial intelligence to create biology multiple choice questions for higher education. Agricultural and
Environmental Education, 2(1), em002. https://doi.org/10.29333/agrenvedu/13071
ARTICLE INFO
ABSTRACT
Received: 11 Mar. 2023
Accepted: 11 Mar. 2023
This study aims to determine the validity, reliability, level of difficulty, and discrimination power of an artificial
intelligence (AI)-generated collection of biology questions for higher education. Students responses to AI-
generated questions are also presented in this study. A sample of 272 students was selected using a random
sampling technique to answer a series of multiple-choice questions and complete a questionnaire. Based on the
research findings, 20 of the 21 questions generated by ChatGPT AI are valid. Cronbachs alpha coefficient was
determined to be 0.65 (fairly reliable) for the twenty valid questions. Based on student responses to questions
generated by ChatGPTs AI, it was determined that 79% of students indicated that the AI-generated questions were
relevant to the class subject. 72% of students reported that the clarity of AI-generated questions was acceptable.
73% of students reported that the accuracy of AI-generated questions was good.
Keywords: ChatGPT, multiple choice questions, artificial intelligence, validity, reliability
INTRODUCTION
Algorithms driven by machine-learning technologies are now gaining maturity. ChatGPT is one such innovation. ChatGPT is an
interactive chatbot created by OpenAI, a California-based artificial intelligence (AI) startup (Susnjak, 2022). OpenAIs ChatGPT is a
comprehensive language model. ChatGPT AI was trained on a massive corpus of text data using a deep learning algorithm to create
replies like those of a human for natural language questions (ChatGPT, 2023). ChatGPT AI bot is now accessible at
https://chat.openai.com/chat.
AI natural language processing (NLP) technologies, such as ChatGPT AI, provide a means through which computers may
engage with human language. A crucial stage in NLP, known as tokenization, is transforming unstructured information into
organized text appropriate for computing (Hosseini et al., 2023). ChatGPT AI is interactive, able to comprehend what is being
requested, and able to deliver it if it meets with application policies and data availability. For example, if you ask a search engine
like Google to offer a list of questions connected to a particular topic, Google will send a link to a website that includes information
relevant to the query you requested. When asking the same command to ChatGPT AI, the application will provide the question in
that column.
The emergence of ChatGPT AI is similar to the emergence of other new innovative technologies that, if used appropriately,
have the potential to benefit education. Despite the fact that ChatGPT AI has the potential to be utilized for activities that are not
acceptable in the academic sector. Students, for example, utilize ChatGPT AI to generate assignments such as essays. However,
teachers may be able to use AI to spot AI-created works.
Teachers can use ChatGPT AI in a variety of ways, including asking information-related questions, confirming the accuracy of
data, reviewing topics, etc. Teachers can request ChatGPT AI to generate multiple-choice questions for tests. Obviously, with its
current version, ChatGPT AI has not been able to create an assessment instrument that can accurately measure a learning
objective if it is not given explicit instructions by an expert or teacher. However, it is not impossible that in the future ChatGPT AI
may be able to generate complex questions if it has access to a huge amount of data and has received extensive training.
A question arises regarding the form of questions that the current version of ChatGPT AI is capable of compiling. How valid and
reliable are the question sets generated by ChatGPT AI? What is the difficulty level of the questions created by the ChatGPT AI?
What do students think about the questions created by ChatGPT AI? is it easy to read or understand? Is it relevant to the material
being studied? Is it comparable to questions posed by humans?
Reliability and validity are, at a minimum, the two most important and essential aspects to consider when evaluating any
measurement instrument or tool used (Mohajan, 2017). A measurement instrument is valid when it measures what it is intended
2 / 11 Nasution / Agricultural and Environmental Education, 2(1), em002
to measure (Muijs, 2011). In other words, if an instrument measures a required variable accurately, it is termed a valid instrument
for that variable (Ghazali, 2016). In comparison, reliability is defined as the degree to which test scores are free of measurement
error (Muijs, 2011). It is a measurement of the stability or internal consistency of an instrument used to measure particular
variable (Jackson, 2003). Multiple-choice questions are regarded to have a high level of reliability since they are scored objectively
(Considine et al., 2005; Haladyna, 1999). Validity and reliability are related. It is possible for an instrument to be reliable but not
valid; however, it cannot be valid if it is not reliable (Jackson, 2003). In other words, a valid instrument must also be reliable
(Ghazali, 2016).
The quality of a multiple-choice questions test instrument can be determined by its validity and reliability, as well as its level
of difficulty and discrimination power (Considine et al., 2005; Friatma & Anhar, 2019; Setiawaty et al., 2017; Rao et al., 2016; Salwa,
2012). The items difficulty corresponds to the proportion of correct responses (McCowan & McCowan, 1999). It is the frequency
with which test-takers select the appropriate response (Thorndike et al., 1991). Items with a higher difficulty index are less difficult.
A question that was answered correctly by 75% of test-takers has a difficulty level of 0.75. A question that was answered correctly
by 35% of test-takers has a difficulty level of 0.35 (McCowan & McCowan, 1999). Item discrimination contrasts the proportion of
high scorers and low scorers who correctly answer a given item. It refers to the degree to which items discriminate between
students in the high and low groups. The whole test and each individual item should assess the same concept. High performers
should be more likely to answer a good question properly, but poor performers should be more likely to do so wrong (McCowan &
McCowan, 1999).
This study aims to determine the validity, reliability, level of difficulty, and discrimination power of an AI-generated collection
of biology questions for higher education. Studentsresponses to AI-generated questions are also presented in this study.
METHODS
This research is a descriptive quantitative analysis to explain the validity and reliability of ChatGPT AIs questions. Before
conducting the research, questions obtained from ChatGPT AI were compiled and administered to students. The steps of research
are described in more detail below.
Accessing ChatGPT Artificial Intelligence
The researcher accessed the ChatGPT AI website in 2023, created an account, and logged into the application. Version 30
January 2023 of ChatGPT AI is in use (Figure 1).
Figure 1. View of the publicly available ChatGPT AI bot landing page after login (ChatGPT, 2023)
Nasution / Agricultural and Environmental Education, 2(1), em002 3 / 11
Creating questions
Researchers ask ChatGPT AI to create questions using the query write me a multiple choice question with one correct answer
option and four wrong answer options about <subject> for bachelors degree, tag the correct answer. <subject> are seven basic
biology studies discussed in high school and university biology subjects. The seven studies and the distribution of questions made
by ChatGPT AI can be seen in Table 1.
In accordance with the request, ChatGPT AI successfully created 21 questions, each with five multiple-choice options, one of
which was the correct answer and four of which were incorrect. ChatGPT AI also marks the correct answer for each question. The
questions created by ChatGPT AI are written in English, which are then translated into Indonesian and evaluated by English and
Indonesian lecturers with expertise in both languages. The researcher then compiled the 21 questions on the Google form and
administered them to students in person. Students were presented with questions in both English and Indonesian. The test is
administered under strict supervision and with closed books to ensure that studentsresponses are based solely on their own
knowledge and not on the assistance of others or the internet/books. The exam is administered in 42 minutes, with two minutes
allotted to each question.
StudentsResponses to Artificial Intelligence-Generated Questions
We gathered student responses to AI-generated questions using the criteria developed by Susnjak (2022) in his research to
assess AI responses. After completing the AI-generated questions, students work on this questionnaire. Students were told that
the questions they had just worked on had been created by AI, and they were given 10 minutes to complete this questionnaire.
Only students who are willing to complete this questionnaire will be eligible (not required for all students).
Table 2 displays the response questionnaire as well as the criteria.
Participants and Data collection
This study was carried out at the department of science education at a state university in East Java, Indonesia. A sample of 272
students were selected using random sampling technique from two study programs, namely biology education and natural
science education. Not all pupils in class do the questions, only those who wish to. And only those students who choose to fill out
a response questionnaire to students are asked to do so. 68% of the students that worked on the questions were those who worked
on the response questionnaire, which only included 185 students.
Majority (38.97%, n=106) of the participants were aged 20 years. This was followed by 21 years (30.88%, n=84), 19 years
(21.32%, n=58), 22 years (6.98%, n=19), 23 years (1.47%, n=4), and 24 years (.36%, n=1). Among the students, 231 (84.92%) were
female and 41 (15%) were male. 133 participants (48.9%) are student of biology education program study, and 139 participants
(51.1%) are student of natural science program study. The participants were picked from all levels in the undergraduate program.
However, majority (50.37%, n=137) of them were third year students. The second year students were 47.42% (n=129), the fourth
year students were 1.1% (n=3), the fifth year students were 0.7% (n=2), and the first year students were only 0.36% (n=1).
Statistical Analysis
All statistical analyses were calculated using IBM SPSS statistics 26 software. The validity of the questions was determined
using Pearson product-moment correlation (Ahrens et al., 2020; Cho et al., 2006; Harahap et al., 2019; Mutmainah & Isdiati, 2022;
Salwa, 2012). The reliability of the questions was determined using the Cronbachs alpha value (Ahrens et al., 2020; Cho et al.,
2006; Harahap et al., 2019; Mutmainah & Isdiati, 2022; Salwa, 2012). The level of difficulty of the questions is determined by the
following formula from McCowan and McCowan (1999):
Difficulty index (P)=# who answered an item correctly/total # tested.
Table 1. Descriptive statistics and Cronbachs alpha coefficient value
Biology subject
Number of questions
Question number
Change and growth
3
1, 2, & 3
Cell
3
4, 5, & 6
Biodiversity
3
7, 8, & 9
Genetics
3
10, 11, & 12
Evolution
3
13, 14, & 15
Ecology
3
16, 17, & 18
Biotechnology
3
19, 20, & 21
Table 2. Questionnaire of student responses to AI-generated questions
Criteria
Relevance
Clarity
Accuracy
Precision
Depth
4 / 11 Nasution / Agricultural and Environmental Education, 2(1), em002
The difficulty level of the questions is described, as follows: Difficult if below 0.3, medium if between 0.3 and 0.7, and easy if
above 0.7. Using the following formula from Salwa (2012), the discrimination power level of the questions is determined:
Discrimination index (D)=# of top test takers who answered an item correctly/total # of top test takers tested (27% of all
students)-# of bottom test takers who answered an item correctly/total # of bottom test takers tested (27% of all students).
The discrimination power of the questions is defined, as follows: poor if below 0.2, adequate if between 0.2 and 0.4, good if
between 0.4 and 0.7, and excellent if over 0.7. If the result is negative, the discrimination power level of the item is inadequate,
and the item must be eliminated. Studentsresponses to AI-generated questions were analyzed descriptively.
RESULTS
ChatGPT Artificial Intelligence-Generated Questions
ChatGPT AI successfully generated 21 questions. Appendix A shows a list of all questions.
Validity
The results of the validity test of all ChatGPT AI-generated questions can be seen in Table 3.
Reliability
The results of the reliability test for all ChatGPT AI-generated questions may be viewed in Table 4 if the invalid question is not
removed (question no. 17), and in Table 5 if the invalid question is removed.
Level of Difficulty and Discrimination Power
The results of the level of difficulty and discrimination power of all ChatGPT AI-generated questions can be seen in Table 6.
Student responses to Artificial Intelligence-Generated Questions
The percentage of student responses to questions generated by ChatGPT AI can be seen in Figure 2.
Table 3. The results of the validity of all ChatGPT AI-generated questions
Question number
Pearson correlation
Sig. (2-tailed)
Description
1
.321**
.000
Valid
2
.211**
.000
Valid
3
.258**
.000
Valid
4
.313**
.000
Valid
5
.539**
.000
Valid
6
.401**
.000
Valid
7
.15**
.000
Valid
8
.365**
.000
Valid
9
.391**
.000
Valid
10
.502**
.000
Valid
11
.478**
.000
Valid
12
.421**
.000
Valid
13
.338**
.000
Valid
14
.318**
.000
Valid
15
.222**
.000
Valid
16
.467**
.000
Valid
17
.038
.533
Invalid
18
.429**
.000
Valid
19
.345**
.000
Valid
20
.39**
.000
Valid
21
.478**
.000
Valid
Table 4. The results of the reliability of all ChatGPT AI-generated questions (invalid question is not removed)
Cronbachs alpha
Number of items
.623
21
Table 5. The results of the reliability of all ChatGPT AI-generated questions (invalid question is removed)
Cronbachs alpha
Number of items
.655
20
Nasution / Agricultural and Environmental Education, 2(1), em002 5 / 11
DISCUSSION
According to Kimberlin and Winterstein (2008), validity is generally described as the degree to which an instrument measures
what it claims to measure. It is necessary for an instrument to be valid so that it may be used to measure its intended subject.
Using Pearson product moment correlation method to assess the validity of the questions, it was determined that 20 out of 21
items were valid, while one item was invalid. The invalid question is number 17, which is related to ecology. The results of the
validity test indicate that 20 of the 21 questions generated by AI are valid and may be used.
The question number 17, which is invalid, asks students to choose the term used to describe the way by which organisms
obtain energy from their environment. Option D (photosynthesis) is the correct answer, selected by 90 students (or 33%). Option
A (metabolism), which 113 students (41.54%) selected, option B (ecosystem), which 40 students (14.7%) selected, option C
(biodiversity), which 21 students (7.7%) selected, and option E (biogeography), which eight students (2.9%) selected, are all
incorrect answer choices. Based on student choices, it was determined that a higher number of students selected option A
(incorrect answer option) than the right answer. It is possible to derive that, first, answer choice A is an excellent diversion, or,
second, there is a problem with question number 17. According to follow-up interviews with three random students who claimed
to have chosen answer option A, they were confused by the question sentences. If the question is how do plants get energy from
their environmentor how do organisms obtain energy from nature, it is likely that the student would choose option D (the correct
one). Yet, given the wording in question 17 is how organisms get energy from their environment, students have misinterpreted the
organisms at issue as animals and plants, and the environment referred to are other living species (by way of prey).
Table 6. The results of the level of difficulty and discrimination power of all ChatGPT AI-generated questions
Question number
Level of difficulty
Discrimination power
Scale
Description
Scale
Description
1
0.87
Easy
0.19
Poor
2
0.15
Difficult
0.21
Adequate
3
0.36
Medium
0.32
Adequate
4
0.74
Easy
0.4
Good
5
0.39
Medium
0.7
Good
6
0.88
Easy
0.29
Adequate
7
0.17
Difficult
0.08
Poor
8
0.71
Easy
0.44
Good
9
0.82
Easy
0.36
Adequate
10
0.85
Easy
0.45
Good
11
0.57
Medium
0.63
Good
12
0.56
Medium
0.55
Good
13
0.70
Medium
0.34
Adequate
14
0.82
Medium
0.29
Adequate
15
0.39
Medium
0.19
Poor
16
0.92
Easy
0.21
Adequate
17
0.33
Medium
0
Poor
18
0.38
Medium
0.53
Good
19
0.88
Easy
0.23
Adequate
20
0.41
Medium
0.51
Good
21
0.93
Easy
0.22
Adequate
Figure 2. The percentage of student responses to questions generated by ChatGPT AI (Source: Author’s own elaboration)
6 / 11 Nasution / Agricultural and Environmental Education, 2(1), em002
In addition, language and sentence issues that may be present in multiple-choice questions created by ChatGPT AI, such as
question number 17 in this research, may be corrected by experts using content and face validity, as suggested Considine et al.
(2005) and Harahap and Nasution (2022). Nevertheless, we did not do so in our research because we wanted to ensure that the
questions generated by ChatGPT AI were free from any human adjustment.
Cronbachs alpha was used to assess the scales internal consistency. Cronbachs alpha coefficient was determined to be
0.623% if item 17 (invalid item) was not removed, and 0.655% if item 17 (invalid item) was removed. The acceptable values for
cronbachs alpha vary according on the source. According to van Griethuijsen et al. (2014), the acceptable values of cronbachs
alpha are 0.7 or 0.6. Arulogun et al. (2020), George and Mallery (2003), Morgan et al. (2004), Rii et al. (2020), Taber (2018), and
Wongpakaran and Wongpakaran (2012) emphasized the same point, that a Cronbachs alpha above 0.6 can be recognized as a
reliable instrument. If this value is adhered to, then the multiple-choice questions generated by ChatGPT AI in this study may be
deemed reliable. Several other resources, however, say that the allowable values for cronbachs alpha are 0.8 or even 0.9; if this
figure is used, the multiple-choice questions created by ChatGPT AI in this study may be regarded unreliable.
By evaluating the level of difficulty of the questions, it was determined that, of the 21 questions created by ChatGPT AI, nine
were classified as easy, 10 were categorized as medium, and two were classified as difficult. It is preferable to use a proportionate
distribution of easy, medium, and difficult multiple-choice questions. In this context, proportionate means that there should be
at least twice as many questions at the medium level as at the easy and difficult levels, with an equal number of questions at the
easy and difficult levels. ChatGPT AI developed multiple choice questions having nearly identical easy and medium levels, and
only two items (9.5%) are classified as difficult. It is preferred if questions with easy and challenging difficulty levels be revised to
be more proportional, or to become questions with a medium difficulty level. Rao et al. (2016) stated that ideally multiple choice
questions have a medium level of difficulty. Of course, this should be revised depending on the aim of the assessment.
By assessing the discriminating power of the questions, it was determined that, of the 21 questions created by ChatGPT AI, 4
had low discrimination power, nine had adequate discrimination power, and the remaining eight had good discrimination power.
Questions with low discrimination power should be modified to have discrimination power that is adequate or greater. There are
no items with negative discrimination power, suggesting that there are no questions that should be deleted based on the
discriminating power analysis. One of the items, however, has a discrimination value of zero, indicating that this item has very
poor discriminatory power, since the number of students who answered this item correctly in the upper group and the lower group
are identical. This question turned out to be number 17, which was classified as invalid based on the validity test, thus it is not
unexpected that this question has very poor discriminating power. Moreover the difficulty index and discrimination index are
reciprocally related (Chauhan et al., 2013; Mehta & Mokhasi, 2014; Rao et al., 2016; Suruchi & Rana, 2014). For instance, if a question
is determined to have a low level of difficulty and poor discriminating power, the question should be revised (Rao et al., 2016).
Based on student responses to questions produced by ChatGPTs AI, it was determined that 79% of students indicated the AI-
generated questions were relevant to the departmental subject they study. This finding suggests that ChatGPT AI is capable of
generating questions pertaining to the specified subject, in this case biology in natural science, including change and growth, cell,
biodiversity, genetics, evolution, ecology, and biotechnology. 72% of students reported that the questions generated by AI were
clear. This suggests that the majority of students are capable of comprehending the questions posed by ChatGPT AI. The clarity of
questions is determined by three survey items. The first item on the questionnaire asks whether the questions generated by
ChatGPT AI are simple to comprehend. 66% of students indicated that the questions were straightforward. The second question
asks if the questions generated by ChatGPT AI are logically structured and ordered. According to 76% of students, the questions
were well-structured and logically ordered. The last question asks if questions generated by ChatGPT AI employ the proper
language. 73% of students feel the question language is suitable. The questions in an assessment must be clear and concise.
Difficult-to-understand questions will surely make it harder for students to answer, and there is a chance that students will
respond erroneously not because of their incompetence but because of an error in the question.
73% of students stated that the AI-generated questions were accurate. This means that the majority of students consider the
questions created by AI to be accurate; they see no grammatical or conceptual errors in the questions. However, you cant depend
just on studentsopinions to confirm the accuracy of a question. Several experts should be consulted to validate the question.
Nevertheless, as stated previously, the questions in this study were not evaluated by professionals in order to determine how the
questions were generated by AI.
74% of students indicated that the questions generated by AI were precise. This suggests that the majority of students consider
AI-generated questions to be explicit and detailed. Students comprehend the intent of the questions and the required responses.
If questions are not made clear and explicit, it is possible that students may have difficulties answering.
71% of students indicated that the questions posed by AI were of sufficient depth. The majority of students found that the
questions generated by ChatGPT AI were challenging, not overly simple, and appropriate for their college or university level. As
was done in this study, measuring the difficulty level of the questions is another method for determining if the questions are too
easy or too difficult. Just two of the twenty-one questions generated by AI are difficult, while nine are quite easy.
The majority of students responded positively to the questions generated by ChatGPTs AI, according to the results of the
student response questionnaire. Therefore, the teacher can use AI to assist him construct an assessment tool, but this must be
complemented by the teachers capacity to provide AI with clear instructions and to verify and optimize the resulting assessment
tool as needed. Further study is required to determine if students can differentiate between questions developed by AI and those
created by humans, as well as their perspectives on the conditions for AI-created questions.
Given that constructing multiple-choice questions is a complex and time-consuming process (Rao et al., 2016), it would be
highly beneficial if AI could aid teachers or the education sector in the future in developing standardized and high-quality multiple-
Nasution / Agricultural and Environmental Education, 2(1), em002 7 / 11
choice questions. Nevertheless, the present version of ChatGPT AI has several limitations, as mentioned by OpenAI on its website
(ChatGPT, 2023), such as the possibility of producing wrong information, harmful instructions, or biased material, and limited
awareness of the world and events after 2021. Quite likely, ChatGPT AI will acquire more data and better training over time,
allowing it to assist its users more effectively.
CONCLUSION
Based on the research findings, twenty of the twenty-one questions generated by ChatGPT AI are valid. Ecology-related
questions are the only question that is invalid. Cronbachs alpha coefficient was determined to be 0.65 for the twenty valid
questions. By assessing the level of difficulty of the questions, it was determined that, of the 21 questions created by ChatGPT AI,
nine were rated as easy, 10 were classified as medium, and two were classified as difficult. By assessing the discriminating power
of the questions, it was determined that, of the 21 questions created by ChatGPT AI, four had low discrimination power, nine had
adequate discrimination power, and the remaining eight had good discrimination power. Based on student responses to
questions generated by ChatGPTs AI, it was determined that 79% of students indicated that the AI-generated questions were
relevant to the class subject. 72% of students reported that the clarity of AI-generated questions was acceptable. 73% of students
reported that the accuracy of AI-generated questions was good. According to 74% of pupils, the accuracy of AI-generated
questions was good. 71% of students reported that the depth of the questions generated by AI was acceptable.
Funding: No funding source is reported for this study.
Ethical statement: Author stated that all participants were over the age of 18 and that their participation was entirely voluntary. The author
also stated that since no personal data was analyzed and pseudonyms were used in this article, no ethics committee approval was required.
Declaration of interest: No conflict of interest is declared by the author.
Data sharing statement: Data supporting the findings and conclusions are available upon request from the author.
REFERENCES
Arulogun, O. T., Akande, O. N. Akindele, A. T., & Badmus, T. A. (2020). Survey dataset on open and distance learning students
intention to use social media and emerging technologies for online facilitation. Data in Brief, 31, 105929.
https://doi.org/10.1016/j.dib.2020.105929
ChatGPT. (2023). ChatGPT. https://chat.openai.com/chat
Chauhan, P. R., Ratrhod, S. P., Chauhan, B. R., Chauhan, G. R., Adhvaryu, A., & Chauhan, A. P. (2013). Study of difficulty level and
discriminating index of stem type multiple choice questions of anatomy in Rajkot. Biomirror, 4(6), 1-4.
Cho, K., Schunn, C. D., & Wilson, R. W. (2006). Validity and reliability of scaffolded peer assessment of writing from instructor and
student perspectives. Journal of Educational Psychology, 98(4), 891-901. https://doi.org/10.1037/0022-0663.98.4.891
Considine, J., Botti, M., & Thomas, S. (2005). Design, format, validity and reliability of multiple choice questions for use in nursing
research and education. Collegian, 12(1), 19-24. https://doi.org/10.1016/S1322-7696(08)60478-3
de Barros Ahrens, R., da Silva Lirani, L., & de Francisco, A. C. (2020). Construct validity and reliability of the work environment
assessment instrument WE-10. International Journal of Environmental Research and Public Health, 17(20), 7364.
https://doi.org/10.3390/ijerph17207364
Friatma, A., & Anhar, A. (2019). Analysis of validity, reliability, discrimination, difficulty and distraction effectiveness in learning
assessment. Journal of Physics: Conference Series, 1387, 012063. https://doi.org/10.1088/1742-6596/1387/1/012063
George, D., & Mallery, P. (2003). SPSS for Windows step by step: A simple guide and reference 11.0 update. Allyn & Bacon.
Ghazali, N. H. M. (2016). A reliability and validity of an instrument to evaluate the school-based assessment system: A pilot study.
International Journal of Evaluation and Research in Education, 5(2), 148-157. http://doi.org/10.11591/ijere.v5i2.4533
Haladyna, T. M. (1999). Developing and validating multiple-choice test items. Lawrence Erlbaum.
Harahap, F., Nasution, N. E. A., & Manurung, B. (2019). The effect of blended learning on students learning achievement and
science process skills in plant tissue culture course. International Journal of Instruction, 12(1), 521-538.
https://doi.org/10.29333/iji.2019.12134a
Harahap, M. P., & Nasution, N. E. A. (2022). Validity of computer based learning media to improve junior high school students
learning outcomes on ecosystem topics. META: Journal of Science and Technological Education, 1(1), 31-45.
Hosseini, M., Rasmussen, L. M., & Resnik, D. B. (2023). Using AI to write scholarly publications. Accountability in Research, 1-9.
https://doi.org/10.1080/08989621.2023.2168535
Jackson, S. L. (2003). Research methods and statistics: A critical thinking approach. Thomson Wadsworth.
Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. American Journal
of Health-System Pharmacy, 65(23), 2276-2284. https://doi.org/10.2146/ajhp070364
McCowan, R. J., & McCowan, S. C. (1999). Item analysis for criterionReferenced tests. Center for Development of Human Services.
https://files.eric.ed.gov/fulltext/ED501716.pdf
8 / 11 Nasution / Agricultural and Environmental Education, 2(1), em002
Mehta, G., & Mokhasi, V. (2014). Item analysis of multiple choice questionsAn assessment of the assessment tool. International
Journal of Health Sciences and Research, 4(7), 197-202. https://doi.org/10.1016/j.mjafi.2020.11.007
Mohajan, A. K. (2017). Two criteria for good measurements in research: Validity and reliability. Annals of Spiru Haret University,
17(3), 58-82. https://doi.org/10.26458/1746
Morgan, P. J., CleaveHogg, D., DeSousa, S., & Tarshis, J. (2004). Highfidelity patient simulation: Validation of performance
checklists. BJA: British Journal of Anaesthesia, 92(3), 388-392. https://doi.org/10.1093/bja/aeh081
Muijs, D. (2011). Doing quantitative research in education with SPSS. SAGE. https://doi.org/10.4135/9781849203241
Mutmainah, I., & Isdiati, A. (2022). Validity and reliability test of a written English test online-based using Google Form.
INTERACTION: Jurnal Pendidikan Bahasa [INTERACTION: Journal of Language Education], 9(1), 89-100.
Rao, C., Kishan, P. H. L., Sajitha, K., Permi, H., & Shetty, J. (2016). Item analysis of multiple choice questions: Assessing an
assessment tool in medical students. International Journal of Education and Psychological Research, 2, 201-214.
https://doi.org/10.4103/2395-2296.189670
Rii, K. B., Choi, L. K., Shino, Y., Kenta, H., & Adianita, I. R. (2020). Application of iLearning Education in Learning Methods for
Entrepreneurship and Elementary School Student Innovation. Aptisi Transactions on Technopreneurship, 2(2), 131-142.
https://doi.org/10.34306/att.v2i2.90
Salwa, A. (2012). The validity, reliability, level of difficulty and appropriateness of curriculum of the English test [PhD thesis,
Diponegoro University].
Setiawaty, R., Sulistyorini, T. B., Margono, & Rahmawat, L. E. (2017). Validity test and reliability of Indonesian language multiple
choice in final term examination. In Proceedings of the 1st International Seminar on Language, Literature and Education (pp. 43-
50). https://doi.org/10.18502/kss.v3i9.2609
Suruchi, S., & Rana, S. S. (2014). Test item analysis and relationship between difficulty level and discrimination index of test items
in an achievement test in biology. Paripex-Indian Journal of Research, 3(6), 56-58. https://doi.org/10.15373/22501991/JUNE
2014/18
Susnjak, T. (2022). ChatGPT: The end of online exam integrity? arXiv. https://doi.org/10.48550/arXiv.2212.09292
Taber, K. S. (2018). The use of Cronbachs alpha when developing and reporting research instruments in science education.
Research in Science Education, 48, 1273-1296. https://doi.org/10.1007/s11165-016-9602-2
Thorndike, R. M., Cunningham, G. K., Thorndike, R. L., & Hagen, E. P. (1991). Measurement and evaluation in psychology and
education. MacMillan.
van Griethuijsen, R. A. L. F., van Eijck, M. W., Haste, H., den Brok, P. J., Skinner, N. C., Mansour, N.,Gencer, A. S., & BouJaoude, S.
(2014). Global patterns in studentsviews of science and interest in science. Research in Science Education, 45(4), 581-603.
http://doi.org/10.1007/s11165-014-9438-6
Wongpakaran, T., & Wongpakaran, N. (2012). A comparison of reliability and construct validity between the original and revised
versions of the Rosenberg self-esteem scale. Psychiatry Investigation, 9(1), 54-58. https://doi.org/10.4306/pi.2012.9.1.54
Nasution / Agricultural and Environmental Education, 2(1), em002 9 / 11
APPENDIX A: ChatGPT AI-GENERATED QUESTIONS
1. Which of the following is NOT a factor that affects aging and biological changes in organisms?
a. Environmental factors
b. Genetic factors
c. Lifestyle choices
d. Planetary alignments
e. Hormonal imbalances
2. Which of the following hormones is primarily responsible for stimulating cell division and growth in animals?
a. Adrenaline
b. Insulin
c. Thyroxine
d. Testosterone
e. Human growth hormone
3. What is the process by which a single cell develops into a multicellular organism called?
a. Differentiation
b. Fertilization
c. Metamorphosis
d. Germination
e. Replication
4. What is the primary function of the endoplasmic reticulum in a eukaryotic cell?
a. Energy production
b. Protein synthesis
c. Waste removal
d. Storage of lipids
e. DNA replication
5. What is the primary function of lysosomes in a eukaryotic cell?
a. Energy production
b. Protein synthesis
c. Waste removal
d. Storage of lipids
e. DNA replication
6. What is the basic unit of life?
a. Atom
b. Molecule
c. Cell
d. Tissue
e. Organ
7. What is the main cause of loss of biodiversity?
a. Climate change
b. Deforestation
c. Overfishing
d. Land use change
e. Increased CO2 emissions
8. What is the definition of biodiversity?
a. The number of different species in a particular habitat
b. The total amount of living matter in a given area
c. The variety of life forms in a particular ecosystem
d. The total mass of all living organisms in a given area
e. The study of different types of plants and animals in a given area
10 / 11 Nasution / Agricultural and Environmental Education, 2(1), em002
9. What is the term used to describe the variety of life on Earth, including the variety of species, ecosystems, and genes?
a. Biodiversity
b. Geodiversity
c. Climate diversity
d. Technology diversity
e. Cultural diversity
10. What is the study of heredity and variation in organisms called?
a. Botany
b. Zoology
c. Genetics
d. Paleontology
e. Geology
11. What type of molecule carries genetic information in cells?
a. Carbohydrates
b. Lipids
c. Proteins
d. Nucleic acids
e. Vitamins
12. What is the scientific method used to determine the sequence of nucleotides in a DNA molecule?
a. Electrophoresis
b. PCR
c. Sequencing
d. Microscopy
e. Chromatography
13. What is the process by which species of organisms change over generations through natural selection?
a. Regression
b. Mutation
c. Adaptation
d. Evolution
e. Stagnation
14. What theory was developed by Charles Darwin and Alfred Russel Wallace to explain the mechanism of evolution?
a. Plate Tectonics Theory
b. General Relativity
c. Theory of Evolution by Natural Selection
d. Quantum Mechanics
e. The Big Bang Theory
15. What is the main driving force behind evolution?
a. Divine intervention
b. Survival of the fittest
c. A fixed process
d. Mutation of species at random
e. The environment alone
16. What is the study of relationships between organisms and their environment known as?
a. Sociology
b. Geology
c. Ecology
d. Meteorology
e. Archaeology
Nasution / Agricultural and Environmental Education, 2(1), em002 11 / 11
17. What is the term used to describe the way organisms obtain energy from their environment?
a. Metabolism
b. Ecosystem
c. Biodiversity
d. Photosynthesis
e. Biogeography
18. What is the term used to describe the variety of life found in a specific location?
a. Biodiversity
b. Ecosystem
c. Biogeography
d. Metabolism
e. Photosynthesis
19. What is the application of scientific and engineering principles to the processing of materials by biological agents to provide goods
and services?
a. Nanotechnology
b. Biotechnology
c. Robotics
d. Geotechnology
e. Information technology
20. What is the process of genetically modifying an organism to produce a desired trait or characteristic called?
a. Cloning
b. Fermentation
c. Hybridization
d. Transfection
e. Gene editing
21. What is the production of a large amount of a specific protein using microorganisms such as bacteria or yeast in bioreactors called?
a. Cloning
b. Fermentation
c. Hybridization
d. Transfection
e. Gene editing
... The emergence of artificial intelligence (Nasution, 2023), virtual reality (Freina & Ott, 2015), augmented reality, and other interactive multimedia, as well as Hasanah, Farihah, & Nasution (2022), are examples of the rapid growth of technology and information. Virtual and remote laboratories have arisen as complementary or additional resources for hands-on biology laboratories (Abdulwahed & Nagy, 2009) and provide answers to hands-on lab limitations. ...
... Davis introduced TAM (1989) as a model for understanding the user adoption of emerging technologies. Davis (1985) developed the technology acceptance model (TAM) based on the principle of reasoned action to establish a further universal theoretical framework for conscientious behavior (Nasution, 2023;Liao, Hong, & Wen, 2018). TAM is probably the most commonly used theoretical model of technology studies (Essel & Wilson, 2017). ...
Article
Full-text available
With the advancement of information and technology, virtual and remote laboratories have become supplementary or extra tools for hands-on biology laboratories. In this study, we modified the technology acceptance model to incorporate three additional external variables derived from flow theory in predicting students' acceptance and use of virtual and remote laboratories. This research included 145 college students. These students used virtual and remote laboratories for at least three months. The learning subjects in this research are deoxyribonucleic acid extraction, polymerase chain reaction, gel electrophoresis, deoxyribonucleic acid microarray, and flow cytometry. Using SPSS 25.0, a multiple regression analysis was performed to test the structural model hypothesis. This study validated the association between the basic variables used in the technology acceptance model: perceived ease of use, perceived usefulness, attitudes toward using, behavioral intention, and actual use. There were no surprising discoveries for the technology acceptance model's primary variables. Concentration and perceived enjoyment in the flow theory variables have an extensive relationship with the technology acceptance model variables, perceived usefulness, and perceived ease of use. Meanwhile, one flow theory variable, time distortion, exhibits no significant relationship with perceived usefulness or ease of use. Abstrak: Laboratorium virtual dan jarak jauh menjadi tren yang dimanfaatkan sebagai alat bantu praktikum biologi. Kami memodifikasi model penerimaan teknologi dalam penelitian ini dengan memasukkan tiga variabel eksternal tambahan yang berasal dari teori flow dalam memprediksi bagaimana mahasiswa menerima dan menggunakan laboratorium virtual dan jarak jauh. Penelitian melibatkan 145 mahasiswa. Para mahasiswa ini telah menggunakan laboratorium virtual dan jarak jauh setidaknya tiga bulan. Materi pembelajaran penelitian ini adalah ekstraksi asam deoksiribonukleat (DNA), polymerase chain reaction (PCR), gel electrophoresis, deoxyribonucleic acid microarray, dan flow cytometry. Hubungan antara variabel dasar yang digunakan dalam technology acceptance model yaitu kemudahan penggunaan yang dirasakan (perceived ease of use), kebergunaan yang dirasakan (perceived usefulness), sikap (attitudes toward using), niat perilaku (behavioral intention), dan penggunaan sebenarnya (actual use) divalidasi dalam penelitian ini. Data yang terkumpul dianalisis regresi berganda dengan bantuan SPSS 25. Tidak ada penemuan mengejutkan untuk variabel utama technology acceptance model. Variabel konsentrasi (concentration) dan kesenangan yang dirasakan (perceived enjoyment) pada teori flow memiliki hubungan yang signifikan dengan variabel technology acceptance model, kebergunaan yang dirasakan dan kemudahan penggunaan yang dirasakan. Sedangkan satu variabel teori flow, distorsi waktu (time distortion) tidak menunjukkan hubungan yang signifikan dengan kebergunaan yang dirasakan atau kemudahan penggunaan yang dirasakan.
... Finally, from what is consulted in Lavery et al. (2020) it seems that there is little practice of ABE applied to validation processes, as if this understanding of how to be sufficiently rigorous to achieve the objectivity and truth of an instrument is not yet closed; and suddenly it is already coexisting with the boom of IAGen specifically with ChatGPT (a Large Language Model, LLM), being considered for psychometric and educational tests (Nasution, 2023), or to have a validation process of different tools such as rubrics. However, this type of approach is something that has already been envisioned with the introduction of Learning Management Systems, or Intelligent Tutoring Systems (ITS) as digital learning support tools that have the potential to create individualized and adaptive practice environments (Schmidt & Strasser, 2022). ...
... To generate the questions, the researchers asked ChatGPT AI to create multiple-choice questions with one correct and four incorrect answer choices on basic biology topics discussed in high school and college education. The questions generated by ChatGPT AI were subsequently evaluated by experts and presented to students in both English and Indonesian under strict supervision to ensure that the answers were based solely on their own knowledge (Nasution, 2023). The above, leads to rethinking the whole process, as we could see elaboration work disappearing and only supervisory work being carried out, especially if we are dealing with the allusions that IAGen has (Lingard, 2023). ...
Article
Full-text available
The evolution of the concept of validity is examined in the context of the integration of Generative Artificial Intelligence and ethical stances, and with it, informed decision-making. The methodology used includes the history of concepts as laid out by Koselleck, analyzing how the concept of validity is a fundamental concept. The method used is a literature review, analyzing historical and contemporary perspectives and arguments from influential authors such as Messick and Kane. This conceptual journey leads us to recognize that validity is not a monolithic entity, but a complex fabric of multiple theoretical and practical threads, ranging from the internal logic of evaluations to the repercussions of their application in society. Furthermore, validity is recognized as a complex construct that cannot be simplified to a single aspect or characteristic of a test or evaluation, differentiating between validity and validation. The five historical periods distinguished in the literature that reflect paradigmatic changes in the understanding of validity were: gestational, crystallization, fragmentation, reunification, deconstruction, culminating with the period of diffusion. The most relevant conclusion is that validity is not static but dynamic, evolving with context and application. It also emphasizes the need for continuous validation adapted to emerging challenges, such as Generative Artificial Intelligence (GenAI), with the goal of ensuring that evaluations are accurate and fair amid a growing trend on ideas of quantum computing.
... Instead of viewing AI as a threat to academic integrity, Nigerian tertiary institutions could incorporate AI tools into the assessment process to enhance its effectiveness. For instance, AI can be used to develop adaptive testing systems that tailor questions to individual student abilities, offering a more personalised and accurate measure of performance (Nasution, 2023). With responsible AI adoption, institutions can innovate their assessment strategies and create more meaningful educational experiences for students. ...
Article
Full-text available
Integrating artificial AI technologies in education has revolutionised teaching, learning, and assessment worldwide. In Nigerian tertiary institutions, students increasingly rely on AI tools for assignments, research, and exam preparation, raising concerns about the integrity of traditional assessment methods. This paper explores the impact of AI technologies on academic performance and the challenges they pose to accurately evaluating student capabilities. It argues for the urgent need to redefine assessment strategies in Nigerian higher education to preserve academic standards while harnessing the benefits of AI. The study highlights ethical concerns such as data privacy, access inequality, and over-reliance on AI tools, which can undermine critical thinking skills. It provides countermeasures and policy recommendations, including establishing AI usage guidelines, promoting equitable access to technology, and integrating assessments that prioritise critical thinking and problem-solving skills. By adopting these innovative policies, Nigerian tertiary institutions can enhance the quality of education and ensure that students develop genuine skills and academic excellence. This paper calls for immediate action to align education with the realities of the AI age, ensuring sustainable and authentic student outcomes.
... Recently, LLMs like GPT (OpenAI, 2023) and LLaMa (Touvron et al., 2023) have demonstrated powerful ability as automatic annotators to label training data (Arora et al., 2022;Gilardi et al., 2023). LLMs have been successfully applied in various fields of NLP, including multi-choice questionanswering task (Bitew et al., 2023;Nasution, 2023;Doughty et al., 2024). However, compared to previous fine-tuned methods, mainstream LLMs often fail to achieve a satisfactory performance on DG, as illustrated in Figure 1. ...
... AI is increasingly being utilized in the field of assessments, particularly in creating assessment questions, providing feedback, and grading. AI-driven tools can generate a variety of question types, such as multiple-choice and short answer, with various difficulty, by analyzing existing content to ensure alignment with learning objectives (Lu et al., 2021;Nasution, 2023;Kic-Drgas and Kılıçkaya, 2024). This automation not only saves educators time but also ensures diversity and coverage across topics. ...
Article
Full-text available
AI systems are now capable of providing accurate solutions to questions presented in text format, causing a major problem in assessment integrity. To address this issue, interactive material can be integrated with the questions, preventing current AI systems from processing the requirements. This study proposes a novel approach that combines two important tools: GeoGebra and Moodle. GeoGebra is a widely used tool in schools and universities for creating dynamic and interactive material in the STEM field. On the other hand, Moodle is a popular learning management system with integrated tools capable of generating multiple versions of the same question to enhance academic integrity. We combine these two tools to automatically create unique interactive questions for each student in a computer-based assessment. Detailed implementation steps that do not require prior coding experience or the installation of additional plugins are presented, making the technique accessible to a wider range of instructors. The proposed approach was tested on a group of students and showed enhanced performance in animation-based questions compared to traditional question formats. Moreover, a survey exploring the students’ opinions on the proposed approach reported strong student endorsement of animated questions.
... To fully realize the benefits of this technological integration, it is essential to comprehend the complex interactions that exist between artificial intelligence and the qualitative components of evaluations. Comprehending these intricacies is essential to fully utilize artificial intelligence (AI) to improve assessment procedures 5 and guarantee conformity with academic goals. This review conducts a thorough investigation to uncover the complex dynamics surrounding how the use of AI influences the creation and calibration of multiple-choice questions (MCQs) in modern medical educational assessments, as well as the potential of AI-based feedback in language learning, with a focus on student motivation and 6 introspection. ...
Article
Full-text available
This systematic review focuses on examining how artificial intelligence is included in multiple-choice questionsand how this affects the efficacy and quality of assessments used in education. Several papers investigating theapplication of artificial intelligence in multiple-choice question creation have been found through a thoroughliterature analysis. The present study employed a systematic literature review to comprehensively analyze theexisting literature and underscore the effects of incorporating artificial intelligence into creating multiplechoicequestions on the standard and efficacy of assessments used in education. Between January 2019 andJanuary 2024, we examined papers from credible publications, concentrating on sixteen chosen articles for indepth examination. The results show how artificial intelligence can revolutionize traditional evaluationmethods in education by improving the accuracy, efficiency, and diversity of multiple-choice questions. Whileartificial intelligence models like ChatGPT, Bard, and Bing have shown encouraging results in creating multiplechoice questions, issues with validity, complexity, and reasoning ability still need to be addressed.Notwithstanding its drawbacks, artificial intelligence-driven multiple-choice question holds great potential forenhancing evaluation processes and enhancing educational opportunities in a variety of subject areas. ThisSystematic review highlights the necessity of further research and advancement to fully utilize artificialintelligence in creating multiple-choice questions and its incorporation into frameworks for educationalassessments.
Article
Full-text available
La evaluación del aprendizaje de los estudiantes es un tema de investigación relevante de la didáctica de la matemática. Evaluar en matemática requiere mucho más que la resolución de un ejercicio. Se trata de evaluar todo el proceso. En este sentido, el diseño de evaluaciones no es trivial ni inmediato. Requiere formación, objetivos claros y propuestas relevantes. En este trabajo se analizan las evaluaciones propuestas por tres futuros profesores de matemática y por tres chatbots basados en modelos de Inteligencia Artificial (IA) generativa. Se comparan los tipos de evaluaciones propuestas sobre nociones de estadística (población y muestra) y se determina la funcionalidad de los chatbots como posibles asistentes para la generación de diferentes tipos de evaluaciones. Se concluye que los chatbots pueden resultar en asistentes valiosos a la hora de crear evaluaciones, ya que ofrecen diferentes tipos de evaluaciones, tanto tradicionales, como puede ser una prueba escrita, como no tradicionales, como un proyecto de investigación.
Article
This study aimed to delineate the learning style profile of class XI IPA 2 students during biology lessons at Madrasah Aliyah Raudlatus Syabab Sukowono. The research was conducted at Madrasah Aliyah Raudlatus Syabab, located at Jl. KH. Ahmad Syukri No. 02, Sumber Wringin, Kec. Sukowono, Kab. Jember, East Java. This study employs a quantitative descriptive approach. This research techniques involve collecting, recording, presenting, and verifying data. The participants in this study consisted of students from class XI IPA 2 and teachers specializing in Biology. The purposive sampling approach is used for selecting subjects or data sources. The data collection methods employed in this research comprise interviews, questionnaires, observation, and documentation. The research findings indicate that the predominant learning style among students studying biology at Madrasah Aliyah Raudlatus Syabab Sukowono Jember is visual, with 7 individuals (50%) exhibiting this style. This is followed by kinesthetic, with 4 individuals (29%), and auditory, with 3 individuals (21%). Teachers should possess the ability to establish a varied educational setting and integrate learning techniques that encompass the utilization of visuals and graphics, verbal explanations and discussions, as well as experiments and physical activities in order to cater to the diverse learning styles of students.
Chapter
Full-text available
Evaluating multiple-choice questions (MCQs) involves either labor intensive human assessments or automated methods that prioritize readability, often overlooking deeper question design flaws. To address this issue, we introduce the Scalable Automatic Question Usability Evaluation Toolkit (SAQUET), an open-source tool that leverages the Item-Writing Flaws (IWF) rubric for a comprehensive and automated quality evaluation of MCQs. By harnessing the latest in large language models such as GPT-4, advanced word embeddings, and Transformers designed to analyze textual complexity, SAQUET effectively pinpoints and assesses a wide array of flaws in MCQs. We first demonstrate the discrepancy between commonly used automated evaluation metrics and the human assessment of MCQ quality. Then we evaluate SAQUET on a diverse dataset of MCQs across the five domains of Chemistry, Statistics, Computer Science, Humanities, and Healthcare, showing how it effectively distinguishes between flawed and flawless questions, providing a level of analysis beyond what is achievable with traditional metrics. With an accuracy rate of over 94% in detecting the presence of flaws identified by human evaluators, our findings emphasize the limitations of existing evaluation methods and showcase potential in improving the quality of educational assessments.
Preprint
Full-text available
This study evaluated the ability of ChatGPT, a recently developed artificial intelligence (AI) agent, to perform high-level cognitive tasks and produce text that is indistinguishable from human-generated text. This capacity raises concerns about the potential use of ChatGPT as a tool for academic misconduct in online exams. The study found that ChatGPT is capable of exhibiting critical thinking skills and generating highly realistic text with minimal input, making it a potential threat to the integrity of online exams, particularly in tertiary education settings where such exams are becoming more prevalent. Returning to invigilated and oral exams could form part of the solution, while using advanced proctoring techniques and AI-text output detectors may be effective in addressing this issue, they are not likely to be foolproof solutions. Further research is needed to fully understand the implications of large language models like ChatGPT and to devise strategies for combating the risk of cheating using these tools. It is crucial for educators and institutions to be aware of the possibility of ChatGPT being used for cheating and to investigate measures to address it in order to maintain the fairness and validity of online exams for all students.
Article
Full-text available
The purpose of this research is to examine the validity and reliability of the multiple-choice items of the Even Semester Final Examination of Bahasa Indonesia subject of class 7A Junior High School 2 of Surakarta (SMPN 2 Surakarta) in the academic year 2015/2016. A qualitative research method is used by involving the primary data from the school's documentation, which in this case is SMP N 2 Surakarta. The respondents of this research comprised 26 students. The discussion of this research shows that based on the five percents of calculation table 0.388, the validity of multiple-choice items of-of significance of 5% with the correlation table of 0.388, the validity of the multiple-choice items of Even Semester Final Examination of Bahasa Indonesia subject that consists of 45 numbers, there are seven valid items, and 38 invalid items. The result of the reliability index calculation of the Even Semester Final Examination of Bahasa Indonesia subject of class 7A, SMP N 2 Surakarta is 0.3657 ≤ 0.6. This means that this instrument cannot be implemented to carry out measurements, so there must be an evaluation to improve its quality. Therefore, r obtained 0.388 with r 11 <r , and it can be said that the result of measuring the Learning Result Test (THB) does not have a significant correlation and cannot be said to be reliable.
Article
Full-text available
The purpose of this study was to validate the construct and reliability of an instrument to assess the work environment as a single tool based on quality of life (QL), quality of work life (QWL), and organizational climate (OC). The methodology tested the construct validity through Exploratory Factor Analysis (EFA) and reliability through Cronbach’s alpha. The EFA returned a Kaiser–Meyer–Olkin (KMO) value of 0.917; which demonstrated that the data were adequate for the factor analysis; and a significant Bartlett’s test of sphericity (χ² = 7465.349; Df = 1225; p ≤ 0.000). After the EFA; the varimax rotation method was employed for a factor through commonality analysis; reducing the 14 initial factors to 10. Only question 30 presented commonality lower than 0.5; and the other questions returned values higher than 0.5 in the commonality analysis. Regarding the reliability of the instrument; all of the questions presented reliability as the values varied between 0.953 and 0.956. Thus; the instrument demonstrated construct validity and reliability
Article
Full-text available
Primary school is an early age in the introduction of information and communication technology, therefore it is necessary to be prepared abilities and skills in the use of technology. Learning about entrepreneurship at the elementary school level is generally still traditional, where valuable renewal results are created. However, in this 4.0 era, many elementary students still traditionally run entrepreneurship, this has not been a challenge in line with the Ministry of Education and Culture on an independent campus. In order to create the involvement of young people in the field of entrepreneurship, the right solution is how entrepreneurship learning in Learning can be applied from an early age, namely to elementary school students. SEP (School Enrichment Program) is an entrepreneurial learning application based on iLearning aimed at elementary school students to have a high quality of creativity and a willingness to innovate at an early age. Based on the observational test results the Ubiquitous Learning Method is significantly able to influence the motivation of elementary school students to be enthusiastic in terms of entrepreneurial learning from an early age, and to show the results that Cronbach's Alpha 0.9> 0.6 ie the SEP is very accurate in its application especially can improve the results significant in influencing the formation of intentions in entrepreneurship even more starting to spread the trend of entrepreneurship which has now touched various circles, one of them among students.
Article
Full-text available
Open and Distance Learning (ODL) students rely majorly on the use of Information, Communication and Technology (ICT) tools for online facilitation and other activities supporting learning. With emphasis on ODL students of Ladoke Akintola University of Technology (LAUTECH), Oyo Sta te, Nigeria; Moodle Learning Management System (LMS) has being the major medium for online facilitation for the past 5 years. Therefore, this data article presents a survey dataset that was administered to LAUTECH ODL students with a view to assess their readiness to accept and use alternative social media platforms and emerging technologies for online facilitation. The data article also includes questionnaire instrument administered via google form, 900 responses received in spreadsheet formats, chats generated from the responses, the Statistical Package of the Social Sciences (SPSS) file, the descriptive and reliability statistics for all the variables. Authors believe that the dataset will guide policy makers on the choice of social media and emerging technologies to be adopted as a facilitation tool for ODL students. It will also reveal the challenges that could militate against the willingness to use these supplementary modes of learning from students’ perspectives.
Article
Full-text available
A valid, reliable and practical instrument is needed to evaluate the implementation of the school-based assessment (SBA) system. The aim of this study is to develop and assess the validity and reliability of an instrument to measure the perception of teachers towards the SBA implementation in schools. The instrument is developed based on a conceptual framework developed by Daniel Stufflebeam, that is the CIPP (context, input, process and product) Evaluation Model. The instrument in the form of questionnaire is distributed to a sample of 120 primary and secondary school teachers. The response rate is 80 percent. The content validity is assessed by the experts and the construct validity is measured by Exploratory Factor Analysis (EFA). The reliability of the instrument is measured using internal consistence reliability, which is measured by alpha coefficient reliability or Cronbach Alpha. The finding of this pilot study shows that the instrument is valid and reliable. Finally, out of 71 items, 68 items are retained.
Article
Full-text available
Reliability and validity are the two most important and fundamental features in the evaluation of any measurement instrument or tool for a good research. The purpose of this research is to discuss the validity and reliability of measurement instruments that are used in research. Validity concerns what an instrument measures, and how well it does so. Reliability concerns the faith that one can have in the data obtained from the use of an instrument, that is, the degree to which any measuring tool controls for random error. An attempt has been taken here to review the reliability and validity, and threat to them in some details.
Article
The purpose of this research was to know the effect of blended learning strategy on learning achievement and science process skills of students in plant tissue culture course in the Universitas Negeri Medan. The research method was quasi experiment. The population of this study was all semester VIII students of biology education program. The study sample consists of two classes, namely: class A known as control class who were taught by conventional learning strategy and class C known as experiment class who were taught by blended learning strategy. They were selected using cluster random sampling technique. The results of the study showed that tcount = 3.769, P= 0.001 at the level significance of 0.05 in learning achievements score. It also showed that tcount = 5.435 > ttable = 1.661 P= 0.001 at the level significance of 0.05 in science process skills score. Based on the study result, it can be concluded the blended learning strategy found significantly more effective in enhancing students' learning achievement and science process skills in plant tissue culture course as compared to the conventional learning strategy.