Article

Semantic Similarity Measures for the Generation of Science Tests in Basque

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The work we present in this paper aims to help teachers create multiple-choice science tests. We focus on a scientific vocabulary-learning scenario taking place in a Basque-language educational environment. In this particular scenario, we explore the option of automatically generating Multiple-Choice Questions (MCQ) by means of Natural Language Processing (NLP) techniques and the use of corpora. More specifically, human experts select scientific articles and identify the target terms (i.e., words). These terms are part of the vocabulary studied in the school curriculum for 13-14-year-olds and form the starting point for our system to generate MCQs. We automatically generate distractors that are similar in meaning to the target term. To this end, the system applies semantic similarity measures making use of a variety of corpus-based and graph-based approaches. The paper presents a qualitative and a quantitative analysis of the generated tests to measure the quality of the proposed methods. The qualitative analysis is based on expert opinion, whereas the quantitative analysis is based on the MCQ test responses from students in secondary school. Nine hundred and fifty one students from 18 schools took part in the experiments. The results show that our system could help experts in the generation of MCQ.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In previous works [37], when used for concept-learning scenarios, given a text, ArikIturri constructed the MCQs for target terms selected by human experts. ArikIturri used the terms as the seeds to generate the MCQs. ...
... ArikIturri uses six different heuristics to automatically generate distractors that are similar in meaning to the target term. Based on the evaluation results of [37], the authors chose those heuristics that produced better results in concept learning scenarios. The heuristic applies different strategies considering, on the one hand, the part-of-speech of the target terms and their semantic features, and, on the other hand, regarding whether the term is a monosemous or polysemous noun and whether it appears in Basque WordNet [47] or not. ...
... Next, the heuristics used for each type of target term are briefly mentioned. A detailed description of the techniques is included in [37]. ...
Article
In a concept learning scenario, any technology-supported learning system must provide students with mechanisms that help them with the acquisition of the concepts to be learned. For the technology-supported learning systems to be successful in this task, the development of didactic material is crucial—a hard task that could be alleviated by means of an automation process. In this proposal, two systems which have been previously developed, ArikIturri and DOM-Sortze, are combined to automatically generate multiple-choice questions, based on pedagogically relevant information gathered in textbooks. Originally, the former was able to generate multiple-choice questions from plain texts; and the latter was able to elicit learning objects based on didactic material explicitly represented in electronic textbooks, i.e., definitions, examples, and exercises. This article presents an approach for the automatic generation of multiple-choice questions from learning objects extracted from textbooks. Specifically, ArikIturri uses as input the texts gathered in the learning objects elicited by DOM-Sortze and, using natural language processing techniques, generates multiple-choice questions. This way, considering domain-relevant information from the textbooks, test-type exercises which were not previously elicited by DOM-Sortze are created. In summary, this new approach is able to enrich domain modules of technology-supported learning systems. The proposal has been tested with a textbook which is written in the Basque language and the results show that the generated exercises are suitable to be used in science learning scenarios at secondary school.
... Many MCQ automatic creation systems have been proposed with different distractor generation strategies. In this paper, we are interested in the systems based on the use of semantics [5][6][7][8][9][10][11][12][13][14]. Most of these semantic-based systems attempt to generate distractors that are somehow semantically similar to the correct answer. ...
... This knowledge is typically a set of concepts extracted from a nonstructured text (corpus-based methods) or an ontology (graphbased methods). Corpus-based methods compute their corpus processing the contents of an input text [15,16], the Wikipedia [5,6] or online dictionaries [2,7]. Graph-based methods apply domain ontologies [8,9,[11][12][13] or conceptual maps [10] that represent the contents of a course. ...
... On the other hand, the strategies for generating distractors are varied. The corpus-based methods analyze the meaning of words in the context of questions and, then, select those words that could be used as distractors using the notion of synonymy [2], topic-similarity techniques [5,6,15], or statistical techniques [7,14]. Unlike these solutions, [16] generates natural language questions from the input text and, then, determines phrase-level distractors to evaluate the reading comprehension ability of language learners. ...
... Concretely, the present work is focused on the generation of Multiple-Choice Questions (MCQs). In [1], ArikIturri generated the MCQs for manually selected target terms, i.e., human experts were responsible for selecting the target terms to be assessed. These terms were the seeds given to ArikIturri to generate the MCQs. ...
... In all the cases, the target terms are nouns because the topics of the LDOs are nouns. Based on the part-of-speech of the target terms and their semantic features, different heuristics are applied (a more detailed explanation of the methods can be found in [1]). In the case of multiwords, the distractors are obtained for the head following the same strategy as for single word target terms. ...
Conference Paper
DOM-Sortze is a framework for the semiautomatic generation of Domain Modules from textbooks. It identifies not only topics and relationships between topics but also Learning Objects (e.g., definitions, examples, problem-statements) included in an electronic document. ArikIturri is a NLP-based system designed to automatically generate test-based exercises from corpora. To enrich the Learning Object Repository of DOM-Sortze with new test-based exercises, both systems have been integrated. The experiment conducted to verify the validity of the proposal is described throughout the paper.
... Thus, many automated MCQ grading systems, also known as optical mark reading (OMR), have been presented. Accordingly, various methods have been proposed for automated generation of MCQs [3,4]. ...
Preprint
In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extracted answer boxes, we address the problem from another perspective by training a machine learning classifier to recognize the class of each answer box (i.e., confirmed, crossed out, or blank answer). This gives us the ability to deal with a variety of shading and mark patterns, and distinguish between chosen (i.e., confirmed) and canceled answers (i.e., crossed out). All existing machine learning techniques require a large number of examples in order to train a model for classification, therefore we present a dataset including six real MCQ assessments with different answer sheet templates. We evaluate two strategies of classification: a straight-forward approach and a two-stage classifier approach. We test two handcrafted feature methods and a convolutional neural network. In the end, we present an easy-to-use graphical user interface of the proposed system. Compared with existing OMR systems, the proposed system has the least constraints and achieves a high accuracy. We believe that the presented work will further direct the development of OMR systems towards reducing the restrictions of the MCQ tests.
... The first research question of the current study specifically addressed the primary intention of this systematic research i.e., analyzing the reported impact of AI-enhanced learning on students' learning outcomes in science education. The empirical papers reviewed showed that artificial intelligence has been used within science education for a variety of purposes, such as engaging students in the learning process with a strong sense of motivation and interest (Balakrishnan, 2018), generating tests of science subjects (Aldabe & Maritxalar, 2014;Nasution & Education, 2023), scoring and providing personalized feedback on students' assignments (Azcona et al., 2019;Maestrales et al., 2021;Mirchi et al., 2020), and predicting student performance (Blikstein et al., 2014;Buenaño-Fernández et al., 2019;Jiao et al., 2022a, b). ...
Article
Full-text available
The use of Artificial Intelligence (AI) in education is transforming various dimensions of the education system, such as instructional practices, assessment strategies, and administrative processes. It also plays an active role in the progression of science education. This systematic review attempts to render an inherent understanding of the evidence-based interaction between AI and science education. Specifically, this study offers a consolidated analysis of AI’s impact on students’ learning outcomes, contexts of its adoption, students’ and teachers’ perceptions about its use, and the challenges of its use within science education. The present study followed the PRISMA guidelines to review empirical papers published from 2014 to 2023. In total, 74 records met the eligibility for this systematic study. Previous research provides evidence of AI integration into a variety of fields in physical and natural sciences in many countries across the globe. The results revealed that AI-powered tools are integrated into science education to achieve various pedagogical benefits, including enhancing the learning environment, creating quizzes, assessing students’ work, and predicting their academic performance. The findings from this paper have implications for teachers, educational administrators, and policymakers.
... For educators, AI tools serve to collect, analyse, and interpret educational data, furnishing valuable insights into student performance (Wang et al., 2011). By identifying areas where students may encounter difficulties or excel, teachers can proactively intervene and adjust teaching strategies accordingly (Aldabe & Maritxalar, 2014). AI also holds the potential to automate administrative tasks, such as grading and assessment, thereby enabling educators to allocate more time and resources to impactful teaching practices (Holmes et al., 2023). ...
Article
Full-text available
In the ever‐evolving AI‐driven education, integrating AI technologies into teaching practices has become increasingly imperative for aspiring STEM educators. Yet, there remains a dearth of studies exploring pre‐service STEM teachers' readiness to incorporate AI into their teaching practices. This study examined the factors influencing teachers' willingness to integrate AI (WIAI), especially from the perspective of pre‐service STEM teachers' attitudes towards the application of AI in teaching. In the study, a comprehensive survey was conducted among 239 pre‐service STEM teachers, examining the influences and interconnectedness of Technological Pedagogical Content Knowledge (TPACK), Perceived Usefulness (PU), Perceived Ease of Use (PE), and Self‐Efficacy (SE) on WIAI. Structural Equation Modeling (SEM) was employed for data analysis. The findings illuminated direct influences of TPACK, PU, PE, and SE on WIAI. TPACK was found to directly affect PE, PU, and SE, while PE and PU also directly influenced SE. Further analysis revealed significant mediating roles of PE, PU, and SE in the relationship between TPACK and WIAI, highlighting the presence of a chain mediation effect. In light of these insights, the study offers several recommendations on promoting pre‐service STEM teachers' willingness to integrate AI into their teaching practices. Practitioner notes What is already known about this topic? The potential of AI technologies to enrich learning experiences and improve outcomes in STEM education has been recognized. Pre‐service teachers' willingness to integrate AI into teaching practice is crucial for shaping the future learning environment. The TAM and TPACK frameworks are used to analyse teacher factors in technology‐supported learning environments. Few studies have been conducted for examining factors of pre‐service teachers' willingness to integrate AI into teaching practices in the context of STEM education. What this paper adds? A survey was designed and developed for exploring pre‐service STEM teachers' WIAI and its relationships with factors including TPACK, PE, PU, and SE. TPACK, SE, PU, and PE have direct impact on pre‐service STEM teachers' WIAI. SE, PU, and PE have been identified as mediating variables in the relationship between TPACK and WIAI. Two sequential mediation effects, TPACK → PE → SE → WIAI and TPACK → PU → SE → WIAI, among pre‐service STEM teachers were further identified. Implications of this study for practice and/or policy Pre‐service STEM teachers are encouraged to explore and utilize AI technology to enhance their confidence and self‐efficacy in integrating AI into teaching practices. Showcasing successful cases and practical experiences is essential for fostering awareness of AI integration in STEM education. It is recommended to introduce AI education courses in teacher training programs. Offering internship and practicum opportunities related to AI technologies can enhance their practical skills in integrating AI into education.
... Machine learning and data mining emerged as the second most prominent technology, with a count of 16. This category encompasses a range of techniques and algorithms, such as Bayesian networks (Dettweiler et al., 2017;Hagger & Hamilton, 2018;Jiang et al., 2023), genetic algorithms (Yin et al., 2016), natural language processing (Aldabe & Maritxalar, 2014), computer simulation functions (Magana et al., 2019), visual recognition (Wu & Yang, 2022), convolutional neural networks , decision trees (Biehler & Fleischer, 2021;Göktepe Körpeoğlu & Göktepe Yıldız, 2023), and image classification (Martins et al., 2023). The use of machine learning and data mining in AISE research demonstrates the increasing recognition of the potential for analyzing and deriving insights from large datasets to support science education. ...
Article
Full-text available
The use of artificial intelligence has played an important role in science teaching and learning. The purpose of this study was to fill a gap in the current review of research on AI in science education (AISE) in the early stage of education by systematically reviewing existing research in this area. This systematic review examined the trends and research foci of AI in the science of early stages of education. This review study employed a bibliometric analysis and content analysis to examine the characteristics of 76 studies on Artificial Intelligence in Science Education (AISE) indexed in Web of Science and Scopus from 2013 to 2023. The analytical tool CiteSpace was utilized for the analysis. The study aimed to provide an overview of the development level of AISE and identify major research trends, keywords, research themes, high-impact journals, institutions, countries/regions, and the impact of AISE studies. The results, based on econometric analyses, indicate that AISE has experienced increasing influence over the past decade. Cluster and timeline analyses of the retrieved keywords revealed that AI in primary and secondary science education can be categorized into 11 main themes, and the chronology of their emergence was identified. Among the most prolific journals in this field are the International Journal of Social Robotics, Educational Technology Research and Development, and others. Furthermore, the analysis identified that institutions and countries/regions located primarily in the United States have made the most significant contributions to AISE research. To explore the learning outcomes and overall impact of AI technologies on learners in primary and secondary schools, content analysis was conducted, identifying five main categories of technology applications. This study provides valuable insights into the advancements and implications of AI in science education at the primary and secondary levels.
... In addition, since exercise difficulty increases with distractor plausibility, target similarity can be adjusted according to the learner's proficiency (Alsubait et al., 2015;Chen et al., 2015;Correia et al., 2012). Similarity can target the surface form (Jiang and Lee, 2017), linguistic complexity (Lee and Seneff, 2007;Susanti et al., 2018), phonetics , morphology (Goto et al., 2010), syntax (Guo et al., 2016), or semantics (Susanti et al., 2015) and be based on NLP tools including part-of-speech taggers (Liu et al., 2005), latent semantic analysis (Aldabe and Maritxalar, 2014) and word embedding models Yeung et al., 2019), on external resources such as ontologies (Papasalouros et al., 2008), WordNet (Mitkov et al., 2006;Brown et al., 2005) or FrameNet , or else on statistical methods including classification (Welbl et al., 2017;Gao et al., 2020), regression and deep learning (Liang et al., 2018). If the final candidate selection is not based on the ranking, it may be left to the user (Nikolova, 2009), or done randomly (Araki et al., 2016;Gutl et al., 2011). ...
... Most of the study in automatic question generation is based on the English language [5]; however, works on French [6], Basque [7], Russian [8], Chinese [9] and Thai [10] can also be found. This work focuses, instead, on European Portuguese. ...
Conference Paper
Artificial Intelligence (AI) has seen numerous applications in the area of Education. Through the use of educational technologies such as Intelligent Tutoring Systems (ITS), learning possibilities have increased significantly. One of the main challenges for the widespread use of ITS is the ability to automatically generate questions. Bearing in mind that the act of questioning has been shown to improve the students learning outcomes, Automatic Question Generation (AQG) has proven to be one of the most important applications for optimizing this process. We present a tool for generating factual questions in Portuguese by proposing three distinct approaches. The first one performs a syntax-based analysis of a given text by using the information obtained from Part-of-speech tagging (PoS) and Named Entity Recognition (NER). The second approach carries out a semantic analysis of the sentences, through Semantic Role Labeling (SRL). The last method extracts the inherent dependencies within sentences using Dependency Parsing. All of these methods are possible thanks to Natural Language Processing (NLP) techniques. For evaluation, we have elaborated a pilot test that was answered by Portuguese teachers. The results verify the potential of these different approaches, opening up the possibility to use them in a teaching environment.
... Automatic item generation (AIG) is a promising approach to reduce the cost of test development. AIG methods have been used in generating different types of questions, such as reading comprehen- sion (Rus et al., 2007;Mostow et al., 2017) and vocabulary assessment (Mitkov et al., 2006(Mitkov et al., , 2009Aldabe and Maritxalar, 2014). Due to its high efficiency and controllability, automatic item generation has been used to create solutions and rationales for Computerized Formative Testing (Gierl and Lai, 2018). ...
... Also, MCQ systems have been developed in various languages and domains. In the literature, we found MCQ systems exist in many languages including English ( [21], [22], [23], and many more), Portuguese ( [24], [25]), Basque ( [26], [27], [28]), French [29], Russian [30] and Chinese ( [31], [7], [32]). Also automatic MCQ generation has been performed in several domains, including, language learning ( [5], [33], [34]), grammar & vocabulary learning ( [35], [36], [37]), science ( [38], [39]), history ( [40], [41]), general science ( [42], [43]), biology & medical ( [44], [45], [46], [47], [48], [9], [49]), technology ( [50], [51], [52], [53], [54], [55]), generic domain ( [56], [32], [57], [58]), e-learning & active learning ( [59], [60], [61], [62], [11], [10], [63]), sports & entertainment ( [64], [65], [66], [12], [67], [68], [69]) etc. Automatic MCQ generation is still an active research area. ...
Article
Automatic Multiple Choice Question (MCQ) generation from a text is a popular research area. MCQs are widely accepted for large-scale assessment in various domains and applications. However, manual generation of MCQs is expensive and time-consuming. Therefore, researchers were attracted towards automatic MCQ generation since the late 90's. Since then, many systems have been developed for MCQ generation. We perform a systematic review of those systems. This paper presents our findings on the review. We outline a generic workflow for an automatic MCQ generation system. The workflow consists of six phases. For each of these phases, we find and discuss the list of techniques adopted in the literature. We also study the evaluation techniques for assessing the quality of the system generated MCQs. Finally, we identify the areas where the current research focus should be directed toward enriching the literature.
... As a consequence, the number of shared-task competitions and conferences was increased in the past few Some examples of e-NLP techniques are the following. [Aldabe, 2014] proposed the automatic generation of Multiple-Choice Questions (MCQ) by means of Natural Language Processing (NLP) techniques. [Koukourikos, 2012] proposed the introduction of sentiment analysis techniques on user comments regarding an educational resource to extract the opinion of a user about its quality, and accounting for the user's perception before proposing the resource to another user. ...
Article
Full-text available
Information overload is one of the main challenges in the current educational context, where the Internet has become a major source of information. According to the European Space for Higher Education, students must now be more autonomous and creative, with lecturers being required to provide guidance and supervision. Guiding students to search and read news related to subjects that are being studied in class has proven to be an effective technique in improving motivation, because students appreciate the relevance of the topics being studied in real world examples. However, one of the main drawbacks of this teaching practice is the amount of time that lecturers and students need for searching relevant and useful information on different subjects. The objective of our research is to demonstrate the usefulness of a complementary teaching tool in the traditional educational classroom. It is a new educational platform that combines Artificial Intelligence techniques with the expertise provided by lecturers. It automatically compiles information from different sources and presents only relevant breaking news classified into different subjects and topics. It has been tested on a Finance course, where being continually informed about the latest economic and financial news is an important part of the teaching process, specially for certain key financial concepts. The utility of the platform has been studied by conducting student surveys. The results confirm that using the platform had a positive impact on improving students' motivation and boost the learning processes. This research provides evidence about effectiveness of the new educational complement to traditional teaching methods in classrooms. Also, it demonstrates the improvement on the knowledge transfer within an environment of information overload.
... The method they used can be mainly divided into two genres (corpus-based and graphbased). In corpus-based method, they use context words to compute the similarity between different words based on latent semantic analysis (LSA) [6]. LSA is a theory and method for extracting and representing the meaning of words by statistical computations applied to a corpus, and it has yielded encouraging results in a number of educational applications. ...
... The method they used can be mainly divided into two genres (corpus-based and graphbased). In corpus-based method, they use context words to compute the similarity between different words based on latent semantic analysis (LSA) [6]. LSA is a theory and method for extracting and representing the meaning of words by statistical computations applied to a corpus, and it has yielded encouraging results in a number of educational applications. ...
Chapter
Full-text available
Performance appraisal has always been an important research topic in human resource management. A reasonable performance appraisal plan lays a solid foundation for the development of an enterprise. Traditional performance appraisal programs are labor-based, lacking of fairness. Furthermore, as globalization and technology advance, in order to meet the fast changing strategic goals and increasing cross-functional tasks, enterprises face new challenges in performance appraisal. This paper proposes a data mining-based performance appraisal framework, to conduct an automatic and comprehensive assessment of the employees on their working ability and job competency. This framework has been successfully applied in a domestic company, providing a reliable basis for its human resources management.
... Thus, many automated MCQ grading systems, also known as optical mark reading (OMR), have been presented. Accordingly, various methods have been proposed for automated generation of MCQs [3,4]. ...
Article
Full-text available
In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from the answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extracted answer boxes, we address the problem from another perspective by training a classifier to recognize the class of each answer box (i.e. confirmed, crossed out, and blank answer). This gives us the ability to deal with a variety of shading and mark patterns, and distinguish between chosen and canceled answers. All existing machine learning techniques require a large number of examples in order to train the classifier, therefore we present a dataset that consists of five real MCQ tests and a quiz that have different answer sheet templates. We evaluate two strategies of classification: a straight-forward approach and a two-stage classifier approach. We test two handcrafted feature methods and a convolutional neural network. At the end, we present an easy-to-use graphical user interface of the proposed system. Compared with existing OMR systems, the proposed system is considered of a higher accuracy and the least constraints. We believe that the presented work will further direct the development of OMR systems towards template-free MCQ tests without restrictions.
... The strategies can be applied to develop a large set of questions. In [18], a vocabulary-learning scenario in a Basque-language is used to support teachers by generation of multiple choice questions. In this environment, multiple-choice questions are generated semi-automatically applying natural language processing techniques. ...
Article
Full-text available
This paper presents an environment which generates tests automatically. It is designed for assistance in the software engineering education and is part of the Virtual Education Space. The environment has two functionalities – generation and assessment of different types of test questions. In the paper, the architecture of the environment is described in detail. The test generation is supported by specialized ontologies, which are served by two intelligent agents known as Questioner Operative and Assessment Operative.
... The strategies can be applied to develop a large set of questions. In [ 16 ], a vocabularylearning scenario in a Basque-language is used to support teachers by generation of multiple choice questions. In this environment, multiple-choice questions are generated semi-automatically applying natural language processing techniques. ...
Article
Full-text available
This paper provides a description of a model used for the implementation of a question generation system. According to the proposed approach the three levels of the model are presented in detail. Furthermore, the applicability of the model is demonstrated by an example for the UML domain.
... [8,9,10,11,12,13]. DOM-Sortze ere, hor daukagu.Euskaraz idatzitako testuliburu elektronikoetatik domeinu-moduluaera erdiautomatikoan erauzteko tresna daDOM-Sortze [14,15,16] ...
Article
Full-text available
Lan honetan LiDom Builder tresnaren analisi, diseinu eta ebaluazioa aurkezten dira. Domeinu Modulu Eleaniztunak testuliburu elektronikoetatik era automatikoan erauztea ahalbidetzen du LiDom Builderek. Ezagutza eskuratzeko, Hizkuntzaren Prozesamendurako eta Ikaste Automatikorako teknikekin batera, hainbat baliabide eleaniztun erabiltzen ditu, besteak beste, Wikipedia eta WordNet.
... Correct question results into correct answer. Various systems under natural language processing are mainly divided into two categories as follows: (i) Open domain System and (ii) Closed domain system [1][2] [3].Open domain systems are domain independent having large collection of data from various fields. This type of systems include questions from large varieties of domain and focuses on different topics. ...
Conference Paper
Full-text available
Question is a crucial construct of natural language. Systematic, error free question is a basic need of different applications of natural language. Many research works have been focused on 'statement' formation but the issue of 'systematic question' formation is less focused. This research work resolves above issue through systematization process using Template based approach which is accompanied by Dictionary approach and powerful NLP technique like Maximum Entropy based POS Tagging technique. Systematization process aims to reform proper flawless question from the erroneous input question by removing existing errors present in order of words, word spelling and removing ambiguous synonyms of the words. This work deals with domain specific WH-questions of English language. Additionally it also works on imperative questions. Template based approach is supported with a key concept of 'Question Templates' which are designed with human intelligence keeping detail knowledge of various lingual constructs, their grammar and domain specific questionnaire. This work is useful in various fields , for example in academics to set question papers, to assist English learners, to produce intermediate output for complex systems like question-answering system to retrieve correct answer from a huge dataset.
Article
Full-text available
Research on Artificial Intelligence in Education (AIED) has rapidly progressed in recent years, and understanding the research trends and development is essential for technological innovations and implementations in education. Using a bibliometric analysis of 6843 publications from Web of Science and Scopus, we found that China, US, India, Spain, and Germany led the research profuctivity. AIED research is concerned more with higher education compared to K-12 education. Fifteen research trends emerged from the analysis, such as Educational Robots and Large Data Mining. Research has primarily leveraged technologies of machine learning, decision trees, deep learning, speech recognition, and computer vision in AIED. The major implementations of AI include educational robots, automated grading, recommender systems, learning analytics, and intelligent tutoring systems. Among the implementations, a majority of AIED research was conducted in seven major subject domains, chief among them being science, technology, engineering and mathematics (STEM) and language disciplines, with a focus on computer science and English education.
Article
While to save time, effort, and money by making and generating standard multiple choice questions and generation through text is important and it is the current necessity for all educational institutes like universities, colleges, schools, coaching centers, etc., through online as well as offline. The automatic multiple choice questions generation tools are also useful for all basic and expert users in their subject knowledge field. This paper’s aim is to understand the various approaches involved in question generation, key selection, and distractor selection based on current trends, needs. In this paper, the precised methods for MCQs generation in different stages has mentioned, and also the areas for improvement in the quality of generating the automatic multiple choice questions based on the text were also suggested. The learner’s or stakeholders understood easily based on the understanding of the techniques of the stages for generation of automatic MCQs selection and generations in various domain applications of an unstructured text.
Article
The e-learning is necessary in this fast internet world, especially during this pandemic situation, to continue education without any interruption and it is used reduce the educational cost significantly when reduces the energy loss. Generally, machine learning and deep learning algorithms are used to identify patterns that facilitate learning and help learners understand concepts easily. Many content recommendation systems are available for assisting learners as e-learning applications by providing the required study materials. Despite the fact that existing recommendation systems struggle to provide precise content to e-learners due to the availability of a massive volume of data on the internet and other repositories. For this purpose, we propose a new content recommendation system for recommending suitable content to learners according to their interests and learning capabilities. The proposed content recommendation system employs a newly proposed semantic-aware hybrid feature optimizer that incorporates new optimization algorithms such as the Enhanced Personalized Best Cuckoo Search Algorithm (EpBestCSA) and the Enhanced Harris Hawks Optimization Algorithm (EHHOA) for selecting suitable features that aid in improving prediction accuracy, as well as a newly proposed Deep Semantic Structure Model (DSSM) that incorporates Artificial Neural Network (ANN) and Convolutional Neural Network (CNN). According to the experimental results, the proposed model outperforms other recommendation systems in terms of precision, recall, f-measure, and prediction accuracy. The ten-fold cross validation is done to test the performance of the proposed methodology.
Article
Multiple choice question (MCQ) plays a significant role in educational assessment. Automatic MCQ generation has been an active research area for years, and many systems have been developed for MCQ generation. Still, we could not find any system that generates accurate MCQs from school-level textbook contents that are useful in real examinations. This observation motivated us to develop a system that generates MCQs to assess the student's recall of factual information. Also, the available systems are often domain, subject, or application-specific in nature. Although the MCQ generation task demands a specific setup, we expect a level of generalization can be achieved. In this development, we also focus on this issue. We propose a pipeline for automatic generation of MCQs from textbooks of middle school level subjects, and the pipeline is partially subject-independent. The proposed pipeline comprises four core modules: preprocessing, sentence selection, key selection, and distractor generation. Several techniques have been employed to implement individual modules. These include sentence simplification, syntactic and semantic processing of the sentences, entity recognition, semantic relationship extraction among entities, WordNet, neural word embedding, neural sentence embedding, and computation of inter-sentence similarity. The system is evaluated using NCERT India textbooks for three subjects. The quality of system-generated questions is assessed by human experts using various system-level and individual module-level metrics. The experimental results demonstrate that the proposed system is capable of generating quality questions that could be useful in a real examination.
Article
Full-text available
Background The application of artificial intelligence (AI) in STEM education (AI-STEM), as an emerging field, is confronted with a challenge of integrating diverse AI techniques and complex educational elements to meet instructional and learning needs. To gain a comprehensive understanding of AI applications in STEM education, this study conducted a systematic review to examine 63 empirical AI-STEM research from 2011 to 2021, grounded upon a general system theory (GST) framework. Results The results examined the major elements in the AI-STEM system as well as the effects of AI in STEM education. Six categories of AI applications were summarized and the results further showed the distribution relationships of the AI categories with other elements (i.e., information, subject, medium, environment) in AI-STEM. Moreover, the review revealed the educational and technological effects of AI in STEM education. Conclusions The application of AI technology in STEM education is confronted with the challenge of integrating diverse AI techniques in the complex STEM educational system. Grounded upon a GST framework, this research reviewed the empirical AI-STEM studies from 2011 to 2021 and proposed educational, technological, and theoretical implications to apply AI techniques in STEM education. Overall, the potential of AI technology for enhancing STEM education is fertile ground to be further explored together with studies aimed at investigating the integration of technology and educational system.
Article
The goal of the research here presented is to identify which aspects of Wikipedia can be exploited to support the process of automatically building Multilingual Domain Modules from textbooks. First, we have defined a representation formalism for Multilingual Domain Modules that is essential for Technology Supported Learning Systems which aim to serve a globalized society. To our knowledge, no attempt has been made at achieving domain models that consider multiple languages. Our approach combines Multilingual Educational Ontologies with Learning Objects in different languages. Wikipedia is a valuable resource to accomplish this purpose. In this scenario, we have developed LiDom Builder, a framework that uses Wikipedia as an additional knowledge base for the automatic generation of Multilingual Domain Modules from textbooks. The framework includes domain-independent term extraction methods to identify which topics of Wikipedia are related to the domain to be learnt and, also, extracts their equivalents in other languages. In order to complete the Educational Ontology, we have defined a method to extract pedagogical relationships from Wikipedia and other general-purpose knowledge bases. From this task, we highlight the extraction of relationship that will allow the sequencing of the topics in Technology Supported Learning Systems. In addition, LiDom Builder takes advantage of the structured contents of Wikipedia to identify text fragments that can be used for educational purposes, classifies them and generates their corresponding Learning Objects. The interlanguage links between topics of Wikipedia are used to create Learning Objects in other languages.
Article
Automatic question generation can help teachers to save the time necessary for constructing examination papers. Several approaches were proposed to automatically generate multiple-choice questions for vocalbuary assessment or grammar exercises. However, most of these studies focused on generating questions in English with a certain similarity strategy. This paper presents a mixed similarity strategy which generates Chinese multiple choice distractors with a statistical regression model including orthographic, phonological and semantic features, i.e. features that were shown in previous psycholinguistics studies to contribute to character recognition. In a first experiment, we evaluated the predictive power of the proposed features in measuring Chinese character similarity. One of the significant experimental results showed that the combination of the four proposed categories of features (structure, semantic radical, stroke and meaning) accounts for 62.5% of the variance in the human judgments of character similarity. In the second experiment, a user study was conducted to evaluate the quality of system-generated questions using a test item analysis method. 296 Chinese primary school students (10-11-year-old) participated in this study. We have compared the mixed strategy with other three common distractor generation strategies, orthographic strategy, semantic strategy and phonological strategy. One of important findings suggested that the mixed strategy significantly outperformed other three strategies in terms of the distractor usefulness and has a highest discrimination power among four strategies.
Conference Paper
E-Learning across all environments, whether it may be profit-making, educational or personal, it can be greatly used if the learning experience fulfilled both without delay and contextually. This paper presents the neuro-fuzzy based approach for dynamic content generation. Fast expansion of technology convert newly generated information into previous and invalidates it. This circumstance makes educators, trainers and academicians to use information suitably and efficiently. To achieve this objective, there must be several ways that is used by broad variety of educationists in a quick and efficient manner. These tools must permit generating the information and distributing it. In this paper, a web-based lively content generation system for educationists is presented. Neuro-fuzzy approach is used for developing this system. So that by considering performance of the learners the system will provide the content based on the knowledge level of that particular learner. It will help to individual learners to improve their knowledge level. Instructors can rapidly gather, bundle, and reorder web-based educational content, effortlessly import prepackaged content, and conduct their courses online. Learners study in an easily reached, adaptive, shared learning atmosphere.
Article
Full-text available
Cloze exercises are widely used in language teaching, both as a learning resource and an assessment tool. Cloze has a particular role to play in proficiency testing, where students are expected to demonstrate wide vocabulary knowledge. Cloze allows students to show that they understand the vocabulary in context, discouraging the memorization of synonyms or translations. However, it is time-consuming and difficult for item writers to make up large numbers of cloze exercises. We present a system which automatically generates cloze exercises from a corpus. It takes the word which will form the correct answer to the exercise (the key) as input. It extracts distractors with similarities to the key from a distributional thesaurus. It then identifies a collocate of the key that does not co-occur with the distractors. Next it finds a short, simple sentence in the corpus which contains the key and the collocate. It then presents the whole item (sentence with blanked-out key, key, three distractors) to a human item-writer for approval, modification or rejection. The system has been implemented as an application using the web API to the Sketch Engine, a leading corpus query system. We use a very large corpus (UKWaC, with 1.5 billion words) as this gives a fair-sized set of sentences to choose from for most key+collocate combinations, and allows us to infer with some confidence that, where a distractor has zero occurrences with a collocate, the combination is infelicitous. We present an initial evaluation.
Conference Paper
Full-text available
It is difficult to develop and deploy Language Technology and applications for minority languages for many reasons. These include the lack of Natural Language Processing resources for the language, a scarcity of NLP researchers who speak the language and the communication gap between teachers in the classroom and researchers working in universities and other centres of research. One approach to overcoming these obstacles is for researchers interested in Less-Resourced Languages to work together in reusing and adapting existing resources where possible. This article outlines how a multiple-choice quiz generator for Basque was adapted for Irish. The Quizzes on Tap system uses Latent Semantic Analysis to automatically generate multiple choice test items. Adapting the Basque application to work for Irish involved the sourcing of suitable Irish corpora and a morphological engine for Irish, as well as the compilation of a development set. Various integration issues arising from differences between Basque and Irish needed to be dealt with. The QOT system provides a useful resource that enables Irish teachers to produce both domain-specific and general-knowledge quizzes in a timely manner, for children with varying levels of exposure to the language.
Conference Paper
Full-text available
The ZT Corpus (Basque Corpus of Science and Technology) is a tagged collection of specialised texts in Basque, which aims to be a major resource in research and development with respect to written technical Basque: terminology, syntax and style. It was released in December 2006 and can be queried at http://www.ztcorpusa.net. The ZT Corpus stands out among other Basque corpora for many reasons: it is the first specialised corpus in Basque, it has been designed to be a methodological and functional reference for new projects in the future (i.e. a national corpus for Basque), it is the first corpus in Basque annotated using a TEI-P4 compliant XML format, it is the first written corpus in Basque to be distributed by ELDA and it has a friendly and sophisticated query interface. The corpus has two kinds of annotation, a structural annotation and a stand-off linguistic annotation. It is composed of two parts, a 1.6 million-word balanced part, whose annotation has been revised by hand, and another automatically tagged 6 million-word part. The project is not closed, and we have the intention to gradually enlarge the corpus, along with making improvements to it. We also present the technology and the tools used to build this corpus. These tools, Corpusgile and Eulia, provide a flexible and extensible infrastructure for creating, visualising and managing corpora, and for consulting, visualising and modifying annotations generated by linguistic tools. And finally we will be introducing the web interface to query the ZT Corpus, which offers some interesting advanced features that are new in Basque corpora.
Article
Full-text available
This paper proposes the automatic generation of Fill-in-the-Blank Questions (FBQs) together with testing based on Item Response Theory (IRT) to measure English proficiency. First, the proposal generates an FBQ from a given sen- tence in English. The position of a blank in the sentence is determined, and the word at that position is considered as the correct choice. The candidates for incorrect choices for the blank are hypothesized through a thesaurus. Then, each of the candidates is verified by us- ing the Web. Finally, the blanked sentence, the correct choice and the incorrect choices surviv- ing the verification are together laid out to form the FBQ. Second, the proficiency of non- native speakers who took the test consisting of such FBQs is estimated through IRT. Our experimental results suggest that: (1) the generated questions plus IRT estimate the non-native speakers' English proficiency; (2) while on the other hand, the test can be completed almost perfectly by English native speakers; and (3) the number of questions can be reduced by using item information in IRT. The proposed method provides teach- ers and testers with a tool that reduces time and expenditure for testing English profi- ciency.
Article
Full-text available
We report experience in applying techniques for natural language processing to algorithmically generating test items for both reading and listening cloze items. We propose a word sense disambiguation-based method for locating sentences in which designated words carry specific senses, and apply a collocation-based method for selecting distractors that are necessary for multiple-choice cloze items. Experimental results indicate that our system was able to produce a usable item for every 1.6 items it returned. We also attempt to measure distance between sounds of words by considering phonetic features of the words. With the help of voice synthesizers, we were able to assist the task of composing listening cloze items. By providing both reading and listening cloze items, we would like to offer a somewhat adaptive system for assisting Taiwanese children in learning English vocabulary.
Article
Full-text available
Mitkov and Ha (2003) and Mitkov et al. (2006) offered an alternative to the lengthy and demanding activity of developing multiple-choice test items by proposing an NLP-based methodology for construction of test items from instructive texts such as textbook chapters and encyclopaedia entries. One of the interesting research questions which emerged during these projects was how better quality distractors could automatically be chosen. This paper reports the results of a study seeking to establish which similarity measures generate better quality distractors of multiple-choice tests. Similarity measures employed in the procedure of selection of distractors are collocation patterns, four different methods of WordNet-based semantic similarity (extended gloss overlap measure, Leacock and Chodorow's, Jiang and Conrath's as well as Lin's measures), distributional similarity, phonetic similarity as well as a mixed strategy combining the aforementioned measures. The evaluation results show that the methods based on Lin's measure and on the mixed strategy outperform the rest, albeit not in a statistically significant fashion.
Article
Full-text available
This paper outlines how a multiple choice vocabulary cloze test can be produced from a text. The process described involves assigning word class tags to the text and then retrieving word frequencies for the words in the text from an analyzed corpus. The system allows for the creation of three types of test--one based on the "nth-word deletion" principle, one based on user-specified frequency ranges, and one based on a particular word class. After the user's selection, word class and word frequency of each test item key are matched with similar word class and word frequency options to construct the test items. Analysis of tests produced by the system and administered to students indicates the potential of the computer aided test system, although the three test production modes are not equally successful in their production of "acceptable" test items with the nth-word deletion mode producing considerably fewer acceptable items than the two language oriented test production modes of specified word frequency ranges and particular word classes. The paper concludes with a discussion of the extent to which good test material can be realistically produced by computer aided systems and the different computer tools which may be of use in the process.
Article
Full-text available
We present a strategy to improve the quality of automatically generated cloze and open cloze questions which are used by the REAP tutoring system for assessment in the ill-defined domain of English as a Second Language vocabulary learning. Cloze and open cloze questions are fill-in-the-blank questions with and without multiple choice, respec-tively. The REAP intelligent tutoring system [1] uses cloze questions as part of its assessment and practice module. First we describe a base-line technique to generate cloze questions which uses sample sentences from WordNet [2]. We then show how we can refine this technique with linguistically motivated features to generate better cloze and open cloze questions. A group of English as a Second Language teachers evaluated the quality of the cloze questions generated by both techniques. They also evaluated how well-defined the context of the open cloze questions was. The baseline technique produced high quality cloze questions 40% of the time, while the new strategy produced high quality cloze questions 66% of the time. We also compared our approach to manually authored open cloze questions.
Conference Paper
Full-text available
This paper presents IRAKAS, an m-learning system that provides support for the whole cycle of memorization and training activities in a wide range of domains. The paper is focused on the development of learning materials.
Conference Paper
Full-text available
Retrieving and reusing Learning Objects can lighten the workload of constructing new on-line courses or Technology Supported Learning Systems. The paper presents ErauzOnt, a framework for the automatic generation of new Learning Objects from electronic documents using domain ontologies and Natural Language Processing techniques.
Conference Paper
Full-text available
This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML.
Conference Paper
Full-text available
Fill-in-the-blank questions, or cloze items, are commonly used in language learning applications. The benefits of personalized items, tailored to the user's interest and proficiency, have moti- vated research on automatic generation of cloze items. This pa- per is concerned with generating cloze items for prepositions, whose usage often poses problems for non-native speakers of English. The quality of a cloze item depends on the choice of distrac- tors. We propose two methods, based on collocations and on non-native English corpora, to generate distractors for preposi- tions. Both methods are found to be more successful in attract- ing users than a baseline that relies only on word frequency, a common criterion in past research. Index Terms: computer-assisted language learning, natural language generation
Conference Paper
Full-text available
This paper presents a system which uses Natural Language Processing techniques to generate multiple-choice questions. The system implements different methods to find distractors semantically similar to the correct answer. For this task, a corpus-based approach is applied to measure similarities. The target language is Basque and the questions are used for learners’ assessment in the science domain. In this article we present the results of an evaluation carried out with learners to measure the quality of the automatically generated distractors.
Conference Paper
Full-text available
Knowledge construction is expensive for Computer Assisted As- sessment. When setting exercise questions, teachers use Test Makers to con- struct Question Banks. The addition of Automatic Generation to assessment applications decreases the time spent on constructing examination papers. In this article, we present ArikIturri, an Automatic Question Generator for Basque language test questions, which is independent from the test assessment applica- tion that uses it. The information source for this question generator consists of linguistically analysed real corpora, represented in XML mark-up language. ArikIturri makes use of NLP tools. The influence of the robustness of those tools and the used corpora is highlighted in the article. We have proved the vi- ability of ArikIturri when constructing fill-in-the-blank, word formation, multi- ple choice, and error correction question types. In the evaluation of this auto- matic generator, we have obtained positive results as regards the generation process and its usefulness.
Conference Paper
Full-text available
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
Conference Paper
Full-text available
This paper presents the strategy and design of a highly efficient semiautomatic method for labelling the semantic features of common nouns, using semantic relationships between words, and based on the information extracted from an electronic monolingual dictionary. The method, that uses genus data, specific relators and synonymy information, obtains an accuracy of over 99% and a scope of 68,2% with regard to all the common nouns contained in a real corpus of over 1 million words, after the manual labelling of only 100 nouns.
Conference Paper
Full-text available
In this paper we propose a new graph-based method that uses the knowledge in a LKB (based on WordNet) in order to perform un- supervised Word Sense Disambiguation. Our algorithm uses the full graph of the LKB ef- ficiently, performing better than previous ap- proaches in English all-words datasets. We also show that the algorithm can be easily ported to other languages with good results, with the only requirement of having a word- net. In addition, we make an analysis of the performance of the algorithm, showing that it is efficient and that it could be tuned to be faster.
Conference Paper
Full-text available
This paper introduces a method for the semi-automatic generation of grammar test items by applying Natural Language Processing (NLP) techniques. Based on manually-designed patterns, sentences gathered from the Web are transformed into tests on grammaticality. The method involves representing test writing knowledge as test patterns, acquiring authentic sentences on the Web, and applying generation strategies to transform sentences into items. At runtime, sentences are converted into two types of TOEFL-style question: multiple- choice and error detection. We also describe a prototype system FAST (Free Assessment of Structural Tests). Evaluation on a set of generated questions indicates that the proposed method performs satisfactory quality. Our methodology provides a promising approach and offers significant potential for computer assisted language learning and assessment.
Conference Paper
Full-text available
This paper presents and compares WordNet- based and distributional similarity approaches. The strengths and weaknesses of each ap- proach regarding similarity and relatedness tasks are discussed, and a combination is pre- sented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a super- vised combination of them yields the best pub- lished results on all datasets. Finally, we pio- neer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses.
Conference Paper
Full-text available
Mobile learning (m-learning) integrates the current mobile computing technology with educational aspects to enhance the effectiveness of the traditional learning process. This paper describes IKASYS, an m-learning management tool that provides support for the whole cycle of memorization and training activities in a wide range of domains. The tool has been developed for being used in school-wide environments. This paper focuses mainly on IKASYS Trainer, the application for the mobile device.
Article
Full-text available
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
Article
Full-text available
This paper describes the first version of the Multilingual Central Repository, a lexical knowledge base developed in the framework of the MEANING project.
Article
Full-text available
This paper presents an unsupervised algorithm which automatically discovers word senses from text. The algorithm is based on a graph model representing words and relationships between them.
Conference Paper
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
Article
Because multiple-choice testing is so widespread in higher education, we assessed the quality of items used on classroom tests by carrying out a statistical item analysis. We examined undergraduates’ responses to 1198 multiple-choice items on sixteen classroom tests in various disciplines. The mean item discrimination coefficient was +0.25, with more than 30% of items having unsatisfactory coefficients less than +0.20. Of the 3819 distractors, 45% were flawed either because less than 5% of examinees selected them or because their selection was positively rather than negatively correlated with test scores. In three tests, more than 40% of the items had an unsatisfactory discrimination coefficient, and in six tests, more than half of the distractors were flawed. Discriminatory power suffered dramatically when the selection of one or more distractors was positively correlated with test scores, but it was only minimally affected by the presence of distractors that were selected by less than 5% of examinees. Our findings indicate that there is considerable room for improvement in the quality of many multiple-choice tests. We suggest that instructors consider improving the quality of their multiple-choice tests by conducting an item analysis and by modifying distractors that impair the discriminatory power of items.Étant donné que les examens à choix multiple sont tellement généralisés dans l’enseignement supérieur, nous avons effectué une analyse statistique des items utilisés dans les examens en classe afin d’en évaluer la qualité. Nous avons analysé les réponses des étudiants de premier cycle à 1198 questions à choix multiples dans 16 examens effectués en classe dans diverses disciplines. Le coefficient moyen de discrimination de l’item était +0.25. Plus de 30 % des items avaient des coefficients insatisfaisants inférieurs à + 0.20. Sur les 3819 distracteurs, 45 % étaient imparfaits parce que moins de 5 % des étudiants les ont choisis ou à cause d’une corrélation négative plutôt que positive avec les résultats des examens. Dans trois examens, le coefficient de discrimination de plus de 40 % des items était insatisfaisant et dans six examens, plus de la moitié des distracteurs était imparfaits. Le pouvoir de discrimination était considérablement affecté en cas de corrélation positive entre un distracteur ou plus et les résultatsde l’examen, mais la présence de distracteurs choisis par moins de 5 % des étudiants avait une influence minime sur ce pouvoir. Nos résultats indiquent que les examens à choix multiple peuvent être considérablement améliorés. Nous suggérons que les enseignants procèdent à une analyse des items et modifient les distracteurs qui compromettent le pouvoir de discrimination des items.
Conference Paper
In this paper, we present an automatic question generation system that can generate gap-fill questions for content in a document. Gap-fill questions are fill-in-the-blank questions with multiple choices (one correct answer and three distractors) provided. The system finds the informative sentences from the document and generates gap-fill questions from them by first blanking keys from the sentences and then determining the distractors for these keys. Syntactic and lexical features are used in this process without relying on any external resource apart from the information in the document. We evaluated our system on two chapters of a standard biology textbook and presented the results.
Book
A revision will be coming out in the next few months.
Conference Paper
As lifelong learning becomes increasingly important in our society, mechanisms allowing students to evaluate their progress must be provided. A commonly used and widely accepted feedback mechanism is the multiple-choice test. Manual creation of multiple choice questions is often a time consuming process involving many iterations of trail and error. Using text processing and natural language processing techniques, automated multiple choice question generation, in recent years, is getting much closer to reality than ever. However, one of the most difficult tasks in both manual creation and automated generation of this kind of tests is the creation of distractors, because unsuitable distractors allow students to easily guess the correct answer, which counteracts the goal of these questions. In this paper, we investigated the desired properties of distractors and identified relevant text processing algorithms, specifically, latent semantic analysis and stylometry, for distractor selection. The refined distrators are compared with baseline distrators generated by our existing Automated Question Creator (AQC). Our preliminary evaluation shows that this novel combined approach produces distractors with a higher quality than those of the baseline AQC system.
Article
A taxonomy of 31 multiple-choice item-writing guidelines was validated through a logical process that included two sources of evidence: the consensus achieved from reviewing what was found in 27 textbooks on educational testing and the results of 27 research studies and reviews published since 1990. This taxonomy is mainly intended for classroom assessment. Because textbooks have potential to educate teachers and future teachers, textbook writers are encouraged to consider these findings in future editions of their textbooks. This taxonomy may also have usefulness for developing test items for large-scale assessments. Finally, research on multiple-choice item writing is discussed both from substantive and methodological viewpoints.
Article
We describe a methodology for improving the generation of multiple-choice test items through the usage of language technologies. We apply common natural language processing techniques, like constituency parsing and automatic term extraction together with additional morpho-syntactic rules on raw instructional material in order to determine its key terms. These key terms are then used for the creation of fill-in-the blank test items and the selection of distractors. Our work aims at proving the availability and compatibility of language resources and technologies for Bulgarian, as well as at assessing the readiness for implementation of these techniques in real-world applications.
Article
We present an internet-based system that helps teachers to make a cloze test on an online news article. The current system supports question making of grammar and vocabulary of English. The system works as an assistant to the teacher. It helps the user to choose an article, highlights grammar targets, suggests possible choices for the wrong alternatives and formats the questions in a printer-friendly way. We use the technique of NLP (Natural Language Processing) to provide suggestions for the wrong alternatives. The user interface is built on a web browser so the user can make a test easily (just by clicking on the text.) The user evaluation shows that about 80% of the questions made with this application are appropriate and the usefulness and the usability of the interface are satisfactory.
Article
A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.
Conference Paper
Current work in learner evaluation of Intelligent Tutoring Systems (ITSs), is moving towards open-ended educational content diagnosis. One of the main difficulties of this approach is to be able to automatically understand natural language. Our work is directed to produce automatic evaluation of learner summaries in Basque. Therefore, in addition to language comprehension, difficulties emerge from Basque morphology itself. In this work, Latent Semantic Analysis (LSA) is used to model comprehension in a language in which lemmatization has shown to be highly significant. This paper tests the influence of corpus lemmatization while performing automatic comprehension and coherence grading. Summaries graded by human judges in coherence and comprehension, have been tested against LSA based measures from source lemmatized and non-lemmatized corpora. After lemmatization, the amount of LSA known single terms was reduced in a 56% of its original number. As a result, LSA grades almost match human measures, producing no significant differences between the lemmatized and non-lemmatized approaches.
Article
In this article, we present a comprehensive study aimed at computing semantic relatedness of word pairs. We analyze the performance of a large number of semantic relatedness measures proposed in the literature with respect to different experimental conditions, such as (i) the datasets employed, (ii) the language (English or German), (iii) the underlying knowledge source, and (iv) the evaluation task (computing scores of semantic relatedness, ranking word pairs, solving word choice problems). To our knowledge, this study is the first to systematically analyze semantic relatedness on a large number of datasets with different properties, while emphasizing the role of the knowledge source compiled either by the ‘wisdom of linguists’ (i.e., classical wordnets) or by the ‘wisdom of crowds’ (i.e., collaboratively constructed knowledge sources like Wikipedia).The article discusses benefits and drawbacks of different approaches to evaluating semantic relatedness. We show that results should be interpreted carefully to evaluate particular aspects of semantic relatedness. For the first time, we employ a vector based measure of semantic relatedness, relying on a concept space built from documents, to the first paragraph of Wikipedia articles, to English WordNet glosses, and to GermaNet based pseudo glosses. Contrary to previous research (Strube and Ponzetto 2006; Gabrilovich and Markovitch 2007; Zesch et al. 2007), we find that ‘wisdom of crowds’ based resources are not superior to ‘wisdom of linguists’ based resources. We also find that using the first paragraph of a Wikipedia article as opposed to the whole article leads to better precision, but decreases recall. Finally, we present two systems that were developed to aid the experiments presented herein and are freely available for research purposes: (i) DEXTRACT, a software to semi-automatically construct corpus-driven semantic relatedness datasets, and (ii) JWPL, a Java-based high-performance Wikipedia Application Programming Interface (API) for building natural language processing (NLP) applications.
Article
The language-theoretical interpretation of the result of the analysis is that LSA vectors approximate the meaning of a word as its average effect on the meaning of passages in which it occurs, and reciprocally approximates the meaning of passages as the average of the meaning of their words. The derived relation between individual words should not be confused with surface co-occurrence, the frequency or likelihood that words appear in the same passages: it is correctly interpreted as the similarity of the effects that the words have on passages in which they occur. That this kind of mutual constraint can be realized in other ways than SVD, for example with neural network models, recommends it as a potential theory of corresponding biological mechanisms in language and thought. The use of a large and representative language corpus supports representation of the meanings of new passages by statistical induction, thus comparison of the meaning of new passages to each other whether containing literal words in common or not, and thence to a wide range of practical applications. For example, the sentences Cardiac surgeries are quite safe these days and Nowadays, it is not at all risky to operate on the heart have very high similarity (cos =.76, p<< 0001). In LSA, there is no notion of multiple discrete senses or disambiguation prior to passage meaning formation. A word-form type has the same effect on every passage in which it occurs, and that in turn is the average of the vectors for all of the passages in which it occurs. Thus, a word vector represents a mixture of all its senses, in proportion to the sum of their contextual usages.
Article
Semantic interpretation of language requires extensive and rich lexical knowledge bases (LKB). The Basque WordNet is a LKB based on WordNet and its multilingual counterparts EuroWordNet and the Multilingual Central Repository. This paper reviews the theoretical and practical aspects of the Basque WordNet lexical knowledge base, as well as the steps and methodology followed in its construction. Our methodology is based on the joint development of wordnets and annotated corpora. The Basque WordNet contains 32,456 synsets and 26,565 lemmas, and is complemented by a hand-tagged corpus comprising 59,968 annotations.
Article
The need for greater interest and participation in hearing conservation on the part of otolaryngologists, audiologists, and others is growing. Ever increasing numbers of people are being handicapped by noise induced hearing impairment. This costly damage is not curable but is preventable. Complaints of noise or annoyance must be investigated, and the primary tool for investigation is the sound level meter. It can be used by anyone with some basic knowledge to identify hazardous noise levels. Understanding the parts of the meter, the standards governing its construction, the basic steps in reading hazardous noise levels, including sources of error, methods of reporting the readings and knowing where expert assistance is available form the basis for monitoring hazardous noise levels. The following references contain material that can be used for additional study and sources of professional help. The number of publications dealing with noise has, as with the number of instruments and companies dealing with noise, risen sharply in the last three years. The sources listed constitute only a small portion of those available. These are selected for their detailed information, or their concrete and practical approach in providing immediately useful information, and for their usefulness as sources for further study materials or help with your hearing conservation program.
Article
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-based estimates.
Article
We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-based estimates. 1 Introduction The problem of data sparseness affects all statistical methods for natural language processing. Even large training sets tend to misrepresent low-probability events, since rare events may not appear in the training corpus at all. We concentrate here on the problem of estimating the probability of unseen word pairs, that is, pairs that do not occur in the training set. Katz's back-off scheme (Katz, 1987), widely used in bigram language modeling, estimates the probability of an unseen bigram by utilizing unigram estimates. This has the undesirable resul...
Standards for Teachers: Implementation Guidelines
  • Ict Unesco
  • Competency
Unesco, ICT Competency Standards for Teachers: Implementation Guidelines, version 1.0, http://unesdoc.unesco.org/ulis/cgi-bin/ ulis.pl?catno=156209, 19 p., 2008.
Measuring non-native speakers' proficiency of english by using a test with automatically-generated fill-in-the-blank questions
  • E Sumita
  • F Sugaya
  • S Yamamota
E. Sumita, F. Sugaya, and S. Yamamota, "Measuring non-native speakers' proficiency of english by using a test with automatically-generated fill-in-the-blank questions," in Proc. 2nd Workshop Building Educ. Appl. Using NLP, 2005, pp. 61-68.
Quizzes on tap: Exporting a test generation system from one less-resourced language to another
  • M Maritxalar
  • E U Donnchadha
  • J Foster
  • M Ward
M. Maritxalar, E. U. Donnchadha, J. Foster, and M. Ward, "Quizzes on tap: Exporting a test generation system from one less-resourced language to another," in Proc. 2nd LRL WORK-SHOP: Addressing Gaps Lang. Resources Technol., 5th Lang. Technol. Conf.: Human Lang. Technol. Challenge Comput. Sci. Linguistics, 2011, pp. 502 -514.
Elhuyar Zientzia eta Teknologiaren Hiztegi Entziklopedikoa
  • Zerbitzuak Elhuyar Hizkuntza
Elhuyar Hizkuntza Zerbitzuak, Elhuyar Zientzia eta Teknologiaren Hiztegi Entziklopedikoa. Elhuyar Edizioak, Usurbil, 2009.
Item Response Theory for Psychol.(Multivariate applications book series)
  • S Embretson
  • S Reise
S. Embretson and S. Reise, Item Response Theory for Psychol.(Multivariate applications book series). Mahwah NJ, USA: Lawrence Erlbaum Associates, 2000.
  • fellbaum