Book

Automated Grammatical Error Detection for Language Learners

Authors:
  • McGraw-Hill Education/CTB
... Therefore, there are more and more learner corpora being built all over the world with huge amounts of written texts and/or oral dialogues of different second language learners (the second language being mostly English but also other languages such as French, Spanish, German, Italian, etc.). As regards the non-English learner corpora, the corpora referenced by Leacock et al. (2010) vary from 200,000 to 500,000 words. The Centre for English Corpus Linguistics at the Université Catholique de Louvain has compiled a comprehensive list of learner corpora collected around the world2, and there is also information available on learner corpora in Leacock et al. (2010). ...
... As regards the non-English learner corpora, the corpora referenced by Leacock et al. (2010) vary from 200,000 to 500,000 words. The Centre for English Corpus Linguistics at the Université Catholique de Louvain has compiled a comprehensive list of learner corpora collected around the world2, and there is also information available on learner corpora in Leacock et al. (2010). ...
... A 221,000-words Basque learner corpus (Aldabe et al., 2007;Leacock et al., 2010) has been so far collected and systematically organized. The corpus consists of texts written by learners at several language schools and at different competence levels. ...
Article
Full-text available
This article presents an environment developed for Learner Corpus Research and Error Analysis which makes it possible to deal with language errors from different points of view and with several aims. In the field of Intelligent Computer Assisted Language Learning (ICALL), our objective is to gain a better understanding of the language learning process. In the field of Natural Language Processing (NLP), we work on the development of applications that will help both language learners and teachers in their learning/teaching processes. Using this environment, several studies and experiments on error analysis have been carried out, and thanks to an in-depth study on determiner-related errors in Basque, some contributions in the above mentioned fields of research have been made.
... In fact, few assessments use constructed sentence responses to measure vocabulary knowledge, in part because of the considerable time and cost required to score such responses manually. While much progress has been made in automatically scoring writing quality in essays (Attali and Burstein, 2006; Leacock et al., 2014;), the essay scoring engines do not measure proficiency in the use of specific words, except perhaps for some frequently confused homophones (e.g., its/it's, there/their/they're, affect/effect). In this paper we present a system for automated scoring of targeted vocabulary knowledge based on short constructed responses in a picture description task. ...
... Most work in automated scoring and learner language analysis has focused on detecting grammar and usage errors (Leacock et al., 2014; Gamon, 2010; Chodorow et al., 2007; Lu, 2010). This is done either by means of handcrafted rules or with statistical classifiers using a variety of information. ...
... Previous work (Bergsma et al., 2009; Bergsma et al., 2010; Xu et al., 2011) has shown that mod-els which rely on large web-scale n-gram counts can be effective for the task of context-sensitive spelling correction. Measures of ngram association such as PMI, log likelihood, chi-square, and t have a long history of use for detecting collocations and measuring their quality (see (Manning and Schütze, 1999) and (Leacock et al., 2014) for reviews). Our application of a large n-gram database and PMI is to detect inappropriate word usage. ...
Conference Paper
We describe a system for automatically scoring a vocabulary item type that asks test-takers to use two specific words in writing a sentence based on a picture. The system consists of a rule-based component and a machine learned statistical model which uses a variety of construct-relevant features. Specifically, in constructing the statistical model, we investigate if grammar, usage, and mechanics features developed for scoring essays can be applied to short answers, as in our task. We also explore new features reflecting the quality of the collocations in the response, as well as features measuring the consistency of the response to the picture. System accuracy in scoring is 15 percentage points greater than the majority class baseline and 10 percentage points less than human performance.
... Due to its multifaceted functions, it helps students improve their writing style and their ability to express ideas clearly and concisely assuring grammar accuracy and academic integrity. QuillBot can also offer a variety of vocabulary choices and sentence construction, as noted by Leacock, Chodorow, Gamon and Tetreault (2021) (Leacock, Chodorow, Gamon, & Tetreault, 2021). The abovementioned AI tool can be very helpful for EFL and academic writing teachers and learners. ...
... Due to its multifaceted functions, it helps students improve their writing style and their ability to express ideas clearly and concisely assuring grammar accuracy and academic integrity. QuillBot can also offer a variety of vocabulary choices and sentence construction, as noted by Leacock, Chodorow, Gamon and Tetreault (2021) (Leacock, Chodorow, Gamon, & Tetreault, 2021). The abovementioned AI tool can be very helpful for EFL and academic writing teachers and learners. ...
Article
Full-text available
With the introduction of AI-driven automatic feedback systems like QuillBot, Grammarly, ProWritingAid, and the Hemingway App, the field of teaching academic writing and English as a Foreign Language has been revolutionized. The current research explores the teachers’ perspectives on the integration and efficacy of these tools in enhancing learners’ writing abilities, motivation, and engagement. Within the quantitative approach, the online survey was administered to investigate the perceived advantages and challenges of employing automated feedback technologies in English as a Foreign Language (EFL) and academic writing classes in Higher Education Institutions (HEI) of Georgia. The questionnaire focused on teachers’ views on what areas of writing may each aforementioned tool improve. It also investigated the participants’ overall perceptions of automated feedback and its benefits. The findings of the study revealed that HEI teachers perceive automated feedback technologies very positively and see the benefits the latter bring to their classrooms. They find it effective due to its nature to be personalized, instant, precise and clear. They also find AI-driven feedback tools beneficial for enhancing their feedback skills and enabling them to address students’ needs timely and effectively. This research adds to the expanding corpus of research on educational technology by emphasizing the crucial role that teacher perception plays in the effective integration of automated feedback technologies into writing instruction.
... Also, Singh and Mahmood [7] present a general approach to various uses of natural language processing (NLP) (translation and recognition) using modern techniques such as deep learning techniques. Finally, there are other books and papers in the literature like [16], but there are few systems that deal with lexical or syntactical errors in Spanish, like [3,5,14]. ...
... In this section, we describe the design of the proposed Ar2p-Text. Ar2p-Text is an extension of the Ar2p neural model [10,16] for the context of Spanish text analysis. The Ar2p neural model has previously been successfully used in different contexts [9,11]. ...
Article
Full-text available
Currently, approaches to correcting misspelled words have problems when the words are complex or massive. This is even more serious in the case of Spanish, where there are very few studies in this regard. So, proposing new approaches to word recognition and correction remains a research topic of interest. In particular, an interesting approach is to computationally simulate the brain process for recognizing misspelled words and their automatic correction. Thus, this article presents an automatic recognition and correction system of misspelled words in Spanish texts, for the detection of misspelled words, and their automatic amendments, based on the systematic theory of pattern recognition of the mind (PRTM). The main innovation of the research is the use of the PRTM theory in this context. Particularly, a corrective system of misspelled words in Spanish based on this theory, called Ar2p-Text, was designed and built. Ar2p-Text carries out a recursive process of analysis of words by a disaggregation/integration mechanism, using specialized hierarchical recognition modules that define formal strategies to determine if a word is well or poorly written. A comparative evaluation shows that the precision and coverage of our Ar2p-Text model are competitive with other spell-checkers. In the experiments, the system achieves better performance than the three other systems. In general, Ar2p-Text obtains an F-measure of 83%, above the 73% achieved by the other spell-checkers. Our hierarchical approach reuses a lot of information, allowing for the improvement of the text analysis processes in both quality and efficiency. Preliminary results show that the above will allow for future developments of technologies for the correction of words inspired by this hierarchical approach.
... Tools for GEC (Grammatical error correction) tasks have greatly improved over recent decades. In terms of metrics, modern big language models outperform a human annotator in the GEC task [1]; overviews [2][3][4] present the performance growth at different stages. GEC models are however still noticed to fail in correcting several types of errors that would be easily and necessarily corrected by a human [1]. ...
... The final text should not contain errors. 2 Though s.f. [22] for Swedish, for which GPT 3 outperforms all other models. ...
Article
The study focuses on how modern GEC systems handle character-level errors. We discuss the ways these errors effect the performance of models and test how models of different architectures handle them. We conclude that specialized GEC systems do struggle against correcting non-existent words, and that a simple spellchecker considerably improve overall performance of a model. To evaluate it, we assess the models over several datasets. In addition to CoNLL-2014 validation dataset, we contribute a synthetic dataset with higher density of character-level errors and conclude that, provided that models generally show very high scores, validation datasets with higher density of tricky errors are a useful tool to compare models. Lastly, we notice cases of incorrect treatment of non-existent words on experts' annotation and contribute a cleared version of this dataset. In contrast to specialized GEC systems, LLaMA model used for GEC task handles character-level errors well. We suggest that this better performance is explained by the fact that Alpaca is not extensively trained on annotated texts with errors, but gets as input grammatically and orthographically correct texts
... As such, they failed to satisfactorily interpret the causes of identified errors. Recent research in second language acquisition has shown that L2 learners' choice of semantic and grammatical units at word, phrasal, and discourse levels is regulated by human cognition (Leacock et al., 2010;White, 2010). L2 speakers can select from their accumulated repertoire of the linguistic items that they think best fit the context. ...
... Also, the unclear boundary between countable and uncountable nouns can lead to students' overgeneralization. The fact that some nouns are countable in one sense but uncountable in another confuses the L2 English speakers (Han et al., 2006;Leacock et al., 2010). This finding gives implications for teaching and learning the definite article and countability of nouns in ESL and EFL contexts. ...
Article
Full-text available
As part of a larger project, this study explores Vietnamese college students’ use and concepts of cohesive devices in writing. Cohesion is a crucial element that ties components of a text together. Although the use of cohesive devices in L2 writing has been investigated by a large body of research, there is no such study exploring the effects of L2 learners’ misconceptions of this word class. One hundred sixty-eight academic reports of totally 67,400 words written by Vietnamese final-year English-majored undergraduate students were collected for data analysis. An email invitation was sent, and 23 students accepted to participate in semi-structured interviews which were audio-recorded for analysis. The findings showed that the students employed references, conjunctions, and lexical items the most frequently in writing. Interestingly, the students’ cohesion use and errors had similar patterns. The transcript analysis showed that the students’ misconceptions of some language items and writing requirements affected their choice of cohesive devices. The current study gives implications for teaching and learning of this word class in L2 contexts.
... While much recent research has gone into grammatical error detection and correction (Leacock et al., 2014), this work has a few (admitted) limitations: 1) it has largely focused on a few error types (e.g., prepositions, articles, collocations); 2) it has largely been for English, with only a few explorations into other languages (e.g., Basque (de Ilarraza et al., 2008), Korean (Israel et al., 2013)); and 3) it has often focused on errors to the exclusion of broader patterns of learner productions-a crucial link if one wants to develop intelligent computer-assisted language learning (ICALL) (Heift and Schulze, 2007) or proficiency classification (Vajjala and Loo, 2013;Hawkins and Buttery, 2010) applications or connect to second language acquisition (SLA) research (Ragheb, 2014). We focus on Hungarian morphological analysis for learner language, attempting to build a system that: 1) works for a variety of mor-phological errors, providing detailed information for each; 2) is feasible for low-resource languages; and 3) provides analyses for correct and incorrect forms, i.e., is both a morphological analyzer and an error detector. ...
... There is a wealth of research on statistical error detection and correction of grammatical errors for language learners (Leacock et al., 2014), including for Hungarian . As has been argued before (e.g., Chodorow et al., 2007;Tetreault and Chodorow, 2008), statistical methods are ideal for parts of the linguistic system difficult to encode via rules. ...
... Researchers have demonstrated that prepositions and determiners are the two most frequent error types for language learners (Leacock et al, 2010). According to Swan and Smith (2001), preposition errors might result from L1 interference. ...
... In view of the fact that a large number of grammatical errors appear in non-native speakers' writing, more and more research has been directed towards the automated detection and correction of such errors to help improve the quality of that writing (Dale and Kilgarriff, 2010). In recent years, preposition error detection and correction has especially been an area of increasingly active research (Leacock et al, 2010). The HOO 2012 shared task also focuses on error detection and correction in the use of prepositions and determiners (Dale et al., 2012). ...
Article
Full-text available
Grammatical error correction has been an active research area in the field of Natural Language Processing. In this paper, we integrated four distinct learning-based modules to correct determiner and preposition errors in leaners' writing. Each module focuses on a particular type of error. Our modules were tested in well-formed data and learners' writing. The results show that our system achieves high recall while preserves satisfactory precision.
... According Jack C. Richards (1974) Grammar is a description of the srtucture of a language and the way inwhich linguistic units such as words and phrases are combined to produce sentence in the language, and in generatie transformation theory, grammar means a set of rules and lexicon which describes the knowledge (competence) which a speaker has of his or her language. Furthemore, Leacock (2010) explained that knowledge of grammar underlying our ability to produce and understand sentences in a language. Harmer (2001) explained that the grammar of language is the description of the ways in which words can change their forms and can be combined into sentences in that language. ...
Article
Full-text available
This study is a quantitative approach with descriptive design. It aims to know the students’ errors in speaking. The researcher used a descriptive analysis technique with the percentage from the frequency of each type of errors that the students’ commit on their grammatical errors in speaking. The subject of this study was the student English community, while the object of this study was a students’ grammatical errors in speaking. The researcher are used random sampling. There were 20 samples from 43 populations. To collect data, researcher used speaking tests to find out grammatical errors that students made in their language skills. Based on data analysis, the researcher have found the kinds of errors students made in their speaking. The total number of errors made by the student is 93 percent blame the percentage of the student's errors in omission is 15.05% addition is 22,58% misformation 56.98%, and misordering 5.37%. So the most common students’ grammatical errors in speaking was a Misformation total of 53 errors or it was up to 56,98%
... It has been a sustained pedagogical standpoint that English article systems are notoriously difficult [4]. Acquisition of English articles has been a chronic challenge for individuals who are not native speakers of English [5][6][7][8][9][10][11], needing contemporary pedagogical solutions. They are the most difficult aspect to address when teaching English grammar to foreign learners, and their incorrect usage often indicates non-native proficiency in English [12]. ...
Article
Full-text available
The purpose of the study is to address the persistent challenges non-native English learners face with article systems by introducing and evaluating a novel pedagogical method for teaching indefinite articles. The study developed a "quantifying method," grounded in the semantic premise of noun phrase (NP) countability and the interaction of quantifying elements within English articles. This method was applied to teach indefinite articles ('a/an' and 'zero/null') to 15 EFL students over two weeks. Pre-test and post-test performance scores were analyzed using paired sample t-tests to measure the method's impact. Findings revealed that the quantifying method significantly improved students’ overall acquisition of indefinite articles, with higher performance noted for 'a/an' compared to 'zero/null'. The results highlight the effectiveness of the quantifying method as a targeted instructional tool for overcoming the difficulties of teaching English indefinite articles. The implications of the study recommend a validated, innovative approach to EFL grammar instruction that can be integrated into language teaching practices to enhance learner outcomes.
... In terms of investigating the accuracy of AWE feedback and ChatGPT feedback, the following two statistical concepts were often used: precision (the ratio of the number of correct feedback by the number of all reported feedback) and recall (the proportion of errors that have been detected in relation to all errors) (Leacock et al., 2010). Studies of AWE feedback found that hardly any existing AWE systems met the stringent criterion of 0.9 (Burstein et al., 2003) except for certain types of errors (e.g. ...
... Automatic Error Detection (AED) has been a prominent research topic in education over the past few decades (Leacock et al., 2014). Supported by rapid advancements in natural language processing (NLP) technologies, particularly in language modeling (Min et al., 2023), AED research has achieved notable success in language education (Huang et al., 2023). ...
Preprint
The rise of large language models (LLMs) offers new opportunities for automatic error detection in education, particularly for math word problems (MWPs). While prior studies demonstrate the promise of LLMs as error detectors, they overlook the presence of multiple valid solutions for a single MWP. Our preliminary analysis reveals a significant performance gap between conventional and alternative solutions in MWPs, a phenomenon we term conformity bias in this work. To mitigate this bias, we introduce the Ask-Before-Detect (AskBD) framework, which generates adaptive reference solutions using LLMs to enhance error detection. Experiments on 200 examples of GSM8K show that AskBD effectively mitigates bias and improves performance, especially when combined with reasoning-enhancing techniques like chain-of-thought prompting.
... However, this assumes that systems are able to propose a correction for every detected error, and accurate systems for correction might not be optimal for detection. While closed-class errors such as incorrect prepositions and determiners can be modeled with a supervised classification approach, content-content word errors are the 3rd most frequent error type and pose a serious challenge to error correction frameworks (Leacock et al., 2014;Kochmar and Briscoe, 2014). Evaluation of error correction is also highly subjective and human annotators have rather low agreement on gold-standard corrections (Bryant and Ng, 2015). ...
Preprint
In this paper, we present the first experiments using neural network models for the task of error detection in learner writing. We perform a systematic comparison of alternative compositional architectures and propose a framework for error detection based on bidirectional LSTMs. Experiments on the CoNLL-14 shared task dataset show the model is able to outperform other participants on detecting errors in learner writing. Finally, the model is integrated with a publicly deployed self-assessment system, leading to performance comparable to human annotators.
... Using machine-learning algorithms to identify non-native productions can be identified with well over 80% accuracy (Koppel et al. 2005;Tsur & Rappoport 2007;Massung & Zhai 2016). The task of automatic error correction has a longer history (Leacock et al. 2010), but it is often motivated by market needs; rather than enhance our knowledge of where errors come from, engineers work on contextual grammatical error-correction systems to solve real-world problems without necessarily aiming to understand what lies behind such problems in the first place. Additionally, many such systems are not sensitive to the speaker's mother tongue. ...
Conference Paper
Full-text available
Prepositions, along with articles, are estimated to account for 20-50% of all grammatical errors in English made by non-native speakers. The goal of this paper is twofold: first, to develop a model to predict English-as-foreign-language (EFL) learners' use of prepositions, especially erroneous ones; second, to quantify the common-sensical idea that learners opt for "literal translation", and in fact to estimate the probability students opt for this option. We rely on techniques developed originally for statistical machine translation, with predictions being highly dependent on the speakers' native language. Using large amounts of parallel data in Arabic and English, we generate a model that calculates the probability of target-language prepositions based on those of the source language. The model indicates that the distribution of translation equivalents is such that (a) there are many possible translations and (b) a very small number of these equivalents cover most of the probability mass, while the other equivalents are part of 1 We are thankful to Rani Abboud for optimizing the code that ran over the parallel data. Two anonymous reviewers raised many suggestions and helped us immensely to improve the clarity of the paper. Mr. Ashraf Khamis critically read and edited the paper. We are also thankful to Prof. Salman Ilaiyan and the Arab College for Education for their massive support.
... Advances in computing have been extensively used to provide writing support, starting with tools like spellcheck [68] and grammar-check [51] to improve writing efficiency [14]. As natural language generation technologies evolved, researchers began applying them to support creative writing pursuits such as story writing [22]. ...
Preprint
Full-text available
Large language models (LLMs) are being increasingly integrated into everyday products and services, such as coding tools and writing assistants. As these embedded AI applications are deployed globally, there is a growing concern that the AI models underlying these applications prioritize Western values. This paper investigates what happens when a Western-centric AI model provides writing suggestions to users from a different cultural background. We conducted a cross-cultural controlled experiment with 118 participants from India and the United States who completed culturally grounded writing tasks with and without AI suggestions. Our analysis reveals that AI provided greater efficiency gains for Americans compared to Indians. Moreover, AI suggestions led Indian participants to adopt Western writing styles, altering not just what is written but also how it is written. These findings show that Western-centric AI models homogenize writing toward Western norms, diminishing nuances that differentiate cultural expression.
... Over a billion people speak English as their second or foreign language (Leacock, 2010). Indonesian must learn English to be able to compete with other country. ...
Article
Grammar is a study about how to make and arrange the sentence. It is an integral part of languages and is very important for the learner. Moreover, the active and passive voice form should be studied to improve our English skills. However, presently students' are sometimes uninterested in the concept of active and passive voice form. This study aims to determine the high substantial the students' obstacle in transforming active to passive voice form at X TM1 SMKN 2 Panyabungan. This research conducted a qualitative way, the source taken at X1 TM SMKN 2 Panyabungan. Data collection is observation, interviews, and documentation techniques. The analysis results reveal the high dominant students' obstacles in determining the subject, using verb3, putting verb be, and obstacles found in the past continuous form. Based on this research finding, the researcher decides that few students' have an adequate understanding of passive voice. However, more students require extra effort to learn and practice grammar to be master and capable of active and passive voice form. Keyword: Analysis, Students' Difficulties, Active and Passive Voice
... An important part of this process is to give feedback on the use of grammar by a learner. This feedback has usually been in the form of grammatical error detection (GED) (Leacock et al., 2014), and grammatical error correction (GEC) (Wang et al., 2021;Bryant et al., 2023), and the latter has been the subject of four shared tasks over the past 15 years. While highlighting errors and providing a grammatically corrected text is beneficial for second language (L2) learners (Hyland and Hyland, 2006), it is even more valuable to offer specific feedback that comments on grammatical errors, explains them, and provides suggestions for improvement. ...
Preprint
Full-text available
Grammatical feedback is crucial for consolidating second language (L2) learning. Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems, rather than examining more holistic feedback that may be more useful for learners. This holistic feedback will be referred to as grammatical error feedback (GEF). In this paper, we present a novel implicit evaluation approach to GEF that eliminates the need for manual feedback annotations. Our method adopts a grammatical lineup approach where the task is to pair feedback and essay representations from a set of possible alternatives. This matching process can be performed by appropriately prompting a large language model (LLM). An important aspect of this process, explored here, is the form of the lineup, i.e., the selection of foils. This paper exploits this framework to examine the quality and need for GEC to generate feedback, as well as the system used to generate feedback, using essays from the Cambridge Learner Corpus.
... Error correction is a kind of question type that integrates the examination of grammar knowledge and the application of language skills, and takes the code (language form) as the test object, which is widely used in various types of English tests in China. By setting test points in the text, the question maker requires students to identify and correct errors on the basis of reading comprehension, so as to improve students' awareness of writing self-assessment and correction from the text level [2]. Error correction first appeared in the college entrance examination in 1991. ...
Article
Full-text available
Error correction in English is the constructional question type of the compulsory English test in the college entrance examination, including modification, addition and deletion, and the content involves morphology, grammar and syntax. In this paper, from two aspects of research methods and research content, quantitative and qualitative analysis methods are used to review the research papers on essay error correction in CNKI academic journals. The results show that the research method is single and the research tool is simple. In terms of research content, there are some defects in the depth and breadth of domestic essay error correction research, the proportion of introduction and problem-solving skills is too large, and the proportion of empirical research is insufficient, which needs to be expanded in depth. Finally, combined with the latest development of foreign error correction research, the development direction of the research is prospected from the aspects of research content, level and research methods, and some suggestions and references are provided for future related research.
... The early rule-based systems in GEC laid the foundational framework for the development of automated tools. Originating in an era where computational resources and advanced machine learning techniques were not yet prevalent, these systems relied heavily on manually crafted grammar rules (such as subject-verb agreement and singular-plural consistency) and basic algorithms to identify and correct grammatical errors (Leacock, Chodorow, Gamon, & Tetreault, 2014;Yuan, 2017). For example, in Feng, Saricaoglu, and Chukharev-Hudilainen (2016), the researchers introduced CyWrite, a tool focused on detecting grammatical errors in ESL writings without providing corrections. ...
Preprint
Full-text available
This study provides a qualitative evaluation of Seqtagger, a state-of-the-art machine learning-based sequence-tagging model developed for grammatical error detection (GED) and correction (GEC). The model's performance is evaluated on error detection against human benchmarks, with academic texts written by Japanese university students. Through human annotation and subsequent thematic analysis on failures in error detection, this study reveals that Seqtagger performs well in detecting errors related to simpler grammatical rules such as adverb position and prepositions in fixed collocations, with poorer performance in errors possibly influenced by the Japanese language, macro-structure errors and errors where human judgment is required. The underlying reasons for failures in detection are identified to be a narrow context window that fails to capture broader textual information, insufficient training data, particularly data that fully represents the linguistic characteristics of the Japanese students, and overgeneralization of patterns from the training data. These findings highlight the need for sequence-tagging GED and GEC tools to enhance their context window, be more adaptable to the diverse linguistic features of global learners and to enhance the ability to understand the linguistic complexities of the English language.
... Since our system is based on feedback, collection, analysis and classification of feedback generated by NLP systems in general and our system in particular can be highly useful in teaching languages. (Graf and Fife, 2012;Bejar, 2012;Leacock et al., 2010;Yannier et. al., 2013). ...
Preprint
Full-text available
This paper endeavors to delineate the development and deployment of an AI-based intelligent feedback generator tailored to assist Persian learners of the English language. With a primary focus on efficacious system design, the approach involved compiling comprehensive data to pinpoint the most challenging aspects for such learners. Leveraging two AI-based engines for Persian and English, a feedback generator was devised. The framework fulfilled expectations by juxtaposing the logical forms of English and Persian sentences, thereby facilitating error detection. Most of the errors diagnosed were related to misused prepositions, determiners, tenses, and subjects. The results indicated its effectiveness to teach the English language to such learners. With minor adjustments, the system can also be adapted to aid English learners of Persian as it is capable of parsing input from both languages.
... Not only are such models a cost-effective alternative to human scoring, but also they can be useful for learners to obtain individualized feedback on their writing. A related area of research that has received increasing attention is automated grammatical error detection and correction for language learners (see, e.g., Leacock et al., 2014). Tools built for this purpose are useful for assessing the level of accuracy of learners' language production, an important dimension of second language proficiency. ...
Chapter
Natural language processing (NLP) is an interdisciplinary field of linguistics, computer science, and artificial intelligence that studies the use of computers to automatically analyze, understand, and generate human language in spoken or written form. The language processing capabilities of NLP technologies have myriad applications in computer‐assisted language learning (CALL), and the area of research that explores and implements such applications is referred to as intelligent CALL, or ICALL. This chapter provides an overview of three broad types of applications of NLP technologies in ICALL, the first based on analyses of authentic text in the target language, the second based on analyses of learner text, and the third using a composite of NLP technologies to build dialogue systems that allow learners to interact with conversational agents in meaningful and coherent ways. Both the theoretical motivations and the pedagogical affordances of these applications are discussed.
... (i) Use of authentic L2 data for training al-gorithms. Leacock et al. (2014) convincingly showed that tools for error correction and feedback for foreign language learners benefit from being trained on real L2 students' texts, and that these systems are better suited for use in Intelligent Computer-Assisted Language Learning (ICALL) or Automatic Writing Evaluation (AWE) contexts. Hence the importance of authentic language learner data. ...
Conference Paper
Full-text available
This paper reports on the NLP4CALL shared task on Multilingual Grammatical Error Detection (MultiGED-2023), which included five languages: Czech, English, German, Italian and Swedish. It is the first shared task organized by the Computational SLA1 working group, whose aim is to promote less represented languages in the fields of Grammatical Error Detection and Correction, and other related fields. The MultiGED datasets have been produced based on second language (L2) learner corpora for each particular language. In this paper we introduce the task as a whole, elaborate on the dataset generation process and the design choices made to obtain MultiGED datasets, provide details of the evaluation metrics and CodaLab setup. We further briefly describe the systems used by participants and report the results.
... If available, this can increase the speed with which automated feedback is processed for L1 users. On the other hand, L2 users may also benefit from such a setting as they can instruct the system to employ a hybrid approach in which the generated feedback comes from the learner corpus as well as the stored structured data [43,89]. Hence, adding this feature to AWCF systems is likely to add a much-needed option of differentiation in the specificity and nature of the CF allowing these systems to accommodate more varied CF depending on the writing task and the characteristics of the writer [1]. ...
Article
Full-text available
Recent technological advances in artificial intelligence (AI) have paved the way for improved and in many cases the creation of entirely new and innovative, electronic writing tools. These writing support systems assist during and after the writing process making them indispensable to many writers in general and to students in particular who can get human-like sentence completion suggestions and text generation. Although the wide adoption of these tools by students has been faced with a steady growth of scientific publications in the field, the results of these studies are often contradictory and their validity may be questioned. To gain a deeper understanding of the validity of AI-powered writing assistance tools, we conducted a systematic review of the recent empirical AI-powered writing assistance studies. The purpose of this review is twofold. First, we wanted to explore the recent scholarly publications that evaluated the use of AI-powered writing assistance tools in the classroom in terms of their types, uses, limits, and potential for improving students’ writing skills. Second, the review also sought to explore the perceptions of educators and researchers about learners’ use of AI-powered writing tools and review their recommendations on how to best ingrate these tools into the contemporary and future classroom. Using the Scopus research database, a total of 104 peer-reviewed papers were identified and analyzed. The findings indicate that students are increasingly using a variety of AI-powered writing assistance tools for improving their writing. The tools they are using can be categorized into four main groups: (1) automated writing evaluation tools, (2) tools that provide automated writing corrective feedback, (3) AI-powered machine translators, and (4) GPT-3 automatic text generators. The analysis also highlighted the scholars’ recommendations regarding dealing with learners’ use of AI-powered writing assistance tools and grouped the recommendations into two groups for researchers and educators.
... Since then, much progress has been made, and a comprehensive overview of original and still standing challenges in this field is presented by Beigman Klebanov and Madnani (2020). Related to this is the line of research on grammatical error correction (Leacock et al., 2010;Bryant and Ng, 2015), accompanied by a number of shared tasks (Ng et al., 2013(Ng et al., , 2014Bryant et al., 2019). ...
... Over a billion people speak English as their second or foreign language (Leacock, 2010). Indonesian must learn English to be able to compete with other country. ...
Article
This study is purposed to describe the errors in grammatical features of report text. Inaddition, this study is also aimed to describe the possible causes of those errors. This study appliedthe descriptive qualitative approach in which the data was obtained from the report texts that werewritten by the eleventh grade students of accounting major in SMK N 1 Jorlang Hataran. The datawere collected by using students’ report texts and questionnaires. The data then were analyzed byidentifying the errors first, then describing the errors, and the last explaining the errors. The errorsthat occurred in students’ texts are errors in verbs, sentences, prepositions, articles, agreements,pronouns, modifiers, and tenses. The result of the study indicates that the students made 541grammatical errors which were classified into: 37,15% errors in verbs, 18,85% errors in noun,15,15% errors in sentence, 11,27% errors in preposition, 7,76% errors in article, 4,06% errors inpronoun, 3,51% errors in modifier, and 2,21% errors in tenses. The students’ errors were causedby 31,23% ignorance of grammatical rules, 28,09% incomplete application of rules, 15,89% falseconcept hypothesized, 13,86% interference of interlingual transfer, 9,05% wrong communicationstrategy, 1,84% over-generalization
... Although these AI writing and proofreading programs continue to grow in popularity, reviews regarding the effectiveness of these programs at large are inconsistent. Similar studies to the present one have analyzed the effectiveness of NLP text editors and their potential to approach the level revision of expert human proofreading [6][7][8]. At least one 2016 article [9] evaluates popular GEC tools and comes to the terse conclusion that "grammar checkers do not work. ...
Article
Full-text available
Purpose: Wordvice AI Proofreader is a recently developed web-based artificial intelligence-driven text processor that provides real-time automated proofreading and editing of user-input text. It aims to compare its accuracy and effectiveness to expert proofreading by human editors and two other popular proofreading applications—automated writing analysis tools of Google Docs, and Microsoft Word. Because this tool was primarily designed for use by academic authors to proofread their manuscript drafts, the comparison of this tool’s efficacy to other tools was intended to establish the usefulness of this particular field for these authors.Methods: We performed a comparative analysis of proofreading completed by the Wordvice AI Proofreader, by experienced human academic editors, and by two other popular proofreading applications. The number of errors accurately reported and the overall usefulness of the vocabulary suggestions was measured using a General Language Evaluation Understanding metric and open dataset comparisons.Results: In the majority of texts analyzed, the Wordvice AI Proofreader achieved performance levels at or near that of the human editors, identifying similar errors and offering comparable suggestions in the majority of sample passages. The Wordvice AI Proofreader also had higher performance and greater consistency than that of the other two proofreading applications evaluated.Conclusion: We found that the overall functionality of the Wordvice artificial intelligence proofreading tool is comparable to that of a human proofreader and equal or superior to that of two other programs with built-in automated writing evaluation proofreaders used by tens of millions of users: Google Docs and Microsoft Word.
... An extensive overview of the automated grammatical error detection for language learners was conducted by Leacock et al. (2010) . In subsequent years two English language learner (ELL) corpora were made available for research purposes (Dahlmeier et al., 2013; Yannakoudakis et al., 2011). ...
... Those evaluations make use of human annotated examples of correct and incorrect grammar (Dahlmeier et al., 2013; Yannakoudakis et al., 2011). Particularly, Leacock et al. (2010) provide a comprehensive overview of various aspects related to grammar error detection research. This paper is organized as follows: Section 2 briefly describes the goals of the task our models attempt to address, Section 3 describes our experiments including the proposed Tree Kernel models, whose results are reported in Section 4. Section 5 further comments on the results, and Section 6 concludes with some summarizing remarks. ...
... Although much current work on analyzing learner language focuses on grammatical error detection and correction (e.g., Leacock et al., 2014), there is a growing body of work covering varying kinds of semantic analysis (e.g., Meurers et al., 2011;Bailey and Meurers, 2008;Dickinson, 2014, 2013;Petersen, 2010), including assessment-driven work (e.g., Somasundaran et al., 2015;Somasundaran and Chodorow, 2014). One goal of such work is to facilitate intelligent language tutors (ILTs) and language assessment tools that maximize communicative interaction, as suggested by research in second language instruction (cf. ...
... uage learners' collocation errors is limited or inexistent for many languages, which makes it difficult for learners to detect and correct these errors . Therefore, an application that can detect a learners' collocation errors and suggest the most appropriate " ready-made units " as corrections is an important goal for natural language processing (Leacock et al., 2014). In this paper, we describe Collocation Assistant , a web-based and corpus-based collocational aid, aiming at helping JSL learners expand their collocational knowledge. Focusing on noun-verb constructions, Collocation Assistant flags possible collocation errors and suggests a ranked list of more conventional expressions. Each suggestion ...
... Also, some NLP CALL projects concentrate on the functionality/content and neglect the User Interface (UI) and this makes it difficult for the non-expert user to use the resources. However, there is a growing interest in NLP resources for language learners, particularly in the area of error detection (Leacock et al., 2014). There have been some successful NLP CALL programs (e.g. ...
Conference Paper
Full-text available
This paper looks at the use of Natural Language Pro cessing (NLP) resources in primary school education in Ireland. It shows how two Irish NLP resources, the Irish Finite State Transducer Morphological Engine (IFSTME) (Ui Dhonnchadha, 2002) and Gramadoir (Scannell, 2005) were used as the underlying engines for two Computer Assisted Language Learning (CALL) resources for Irish. The IFSTME was used to supply verb conjugation information for a Verb Checker Component of a CALL resource, while Gramadoir was the underlying engine for a Writing Checker Co mponent. The paper outlines the motivation behind the develo pment of these resources which include trying to leverage some of the benefits of CALL for students studying Irish in primary school. In order to develop CALL materials that were not just an electron ic form of a textbook, it was considered important to incorporate existing NLP resources into the CALL materials. This would have the benefit of not re-inventing the wheel and of using tools that had been designed and testing by a knowledgeable NLP researcher, rather than starting from scratch. The paper reports on the successful development of the CALL resources and some positive feedback from students and teachers. There are several non-technical reasons, mainly logistical, which hin der the deployment of Irish CALL resources in schools, but Irish NLP researchers should strive to disseminate their research and findings to a wider audience than usual, if they wish others to benefit from their work.
... Given that our data is automatically recognized speech, parse features are not likely to be reliable. We use measures of n-gram association, such as pointwise mutual information (PMI), that have a long history of use for detecting collocations and measuring their quality (see Manning and Schütze (1999) and Leacock et al. (2014) for reviews). Our application of a large n-gram database and PMI is to encode language proficiency in sentence construction without using a parser. ...
... While redundancy detection has not yet been widely studied, it is related to several areas of active research, such as grammatical error correction (GEC), sentence simplification and sentence com- pression. Work in GEC attempts to build automatic systems to detect/correct grammatical errors (Leacock et al., 2010; Liu et al., 2010; Dahlmeier and Ng, 2011; Rozovskaya and Roth, 2010). Both redundancy detection and GEC aim to improve students' writings. ...
Article
This paper investigates redundancy detection in ESL writings. We propose a measure that assigns high scores to words and phrases that are likely to be redundant within a given sentence. The measure is composed of two components: one captures fluency with a language model; the other captures meaning preservation based on analyzing alignments between words and their translations. Experiments show that the proposed measure is five times more accurate than the random baseline.
... Related methods are applied in current CALL applications to support learning grammatical rules in a new language. Research in learner error detection also supports grammatical error detection research (Heift and Schulze 2007;Meurers 2009;Leacock et al. 2010), especially as it relates to CALL systems. ...
Article
Full-text available
This article reports on two studies using Language MuseSM (LM), a webbased, teacher professional development (TPD) application designed to enhance teachers' linguistic awareness and to support teachers in the development of language-based instructional scaffolding for English language learners (ELL). In Study 1, in-service teachers enrolled in certification courses learned how to use the natural language processing (NLP) component of LM to support their knowledge, awareness, practice, and application in designing instruction to ELLs. Measurement outcomes from the TPD study indicated that productive use of linguistic feedback in Language Muse led to some increases in teachers' linguistic knowledge and awareness, and in their ability to develop appropriate language-based instruction for ELLs. In Study 2, a school-based study was conducted to evaluate the feasibility of implementing LM in authentic settings. Outcomes from the study demonstrate the feasibility of implementing the Language Muse program in applied school and classroom contexts.
... One of the tasks in educational NLP systems is providing feedback to students in the context of exam questions, homework or intelligent tutoring. Much previous work has been devoted to the automated scoring of essays (Attali and Burstein, 2006; Shermis and Burstein, 2013 ), error detection and correction (Leacock et al., 2010), and classification of texts by grade level (Petersen and Ostendorf, 2009; Sheehan et al., 2010; Nelson et al., 2012 ). In these applications , NLP methods based on shallow features and supervised learning are often highly effective. ...
... The research of NLP applications for improving student's writing skills has grown rapidly in recent years (Dale and Kilgarriff, 2010), as one of education-oriented applications of NLP. The existing studies on these applications have mainly focused on detecting and correcting grammatical and spelling errors (Brockett et al., 2006;Hermet and Désilets, 2009;Leacock et al., 2010;Park and Levy, 2011). On the other hand, there has been growing need for applications taking into account discourse coherence of a text, e.g. ...
Conference Paper
Full-text available
This paper presents building a corpus of manually revised texts which includes both before and after-revision information. In order to create such a corpus, we propose a procedure for revising a text from a discourse perspective, consisting of dividing a text to discourse units, organising and reordering groups of discourse units and finally modifying referring and connective expressions, each of which imposes limits on freedom of revision. Following the procedure, six revisers who have enough experience in either teaching Japanese or scoring Japanese essays revised 120 Japanese essays written by Japanese native speakers. Comparing the original and revised texts, we found some specific manual revisions frequently occurred between the original and revised texts, e.g. 'thesis' statements were frequently placed at the beginning of a text. We also evaluate text coherence using the original and revised texts on the task of pairwise information ordering, identifying a more coherent text. The experimental results using two text coherence models demonstrated that the two models did not outperform the random baseline.
Chapter
This study focuses on both quantitative and qualitative assessments of automatic grammatical error identification, correction, and explanation for learners of Chinese using four large language models (LLMs) (namely, BART CGEC, GPT 4.0, Bard, and Claude 2) from linguistic and educational viewpoints. It was found that general-purpose chat LLMs like GPT 4.0, Bard, and Claude 2 outperformed those specifically designed for Chinese grammatical error correction such as BART CGEC. In particular, Claude 2 excelled in precision and recall for error correction, achieving nearly 95% accuracy with a modified prompt, while GPT 4.0 and Bard lagged behind with around 87.5% precision and 80% recall, and 68.97% precision and 60.6% recall, respectively. Although Claude 2 achieved approximately 66% accuracy in error identification and error explanation, its high precision and recall in error correction made it a strong candidate for an intelligent Chinese grammar checker. Our study suggests the significance of prompt engineering in using LLMs effectively, leading to an 8% improvement in error correction precision for both GPT 4.0 and Claude 2 and over 15% recall improvement in GPT 4.0. Prompt engineering plays a crucial role in optimizing AI tool performance, paving the way for their integration into language learning processes. It is anticipated that LLMs will dramatically revolutionize the outlook of language learning in the near future.
Conference Paper
Full-text available
This paper reports on a study aimed at comparing AI vs. human performance in detecting and categorising errors in L2 Italian texts. Four LLMs were considered: ChatGPT, Copilot, Gemini and Llama3. Two groups of human annotators were involved: L1 and L2 speakers of Italian. A gold standard set of annotations was developed. A fine-grained annotation scheme was adopted, to reflect the specific traits of Italian morphosyntax, with related potential learner errors. Overall, we found that human annotation outperforms AI, with some degree of variation with respect to specific error types. We interpret this as a possible effect of the over-reliance on English as main language used in NLP tasks. We, thus, support a more widespread consideration of different languages.
Article
Full-text available
The difficulties faced by secondary English learners in Dir Upper, Wari, are investigated in this study. It clarifies that a lot of the English teachers in this area are not properly trained in language teaching. The report also emphasizes how frequently teachers employ the Grammar Translation Method (GTM), which presents challenges for students because it forces them to draw comparisons between English and their home tongue-a method that is counterproductive to successful language learning. Additionally, the study highlights how reading comprehension is not given enough weight in Dir Upper, Wari English language training. Although it is a vital component of language development, reading comprehension is not given enough emphasis in the classroom. Furthermore, the research indicates that integrating native language training with English language instruction presents a number of challenges.Because Pashto and English have different grammatical structures, this bilingual method may cause confusion. As a result, pupils may try to translate directly using Pashto grammar rules. In summary, this study highlights the shortcomings in the English language education system in Dir Upper, Wari. It highlights the need for better teacher preparation, a move away from the GTM, and more attention to the development of reading skills. It also recommends reducing the amount of time spent in the student's native tongue during English instruction in order to improve the overall learning experience. The proper and organized means of interpersonal communication is through language. It is the instrument that people use to go about their daily lives in society. This is the means of transmitting a variety of subject subjects. A language speaker has to have some understanding of the structure, history, and relationships of the language they speak. As long as individuals speak and use their native tongue, language exists in a social hierarchy. The importance of language lies in its speakers' political, social, commercial, economic, and cultural significance. The English language is unquestionably extremely significant. In the United Kingdom, almost 300 million people speak English as their first language People become more civilized the more emphasis is placed on language. Every language haFrench is the second most widely known language in the world, after English, which has advanced and attained an appreciated position to such an extent. English has become the universal language, or langue française. Worldwide, English is one of the most extensively used languages for international communication. It is an official language in the majority of third-world nations as well. Another language used for worldwide communication is English. English is a highly urbanized language that is used to convey ideas and is the conduit for information and the revelation of contemporary civilization.
Conference Paper
Full-text available
The development of effective NLP tools for the L2 classroom depends largely on the availability of large annotated corpora of language learner text. While annotated learner corpora of English are widely available, large learner corpora of Spanish are less common. Those Spanish corpora that are available do not contain the annotations needed to facilitate the development of tools beneficial to language learners, such as grammatical error correction. As a result, the field has seen little research in NLP tools designed to benefit Spanish language learners and teachers. We introduce COWS-L2H, a freely available corpus of Spanish learner data which includes error annotations and parallel corrected text to help researchers better understand L2 development, to examine teaching practices empirically, and to develop NLP tools to better serve the Spanish teaching community. We demonstrate the utility of this corpus by developing a neural-network based grammatical error correction system for Spanish learner writing.
Article
Full-text available
Purpose A preliminary version of a paraphasia classification algorithm (henceforth called ParAlg) has previously been shown to be a viable method for coding picture naming errors. The purpose of this study is to present an updated version of ParAlg, which uses multinomial classification, and comprehensively evaluate its performance when using two different forms of transcribed input. Method A subset of 11,999 archival responses produced on the Philadelphia Naming Test were classified into six cardinal paraphasia types using ParAlg under two transcription configurations: (a) using phonemic transcriptions for responses exclusively (phonemic-only) and (b) using phonemic transcriptions for nonlexical responses and orthographic transcriptions for lexical responses (orthographic-lexical). Agreement was quantified by comparing ParAlg-generated paraphasia codes between configurations and relative to human-annotated codes using four metrics (positive predictive value, sensitivity, specificity, and F1 score). An item-level qualitative analysis of misclassifications under the best performing configuration was also completed to identify the source and nature of coding discrepancies. Results Agreement between ParAlg-generated and human-annotated codes was high, although the orthographic-lexical configuration outperformed phonemic-only (weighted-average F1 scores of .78 and .87, respectively). A qualitative analysis of the orthographic-lexical configuration revealed a mix of human- and ParAlg-related misclassifications, the former of which were related primarily to phonological similarity judgments whereas the latter were due to semantic similarity assignment. Conclusions ParAlg is an accurate and efficient alternative to manual scoring of paraphasias, particularly when lexical responses are orthographically transcribed. With further development, it has the potential to be a useful software application for anomia assessment. Supplemental Material https://doi.org/10.23641/asha.22087763
Preprint
Full-text available
This article investigates the potential of using error-annotated L2 utterances and their associated language profile to more accurately assess the language proficiency of an L2 learner. The main goal of this paper is to demonstrate the use of machine learning methods for uncovering features of the learner's interlanguage and their proficiency level, as well as to answer three questions: a) is it feasible to use error-annotated data along with linguistic profile metadata to trace patterns related to aspects of learners’ interlanguage? b) How important may error annotation be in estimating the learner’s proficiency level? c) Which of the (extra-)linguistic profile metadata stand out in contributing to the correct classification of the proficiency level? GLCII, the largest online freely available L2-Greek corpus, is utilized as an example corpus to illustrate the approach proposed in this paper with which for any L2, central questions can potentially be addressed by the relevant machine learning method so that, in a second stage, useful conclusion can be drawn. As a final point, we urge the implementation of ML techniques as a way of expediting research in a variety of related areas in applied linguistics.
Article
Full-text available
This study examines the implementation of Criterion, an automated writing evaluation system developed by ETS, as a source of diagnostic feedback on learners’ linguistic performance in a Vietnamese EFL writing classroom. Thirty-eight second-year English majors had access to Criterion for a five-month period. Data include Criterion error tags on students’ essays from multiple practice sessions, recorded think-aloud protocols as students engaged with the feedback for revisions, and first and revised drafts students submitted to Criterion. The main findings indicate Criterion’s satisfactory precision and capacity to trigger various engagement strategies among learners, but reservations remain due to students’ modest response accuracy and lack of substantive revisions to their texts. Important implications for formative feedback practices in EFL writing classrooms and the adaptation of Criterion’s technical capacities are accordingly presented.
Article
Full-text available
This paper presents a series of experiments in automatic correction of spelling and grammar errors with a statistic and corpus- driven methodology. The language of the experiments is Spanish, but the method can be easily extrapolated to other languages since we do not use language-specific resources. Our main motivation is to develop a tool that could assist university students to write academic texts, because this kind of system is practically nonexistent in the present, especially in Spanish. Our work is based on previous descriptions, which identify the most problematic phenomena in academic writing at university level. We aim to develop a tool for automatic detection and correction of some of those problematic issues at different linguistic levels such as spelling, grammar and vocabulary.
ResearchGate has not been able to resolve any references for this publication.