Article

Text Classification by CEFR Levels Using Machine Learning Methods and the BERT Language Model

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
In recent years, the exponential growth of digital documents has been met by rapid progress in text classification techniques. Newly proposed machine learning algorithms leverage the latest advancements in deep learning methods, allowing for the automatic extraction of expressive features. The swift development of these methods has led to a plethora of strategies to encode natural language into machine-interpretable data. The latest language modelling algorithms are used in conjunction with ad hoc preprocessing procedures, of which the description is often omitted in favour of a more detailed explanation of the classification step. This paper offers a concise review of recent text classification models, with emphasis on the flow of data, from raw text to output labels. We highlight the differences between earlier methods and more recent, deep learning-based methods in both their functioning and in how they transform input data. To give a better perspective on the text classification landscape, we provide an overview of datasets for the English language, as well as supplying instructions for the synthesis of two new multilabel datasets, which we found to be particularly scarce in this setting. Finally, we provide an outline of new experimental results and discuss the open research challenges posed by deep learning-based language models.
Article
Full-text available
In automated essay scoring (AES) systems, similarity techniques are used to compute the score for student answers. Several methods to compute similarity have emerged over the years. However, only a few of them have been widely used in the AES domain. This work shows the findings of a ten-year review on similarity techniques applied in AES systems and discusses the efficiency and limitations of current methods. In the final review, thirty-four (34) articles published between 2010 and 2020 were included. The metrics used to evaluate the performance of the AES systems are also elaborated. The review was conducted using the Kitchenham method, whereby three research questions were formulated and a search strategy was developed. Research papers were chosen based on pre-defined inclusion and quality assessment criteria. This review has identified two types of similarity techniques used in AES systems. In addition, several methods were used to compute the score for student answers in the AES systems. The similarity computation in AES systems is dependent on several factors, hence many studies have combined multiple methods in a single system yielding good results. In addition, the review found that the quadratic weighted kappa (QWK) was most frequently used to evaluate AES systems.
Article
Full-text available
This paper focuses on automatically assessing language proficiency levels according to linguistic complexity in learner English. We implement a supervised learning approach as part of an automatic essay scoring system. The objective is to uncover Common European Framework of Reference for Languages (CEFR) criterial features in writings by learners of English as a foreign language. Our method relies on the concept of microsystems with features related to learner-specific linguistic systems in which several forms operate paradigmatically. Results on internal data show that different microsystems help classify writings from A1 to C2 levels (82% balanced accuracy). Overall results on external data show that a combination of lexical, syntactic, cohesive and accuracy features yields the most efficient classification across several corpora (59.2% balanced accuracy).
Article
Full-text available
This article is dedicated to the analysis of various stylometric characteristics combinations of different levels for the quality of verification of authorship of Russian, English and French prose texts. The research was carried out for both low-level stylometric characteristics based on words and symbols and higher-level structural characteristics. All stylometric characteristics were calculated automatically with the help of the ProseRhythmDetector program. This approach gave a possibility to analyze the works of a large volume and of many writers at the same time. During the work, vectors of stylometric characteristics of the level of symbols, words and structure were compared to each text. During the experiments, the sets of parameters of these three levels were combined with each other in all possible ways. The resulting vectors of stylometric characteristics were applied to the input of various classifiers to perform verification and identify the most appropriate classifier for solving the problem. The best results were obtained with the help of the AdaBoost classifier. The average F-score for all languages turned out to be more than 92 %. Detailed assessments of the quality of verification are given and analyzed for each author. Use of high-level stylometric characteristics, in particular, frequency of using N-grams of POS tags, offers the prospect of a more detailed analysis of the style of one or another author. The results of the experiments show that when the characteristics of the structure level are combined with the characteristics of the level of words and / or symbols, the most accurate results of verification of authorship for literary texts in Russian, English and French are obtained. Additionally, the authors were able to conclude about a different degree of impact of stylometric characteristics for the quality of verification of authorship for different languages.
Article
Full-text available
Assessment in the Education system plays a significant role in judging student performance. The present evaluation system is through human assessment. As the number of teachers' student ratio is gradually increasing, the manual evaluation process becomes complicated. The drawback of manual evaluation is that it is time-consuming, lacks reliability, and many more. This connection online examination system evolved as an alternative tool for pen and paper-based methods. Present Computer-based evaluation system works only for multiple-choice questions, but there is no proper evaluation system for grading essays and short answers. Many researchers are working on automated essay grading and short answer scoring for the last few decades, but assessing an essay by considering all parameters like the relevance of the content to the prompt, development of ideas, Cohesion, and Coherence is a big challenge till now. Few researchers focused on Content-based evaluation, while many of them addressed style-based assessment. This paper provides a systematic literature review on automated essay scoring systems. We studied the Artificial Intelligence and Machine Learning techniques used to evaluate automatic essay scoring and analyzed the limitations of the current studies and research trends. We observed that the essay evaluation is not done based on the relevance of the content and coherence.
Article
Importance. An important problem of using written digital texts as a means of teaching professional written communication in a non-linguistic university is posed. The purpose of the research is to show how modern digital technologies can effectively organize educational activities in the subject-language training of written communication of future specialists in the field of international business. Research methods. The following research methods were used: general theoretical (analysis, synthesis, generalization, specification, functional method) and practical (description, observation, surveys, conversations, statistical analysis method). Results and Discussion. Using internet-resources (news sites, professional online magazines), various tasks are offered that contribute to the formation of effective written communication skills. On the basis of step-by-step activity, the development of discursive skills in the professional sphere takes place, the formation of skills for critical comprehension of processed information, the ability to create a secondary written text. Conclusion. Teaching students to work with digital text is one of the priority tasks of professional foreign language training of a specialist today. Successful mastering of written internet-communication allows students, working with digital resources, to form skills of diversification of the orientation of the profession, its movement towards the development of new digital practices.
Chapter
The automatic assessment of language learners’ competences represents an increasingly promising task thanks to recent developments in NLP and deep learning technologies. In this paper, we propose the use of neural models for classifying English written exams into one of the CEFR competence levels. We employ pre-trained BERT models which provide efficient and rapid language processing on account of attention-based mechanisms and the capacity of capturing long-range sequence features. In particular, we investigate on augmenting the original learner’s text with corrections provided by an automatic tool or by human evaluators. We consider different architectures where the texts and corrections are combined at an early stage, via concatenation before the BERT network, or as late fusion of the BERT embeddings. The proposed approach is evaluated on two open-source datasets: the EFCAMDAT and the CLC-FCE. The experimental results show that the proposed approach can predict the learner’s competence level with remarkably high accuracy, in particular when large labelled corpora are available. In addition, we observed that augmenting the input text with corrections provides further improvement in the automatic language assessment task.
Chapter
Automated essay scoring (AES) is the task of assigning grades to essays. It can be applied for quality assessment as well as pricing on User Generated Content. Previous works mainly consider using the prompt information for scoring. However, some prompts are highly abstract, making it hard to score the essay only based on the relevance between the essay and the prompt. To solve the problem, we design an auxiliary task, where a dynamic semantic matching block is introduced to capture the hidden features with example-based learning. Besides, we provide a hierarchical model that can extract semantic features at both sentence-level and document-level. The weighted combination of the scores is obtained from the features above to get holistic scoring. Experimental results show that our model achieves higher Quadratic Weighted Kappa (QWK) scores on five of the eight prompts compared with previous methods on the ASAP dataset, which demonstrate the effectiveness of our model.
Preprint
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remain challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device study.
The problem of automatic scoring for complex constructs using open-ended tasks
  • N V Galichev
  • P S Shirogorodskaya
Automated classification of written pro€ciency levels on the CEFR-scale through complexity contours and RNNs
  • E Kerz
  • D Wiechmann
  • Y Qiao
  • E Tseng
  • M Ströbel
  • J Burstein
  • A Horbach
  • E Kochmar
  • R Laarmann-Quante
  • C Leacock
  • N Madnani
  • I Pilán
  • H Yannakoudakis
  • E. Kerz
BERT: Pre-training of deep bidirectional transformers for language understanding
  • J Devlin
  • M.-W Chang
  • K Lee
  • K Toutanova