Article

Automatic determination of semantic similarity of student answers with the standard one using modern models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The paper presents the results of a study of modern text models in order to identify, on their basis, the semantic similarity of English-language texts. The task of determining semantic similarity of texts is an important component of many areas of natural language processing: machine translation, information retrieval, question and answer systems, artificial intelligence in education. The authors solved the problem of classifying the proximity of student answers to the teacher’s standard answer. The neural network language models BERT and GPT, previously used to determine the semantic similarity of texts, the new neural network model Mamba, as well as stylometric features of the text were chosen for the study. Experiments were carried out with two text corpora: the Text Similarity corpus from open sources and the custom corpus, collected with the help of philologists. The quality of the problem solution was assessed by precision, recall, and F-measure. All neural network language models showed a similar F-measure quality of about 86% for the larger Text Similarity corpus and 50–56% for the custom corpus. A completely new result was the successful application of the Mamba model. However, the most interesting achievement was the use of vectors of stylometric features of the text, which showed 80% F-measure for the custom corpus and the same quality of problem solving as neural network models for another corpus.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Авторы работы [18] проанализировали наборы лексических характеристик уровня символов, слов и структуры для определения близости текстов на английском языке. Текст на естественном языке преобразовывался в вектор чисел на основе стилометрической модели, для пар векторов текстов рассчитывались метрики близости пяти видов: косинусное сходство, коэффициент корреляции Пирсона, метрика Чебышева, евклидово расстояние, метрика Минковского. ...
... В уже упоминавшейся работе [18] по определению сходных текстов сравнивались различные варианты больших языковых моделей BERT, GPT и Mamba. На корпусе Text Similarity все модели добились примерно одинакового качества результата: 85-87 % F-меры. ...
Article
The development of automatic assessment systems is a relevant task designed to simplify the routine work of a teacher and speed up feedback for a student. The survey is devoted to research in the field of automatic assessment of student answers based on a teacher's reference answer. The authors of the work analyzed text models used for the tasks of automatic assessment of short answers (ASAG) and automated essay assessment (AES). Several approaches were also taken into account for the task of determining the text similarity, since it is a close task, and the methods for solving it can also be useful for analyzing student answers. Text models can be divided into several large categories. The first is linguistic models based on various stylometric features, both simple ones like a bag of words and n-grams, and complex ones like syntactic and semantic ones. The authors attributed neural network models based on various embeddings to the second category. It highlights large language models as universal, popular and high-quality modeling methods. The third category includes combined models that unite both linguistic features and neural network embeddings. A comparison of modern studies on models, methods and quality metrics showed that the trends in the subject area coincide with the trends in computational linguistics in general. A large number of authors choose large language models to solve their problems, but standard features remain in demand. It is impossible to single out a universal approach; each subtask requires a separate choice of method and adjustment of its parameters. Combined and ensemble approaches allow achieving higher quality than other methods. The vast majority of studies examine texts in English. However, successful results for national languages are also found. It can be concluded that the development and adaptation of methods for assessing students' answers in national languages is a relevant and promising task.
Article
Full-text available
Text-based open-ended questions in academic formative and summative assessments help students become deep learners and prepare them to understand concepts for a subsequent conceptual assessment. However, grading text-based questions, especially in large (>50 enrolled students) courses, is tedious and time-consuming for instructors. Text processing models continue progressing with the rapid development of Artificial Intelligence (AI) tools and Natural Language Processing (NLP) algorithms. Especially after breakthroughs in Large Language Models (LLM), there is immense potential to automate rapid assessment and feedback of text-based responses in education. This systematic review adopts a scientific and reproducible literature search strategy based on the PRISMA process using explicit inclusion and exclusion criteria to study text-based automatic assessment systems in post-secondary education, screening 838 papers and synthesizing 93 studies. To understand how text-based automatic assessment systems have been developed and applied in education in recent years, three research questions are considered: 1) What types of automated assessment systems can be identified using input, output, and processing framework? 2) What are the educational focus and research motivations of studies with automated assessment systems? 3) What are the reported research outcomes in automated assessment systems and the next steps for educational applications? All included studies are summarized and categorized according to a proposed comprehensive framework, including the input and output of the system, research motivation, and research outcomes, aiming to answer the research questions accordingly. Additionally, the typical studies of automated assessment systems, research methods, and application domains in these studies are investigated and summarized. This systematic review provides an overview of recent educational applications of text-based assessment systems for understanding the latest AI/NLP developments assisting in text-based assessments in higher education. Findings will particularly benefit researchers and educators incorporating LLMs such as ChatGPT into their educational activities.
Article
Full-text available
When integrating data from different sources, there are problems of synonymy, different languages, and concepts of different granularity. This paper proposes a simple yet effective approach to evaluate the semantic similarity of short texts, especially keywords. The method is capable of matching keywords from different sources and languages by exploiting transformers and WordNet-based methods. Key features of the approach include its unsupervised pipeline, mitigation of the lack of context in keywords, scalability for large archives, support for multiple languages and real-world scenarios adaptation capabilities. The work aims to provide a versatile tool for different cultural heritage archives without requiring complex customization. The paper aims to explore different approaches to identifying similarities in 1- or n-gram tags, evaluate and compare different pre-trained language models, and define integrated methods to overcome limitations. Tests to validate the approach have been conducted using the QueryLab portal, a search engine for cultural heritage archives, to evaluate the proposed pipeline.
Article
Full-text available
As one of the prominent research directions in the field of natural language processing (NLP), short-text similarity has been widely used in search recommendation and question-and-answer systems. Most of the existing short textual similarity models focus on considering semantic similarity while overlooking the importance of syntactic similarity. In this paper, we first propose an enhanced knowledge language representation model based on graph convolutional networks (KEBERT-GCN), which effectively uses fine-grained word relations in the knowledge base to assess semantic similarity and model the relationship between knowledge structure and text structure. To fully leverage the syntactic information of sentences, we also propose a computational model of constituency parse trees based on tree kernels (CPT-TK), which combines syntactic information, semantic features, and attentional weighting mechanisms to evaluate syntactic similarity. Finally, we propose a comprehensive model that integrates both semantic and syntactic information to comprehensively evaluate short-text similarity. The experimental results demonstrate that our proposed short-text similarity model outperforms the models proposed in recent years, achieving a Pearson correlation coefficient of 0.8805 on the STS-B dataset.
Article
Full-text available
The conventional semantic text-similarity methods requires high amount of trained labeled data and also human interventions. Generally, it neglects the contextual-information and word-orders information resulted in data sparseness problem and latitudinal-explosion issue. Recently, deep-learning methods are used for determining text-similarity. Hence, this study investigates NLP application tasks usage in detecting text-similarity of question pairs or documents and explores the similarity score predictions. A new hybridized approach using Weighted Fine-Tuned BERT Feature extraction with Siamese Bi-LSTM model is implemented. The technique is employed for determining question pair sets using Semantic-text-similarity from Quora dataset. The text features are extracted using BERT process, followed by words embedding with weights. The features along with weight values, are represented as embedded vectors, are subjected to various layers of Siamese Networks. The embedded vectors of input text features were trained by using Deep Siamese Bi-LSTM model, in various layers. Finally, similarity scores are determined for each sentence, and the semantic text-similarity is learned. The performance evaluation of proposed-framework is established with respect to accuracy rate, precision value, F1 score data and Recall values parameters compared with other existing text-similarity detection methods. The proposed-framework exhibited higher efficiency rate with 91% in accuracy level in determining semantic-text-similarity compared with other existing algorithms.
Article
Full-text available
The manual process of scoring short answers of Arabic essay questions is exhaustive, susceptible to error and consumes instructor’s time and resources. This paper explores longest common subsequence (LCS) algorithm as a string-based text similarity measure for effectively scoring short answers of Arabic essay questions. To achieve this effectiveness, the longest common subsequence is modified by developing weight-based measurement techniques and implemented along with using Arabic WordNet for scoring Arabic short answers. The experiments conducted on a dataset of 330 students’ answers reported Root Mean Square Error (RMSE) value of 0.81 and Pearson correlation r value of 0.94. Findings based on experiments have shown improvements in the accuracy of performance of the proposed approach compared to similar studies. Moreover, the statistical analysis has shown that the proposed method scores students’ answers similar to that of human estimator.
Article
Full-text available
Short text similarity measurement methods play an important role in many applications within natural language processing. This paper reviews the research literature on short text similarity (STS) measurement method with the aim to (i) classify and give a broad overview of existing techniques; (ii) find out its strengths and weaknesses in terms of the domain the independence, language independence, requirement of semantic knowledge, corpus and training data, ability to identify semantic meaning, word order similarity and polysemy; and (iii) identify semantic knowledge and corpus resource that can be utilized for the STS measurement methods. Furthermore, our study also considers various issues such as the difference between the various text similarity methods and the difference between semantic knowledge sources and corpora for text similarity. Although there are a few review papers in this area, they focus mostly only on one/two existing techniques. Furthermore, existing review papers do not cover recent research. To the best of our knowledge, this is a comprehensive systematic literature review on this topic. The findings of this research can be as follows: It identified four semantic knowledge and eight corpus resources as external resources that can be classified into general-purpose and domain-specific. Furthermore, the existing techniques can be classified into string-based, corpus-based, knowledge-based and hybrid-based. Moreover, expert researchers can utilize this review as a benchmark as well as reference to the limitations of current techniques. The paper also identifies the open issues that can be considered as feasible opportunities for future research directions.
Article
Full-text available
Text similarity measurement is the basis of natural language processing tasks, which play an important role in information retrieval, automatic question answering, machine translation, dialogue systems, and document matching. This paper systematically combs the research status of similarity measurement, analyzes the advantages and disadvantages of current methods, develops a more comprehensive classification description system of text similarity measurement algorithms, and summarizes the future development direction. With the aim of providing reference for related research and application, the text similarity measurement method is described by two aspects: text distance and text representation. The text distance can be divided into length distance, distribution distance, and semantic distance; text representation is divided into string-based, corpus-based, single-semantic text, multi-semantic text, and graph-structure-based representation. Finally, the development of text similarity is also summarized in the discussion section.
Conference Paper
Full-text available
In goal-oriented conversational agents like Chatbots, finding the similarity between user input and representative text result is a big challenge. Generally, the conversational agent developers tend to provide a minimal number of utterances per intent, which makes the classification task difficult. The problem becomes more complex when the length of the representative text per action is short and the length of the user input is long. We propose a methodology that derives Sentence Similarity score based on N-gram and Sliding Window and uses the FastText Word Embeddings technique which outperforms the current state-of-the-art Sentence Similarity results. We are also publishing a dataset on the shopping domain, to build conversational agents. And the extensive experiments done on the dataset fetched better results in accuracy, precision and recall by 6%, 2% and 80% respectively. It also evinces that our solution generalizes well on the low corpus and requires no training.
Article
Full-text available
Answer sheet evaluation is a time-consuming task that requires lot of efforts by the teachers and hence there is a strong need of automation for the same. This paper proposes a machine learning based approach that relies on WordNet graphs for finding out the text similarity between the answer provided by the student and the ideal answer provided by the teacher to facilitate the automation of answer sheet evaluation. This work is the first attempt in the field of short answer-based evaluation using WordNet graphs. Here, a novel marking algorithm is provided which can incorporate semantic relations of the answer text into consideration. The results when tested on 400 answer sheets yield promising results as compared with the state-of-art.
Article
Full-text available
Text Mining is the excavations carried out by the computer to get something new that comes from information extracted automatically from data sources of different text. Clustering technique itself is a grouping technique that is widely used in data mining. The aim of this study was to find the most optimum value similarity. Jaccard similarity method used similarity, cosine similarity and a combination of Jaccard similarity and cosine similarity. By combining the two similarity is expected to increase the value of the similarity of the two titles. While the document is used only in the form of a title document of practical work in the Department of Informatics Engineering University of Ahmad Dahlan. All these articles have been through the process of preprocessing beforehand. And the method used is the method of document clustering with Shared Nearest Neighbor (SNN). Results from this study is the cosine similarity method gives the best value of proximity or similarity compared to Jaccard similarity and a combination of both
Conference Paper
Full-text available
Assessing the semantic similarity of texts is an important part of different text-related applications like educational systems, information retrieval, text summarization, etc. This task is performed by sophisticated analysis, which implements text-mining techniques. Text mining involves several pre-processing steps, which provide for obtaining structured representative model of the documents in a corpus by means of extracting and selecting the features, characterizing their content. Generally the model is vector-based and enables further analysis with knowledge discovery approaches. Algorithms and measures are used for assessing texts at syntactical and semantic level. An important text-mining method and similarity measure is latent semantic analysis (LSA). It provides for reducing the dimensionality of the document vector space and better capturing the text semantics. The mathematical background of LSA for deriving the meaning of the words in a given text by exploring their co-occurrence is examined. The algorithm for obtaining the vector representation of words and their corresponding latent concepts in a reduced multidimensional space as well as similarity calculation are presented.
Article
This paper presents a study of the problem of automatic classification of short coherent texts (essays) in English according to the levels of the international CEFR scale. Determining the level of text in natural language is an important component of assessing students knowledge, including checking open tasks in e-learning systems. To solve this problem, vector text models were considered based on stylometric numerical features of the character, word, sentence structure levels. The classification of the obtained vectors was carried out by standard machine learning classifiers. The article presents the results of the three most successful ones: Support Vector Classifier, Stochastic Gradient Descent Classifier, LogisticRegression. Precision, recall and F-score served as quality measures. Two open text corpora, CEFR Levelled English Texts and BEA-2019, were chosen for the experiments. The best classification results for six CEFR levels and sublevels from A1 to C2 were shown by the Support Vector Classifier with F-score 67 % for the CEFR Levelled English Texts. This approach was compared with the application of the BERT language model (six different variants). The best model, bert-base-cased, provided the F-score value of 69 %. The analysis of classification errors showed that most of them are between neighboring levels, which is quite understandable from the point of view of the domain. In addition, the quality of classification strongly depended on the text corpus, that demonstrated a significant difference in F-scores during application of the same text models for different corpora. In general, the obtained results showed the effectiveness of automatic text level detection and the possibility of its practical application.
Article
Comparing text documents is an essential task for a variety of applications within diverse research fields, and several different methods have been developed for this. However, calculating text similarity is an ambiguous and context-dependent task, so many open challenges still exist. In this paper, we present a novel method for text similarity calculations based on the combination of embedding technology and ensemble methods. By using several embeddings, instead of only one, we show that it is possible to achieve higher quality, which in turn is a key factor for developing high-performing applications for text similarity exploitation. We also provide a prototype visual analytics tool which helps the analyst to find optimal performing ensembles and gain insights to the inner workings of the similarity calculations. Furthermore, we discuss the generalizability of our key ideas to fields beyond the scope of text analysis.
Article
Short text similarity plays an important role in natural language processing (NLP). It has been applied in many fields. Due to the lack of sufficient context in the short text, it is difficult to measure the similarity. The use of semantics similarity to calculate textual similarity has attracted the attention of academia and industry and achieved better results. In this survey, we have conducted a comprehensive and systematic analysis of semantic similarity. We first propose three categories of semantic similarity: corpus‐based, knowledge‐based, and deep learning (DL)‐based. We analyze the pros and cons of representative and novel algorithms in each category. Our analysis also includes the applications of these similarity measurement methods in other areas of NLP. We then evaluate state‐of‐the‐art DL methods on four common datasets, which proved that DL‐based can better solve the challenges of the short text similarity, such as sparsity and complexity. Especially, bidirectional encoder representations from transformer model can fully employ scarce information of short texts and semantic information and obtain higher accuracy and F1 value. We finally put forward some future directions.
Chapter
Recent advancements in the field of deep learning for natural language processing made it possible to use novel deep learning architectures, such as the Transformer, for increasingly complex natural language processing tasks. Combined with novel unsupervised pre-training tasks such as masked language modeling, sentence ordering or next sentence prediction, those natural language processing models became even more accurate. In this work, we experiment with fine-tuning different pre-trained Transformer based architectures. We train the newest and most powerful, according to the glue benchmark, transformers on the SemEval-2013 dataset. We also explore the impact of transfer learning a model fine-tuned on the MNLI dataset to the SemEval-2013 dataset on generalization and performance. We report up to 13% absolute improvement in macro-average-F1 over state-of-the-art results. We show that models trained with knowledge distillation are feasible for use in short answer grading. Furthermore, we compare multilingual models on a machine-translated version of the SemEval-2013 dataset.
Article
Semantic textual similarity (STS) is the task of assessing the degree of similarity between two texts in terms of meaning. Several approaches have been proposed in the literature to determine the semantic similarity between texts. The most promising work recently presented in the literature was supervised approaches. Unsupervised STS approaches are characterized by the fact that they do not require learning data, but they still suffer from some limitations. Word alignment has been widely used in the state-of-the-art approaches. From this point, this paper has three contributions. First, a new synset-oriented word aligner is presented, which relies on a huge multilingual semantic network named BabelNet. Second, three unsupervised STS approaches are proposed: string kernel-based (SK), alignment-based (AL), and weighted alignment-based (WAL). Third, some limitations of the state-of-the-art approaches are tackled, and different similarity methods are demonstrated to be complementary with each other by proposing an unsupervised ensemble STS (UESTS) approach. The UESTS incorporates the merits of four similarity measures: proposed alignment-based, surface-based, corpus-based, and enhanced edit distance. The experimental results proved that the participation of the proposed aligner in STS is effective. Over all the evaluation data sets, the proposed UESTS outperforms the state-of-the-art unsupervised approaches, which is a promising result.
Conference Paper
In this paper, we explore unsupervised techniques for the task of automatic short answer grading. We compare a number of knowledge-based and corpus-based mea- sures of text similarity, evaluate the effect of domain and size on the corpus-based measures, and also introduce a novel tech- nique to improve the performance of the system by integrating automatic feedback from the student answers. Overall, our system significantly and consistently out- performs other unsupervised methods for short answer grading that have been pro- posed in the past.
Algoritm avtomaticheskogo postroeniya yazykovogo profilya uchashchegosya
  • N S Lagutina
  • M V Tihomirov
  • N K Mastakova
N. S. Lagutina, M. V. Tihomirov, and N. K. Mastakova, "Algoritm avtomaticheskogo postroeniya yazykovogo profilya uchashchegosya", Zametki po informatike i matematike, no. 15, pp. 58-65, 2023, in Russian.
Sostoyanie i uroven' razrabotok sistem avtomaticheskoj ocenki svobodnyh otvetov na estestvennom yazyke
  • O B Mishunin
  • A P Savinov
  • D I Firstov
O. B. Mishunin, A. P. Savinov, and D. I. Firstov, "Sostoyanie i uroven' razrabotok sistem avtomaticheskoj ocenki svobodnyh otvetov na estestvennom yazyke", Modern high technologies, no. 1, pp. 38-44, 2016, in Russian.
Language models are few-shot learners
  • T Brown
T. Brown et al., "Language models are few-shot learners", Advances in neural information processing systems, vol. 33, pp. 1877-1901, 2020.
Wisdom of students: A consistent automatic short answer grading technique
  • S Roy
  • S Dandapat
  • A Nagesh
  • Y Narahari
S. Roy, S. Dandapat, A. Nagesh, and Y. Narahari, "Wisdom of students: A consistent automatic short answer grading technique", in Proceedings of the 13th International Conference on Natural Language Processing, 2016, pp. 178-187.