Figure 8 - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Confusion matrices reporting the number of correctly (main diagonal) and incorrectly (secondary diagonal) predicted essays for all methods (Spanish dataset-setting: Repetitiveness).
Source publication
Detection of AI-generated content is a crucially important task considering the increasing attention towards AI tools, such as ChatGPT, and the raised concerns with regard to academic integrity. Existing text classification approaches, including neural-network-based and feature-based methods, are mostly tailored for English data, and they are typic...
Context in source publication
Context 1
... results are close to the random performance with this setting, with OneClassSVM being the best-performing method. The Repetitiveness setting (see Figure 8) highlights 358 (IsolationForest), 355 (ABOD), 355 (AutoEncoder), 354 (HBOS), 338 (LocalOutlierFactor), and 336 (OneClassSVM) incorrectly classified essays. Across all the methods in this setting, the performance is consistently poor, as indicated by the minimal deviation in the number of misclassified essays. ...
Similar publications
The proliferation of deluding data such as fake news and phony audits on news web journals, online publications, and internet business apps has been aided by the availability of the web, cell phones, and social media. Individuals can quickly fabricate comments and news on social media. The most difficult challenge is determining which news is real...
In this paper, we present a methodology for the early detection of fake news on emerging topics through the innovative application of weak supervision. Traditional techniques for fake news detection often rely on fact-checkers or supervised learning with labeled data, which is not readily available for emerging topics. To address this, we introduce...
Many current studies on natural language processing (NLP) depend on supervised learning, which needs a lot of labeled data. This is not practical for large classification tasks. This research presents an unsupervised approach to automatically classify unlabeled theses using a BERT-hierarchical model. This technique combines BERT, an open-source mac...
Citations
... Subjectivity and polarity are two linguistic features used in this category, where subjectivity refers to the degree of opinion present in the document, while polarity indicates the expressed sentiment ranging from positive to negative. • Diversity features: The research finding shows that text produced through maximization or top-k sampling methods is more likely predictable, indicating a deficiency in lexical diversity [61,62]. Thus, we use two approaches, the type-token ratio (TTR) and entropy, to measure the richness of text. ...
Large language models (LLMs) are central to AI systems and Excel in natural language processing tasks. They blur the line between human and machine-generated text and are widely used by professional writers across domains including news article generation. The challenge of detecting LLM-written articles introduces novel obstacles regarding misuse and the generation of fake content. In this work, we aim to recognize two kinds of LLM-written news where one type is entirely generated by LLMs and another is paraphrased based on existing news sources. We propose a neural network model that incorporates linguistic features and BERT contextual embedding features for LLM-written news article detection. In conjunction with the proposed model, we also produce a news article corpus based on the BBC dataset to generate and paraphrase news articles through multi-agent cooperation using ChatGPT. Our model obtains 96.57% accuracy and 96.44% F1macro score, respectively, outperforming other existing models and indicating the capability of helping readers to identify LLM-written news articles. To assess the model’s robustness, we also construct another corpus based on the BBC dataset using a different language model, Claude, and demonstrate that our detection model achieves strong results. Furthermore, we apply our model to text generation detection in the medical domain, where it also delivers promising performance.
... Experiments Common words extraction: We grouped data based on the meme class label, and counted the frequency of each word. We filtered stopwords as a common practice to remove nonlexical words [17]. We resorted to the pre-curated list available in the Python nltk library. ...
... Integrating Artificial Intelligence (AI) into education has changed conventional teaching methodologies and opened up a world of possibilities for improving learning experiences. Chat-GPT and other AI technologies are not just tools but gateways to interactive and personalized learning opportunities previously beyond our reach [1,2]. These developments have created an adaptive and engaging environment, empowering students and teachers to explore the full potential of AI for various academic purposes. ...
Integrating artificial intelligence (AI) in education has transformed traditional teaching by providing interactive and personalized learning opportunities. This study investigates the impact of Chat-GPT, an advanced language model, on improving English writing skills among 10 secondary school students in Ambon City, Maluku, Indonesia. Utilizing a mixed-methods approach, the research incorporated qualitative interviews, surveys, and quantitative analysis of students' writing portfolios to evaluate the effectiveness of AI-generated feedback. The results revealed that Chat-GPT significantly enhanced students' writing skills through personalized, real-time feedback , which led to notable improvements in grammar, vocabulary, and self-editing capabilities. Principal component analysis (PCA) identified three key factors contributing to skill development: (1) overall improvement in writing quality, including better coherence and creativity; (2) technical proficiency in grammar, sentence structure , and editing; and (3) user satisfaction with the platform and the perceived value of its feedback. Furthermore, correlation analysis demonstrated a positive association between frequent Chat-GPT use and enhanced coherence and technical precision in written work. Qualitative feedback highlighted students' appreciation for contextual-ized, interactive feedback and a preference for features supporting real-time collaboration. The study advocates for the responsible integration of AI in education to foster self-directed learning and improve linguistic competencies.
... Looking ahead, in the technical domain, research will inevitably continue on both sides of the detection/ evasion arms race. Solutions are already focusing on so-called zero-shot detection methods that do not require labeled training data (e.g., Mitchell et al., 2023); those that can detect AI-generation in ever-smaller units of text, such as sentences (e.g., Wang et al., 2023); and those that generalize successfully across languages (e.g., Corizzo & Leal-Arenas, 2023). More broadly, however, as various AI methods become legitimate tools, seamlessly integrated into almost every aspect of daily life, it is unlikely that purely technical solutions will suffice. ...
The widespread availability of generative-AI using Large Language Models (LLMs) has provided the means for students and others to easily cheat on written assignments – that is, students can use AI to generate text and then submit that work as their own. A variety of technical solutions have been developed to detect such cheating. However, concerns have been raised about the dangers of falsely identifying real students’ responses as having been generated by AI. Here we evaluate a Generative AI detector that comes as an option with Turnitin, a widely used plagiarism-detection platform already in use at many universities. We compare 160 responses written by students in a class assignment with 160 responses generated by ChatGPT instructed to complete the same assignment. The ChatGPT responses were generated by 16 different prompts crafted to mimic those that plausibly might be given by individuals seeking to cheat on an assignment. The AI scores for the AI generated responses were significantly higher than the AI scores for the human-generated responses, which were all zero. Clearly, an arms race is set to develop between technology that facilitates cheating and technology that detects it. However, the present findings demonstrate that it is at least possible to deploy technical solutions in this context. Looking ahead, as various AI-methods become legitimate tools and are more seamlessly integrated into almost every aspect of daily life, it is unlikely that purely technical solutions will suffice. Instead, guidelines surrounding academic integrity will have to adapt to new conceptualizations of academic mastery and creative output.
... The present prevalence of Gen-AI systems like ChatGPT and Grammarly-GO has changed the concept of plagiarism, as these tools can, besides assisting with language error correction, generate texts based on the instructions of a person. Thus, because of the worry about academic integrity in academic writing, the advent of AI-text detection systems was deemed inevitable to empower AI plagiarism detection methods (Corizzo & Leal-Arenas, 2023;Perkins, 2023;. Some research contended that texts produced by Gen-AI systems can be identified with high confidence when machine-learning models are applied (Cingillioglu, 2023). ...
... Algorithms like Neural Networks and Decision Trees applied in machine learning, as indicated by some studies (Alamleh et al., 2023;Kirchner et al., 2023), have demonstrated effectiveness in differentiating AI-generated text from human-written text, which is crucial for applications in content moderation and AI-based plagiarism detection. Other studies have applied models based on One-class learning, which utilises linguistic features to detect AI-generated texts, underscoring the feasibility of accurate detection across different languages (Corizzo & Leal-Arenas, 2023). Nonetheless, research works that have compared different AI detection tools, including GPT-Zero, Copy-Leaks, and Turnitin, revealed varying levels of effectiveness of these detection tools (Santra & Majhi, 2023). ...
Academic writing in the current times is significantly different from what it was a decade ago. Most prevalent in today's digital world is the disruption caused by Artificial Intelligence (AI) tools employed to assist in writing academic works. In this chapter, we overview the ongoing trend of the ethical challenge and implications of using AI-text detection systems for fostering academic and research integrity. Using current literature on the topic, the chapter has presented real-world cases where individuals have complained that AI-text detection platforms like that provided by Turnitin have flagged (as AI-generated) the content that was edited using language AI assistive tools like Grammarly and language translators. Furthermore, by reviewing the up-to-date empirical studies, we have presented an overview of the false-positive scenario of these AI-text detection tools and their biases towards non-native English writers. Finally, the chapter provides, besides the future outlook of the topic, the practical implications, including the consideration of fair and transparent AI-based tool usage policies.
... Together, these studies represent important developments in the application of sophisticated language models and advanced neural network techniques to improve classification tasks in specialized medical contexts and general text processing, thereby increasing both accuracy and efficiency in handling complex textual information. In [36], the authors proposed a novel machine learning technique called one-class learning, which is employed to differentiate between AI-generated essays and human-written ones. This research presents a practical solution to the problem of distinguishing between AI-generated and human-written essays by leveraging one-class learning, which requires less labeled data and can be effective in identifying deviations from a known norm. ...
This paper proposes a new robust model for text classification on the Stanford Sentiment Treebank v2 (SST-2) dataset in terms of model accuracy. We developed a Recurrent Neural Network Bert based (RNN_Bert_based) model designed to improve classification accuracy on the SST-2 dataset. This dataset consists of movie review sentences, each labeled with either positive or negative sentiment, making it a binary classification task. Recurrent Neural Networks (RNNs) are effective for text classification because they capture the sequential nature of language, which is crucial for understanding context and meaning. Bert excels in text classification by providing bidirectional context, generating contextual embeddings, and leveraging pre-training on large corpora. This allows Bert to capture nuanced meanings and relationships within the text effectively. Combining Bert with RNNs can be highly effective for text classification. Bert’s bidirectional context and rich embeddings provide a deep understanding of the text, while RNNs capture sequential patterns and long-range dependencies. Together, they leverage the strengths of both architectures, leading to improved performance on complex classification tasks. Next, we also developed an integration of the Bert model and a K-Nearest Neighbor based (KNN_Bert_based) method as a comparative scheme for our proposed work. Based on the results of experimentation, our proposed model outperforms traditional text classification models as well as existing models in terms of accuracy.
... Feature-based approaches simplify the understanding of the model's decision-making process. The discussed techniques, in fact, make the process more transparent and comprehensible by concentrating on particular, quantifiable aspects of the text [72]. Moreover, they are very "flexible" because it is possible to select specific features and adapt the model to particular types of text and writing styles. ...
The rise of Large Language Models (LLMs) has dramatically altered the generation and spreading of textual content. This advancement offers benefits in various domains, including medicine, education, law, coding, and journalism, but also has negative implications, mainly related to ethical concerns. Preventing measures to mitigate negative implications pass through solutions that distinguish machine-generated text from humanwritten text. This study aims to provide a comprehensive review of existing literature for detecting LLMgenerated texts. Emerging techniques are categorized into five categories: watermarking, feature-based, neural-based, hybrid, and human-aided methods. For each introduced category, strengths and limitations are discussed, providing insights into their effectiveness and potential for future improvements. Moreover, available datasets and tools are introduced. Results demonstrate that, despite the good delimited performance, the multitude of languages to recognize, hybrid texts, the continuous improvement of algorithms for text generation and the lack of regulation require additional efforts for efficient detection.
... Its adoption has generated a series of reactions in social media, where uses and misuses are presented. In academic contexts, scholars [1] [2] [3] [4] have mentioned the potential for course assignments to be completed entirely by Artificial Intelligence (AI) systems, leaving instructors in a predicament. This concern stems from the growing capabilities of AI in Natural Language Processing (NLP) and text generation. ...
... While one-class learning approaches represent a viable option to overcome such limitation and have demonstrated success in a number of domains [19] [20] [21] [22] [23], approaches for machine-generated text detection are still scarce, and limited to the adoption of simple one-class baseline models and basic linguistic features [4]. ...
... While successful in a wide range of domains and applications, scarce attention has been devoted to machine-generated text content. The work in [4] assesses the efficacy of one-class models on essay data, leveraging baseline approaches with simple sets of linguistic features. ...
On the brink of the one-year anniversary since the public release of ChatGPT, scholarly research has directed their interest toward detection methodologies for machine-generated text. Different models have been proposed, including feature-based classification and detection approaches, as well as deep learning architectures, with a small portion of them integrating contextual information to enhance accurate predictions. Moreover, detection approaches explored thus far have focused primarily on English datasets, with limited attention given to the examination of similar methods in other languages. As a result, the applicability and efficacy of these methods in linguistically diverse contexts remains underexplored. In this paper, we present a one-class deep fusion model that considers both contextual text features derived from word embeddings and linguistic features to detect machine-generated texts in English and Spanish. Experimental results indicated that our model outperformed popular baseline one-class learning models in the detection task, presenting higher accuracy scores in the English dataset. Results are discussed in comparison to competing classifiers as well as the language biases found in detection models.