Figure - available from: International Journal for Educational Integrity
This content is subject to copyright. Terms and conditions apply.
Source publication
Recent advances in generative pre-trained transformer large language models have emphasised the potential risks of unfair use of artificial intelligence (AI) generated content in an academic environment and intensified efforts in searching for solutions to detect such content. The paper examines the general functionality of detection tools for AI-g...
Similar publications
Background
The application of artificial intelligence (AI) in academic writing has raised concerns regarding accuracy, ethics, and scientific rigour. Some AI content detectors may not accurately identify AI-generated texts, especially those that have undergone paraphrasing. Therefore, there is a pressing need for efficacious approaches or guideline...
Citations
... Relying on AI-detection methods to force students not to use GenAI is as successful and effective as catching the wind in a net. Many researchers have found these tools generally unreliable (Elkhatat et al., 2023;Li et al., 2023;Liang et al., 2023;Matthews & Volpe, 2023;Sharples, 2022;Weber-Wulff et al., 2023). We strongly believe that no matter how advanced these detection tools claim to be, there will always be ways to outsmart these tools (using AI too), and GenAI will keep developing and outpacing these detection methods. ...
There is a growing need to upskill higher education (HE) teachers for the effective and responsible integration of generative artificial intelligence (GenAI) in their classrooms. This case study sought to address this growing need by designing and delivering a training course for educators, focusing on the use of ChatGPT as it was the most commonly used tool at the time. The professional development opportunity lasted 5 weeks and covered critical aspects of GenAI use for teaching and learning. Data collected from participants included discussion board entries, written tasks and focus groups. Findings highlight some of the common practices and concerns HE practitioners had regarding the use of GenAI in their practice. The findings also emphasise the importance of providing teachers with customised GenAI training to facilitate its effective integration in HE contexts. Finally, based on the findings of this study, we propose the TPTP Support System for Teachers, built upon four key areas: teacher training, pedagogical support, testing revamp and practice networks. This system aims to guide institutional efforts to facilitate and support educators as they integrate GenAI in HE. Implications for practice or policy: Teacher training is necessary for the effective integration of GenAI in HE contexts. Institutions should provide support in four key areas to facilitate educators’ effective and responsible use of GenAI in HE. The TPTP Support System for Teachers can be leveraged for these planning and support initiatives.
... Some higher education institutions (HEIs) have integrated AI detection applications, such as Turnitin, within virtual learning environments, or use standalone tools like GPTZero (https://gptzero.me/) and WinstonAI (https://gowinston.ai/) to detect AI generated work (McDonald et al., 2024). However, other institutions remain hesitant due to concerns about the accuracy and reliability of these tools, particularly the risk of false positives (Dalalah & Dalalah, 2023;Saqib & Zia, 2024;Weber-Wulff et al., 2023). Moreover, current detection tools may unfairly target students whose first language is not English, mistakenly identifying their work as AI-generated (Fröhling & Zubiaga, 2021). ...
Generative AI has the potential to transform higher education assessment. This study examines the opportunities and challenges of integrating AI into coursework assessments, highlighting the need to rethink traditional paradigms. A case study is presented that explores AI as an auxiliary learning tool in postgraduate coursework. Students found AI valuable for text generation, proofreading, idea generation, and research but noted limitations in accuracy, detail, and specificity. AI integration offers advantages such as enhancing assessment authenticity, promoting self-regulated learning, and developing critical thinking and problem-solving skills. A holistic approach is recommended, incorporating AI into feedback, adapting assessments to leverage AI’s capabilities, and promoting AI literacy among students and educators. Embracing AI while addressing its challenges can enable effective, equitable, and engaging assessment and teaching practices. Universities are encouraged to strategically integrate AI into teaching and learning, ultimately transforming the educational landscape to better prepare students for an AI-driven world.
... In particular, the focus has been on the curation of detection benchmarks (Uchendu et al. 2021;Li et al. 2024;Wang et al. 2024) and the automation of detection procedure (Venkatraman, Uchendu, and Lee 2024;Hu, Chen, and Ho 2023;Wang et al. 2023;Mitchell et al. 2023). Yet, these detectors can be easily fooled by simple paraphrasing (Krishna et al. 2024) and are not robust to unseen models and domains (Weber-Wulff et al. 2023). These limitations necessitate exploring alternative strategies, such as integrating human-inthe-loop mechanisms, where human evaluators validate or supplement existing detectors. ...
The proliferation of generative models has presented significant challenges in distinguishing authentic human-authored content from deepfake content. Collaborative human efforts, augmented by AI tools, present a promising solution. In this study, we explore the potential of DeepFakeDeLiBot, a deliberation-enhancing chatbot, to support groups in detecting deepfake text. Our findings reveal that group-based problem-solving significantly improves the accuracy of identifying machine-generated paragraphs compared to individual efforts. While engagement with DeepFakeDeLiBot does not yield substantial performance gains overall, it enhances group dynamics by fostering greater participant engagement, consensus building, and the frequency and diversity of reasoning-based utterances. Additionally, participants with higher perceived effectiveness of group collaboration exhibited performance benefits from DeepFakeDeLiBot. These findings underscore the potential of deliberative chatbots in fostering interactive and productive group dynamics while ensuring accuracy in collaborative deepfake text detection. \textit{Dataset and source code used in this study will be made publicly available upon acceptance of the manuscript.
... Model as a sociocultural figure is a representation. Model itself is divided into three, those are professional model, non-professional model, and model a community organized (Hassan et al., 2022;Weber-Wulff et al., 2023). ...
This study aimed to examine how the language used in GoSend advertisements taken from YouTube could influence consumer perceptions and behaviour. This study provided practical importance on the significance of effective language use, appropriate visual selection, and platform selection that could achieve marketing goals. To analyze the advertisement, the researcher used Fairclough’s three-dimensional framework: text dimension (micro), discourse practice (meso), and socio-cultural practice (macro). This study employed a qualitative method in analyzing the advertising text of Gojek’s YouTube GoSend advertisement #BestSellerGoSend featuring Ariel Noah. The findings of this study indicated that various strategies were used in advertising to attract consumer interest. The selection of interesting and promising language increasingly made customers want to use the GoSend service. This advertisement had an impact on influencing customers that by using this service, goods would arrive quickly with cheap shipping costs and safe and hassle-free delivery. In addition, the selection of models as visual objects in this advertisement was very influential in attracting the attention of the audience and building public trust because the model in this advertisement was a legendary public figure, and the addition of old song clips from the band “Noah” served as a special attraction in this advertisement.
... Nonetheless, the accuracy of these tools is questionable, as evidenced by numerous highly publicized cases of errors (Ankel, 2023), leading some institutions to disable such features altogether (Coley, 2023;Fowler, 2023). Detecting GenAI content is clearly prone to error, as documented by a study involving 14 major GenAI detectors (Weber-Wulff et al., 2023). To make the situation worse, AI detectors are shown to have higher error rates when scoring content with certain language patterns, such as those written by non-native English speakers, thereby creating a systematic bias against certain demographics (Liang et al., 2023). ...
... As noted, there are many commercial services aimed at detecting content written by GenAI. There have been studies dedicated to evaluating the performance of these service (e.g., Weber-Wulff et al., 2023). ZeroGPT (zerogpt.com) is among the most popular of such services. ...
... Additionally, instructing AI to write as a student appears to be fairly ineffective in concealing the AI-generated nature of the text, as measured by ZeroGPT. Consistent with previous research (Weber-Wulff et al., 2023), we find that the accuracy of AI-detection software remains far from perfect. ...
Generative AI agents such as ChatGPT have created renewed concerns about the adverse effects of technology on students’ learning through receiving unpermitted aid in their coursework. We conducted an exploratory experiment involving a typical college course assignment to detect and compare genuine student responses with responses generated by ChatGPT. Using a text classification scheme that we devised, we showed that student responses are fairly accurately distinguishable from AI’s, not only when AI uses its general knowledge to answer questions, but also when it is prompted to use the same material used by students. In addition, we identified elements of authorship style, primarily related to formality, that can help humans set students’ and AI’s work apart. We also explored the strategies that students use and the depth at which they alter AI-generated content to make it “their own”. We found the alterations to be mostly moderate, and the modified text remained mostly detectable by our classification scheme. Overall, our results offer a transparent machine learning model for detecting AI-generated text, stylistic cues that can help humans detect such text, as well as insights into students’ strategies when borrowing content from AI agents.
Keywords: Generative AI, Education, Cheating, Authorship Attribution, Authorship Style, Text Analysis
... However, both automated and human detection methods have revealed significant flaws. For instance, studies have shown that these methods often result in false positives and negatives, raising ethical and practical concerns about their reliability and the potential consequences for students and educators (Kumar and Mindzak 2024;Ma et al. 2023;Weber-Wulff et al. 2023). ...
This study examines postsecondary education (PSE) students’ perspectives on postplagiarism—a framework that reconceptualizes academic integrity in response to generative artificial intelligence (GenAI). Through a quantitative survey of 581 PSE students across five English-speaking countries, the research investigated student responses to the six tenets of postplagiarism articulated by Eaton (Int J Educ Integr 19:23, 2023a). The findings reveal a complex pattern of acceptance and resistance: while students broadly embrace the integration of GenAI in academic work, with 93.1% acknowledging the normalization of hybrid human–AI writing, significant concerns persist. Notable resistance emerged regarding the distinction between human and AI-generated content (65.92%), the potential impact of AI on human creativity (60.76%), and the retention of human agency in writing (32.7%). The study also validates a novel instrument for measuring postplagiarism perspectives, achieving acceptable internal consistency (Cronbach’s alpha = 0.718) while identifying areas for refinement. These insights suggest that educational institutions must develop nuanced policies that address student concerns while facilitating ethical AI integration, particularly in areas of attribution, creative expression, and academic agency. The findings contribute to our understanding of how academic integrity frameworks can evolve to remain relevant in an AI-integrated educational landscape.
... The advancements made in the field of artificial intelligence have led to a notable increase in the development and deployment of chatbots for various tasks [42]. Many research efforts have focused on the creation of chatbots and the study of their applications in different fields, such as business support and education. ...
Understanding legal documentation is a complex task due to its inherent subtleties and constant changes. This article explores the use of artificial intelligence-driven chatbots, enhanced by retrieval-augmented generation (RAG) techniques, to address these challenges. RAG integrates external knowledge into generative models, enabling the delivery of accurate and contextually relevant legal responses. Our study focuses on the development of a semantic legal chatbot designed to interact with contract law data through an intuitive interface. This AI Lawyer functions like a professional lawyer, providing expert answers in property law. Users can pose questions in multiple languages, such as English and French, and the chatbot delivers relevant responses based on integrated official documents. The system distinguishes itself by effectively avoiding LLM hallucinations, relying solely on reliable and up-to-date legal data. Additionally, we emphasize the potential of chatbots based on LLMs and RAG to enhance legal understanding, reduce the risk of misinformation, and assist in drafting legally compliant contracts. The system is also adaptable to various countries through the modification of its legal databases, allowing for international application.
... The detection of LLM-generated text, has become an emerging challenge. Current detection technologies, including commercial tools, often need help distingushing between human-written and LLM-generated content (Price and Sakellarios 2023;Walters 2023;Weber-Wulff et al. 2023). These systems frequently misclassify outputs, with a tendency to favour human-written classifications. ...
The remarkable ability of large language models (LLMs) to comprehend, interpret, and generate complex language has rapidly integrated LLM-generated text into various aspects of daily life, where users increasingly accept it. However, the growing reliance on LLMs underscores the urgent need for effective detection mechanisms to identify LLM-generated text. Such mechanisms are critical to mitigating misuse and safeguarding domains like artistic expression and social networks from potential negative consequences. LLM-generated text detection, conceptualized as a binary classification task, seeks to determine whether an LLM produced a given text. Recent advances in this field stem from innovations in watermarking techniques, statistics-based detectors, and neural-based detectors. Human-assisted methods also play a crucial role. In this survey, we consolidate recent research breakthroughs in this field, emphasizing the urgent need to strengthen detector research. Additionally, we review existing datasets, highlighting their limitations and developmental requirements. Furthermore, we examine various LLM-generated text detection paradigms, shedding light on challenges like out-of-distribution problems, potential attacks, real-world data issues, and ineffective evaluation frameworks. Finally, we outline intriguing directions for future research in LLM-generated text detection to advance responsible artificial intelligence. This survey aims to provide a clear and comprehensive introduction for newcomers while offering seasoned researchers valuable updates in the field.
... Tools to support the detection of traditional text-based plagiarism are well understood, with many comparative studies available to assess how well they work (Martins et al., 2014;Foltýnek et al., 2020). Although AI detection tools exist, these appear to have severe limitations and should not be relied upon as a mechanism for accusing students of academic misconduct Weber-Wulff et al., 2023). The watermarking of AI generated text may provide an alternative method of identification (Lancaster, 2023;Liu et al., 2024), but checking for watermarks is not yet available as a commercial service. ...
This teaching practice paper shows how students may choose to work with ChatGPT, generative AI and Large Language Models (LLMs) to produce essays and written assessment solutions in a manner that may be considered as either acceptable or as a breach of academic integrity depending on individual and institutional views. Following a brief introduction to how chatbots work, case study examples show how modified prompts can be used to generate writing in alternative styles, how a writing tutor review can be simulated, and how LLMs can be run locally and without Internet access. The paper is intended to inform academic writing tutors, instructors, and assessors what is possible using generative AI for writing as of January 2024. It is not positioned to make a judgement regarding what is acceptable, but rather to illustrate how technically proficient users can accomplish more than is often indicated by writing beginner level prompts for a chatbot. Such techniques are accessible to many students and the Academic Writing Development community will need to consider its response.
... Some studies trained these detectors in an end-to-end manner on collected data [GPT23; Zer23; BL24; KKO24], whereas others exploited structural properties of LLMs for detection [Ipp+20; GSR19;Mit+23] or relied on inherent stylistic distinctions without training [Yan+24;Tul+24]. However, these ad-hoc methods have shown degraded performance as LLMs become increasingly capable of generating human-like text [Web+23]. It remains true that contemporary LLM-generated text still exhibits distinguishable features compared to humanwritten text [PCJ25]; furthermore, these methods often exhibit vulnerability to adversarial attacks and can show bias against non-native English writers [Kri+24; Sad+23; Lia+23], but this is an area that is evolving. ...
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI), exhibiting remarkable capabilities across diverse tasks such as text generation, reasoning, and decision-making. While their success has primarily been driven by advances in computational power and deep learning architectures, emerging problems -- in areas such as uncertainty quantification, decision-making, causal inference, and distribution shift -- require a deeper engagement with the field of statistics. This paper explores potential areas where statisticians can make important contributions to the development of LLMs, particularly those that aim to engender trustworthiness and transparency for human users. Thus, we focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation. We also consider possible roles for LLMs in statistical analysis. By bridging AI and statistics, we aim to foster a deeper collaboration that advances both the theoretical foundations and practical applications of LLMs, ultimately shaping their role in addressing complex societal challenges.