December 2024
·
6 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
December 2024
·
6 Reads
October 2024
·
1 Read
August 2024
·
13 Reads
July 2024
·
22 Reads
December 2023
·
98 Reads
·
3 Citations
On the brink of the one-year anniversary since the public release of ChatGPT, scholarly research has directed their interest toward detection methodologies for machine-generated text. Different models have been proposed, including feature-based classification and detection approaches, as well as deep learning architectures, with a small portion of them integrating contextual information to enhance accurate predictions. Moreover, detection approaches explored thus far have focused primarily on English datasets, with limited attention given to the examination of similar methods in other languages. As a result, the applicability and efficacy of these methods in linguistically diverse contexts remains underexplored. In this paper, we present a one-class deep fusion model that considers both contextual text features derived from word embeddings and linguistic features to detect machine-generated texts in English and Spanish. Experimental results indicated that our model outperformed popular baseline one-class learning models in the detection task, presenting higher accuracy scores in the English dataset. Results are discussed in comparison to competing classifiers as well as the language biases found in detection models.
July 2023
·
269 Reads
·
12 Citations
Applied Sciences
Detection of AI-generated content is a crucially important task considering the increasing attention towards AI tools, such as ChatGPT, and the raised concerns with regard to academic integrity. Existing text classification approaches, including neural-network-based and feature-based methods, are mostly tailored for English data, and they are typically limited to a supervised learning setting. Although one-class learning methods are more suitable for classification tasks, their effectiveness in essay detection is still unknown. In this paper, this gap is explored by adopting linguistic features and one-class learning models for AI-generated essay detection. Detection performance of different models is assessed in different settings, where positively labeled data, i.e., AI-generated essays, are unavailable for model training. Results with two datasets containing essays in L2 English and L2 Spanish show that it is feasible to accurately detect AI-generated essays. The analysis reveals which models and which sets of linguistic features are more powerful than others in the detection task.
June 2023
·
27 Reads
·
6 Citations
... It is influenced by several factors including the complexity of the vocabulary used, the structure of the document and the flow of the text. The readability feature has been considered as one of the most important features for LLM-generated text detection [57,58]. We adopt two readability metrics to quantify the document's reading difficulty level [59]. ...
December 2023
... Los rasgos de puntuación se utilizan en nueve de los dieciséis trabajos. La mayoría no especifica exactamente qué símbolos ha considerado; no obstante, es relevante que tres estudios coinciden en la importancia diferenciadora del estilo que ofrece el uso de la coma y del punto (Fröhling y Zubiaga, 2021;Corizzo y Leal-Arenas, 2023;Ma et al., 2023). En el estudio de Zaitsu y Jin (2023) se utiliza el posicionamiento de las comas como rasgo discriminatorio entre el estilo humano y el automático. ...
June 2023
... Subjectivity and polarity are two linguistic features used in this category, where subjectivity refers to the degree of opinion present in the document, while polarity indicates the expressed sentiment ranging from positive to negative. • Diversity features: The research finding shows that text produced through maximization or top-k sampling methods is more likely predictable, indicating a deficiency in lexical diversity [61,62]. Thus, we use two approaches, the type-token ratio (TTR) and entropy, to measure the richness of text. ...
July 2023
Applied Sciences