Sebastian Leal-Arenas’s research while affiliated with University of Pittsburgh and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


Multimodal Deep Learning for Online Meme Classification
  • Conference Paper

December 2024

·

6 Reads

Stephanie Han

·

Sebastian Leal-Arenas

·

·

[...]

·



Figure 1. Stimuli example
Mean Accuracy and Reaction Time Values for Accurate Responses in Independent Variables
Linear Regression of Reaction Times for Accurate Responses in the Perception of Lexical Stress with Mis/matched Cues
Do You See What I Mean? Cue Clashing in L2 Spanish Lexical Stress Perception
  • Conference Paper
  • Full-text available

August 2024

·

13 Reads

Download


Fig. 1. Proposed one-class deep fusion model architecture for machine-generated text detection. The model aims to effectively integrate the contribution of contextual features in document embeddings with linguistic features (Text, Repetitiveness, Emotional Semantics, Readability, and Part-of-Speech). Subsequent to their integration, a Dropout regularization layer and an autoencoder-like reconstruction-based model branch made of several Fully Connected (FC) layers further processes this representation, facilitating the extraction of hidden semantic and contextually rich information that allows the model to accurately detect machine-generated text.
Fig. 2. Feature extraction workflow of the proposed method. Linguistic feature extraction is performed to extract text, repetitiveness, emotional semantics, readability, and Part-Of-Speech (POS) numerical features from data. In parallel, a Doc2Vec model is trained with human written texts and used to extract textual vector embeddings. Both representations are fed to our deep fusion model.
Fig. 3. Confusion matrices reporting the number of correctly (main diagonal) and incorrectly (secondary diagonal) predicted text documents for all methods (English dataset).
Fig. 4. Confusion matrices reporting the number of correctly (main diagonal) and incorrectly (secondary diagonal) predicted text documents for all methods (Spanish dataset).
One-GPT: A One-Class Deep Fusion Model for Machine-Generated Text Detection

December 2023

·

98 Reads

·

3 Citations

On the brink of the one-year anniversary since the public release of ChatGPT, scholarly research has directed their interest toward detection methodologies for machine-generated text. Different models have been proposed, including feature-based classification and detection approaches, as well as deep learning architectures, with a small portion of them integrating contextual information to enhance accurate predictions. Moreover, detection approaches explored thus far have focused primarily on English datasets, with limited attention given to the examination of similar methods in other languages. As a result, the applicability and efficacy of these methods in linguistically diverse contexts remains underexplored. In this paper, we present a one-class deep fusion model that considers both contextual text features derived from word embeddings and linguistic features to detect machine-generated texts in English and Spanish. Experimental results indicated that our model outperformed popular baseline one-class learning models in the detection task, presenting higher accuracy scores in the English dataset. Results are discussed in comparison to competing classifiers as well as the language biases found in detection models.


One-Class Learning for AI-Generated Essay Detection

July 2023

·

269 Reads

·

12 Citations

Applied Sciences

Detection of AI-generated content is a crucially important task considering the increasing attention towards AI tools, such as ChatGPT, and the raised concerns with regard to academic integrity. Existing text classification approaches, including neural-network-based and feature-based methods, are mostly tailored for English data, and they are typically limited to a supervised learning setting. Although one-class learning methods are more suitable for classification tasks, their effectiveness in essay detection is still unknown. In this paper, this gap is explored by adopting linguistic features and one-class learning models for AI-generated essay detection. Detection performance of different models is assessed in different settings, where positively labeled data, i.e., AI-generated essays, are unavailable for model training. Results with two datasets containing essays in L2 English and L2 Spanish show that it is feasible to accurately detect AI-generated essays. The analysis reveals which models and which sets of linguistic features are more powerful than others in the detection task.


Citations (3)


... It is influenced by several factors including the complexity of the vocabulary used, the structure of the document and the flow of the text. The readability feature has been considered as one of the most important features for LLM-generated text detection [57,58]. We adopt two readability metrics to quantify the document's reading difficulty level [59]. ...

Reference:

A hybrid model for the detection of multi-agent written news articles based on linguistic features and BERT
One-GPT: A One-Class Deep Fusion Model for Machine-Generated Text Detection

... Los rasgos de puntuación se utilizan en nueve de los dieciséis trabajos. La mayoría no especifica exactamente qué símbolos ha considerado; no obstante, es relevante que tres estudios coinciden en la importancia diferenciadora del estilo que ofrece el uso de la coma y del punto (Fröhling y Zubiaga, 2021;Corizzo y Leal-Arenas, 2023;Ma et al., 2023). En el estudio de Zaitsu y Jin (2023) se utiliza el posicionamiento de las comas como rasgo discriminatorio entre el estilo humano y el automático. ...

A Deep Fusion Model for Human vs. Machine-Generated Essay Classification
  • Citing Conference Paper
  • June 2023

... Subjectivity and polarity are two linguistic features used in this category, where subjectivity refers to the degree of opinion present in the document, while polarity indicates the expressed sentiment ranging from positive to negative. • Diversity features: The research finding shows that text produced through maximization or top-k sampling methods is more likely predictable, indicating a deficiency in lexical diversity [61,62]. Thus, we use two approaches, the type-token ratio (TTR) and entropy, to measure the richness of text. ...

One-Class Learning for AI-Generated Essay Detection

Applied Sciences