Marvin M. Agüero-Torales

Marvin M. Agüero-Torales
Fujitsu Ltd. · Data Intelligence of CoE

Doctor of Philosophy
NLP Applied Researcher

About

24
Publications
9,279
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
188
Citations
Introduction
Marvin Agüero-Torales currently a NLP Applied Researcher in Global CoE of Data Intelligence at Fujitsu. Marvin does research in bioNLP, Text Mining, Natural Language Processing and Artificial Intelligence.
Additional affiliations
October 2017 - January 2022
University of Granada
Position
  • PhD Student
Description
  • Part-time student. Sentiment Analysis, Text Mining, Natural Language Processing, Machine Learning, Computer Science. Thesis: https://digibug.ugr.es/handle/10481/72863
September 2016 - October 2017
University of Granada
Position
  • Master's Student
Description
  • Master's student of the Master of Computer Engineering

Publications

Publications (24)
Conference Paper
Full-text available
One of the main problems low-resource languages face in NLP can be pictured as a vicious circle: data is needed to build and test tools, but the available text is scarce and there are not powerful tools to collect it. In order to break this circle for Guarani, we explore if text automatically generated from a grammar can work as a Data Augmentation...
Conference Paper
Full-text available
This paper presents the results of the first shared task about the creation of educational materials for three indigenous languages of the Americas.The task proposes to automatically generate variations of sentences according to linguistic features that could be used for grammar exercises.The languages involved in this task are Bribri, Maya, and Gu...
Article
Full-text available
We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from...
Article
Full-text available
This paper focuses on text-based affective computing for Jopara, a code-switching language that combines Guarani and Spanish. First, we collected a dataset of tweets primarily written in Guarani and annotated them for three widely used dimensions in sentiment analysis: (a) emotion recognition, (b) humor detection, and (c) offensive language identif...
Conference Paper
Full-text available
This paper presents a work in progress about creating a Guarani version of the WordNet database. Guarani is an indigenous South American language and is a low-resource language from the NLP perspective. Following the expand approach, we aim to find Guarani lemmas that correspond to the concepts defined in WordNet. We do this through three strategie...
Conference Paper
Full-text available
This work presents a parallel corpus of Guarani-Spanish text aligned at sentence level. The corpus contains about 30,000 sentence pairs, and is structured as a collection of subsets from different sources, further split into training, development and test sets. A sample of sentences from the test set was manually annotated by native speakers in ord...
Poster
Full-text available
This work presents a parallel corpus of Guarani-Spanish text aligned at sentence level. The corpus contains about 30,000 sentence pairs, and is structured as a collection of subsets from different sources, further split into training, development and test sets. A sample of sentences from the test set was manually annotated by native speakers in ord...
Method
Full-text available
An annotation mini-guidelines, which describes the process followed by the bilingual annotators (Guarani-Spanish) who manually annotated the Guarani-dominant Jopara and Guarani corpus (https://github.com/search?q=user%3Ammaguero+corpus)
Conference Paper
Full-text available
Detection of occupations in texts is relevant for a range of important application scenarios, like competitive intelligence, sociodemographic analysis, legal NLP or health-related occupational data mining. Despite the importance and heterogeneous data types that mention occupations, text mining efforts to recognize them have been limited. This is d...
Preprint
Full-text available
This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, includin...
Article
Twenty-four studies on twenty-three distinct languages and eleven social media illustrate the steady interest in deep learning approaches for multilingual sentiment analysis of social media. We improve over previous reviews with wider coverage from 2017 to 2020 as well as a study focused on the underlying ideas and commonalities behind the differen...
Article
In this work, we apply topic modeling to study what users have been discussing in Twitter during the beginning of the COVID-19 pandemic. More particularly , we explore the period of time that includes three differentiated phases of the COVID-19 crisis in Spain: the pre-crisis time, the outbreak, and the beginning of the lockdown. To do so, we first...
Method
Full-text available
Mini-evaluation framework and mini-guides. See more details in (section 4.4, p. 186, http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6333).
Method
Full-text available
An annotation mini-guidelines, which describes the process followed by the bilingual annotators (Guarani-Spanish) who manually annotated the Guarani-dominant Jopara and Guarani corpus (https://github.com/mmaguero/josa-corpus)
Technical Report
Full-text available
SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. The ProfNER Shared Task encourages its participants to detect occupations and employment situations in Spanish tweets related to the COVID-19 situation. These guidelines describe the process followed by the clinical and linguist experts who manually annotated...
Poster
Full-text available
Elementos clave: • SLA • Peritaje de vehículos • Aprendizaje automático
Article
Full-text available
The tourism industry has been promoting its products and services based on the reviews that people often write on travel websites like TripAdvisor.com, Booking.com and other platforms like these. These reviews have a profound effect on the decision making process when evaluating which places to visit, such as which restaurants to book, etc. In thi...
Poster
Full-text available
GASTRO-MINER: Una Herramienta Basada en la Nube para el Análisis de Sentimientos en Opiniones sobre Restaurantes en TripAdvisor: Caso de Estudio sobre Restaurantes de la Provincia de Granada.
Presentation
Full-text available
Defensa del Trabajo de Fin de Máster Universitario en Ingeniería Informática de la Universidad de Granada. Trabajo Final de Máster: "Gastro-miner: Una Herramienta Basada en la Nube para el Análisis de Sentimientos en Opiniones sobre Restaurantes en TripAdvisor: Caso de Estudio sobre Restaurantes de la Provincia de Granada" [Python Stack (Django, N...
Technical Report
Full-text available
Resumen ---- La industria del turismo ha estado promoviendo sus productos y servicios basados en las revisiones que las personas a menudo escriben en los sitios web de viajes como TripAdvisor.com. Estas revisiones tienen un efecto profundo en el proceso de toma de decisiones cuando se evalúan qué lugares visitar, como en cuáles restaurantes reserv...
Preprint
Full-text available
La investigacion en este trabajo pretende exponer una perspectiva clara del modo en que una Base de Datos Federada puede ser implementada y los conjuntos de tecnicas disponibles para este proposito. Un Sistema de Base de Datos Federada es una eleccion valida, si se necesita formular consultas simples (unicas), y recibir respuestas simples (unicas),...
Conference Paper
Full-text available
La creciente necesidad de cooperación entre entidades independientes requiere el acceso integrado a múltiples bases de datos autónomas y heterogéneas, es decir, acceder a los datos como si de una sola fuente de datos se tratase. Esta colección de bases de datos cooperativas, conocidas como Sistemas de Bases de Datos (SBD) componentes, forman una fe...

Network

Cited By