
Marvin M. Agüero-ToralesFujitsu Ltd. · Data Intelligence of CoE
Marvin M. Agüero-Torales
Doctor of Philosophy
NLP Applied Researcher
About
24
Publications
9,279
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
188
Citations
Introduction
Marvin Agüero-Torales currently a NLP Applied Researcher in Global CoE of Data Intelligence at Fujitsu. Marvin does research in bioNLP, Text Mining, Natural Language Processing and Artificial Intelligence.
Additional affiliations
Publications
Publications (24)
One of the main problems low-resource languages face in NLP can be pictured as a vicious circle: data is needed to build and test tools, but the available text is scarce and there are not powerful tools to collect it. In order to break this circle for Guarani, we explore if text automatically generated from a grammar can work as a Data Augmentation...
This paper presents the results of the first shared task about the creation of educational materials for three indigenous languages of the Americas.The task proposes to automatically generate variations of sentences according to linguistic features that could be used for grammar exercises.The languages involved in this task are Bribri, Maya, and Gu...
We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context. We annotated a corpus of 1500 texts extracted from...
This paper focuses on text-based affective computing for Jopara, a code-switching language that combines Guarani and Spanish. First, we collected a dataset of tweets primarily written in Guarani and annotated them for three widely used dimensions in sentiment analysis: (a) emotion recognition, (b) humor detection, and (c) offensive language identif...
This paper presents a work in progress about creating a Guarani version of the WordNet database. Guarani is an indigenous South American language and is a low-resource language from the NLP perspective. Following the expand approach, we aim to find Guarani lemmas that correspond to the concepts defined in WordNet. We do this through three strategie...
This work presents a parallel corpus of Guarani-Spanish text aligned at sentence level. The corpus contains about 30,000 sentence pairs, and is structured as a collection of subsets from different sources, further split into training, development and test sets. A sample of sentences from the test set was manually annotated by native speakers in ord...
This work presents a parallel corpus of Guarani-Spanish text aligned at sentence level. The corpus contains about 30,000 sentence pairs, and is structured as a collection of subsets from different sources, further split into training, development and test sets. A sample of sentences from the test set was manually annotated by native speakers in ord...
An annotation mini-guidelines, which describes the process followed by the bilingual annotators (Guarani-Spanish) who manually annotated the Guarani-dominant Jopara and Guarani corpus (https://github.com/search?q=user%3Ammaguero+corpus)
Detection of occupations in texts is relevant for a range of important application scenarios, like competitive intelligence, sociodemographic analysis, legal NLP or health-related occupational data mining. Despite the importance and heterogeneous data types that mention occupations, text mining efforts to recognize them have been limited. This is d...
This paper addresses the problem of sentiment analysis for Jopara, a code-switching language between Guarani and Spanish. We first collect a corpus of Guarani-dominant tweets and discuss on the difficulties of finding quality data for even relatively easy-to-annotate tasks, such as sentiment analysis. Then, we train a set of neural models, includin...
Twenty-four studies on twenty-three distinct languages and eleven social media illustrate the steady interest in deep learning approaches for multilingual sentiment analysis of social media. We improve over previous reviews with wider coverage from 2017 to 2020 as well as a study focused on the underlying ideas and commonalities behind the differen...
In this work, we apply topic modeling to study what users have been discussing in Twitter during the beginning of the COVID-19 pandemic. More particularly , we explore the period of time that includes three differentiated phases of the COVID-19 crisis in Spain: the pre-crisis time, the outbreak, and the beginning of the lockdown. To do so, we first...
Mini-evaluation framework and mini-guides. See more details in (section 4.4, p. 186, http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6333).
An annotation mini-guidelines, which describes the process followed by the bilingual annotators (Guarani-Spanish) who manually annotated the Guarani-dominant Jopara and Guarani corpus (https://github.com/mmaguero/josa-corpus)
SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/.
The ProfNER Shared Task encourages its participants to detect occupations and employment situations in Spanish tweets related to the COVID-19 situation. These guidelines describe the process followed by the clinical and linguist experts who manually annotated...
Elementos clave:
• SLA
• Peritaje de vehículos
• Aprendizaje automático
The tourism industry has been promoting its products and services based on the reviews that people often write on travel websites like TripAdvisor.com, Booking.com and other platforms like these. These reviews have a profound effect on the decision making process when evaluating which places to visit, such as which restaurants to book, etc.
In thi...
GASTRO-MINER: Una Herramienta Basada en la Nube para el Análisis de Sentimientos en Opiniones sobre Restaurantes en TripAdvisor: Caso de Estudio sobre Restaurantes de la Provincia de Granada.
Defensa del Trabajo de Fin de Máster Universitario en Ingeniería Informática de la Universidad de Granada.
Trabajo Final de Máster: "Gastro-miner: Una Herramienta Basada en la Nube para el Análisis de Sentimientos en Opiniones sobre Restaurantes en TripAdvisor: Caso de Estudio sobre Restaurantes de la Provincia de Granada"
[Python Stack (Django, N...
Resumen
----
La industria del turismo ha estado promoviendo sus productos y servicios basados en las revisiones
que las personas a menudo escriben en los sitios web de viajes como TripAdvisor.com. Estas
revisiones tienen un efecto profundo en el proceso de toma de decisiones cuando se evalúan qué
lugares visitar, como en cuáles restaurantes reserv...
La investigacion en este trabajo pretende exponer una perspectiva
clara del modo en que una Base de Datos Federada puede ser implementada
y los conjuntos de tecnicas disponibles para este proposito. Un Sistema de Base
de Datos Federada es una eleccion valida, si se necesita formular consultas simples
(unicas), y recibir respuestas simples (unicas),...
La creciente necesidad de cooperación entre entidades independientes requiere el acceso integrado a múltiples bases de datos autónomas y heterogéneas, es decir, acceder a los datos como si de una sola fuente de datos se tratase. Esta colección de bases de datos cooperativas, conocidas como Sistemas de Bases de Datos (SBD) componentes, forman una fe...