
Armando Suárez Cueto- PhD Computer Science
- Faculty Member at University of Alicante
Armando Suárez Cueto
- PhD Computer Science
- Faculty Member at University of Alicante
About
84
Publications
17,189
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
437
Citations
Introduction
Current institution
Publications
Publications (84)
Researchers have relegated natural language processing tasks to Transformer-type models, particularly generative models, because these models exhibit high versatility when performing generation and classification tasks. As the size of these models increases, they achieve outstanding results. Given their widespread use, many explainability technique...
Generative Artificial Intelligence has grown exponentially as a result of Large Language Models (LLMs). This has been possible because of the impressive performance of deep learning methods created within the field of Natural Language Processing (NLP) and its subfield Natural Language Generation (NLG), which is the focus of this paper. Within the g...
Large language models have shown impressive performance in Natural Language Processing tasks, but their black box characteristics render the explain-ability of the model's decision difficult to achieve and the integration of semantic knowledge. There has been a growing interest in combining external knowledge sources with language models to address...
We present a data-driven approach to discover and extract patterns in textual genres with the aim of identifying whether there is an interesting variation of linguistic features among different narrative genres depending on their respective communicative purposes. We want to achieve this goal by performing a multilevel discourse analysis according...
The analysis of discourse and the study of what characterizes it in terms of communicative objectives is essential to most tasks of Natural Language Processing. Consequently, research on textual genres as expressions of such objectives presents an opportunity to enhance both automatic techniques and resources. To conduct an investigation of this ki...
RESUMEN. La investigación presentada surge en el marco del proyecto nacional Plataforma para la gestión y difusión de contenidos abiertos mediante el uso de MOOC (Massive Open Online Course, o en castellano: Cursos en Línea Masivos y Abiertos (COMA)) en el año 2018. Esto forma parte de la política de la Universidad Agraria de La Habana "Fructuoso R...
As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last Sen-sEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet ha...
In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several dierent meanings. This speciÞc task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word deÞnitions. We present...
Keyphrases are mainly words that capture the main topics of a document. We think that semantic classes can be used as keyphrases
for a text. We have developed a semantic class–based WSD system that can tag the words of a text with their semantic class.
A method is developed to compare the semantic classes of the words of a text with the correct one...
This paper summarizes our participation in task #17 of SemEval–2 (All–words WSD on a specific domain) using a su-pervised class-based Word Sense Disam-biguation system. Basically, we use Sup-port Vector Machines (SVM) as learning algorithm and a set of simple features to build three different models. Each model considers a different training corpus...
As empirically demonstrated by the last SensEval exercises, assigning the appro- priate meaning to words in context has re- sisted all attempts to be successfully ad- dressed. One possible reason could be the use of inappropriate set of meanings. In fact, WordNet has been used as a de-facto standard repository of meanings. How- ever, to our knowled...
The R2D2 systems for the English All-Words and Lexical Sample tasks at SENSEVAL-3 are based on several supervised and unsupervised methods com-bined by means of a voting procedure. Main goal was to take advantage of training data when avail-able, and getting maximum coverage with the help of methods that not need such learning examples. The results...
Back in the 1990s Malcolm Coulthard announced the beginnings of an emerging discipline, forensic linguistics, resulting from the interface of language, crime and the law. Today the courts are more than ever calling on language experts to help in certain types of cases, such as authorship identification, plagiarism, legal interpreting and translatio...
Clasificación de los métodos de aprendizaje automático. Métodos basado en corpus anotados. Aplicación al Procesamiento del Lenguaje Natural. Resolución de la ambigüedad semántica de las palabras con técnicas de aprendizaje automático.
Presentamos un método muy simple para seleccionar conceptos base (Base Level Concepts) usando algunas propiedades estructurales básicas de WordNet. Demostramos empíricamente que el conjunto de Base Level Concepts obtenido agrupa sentidos de palabras en un nivel de abstracción adecuado para la desambiguación del sentido de las palabras basada en cla...
The increasing flow of digital information requires the extraction, filtering and classification of pertinent information from large volumes of texts. All these tasks greatly benefit from involving a Named Entity Recognizer (NER) in the preprocessing stage. This paper proposes a completely automatic NER system. The NER task involves not only the id...
The increasing flow of digital information requires the extraction , filtering and classification of pertinent information from large volumes of texts. An important preprocessing tool of these tasks consists of name entities recognition, which corresponds to a Name Entity Recognition (NER) task. In this paper we propose a completely automatic NER w...
We present a corpus-based supervised lear-ning system for coarse-grained sense disam-biguation. In addition to usual features for training in word sense disambiguation, our system also uses Base Level Concepts au-tomatically obtained from WordNet. Base Level Concepts are some synsets that gene-ralize a hyponymy sub–hierarchy, and pro-vides an extra...
Apuntes de la asignatura Bases de Datos 1.
In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such...
In this paper we discuss to what extent the choice of one particular Part-of-Speech (PoS) tagger determines the results obtained
by a word sense disambiguation (WSD) system. We have chosen several PoS taggers and two WSD methods. By combining them, and
using different kind of information, several experiments have been carried out. The WSD systems h...
We present a very simple method for selecting Base Level Concepts using basic structural prop-erties of WordNet. We also empirically demon-strate that these automatically derived set of Base Level Concepts group senses into an ad-equate level of abstraction in order to perform class-based Word Sense Disambiguation. In fact a very naive Most Frequen...
Tercera parte del tema Modelo Relacional.
Parte primera del tema Álgebra relacional.
Normalización y dependencias funcionales, primera parte.
Ejercicios adicionales de las sesiones.
Aspectos básicos de un SGBD.
Introducción al modelo E-R.
Exámenes y soluciones
Question Classification (QC) is usually the first stage in a Question Answering system. This paper presents a multilingual
SVM-based question classification system aiming to be language and domain independent. For this purpose, we use only surface
text features. The system has been tested on the TREC QA track questions set obtaining encouraging res...
The increasing flow of digital information requires the extraction, filtering and classification of pertinent information
from large volumes of texts. An important preprocessing tool of these tasks consists of name entities recognition, which corresponds
to a Name Entity Recognition (NER) task. In this paper we propose a completely automatic NER wh...
In this paper we concentrate on the resolution of the lexical ambiguity that arises when a given word has several different meanings. This specific task is commonly referred to as word sense disambiguation (WSD). The task of WSD consists of assigning the correct sense to words using an electronic dictionary as the source of word definitions. We pre...
This paper presents a multilingual approach to Question Classification based on machine learning, using language independent features. This way we obtain a system flexible and easily adaptable to new languages. Using a parallel corpus in English and Spanish, we test the performance of the system with three different techniques: Support Vector Machi...
Gran parte de las tareas asociadas al procesamiento del lenguaje natural (PLN) representan problemas de clasificación. En estos tipos de tareas clasificar consiste en identificar el tipo de una entidad y determinar la clase a la que pertenece. De esta forma un sistema de detección de entidades se puede definir como en clasificador de aquellas palab...
PhD Thesis in Computer Science written by Armando Suárez Cueto under the supervision of Dr. Manuel Palomar Sanz, (Univ. of Alicante), and German Rigau Claramunt (Univ. of Basque Country). The author was examined in June 28th, 2004 by the commitee formed by Dr. Lluis Padró Cirera (Politechnic University of Cataluña) , Andrés Montoyo Guijarro (Univ....
Este artículo presenta una aproximación multilingüe a la clasificación de preguntas basada en aprendizaje automático, empleando características de aprendizaje independientes del idioma. Esto va a permitir que el sistema sea flexible y fácilmente adaptable a nuevos idiomas. Sobre un corpus paralelo de preguntas en inglés y castellano, contrastaremos...
In order to achieve high precision Question Answering Systems or Information Retrieval Systems, the incorporation of Natural
Language Processing techniques are needed. For this reason, in this paper a method to determine the semantic role for a constituent
is presented. The goal of this is to integrate the method in a Question Answering System and...
En este artículo se presenta un sistema de reconocimiento y desambiguación de las entidades con nombre que aparecen en textos en español. El sistema tiene dos etapas: en primer lugar se realiza la identificación en el texto de las entidades para posteriormente realizar la desambiguación de la entidad. La desambiguaci ´ on de la entidad se realiza u...
In order to achieve high precision Question Answering Systems or Information Retrieval Systems, the incorporation of Natural
Language Processing techniques are needed. For this reason, in this paper a method that can be integrated in these kinds of
systems, is presented. The aim of this method, based on maximum entropy conditional probability model...
In this paper, a supervised learning method of semantic role labeling is presented. It is based on maximum entropy conditional probability models. This method acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of
features. Several types of features have been analyzed for a few words selected fro...
Introduction The "Meaning" system has been developed within the framework of the Meaning European research project . It is a combined system, which integrates several supervised machine learning word sense disambiguation modules, and several knowledge-- based (unsupervised) modules. See section 2 for details. The supervised modules have been traine...
Entidad financiera: MCyT (Proyecto PROFIT: FIT-150500-2002-411).
In this paper, a supervised learning system of word sense dis- ambiguation is presented. It is based on conditional maximum entropy models. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. The sys- tem were evaluated both using WordNet's senses and domains as the sets...
In this paper, a supervised learning method of word sense disambiguation based on maximum entropy conditional probability models is presented. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in
the form of features. Several types of features has been analyzed for a few words selected from the...
This paper presents a method to combine two unsupervised methods (Specification Marks, Conceptual Density) and one supervised
(Maximum Entropy) for the automatic resolution of lexical ambiguity of nouns in English texts. The main objective is to improved
the accuracy of knowledge-based methods with statistical information supplied by the corpus-bas...
Supervised learning on a corpus-based Word Sense Disambiguation (WSD) system uses a previously classified set of linguistic
contexts. In order to perform the training of the system, it is usual to define a set of functions that inform of any linguistic
feature in each example. It is usual to look for the same kind of information for each word too,...
In this paper, a supervised learning system of word sense disambiguation is presented. It is based on conditional maximum entropy models. This system acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of features. Several types of features have been analyzed using the SENSEVAL-2 data for the Spa...
In this paper, an evaluation of several feature selections for word sense disambiguation is presented. The method used to
classify linguistic contexts in its correct sense is based on maximum entropy probability models. In order to study their
relevance for each word, several types of features have been analyzed for a few words selected from the DS...
En este artículo se presenta un sistema de aprendizaje supervisado para la desambiguación del sentido de las palabras. Dicho sistema se basa en los modelos de probabilidad condicional de máxima entropía. El conocimiento lingüístico se adquiere a partir de un corpus anotado y se representa en forma de atributos (features). Se han estudiado varios ti...
This paper describes a Natural Lan- guage Learning method that extracts knowledge in the form of semantic pat- terns with ontology elements associated to syntactic components in the text. The method combines the use of EuroWord- Net's ontological concepts and the cor- rect sense of each word assigned by a Word Sense Disambiguation(WSD) module to ex...
The WSD system presented at Senseval-2 uses a knowledge-based method for noun disambiguation and a corpus-based method for verbs and adjectives. The methods are, respectively, Specification Marks and Maximum Entropy probability models. So, we can say that this is a hybrid system which joins an unsupervised method with a supervised method. The whole...
This paper describes a Natural Language Learning method that extracts knowledge in the form of semantic patterns with ontology elements associated to syntactic components in the text. The method combines the use of EuroWordNet 's ontological concepts and the correct sense of each word assigned by a Word Sense Disambiguation(WSD) module to extract t...
In this paper we present a whole Natural Language Process- ing (NLP) system for Spanish. The core of this system is the parser,
which uses the grammatical formalism Lexical-Functional Grammars (LFG). The system uses the Specification Marks Method in
order to resolve the lexical ambiguity. Another important component of this system is the anaphora r...
En este artículo se presentan dos métodos que resuelven la ambigüedad léxica, se realiza una comparación entre ellos y se demuestra que una adecuada cooperación mejoraría los resultados obtenidos. Este artículo presenta un estudio comparativo sobre dos métodos que resuelven la ambigüedad léxica de nombres en textos escritos en inglés. Los métodos u...
In this paper we present a whole Natural Language Processing (NLP) system for Spanish. The core of this system is the parser, which uses the grammatical formalism Lexical-Functional Grammars (LFG). Another important component of this system is the anaphora resolution module. To solve the anaphora, this module contains a method based on linguistic i...
The problem with using extensive lexical, syntactic and semantic
resources is the large quantity of information that is provided, as well
as all the different senses that each word has. To contribute to the
resolution of this problem, this paper presents a tool for knowledge
acquisition from WordNet, restricting and limiting the learning of word
se...
RESUMEN. En este trabajo se presenta el sistema EXIT, una propuesta de un sistema de extracción de información capaz de obtener información estructurada a partir del contenido de escrituras de compra-venta. Este sistema tiene como objetivo el rellenar unas plantillas que son el reflejo del esquema lógico de una base de datos relacional, en la cuál...
The inference methods which are proposed in Syntactic Pattern Recognition in practice only make use of positive data and generate a heuristic generalization of strings in the data. However, the use of positive data becomes insufficient when very discriminatory models are needed. This is the case of Difficult Vocabularies in Isolated Word Recognitio...
In this paper we present an automatic mechanism for bilingual (Spanish-English) alignment of anaphoric expressions. For this purpose, two anaphora resolution systems were used. Both are based on linguistic preferences and constraints, for Spanish (SUPPAR) and for English (MARS). These systems have been independently developed and each of them is pr...
The DLSI-UA team is currently working on sev-eral word sense disambiguation approaches, both supervised and unsupervised. These approaches are based on different ways to use both annotated and unannotated data, and several resources generated from or exploiting WordNet (Miller et al., 1993), WordNet Domains, EuroWordNet (EWN) and addi-tional corpor...
This paper presents re-training, a bootstrapping algorithm that automatically acquires semantically annotated data, ensuring high levels of precision. This algorithm uses a corpus-based system of word sense disambiguation that relies on maximum entropy probability models. The re-training method consists of the iterative feeding of training-classifi...
La generación de recursos dentro de un grupo de investigación se ve fuertemente influenciada por la movilidad del personal eventual, por la propia evolución profesional del personal fijo, así como por la constante revisión de las técnicas y materiales necesarios. El resultado es un conjunto de herramientas y datos poco cohesionados que suponen un e...
In this paper, the QALL-ME project, related to the Information Systems Technologies, is introduced. The project is 36 months long, it is founded by the European Union and it will carry out by 7 institutions. The main goal is to establish a shared infrastructure for multilingual and multimodal open domain Question Answering for mobile phones. Taking...
Cuarta parte del tema Modelo Relacional.
Última parte del tema Modelo Relacional.
Última parte del tema Álgebra relacional.
Normalización y dependencias funcionales, última parte.
En este trabajo se presenta una propuesta para incorporar información semántica en el proceso de análisis sintáctico parcial. La propuesta está basada en el método IRSAS para textos restringidos y extiende su acción a textos no restringidos con el uso de un recurso léxico de propósito general (WordNet). Tras la generación de una serie de patrones s...
Primera parte del modelo relacional.
Segunda parte del modelo relacional.
Libro de sesiones prácticas de la asignatura Bases de Datos 1.
Los lenguajes relacionales basados en el Cálculo de Predicados de Primer Orden.
Ejercicios adicionales de las sesiones.
Sesión de presentación de contenidos de la asignatura.
Conceptos generales sobre modelos de datos.
Presentación de la asignatura: temario teórico, práctico, metodología, evaluación.