
Borja Navarro-Colorado- PhD
- Professor (Associate) at University of Alicante
Borja Navarro-Colorado
- PhD
- Professor (Associate) at University of Alicante
About
78
Publications
16,032
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
960
Citations
Introduction
Current institution
Additional affiliations
February 2003 - present
Publications
Publications (78)
Generative Artificial Intelligence has grown exponentially as a result of Large Language Models (LLMs). This has been possible because of the impressive performance of deep learning methods created within the field of Natural Language Processing (NLP) and its subfield Natural Language Generation (NLG), which is the focus of this paper. Within the g...
The last decade has witnessed a remarkable advancement in the field of Artificial Intelligence, particularly in Natural Language Processing (NLP) and Natural Language Generation (NLG), which has been made possible by the development and enhancement of Deep Learning techniques and Large Language Models (LLMs). Nowadays, ChatGPT is one of the most wi...
This paper presents a new approach to retrieve and further integrate tabular datasets (collections of rows and columns) using union and join operations. In this work, both processes were carried out using a similarity measure based on contextual word embeddings, which allows finding semantically similar tables and overcome the recall problem of lex...
En este trabajo se plantea la necesidad de combinar el análisis llamado «distante» (análisis panorámico de gran cantidad de texto literario) con el análisis profundo (análisis en detalle de diferentes aspectos lingüísticos o literarios). Para ello se propone la creación de amplios corpus literarios de referencia en los que, aprovechando los actuale...
In order to analyze metrical and semantics aspects of poetry in Spanish with computational techniques, we have developed a large corpus annotated with metrical information. In this paper we will present and discuss the development of this corpus: the formal representation of metrical patterns, the semi-automatic annotation process based on a new au...
Educational Data Mining (EDM) is a research field that focuses on the application of data mining, machine learning, and statistical methods to detect patterns in large collections of educational data. Different machine learning techniques have been applied in this field over the years, but it has been recently that Deep Learning has gained increasi...
A new approach to narrative abstractive summarization (NATSUM) is presented in this paper. NATSUM is centered on generating a narrative chronologically ordered summary about a target entity from several news documents related to the same topic. To achieve this, first, our system creates a cross-document timeline where a time point contains all the...
This paper describes the DISCO corpus and how it complements available digital materials for poetry in Spanish in several respects: First, the author and period range. Second, metadata concerning the authors and their works expressed in TEI-RDFa, given the importance of interoperability between literary datasets and the advantages of Linked Open Da...
This paper analyzes the application of LDA topic modeling to a corpus of poetry. First, it explains how the most coherent LDA-topics have been established by running several tests and automatically evaluating the coherence of the resulting LDA-topics. Results show, on one hand, that when dealing with a corpus of poetry, lemmatization is not advisab...
En esta comunicación se discuten problemas específicos de la anotación de
un corpus de sonetos del Siglo de Oro con información métrica. Tras presentar
el marco de desarrollo del corpus (proyecto ADSO), los datos generales de
este y el proceso de anotación, se expondrán tanto los principales problemas
métricos como las soluciones adoptadas y reflej...
In this paper a search service developed for the exploitation of a TEI-based Spanish poetry corpus is presented. Besides a textual retrieval, the search service takes advantage of the metrical annotation to retrieve verses and poems with specific rhythms. The Spanish Golden-Age corpus compiles 5078 sonnets written during the 16th and 17th centuries...
In this article an automatic scansion model for fixed-metre Spanish poetry is presented. It is a hybrid model that combines hand-made rules with probabilistic information. Through the set of rules, the model is able to extract the syllabic structure of each word, to classify them as stressed or unstressed and to resolve metrical phenomena such as s...
This paper focuses on the contribution of temporal relations inference and distributional semantic models to the event ordering task. Our system automatically builds ordered timelines of events from different written texts in English by performing first temporal clustering and then semantic clustering. In order to determine temporal compatibility,...
In this paper we present a system that automatically builds ordered timelines of events from different written texts in English. The system deals with problems such as automatic event extraction, cross-document temporal relation extraction and cross-document event coreference resolution. Its main characteristic is the application of three different...
En este trabajo se desarrolla un análisis de los principales tipos de endecasílabos utilizados en los sonetos del Siglo de Oro. Como novedad, aplicamos un método de análisis macro o distante, mediante el análisis computacional de un corpus de más de setenta mil (70.000) versos. A partir de un modelo formal de patrón métrico, analizamos los tipos de...
This article presents a method for recommending scientific articles taking into consideration their degree of generality or specificity. This approach is based on the idea that less expert people in a specific topic prefer to read more general articles to be introduced into it, while people with more expertise prefer to read more specific articles....
Building unified timelines from a collection of written news articles
requires cross-document event coreference resolution and temporal relation
extraction. In this paper we present an approach event coreference resolution
according to: a) similar temporal information, and b) similar semantic
arguments. Temporal information is detected using an aut...
Building unified timelines from a collection of written news articles requires cross-document event coreference resolution and temporal relation extraction. In this paper we present an approach event coreference resolution according to: a) similar temporal information, and b) similar semantic arguments. Temporal information is detected using an aut...
Several computational linguistics techniques are applied to analyze a large corpus of Span-ish sonnets from the 16th and 17th centuries. The analysis is focused on metrical and semantic aspects. First, we are developing a hybrid scansion system in order to extract and analyze rhythmical or metrical patterns. The possible metrical patterns of each v...
This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the perform...
Nowadays, the automatic processing of digitalized documents is crucial to cope with the increasing amount of information available. This issue is addressed from the natural language processing (NLP) research field. One of the tasks required for many NLP applications is temporal information processing. It involves the automatic extraction and interp...
This paper addresses the automatic recognition of temporal expressions and events in Chinese. For this language, these tasks
are still in a exploratory stage and high-performance approaches are needed. Recently, in TempEval-2 evaluation exercise,
corpora annotated in TimeML were released for different languages including Chinese. However, no system...
We present an analysis of morpho-lexical features to learn SVM models for recognizing TimeML time and event expressions. We
evaluate over the TempEval-2 data, the features: word, lemma, and PoS in isolation, in different size static-context windows,
and in a syntax-motivated dynamic-context windows defined in this paper. The results show that word,...
This demonstration presents a novel interactive graphical interface to document content focusing on the time dimension. The
objective of Time-Surfer is to let users search and explore information related to a specific period, event, or event participant
within a document. The system is based on the automatic detection not only of time expressions,...
We present a data-driven approach for recognizing and classifying TimeML events in Italian. A high-performance stateof-the-art approach, TIPSem, is adopted and extended with Italian-specific semantic features from a lexical resource. The resulting approach has been evaluated over the official TempEval2 Italian test data. The analysis of the results...
royecto emergente centrado en la detección e interpretación de metáforas con métodos no supervisados. Se presenta la caracterización del problema metafórico en Procesamiento del Lenguaje Natural, los fundamentos teóricos del proyecto y los primeros resultados.
This paper presents TIPSem, a system to extract temporal information from natural language texts for English and Spanish. TIPSem, learns CRF models from training data. Although the used features include different language analysis levels, the ap-proach is focused on semantic informa-tion. For Spanish, TIPSem achieved the best F1 score in all the ta...
This paper analyzes the contribution of semantic roles to TimeML event recognition and classification. For that purpose, an approach using conditional random fields with a variety of morphosyntactic features plus semantic roles features is developed and evaluated. Our system achieves an F1 of 81.4% in recognition and a 64.2% in classification. We d...
The automatic treatment of temporal elements of natural language has become a very important issue among NLP community. Recently,
TimeML annotation scheme has been adopted as standard for temporal information representation by a large number of researchers.
There are few TimeML resources for languages other than English whereas there exist semantic...
Our objective in this paper is to determine the necessity of Word Sense Disambiguation in Information Retrieval tasks, according
to user behaviour. We estimate and analyse the lexical ambiguity of queries in Cross-Language Image Retrieval (using search
logs from a multilingual search interface for Flickr) and measure its correlation with search eff...
Following TimeML (TIMEX3) specifications, we present a study analyzing to what extent are semantic roles useful in temporal
expression identification task, as well as, a list of the potential applications of this combination. For that purpose, two
approaches of a temporal expression identification system based on semantic roles have been developed:...
Nowadays, the temporal aspects of natural lan-guage are receiving a great research interest. TimeML has been adopted as a standard for tem-poral information annotation by a large number of researchers. Available TimeML resources are very limited in size and in diversity of languages. This paper analyzes a combination of semantic roles and semantic...
In this paper we calculate and analyse the lexical ambiguity of queries in a cross- lingual Image Retrieval (Flickling) and compare it with the results obtained by users. We want to know to what extent the lexical ambiguity of a query inuences the correct localization of an image in a multilingual framework. With this, our nal objective is to deter...
This paper shows the results of adapting a modular domain English QA system (called IBQAS, whose initials correspond to Interchangeable
Blocks Question Answering System) to work with both manual and automatic text transcriptions. This system provides a generic
and modular framework using an approach based on the recognition of named entities as a m...
This paper shows the results of adapting a modular domain English QA system (called IBQAS, whose initials correspond to Interchangeable Blocks Question Answering System) to work with both manual and automatic text transcriptions. This system provides a generic and modular framework using an approach based on the recognition of named entities as a m...
Análisis sintáctico automático. La ambigüedad estructural. Gramáticas libres de contexto. Gramáticas de cláusulas definidas. Algoritmos de análisis.
Ejercicios prácticos de cada tema de la asignatura Ingeniería del Lenguaje Natural: análisis de sistemas reales, procesamiento superficial de textos, tokenización y normalización, análisis de la tarea de resolución de la ambigüedad semántica de las palabras, gramáticas de cláusulas definidas, análisis de chunkers.
In this paper, a method to determine the semantic role for the constituents of a sentence is presented. This method, named SemRol, is a corpus-based approach that uses two different statistical models, conditional Maximum Entropy (ME) Probability Models and the TiMBL program, a Memory-based Learning. It consists of three phases that make use of fea...
This paper presents a headline emotion clas-sification approach based on frequency and co-occurrence information collected from the World Wide Web. The content words of a headline (nouns, verbs, adverbs and adjec-tives) are extracted in order to form different bag of word pairs with the joy, disgust, fear, anger, sadness and surprise emotions. For...
This paper presents a headline emotion classification approach based on frequency and co-occurrence information collected from the World Wide Web. The content words of a headline (nouns, verbs, adverbs and adjectives) are extracted in order to form different bag of word pairs with the joy, disgust, fear, anger, sadness and surprise emotions. For ea...
Análisis léxico superficial de textos. Tokenización y lematización. Procesos de stemming. Generación de léxicos computacionales. Análisis morfológico.
Las categorías gramaticales. El análisis categorial o "PoS tagging". La ambigüedad categorial. Técnicas de resolución.
La interpretación semántica automática. Modelos de representación formal. Principio de composicionalidad. Técnicas de interpretración semántica.
Definición de los Sistema de Búsqueda de Respuestas. Módulos del sistema. Principales competiciones.
In this paper, an approach to semantic disambiguation based on machine learning and semantic classes for Spanish is presented. A critical issue in a corpus-based approach for Word Sense Disambiguation (WSD) is the lack of wide-coverage resources to automatically learn the linguistic information. In particular, all-words sense annotated corpora such...
En este trabajo se presenta un nuevo recurso, diseñado con el objetivo de ser usado en desambiguación semántica de las palabras en español, basado en las relaciones sintagmáticas entre las palabras. Las relaciones sintagmáticas son relaciones entre sentidos dentro de un sintagma o dentro de una oración. En nuestro caso, estas relaciones han sido ex...
The main topic of this paper is the context size needed for an ecient Interactive Cross-language Question Answering system. We compare two approaches: the first one (baseline system) shows user whole passages (maximum context: 10 sentences). The second one (experimental system) shows only a clause (minimum context). As cross-language system, the ma...
La creación de léxicos (verbales) computacionales es larga y costosa. A partir de los corpora creados en el proyecto 3LB se deriva un léxico verbal con información sintáctica y semántica (synsets de EWN). A partir de esta información se establece la correspondencia entre funciones sintácticas y papeles temáticos para cada sentido de cada verbo. El...
It is well known that Information Retrieval Systems based entirely on syntactic contents have serious limitations. In order
to achieve high precision Information Retrieval Systems the incorporation of Natural Language Processing techniques that provide
semantic information is needed. For this reason, in this paper a method to determine the semantic...
Knowledge management (ontologies development, disambiguation of words, semantic web, etc.) must extract knowledge from somewhere. The main source of knowledge are natural language texts, in which humans express how they view and conceptualize the world. However, the automatic extraction of knowledge from texts is not a trivial task. In this paper w...
In this position paper we present the research on verb predicates that we have carried out until now for Catalan, Spanish, and Basque, and we outline the framework of our future research, which is based on the idea that it is necessary to include syntagmatic and statistic information in lexical resources, such as WordNet, in order to use it in task...
The iCLEF 2004 experiment at the University of Alicante has focused on how to assist users when searching the correct answer
in passages written in a language different from the one of the query. The language of the users is Spanish and the language
of the documents/passages English. In order to help users, a first system shows, together with the p...
En este artículo presentamos los resultados del proyecto 3LB, consistente en el desarrollo de tres corpus (para el catalán, el castellano y el euskera) anotados sintáctica y semánticamente. Se exponen los criterios que se han seguido para las diferentes anotaciones, las diferentes herramientas desarrolladas para los distintos etiquetados, así como...
This paper presents the discourse annotation followed in Cast3LB, a Spanish corpus annotated with several information sources (morphological, syntactic, semantic and coreferential) at syntactic, semantic and discourse level. 3LB annotation scheme has been developed for three languages (Spanish, Catalan and Basque). Human annotators have used a set...
This paper presents a novel approach to the development of anaphoric annotation of large corpora based on the use of semantic
information to help the annotation process. The anaphora annotation scheme has been developed from a multilingual point of
view in order to annotate three corpora: one for Catalan, one for Basque and one for Spanish. An anap...
In this paper we present an automatic system for the extraction of syntactic semantic patterns applied to the development of multilingual processing tools. In order to achieve optimum methods for the automatic treatment of more than one language, we propose the use of syntactic semantic patterns. These patterns are formed by a verbal head and the m...
Entidad financiera: MCyT (Proyecto PROFIT: FIT-150500-2002-411).
In this paper we will present the result of the interactive CLEF experiment at the University of Alicante. Our aim was to compare two interactive approaches: one based on passages (presented at the iCLEF 2002 (5)), and a new interactive approach based on syntactic semantic patterns. These patterns are composed by the main verb of a sentence plus it...
In the last few years, there has been a wide development in the research on textual information systems. The goal is to improve these systems in order to allow an easy localization, treatment and access to the information stored in digital format (Digital Databases, Documental Databases, and so on). There are lots of applications focused on informa...
Resumen In this paper we present a proposal of algorithm for definite descrip-tion resolution through the structure of dialogue defining an anaphoric accessibi-lity space in Spanish. This algorithm is based on the theoretical hypothesis that anaphora resolution and the dialogue structure are related. The definite descrip-tion resolution improve if...
El objetivo de este trabajo es presentar un análisis cualitativo y cuantitativo de las discrepancias entre anotadores en el etiquetado sintáctico del corpus Cast3LB. Para ello se ha definido un corpus de prueba de mil oraciones que ha sido etiquetado paralelamente por cinco anotadores. Se han realizado sucesivas evaluaciones de los resultados que h...
In this paper we present an study about the relationship between the definite description resolution and the structure of dialogues defining an anaphoric accessibility space. This relationship allows to reduce the list of candidates in the resolution process.
En este artículo se presenta un estudio sobre el espacio de accesibilidad anafórico que puede extraerse de la estructura generada por la publicación electrónica de documentos a través del formalismo HTML. Esta propuesta se basa en la dependencia existente entre la resolución de la anáfora y la estructura del discurso. Aprovechando las etiquetas de...
En este trabajo presentamos un sistema automático de extracción de reglas sintácticas a partir de un corpus etiquetado con sus categorías gramaticales. Planteamos un sistema de definición de patrones sintácticos sencillo que es capaz de identificar las construcciones sintácticas de sintagmas nominales, sintagmas preposicionales y sintagmas verbales...
Resumen In this paper, the proposal and the method of annotation with semantic roles of 3LB corpus are presented. The semantic roles have been specified bearing in mind the application of the corpus to the development of Question Answering Systems. A semiautomatic method is followed with 3LB-SeRAT tool. En este trabajo se presenta la propuesta y mé...
In this paper we present two Spanish corpora, MiniCors and Cast3LB, semantically tagged according to different annotation criteria and objectives. In order to guarantee the quality of the results, we have established a methodology for the development of these corpora. The resulting resources consist of a semantically tagged corpus according to the...
Presentación de la asignatura: temario teórico, práctico, metodología, evaluación.
Visión general del Procesamiento del Lenguaje Natural. Fases de análisis. La ambigüedad.
El problema de la ambigüedad semántica de las palabras. Principales técnicas de resolución automática.
Aplicaciones de la Ingeniería Lingüística. La sociedad actual y el problema del acceso a la información. Sistemas de recuperación de información. El modelo vectorial. Sistemas de Extracción de Información. Reconocimiento de entidades. El problema de la anáfora.