About
51
Publications
7,392
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,040
Citations
Publications
Publications (51)
Machine translation (MT) is directly linked to its evaluation in order to both compare different MT system outputs and analyse system errors so that they can be addressed and corrected. As a consequence, MT evaluation has become increasingly important and popular in the last decade, leading to the development of MT evaluation metrics aiming at auto...
This paper describes a practical demo of VERTa for Spanish. VERTa is an MT evaluation metric that combines linguistic features at different levels. VERTa has been developed for English and Spanish but can be easily adapted to other languages. VERTa can be used to evaluate adequacy, fluency and ranking of sentences. In this paper, VERTa's modules ar...
Named entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and hence need to be trained on (manually) annotated corpora. However, annotated corpora hardly exist...
In this paper we present VERTa, a linguistically-motivated metric that combines linguistic features at different levels. We provide the linguistic motivation on which the metric is based, as well as describe the different modules in VERTa and how they are combined. Finally, we describe the two versions of VERTa, VERTa-EQ and VERTa-W, sent to WMT14...
This paper describes the system implemented
by Fundaci´o Barcelona Media (FBM) for classifying
the polarity of opinion expressions in
tweets and SMSs, and which is supported by
a UIMA pipeline for rich linguistic and sentiment
annotations. FBM participated in the SEMEVAL 2013 Task 2 on polarity classification. It ranked 5th in Task A (constrained t...
This paper describes FBM-Yahoo!'s participation in the profiling task of RepLab 2012, which aims at determining whether a given tweet is related to a specific company and, in if this being the case, whether it contains a positive or negative statement related to the company's reputation or not. We addressed both problems (ambiguity and polarity rep...
Resumen: Este artículo presenta WikiNer, una versión de la Wikipedia catalana procesada mediante diferentes herramientas de PLN (etiquetadores de POS, NERC, parsers de dependencias). El artículo se centra en el análisis de las diferentes anotaciones de NERC realizadas con 3 etiquetadores: JNET, Yamcha y SST. A pesar de que el texto de la Wikipedia...
Natural language technologies have long been envisioned to play a crucial role in developing a Semantic Web. Textual content's significance on the Web has increased with the rise of Web 2.0 and mass participation in content generation. Yet, natural language technologies face great challenges in dealing with Web content's heterogeneity: key among th...
This paper presents an extension to perform Word Sense Disambiguation of an integrated ar-chitecture designed for Semantic Parsing. In the proposed collaborative framework, both tasks are addressed simultaneously. The feasibility and robustness of the proposed architecture for Semantic Parsing have been tested against a well-defined task on Word Se...
This paper presents the complete and consistent annotation of the nominal part of the EuroWordNet (EWN). The annotation has been carried out using the semantic features defined in the EWN Top Concept Ontology. Up to now only an initial core set of 1024 synsets, the so-called Base Concepts, were ontologized in such a way.
This paper presents the complete and consistent ont ological annotation of the nominal part of WordNet. The annotation has been carried out using the semantic features defined in the EuroWordNet Top Concept Ontology and made available to the NLP community. Up to now only an initial core set of 1,024 synsets , the so-called Base Concepts, was ontolo...
This paper describes SW1, the first version of a semantically annotated snapshot of the English Wikipedia. In recent years Wikipedia has become a valuable resource for both the Natural Language Processing (NLP) community and the Information Retrieval (IR) community. Although NLP technology for processing Wikipedia already exists, not all researcher...
In strategic management there has been a debate over many years. Already in 1962 Alfred Chandler had stated: Structure follows Strategy. In the nineteen eighties, Michael Porter modified Chandler's dictum about structure following strategy by introducing ...
We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific. We discuss two approaches for this problem: i) exploiting the entity containment graph and ii) using a Web search engine to compute entity relevance. We evaluate these appr...
This paper describes version 1.3 of the FreeLing suite of NLP tools. FreeLing was first released in February 2004 providing morpholog-ical analysis and PoS tagging for Catalan, Spanish, and English. From then on, the package has been improved and enlarged to cover more languages (i.e. Italian and Galician) and offer more services: Named entity reco...
En este artículo presentamos una extensión de una arquitectura integrada diseñada originalmente para Semantic Parsing a WSD. El marco propuesto permitirá que ambas tareas puedan colaborar y llevarse a cabo simultáneamente. Se ha probado la validez y robustez de esta arquitectura contra una tarea de WSD bien definida (el SENSEVAL-II English Lexical...
This paper presents the work carried out towards the so-called shallow ontologization of WordNet, which is argued to be a way to overcome most of the many structural problems of the widely used lexical knowledge base. The result shall be a multilingual resource more suitable for large-scale semantic processing En este artículo se presenta el trabaj...
This paper studies the impact of multiword expressions on Word Sense Disambiguation (WSD). Several identification strategies
of the multiwords in WordNet2.0 are tested in a real Senseval-3 task: the disambiguation of WordNet glosses. Although we have
focused on Word Sense Disambiguation, the same techniques could be applied in more complex tasks, s...
In this demo we present the first version of Txala, a dependency parser for Spanish developed under LGPL license. This parser is framed in the development of a free-software platform for Machine Translation. Due to the lack of this kind of syntactic parsers for Spanish, this tool is essential for the development of NLP in Spanish Esta demostración...
This paper describes the first version of the Multilingual Central Repository, a lexical knowledge base developed in the framework of the MEANING project.
This paper describes the new Spanish Wordnet aligned to Princeton WordNet1.6 and the analysis of the transformation from the previous version aligned to Princeton WordNet1.5. Although a mapping technology exists, to our knowledge it is the first time a whole local wordnet has been ported to a newer release of the Princeton WordNet.
Abstract This paper presents a semantic-driven methodology,for the automatic acquisition of verbal models. Our approach,relies strongly on the semantic generalizations allowed by already existing resources (e.g. Domain labels, Named Entity categories, concepts in the SUMO ontology, etc). Several experiments have been carried out using comparable co...
A pesar del progreso que se realiza en el Procesamiento del Lenguaje Natural (PLN) aún estamos lejos de la Comprensión del Lenguaje Natural. Un paso importante hacia este objetivo es el desarrollo de técnicas y recursos que traten conceptos en lugar de palabras. Sin embargo, si queremos construir la próxima generación de sistemas inteligentes que t...
Este artículo contiene una breve descripción de los principales componentes software del Multilingual Central Repository de MEANING y su contenido inicial. This paper provides a brief description of the main software components of the Multilingual Central Repository of MEANING and their initial content. This research has been partially funded by th...
This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone, it is our belief that full-fledged lexical ambiguity resolution should combine several information...
This work explores a new robust approach for Semantic Parsing of unrestricted texts. Our approach considers Semantic Parsing as a Consistent Labelling Problem (CLP), allowing the integration of several knowledge types (syntactic and semantic) obtained from different sources (linguistic and statistic). The current implementation obtains 95% accuracy...
This paper presents a semantic parsing approach for unrestricted texts. Semantic parsing is one of the major bottlenecks of Natural Language Understanding (NLU) systems and usually requires building expensive resources not easily portable to other domains. Our approach obtains a case-role analysis, in which the semantic roles of the verb are identi...
The aim of this work is to explore new methodologies on Semantic Parsing for unrestricted texts. Our approach follows the current trends in Information Extraction (IE) and is based on the application of a verbal subcategorization lexicon (LEXPIR) by means of complex pattern recognition techniques. LEXPIR is framed on the theoretical model of the ve...
This paper has presented a semantic parsing approach for non domainspecific texts. Our approach is based on the application of a verbal subcategorization lexicon (LEXPIR) developed in the Pirapides project.
This work combines a set of available techniques – whichcould be further extended – to perform noun sense disambiguation. We use several unsupervised techniques (Rigau et al., 1997) that draw knowledge from a variety of sources. In addition, we also apply a supervised technique in order to show that supervised and unsupervised methods can be combin...
This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone, it is our belief that full-fledged lexical ambiguity resolution should combine several information...
This on--line demonstration is about an environment for massive processing of unrestricted Spanish text. The system consists of three stages: morphological analysis, POS disambiguation and parsing. The output of each can be pipelined into the next. The first two phases are described in (Carmona et al., 1998) and the third is described in (Atserias...
: This document describes the structure, functional capabilities as well as the user manual of TACAT (TAgged Corpus Analyser Tool), a parser and syntactic editor for tagged text. TACAT has been developed as part of ITEM (TIC96-1243-C03-02) project for providing multilingual background for information extraction and information retrieval tasks. The...
This paper explores the automatic construction of a multilingual Lexical Knowledge Base from preexisting lexical resources. First, a set of automatic and complementary techniques for linking Spanish words collected from monolingual and bilingual MRDs to English WordNet synsets are described. Second, we show how resulting data provided by each metho...
This paper presents a method to combine a set of unsupervised algorithms
that can accurately disambiguate word senses in a large, completely
untagged corpus. Although most of the techniques for word sense
resolution have been presented as stand-alone, it is our belief that
full-fledged lexical ambiguity resolution should combine several
information...
This paper describes the initial research steps towards the Top Ontology for the Multilingual Central Repository (MCR) built in the MEANING project. The current version of the MCR integrates five local wordnets plus four versions of Prince-ton's English WordNet, three ontologies and hundreds of thousands of new semantic relations and properties aut...
A current research line for word sense disambiguation (WSD) focuses on the use of supervised machine learning techniques. One of the drawbacks of using such techniques is that previously sense annotated data is required. This paper presents ExRetriever, a new software tool for automatically acquiring large sets of sense tagged examples from large c...
This document describes the structure, functional capabilities as well as the user manual of TACAT (TAgged Corpus Analyser Tool), a parser and syntactic editor for tagged text. TACAT has been developed as part of ITEM (TIC96-1243-C03-02) project for providing multilingual background for information extraction and information retrieval tasks. The ma...
In the last decades, a wide range of automatic metrics that use linguistic knowledge has been developed. Some of them are based on lexical information, such as METEOR; others rely on the use of syntax, either using constituent or dependency analysis; and others use semantic information, such as Named Entities and semantic roles. All these metrics w...
1 ExRetriever ExRetriever characterises each sense of a word as a specific query. This is automatically done by using a particular query construction strat-egy, which is defined a priory by an expert. Each different strategy can take into account the information related to words and available into a lexical knowledge base in order to auto-matically...
Despite the great interest in dierent forms of textual an-notation (named entity extraction, semantic tagging, syntactic and se-mantic parsing, etc.), there is still no consensus about which search tasks can be improved with such annotations, and what search algorithms are required to implement ecient engines to solve these tasks. We dene formally...
This article presents the problem of diacritic restoration (or diacritization) in the context of spell-checking, with the focus on an ortho-graphically rich language such as Spanish. We argue that despite the large volume of work published on the topic of diacritization, currently available spell-checking tools have still not found a proper solutio...
We investigate the posibility to obtain large-scale semantic patterns for any language based only on shallow parsing and some basic semantic generalizations. Being this a exploratory experiment we performed only a qualitative evaluation. We compared several semantic patterns coming from translation equivalent verbs selected from different languages...
This paper describes the initial design of the Multilingual Central Repository. The first version of the MCR integrates into the same EuroWordNet framework, five local wordnets (including three versions of the English WordNet from Princeton), the EuroWordNet Top Ontology, MultiWordNet Domains, and hundreds of thousand of new semantic relations and...