Antoni Oliver

Antoni Oliver
Universitat Oberta de Catalunya | UOC · Arts and Humanities Studies

Phd. in Linguistics, Slavonic Philology, Telecommunication Engineering, Master in Free Software

About

108
Publications
32,925
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
391
Citations
Introduction
I'm a lecturer at the Universitat Oberta de Catalunya (UOC) in Barcelona. I'm the director of the master degree in Translation and Technologies. I'm teaching subjects related to translation technologies and natural language processing. I hold a PhD in Linguistics, a BA in Slavonic Philology, a BS in Telecommunication Engineering and a MS in Free Software. I'm involved in several research projects related to Computational Linguistics applied to Translation and Language Learning.
Additional affiliations
October 2006 - present
Universitat Oberta de Catalunya
Position
  • Professor (Associate)
Education
September 1999 - July 2004

Publications

Publications (108)
Article
The recent improvements in neural MT (NMT) have driven a shift from statistical MT (SMT) to NMT. However, to assess the usefulness of MT models for post-editing (PE) and have a detailed insight of the output they produce, we need to analyse the most frequent errors and how they affect the task. We present a pilot study of a fine-grained analysis of...
Presentation
Full-text available
Workshop on post-editing research methods and concepts at IATIS 2021 Congress.
Article
Full-text available
In this paper we describe the building, manual annotation and analysis of a balanced corpus to assess conceptual metaphors on mental illness as used in Spanish blogger writing by patients and mental health professionals. The corpus was structured as eight subgroups: four patient subgroups (composed of persons who declared having been diagnosed with...
Article
Full-text available
Actualmente, la posedición de traducción automática (TA) se considera una práctica habitual en el flujo de trabajo de traducción, sobre todo por la buena calidad que se obtiene con la traducción automática neuronal (TAN). Este hecho está asociado a los esfuerzos que han hecho los proveedores de servicios lingüísticos y los clientes para reducir los...
Preprint
In this chapter we build a machine translation (MT) system tailored to the literary domain, specifically to novels, based on the state-of-the-art architecture in neural MT (NMT), the Transformer (Vaswani et al., 2017), for the translation direction English-to-Catalan. Subsequently, we assess to what extent such a system can be useful by evaluating...
Conference Paper
Full-text available
INMIGRA3 is a three-year project that builds on the work of two previous initiatives: INMIGRA2-CM 1 and CRISIS-MT 2. Together, they address the specific needs of NGOs in multilingual settings with a particular interest in migratory contexts. Work on INMIGRA3 concentrates in the analysis of how to use NMT for the purposes of translating NGOs documen...
Conference Paper
Full-text available
In this paper we present a novel resource-inexpensive architecture for metaphor detection based on a residual bidirectional long short-term memory and conditional random fields. Current approaches on this task rely on deep neural networks to identify metaphorical words, using additional linguistic features or word embeddings. We evaluate our propos...
Conference Paper
Full-text available
The recent improvements in machine translation (MT) have boosted the use of post-editing (PE) in the translation industry. A new MT paradigm, neural MT (NMT), is displacing its corpus-based predecessor , statistical machine translation (SMT), in the translation workflows currently implemented because it usually increases the fluency and accuracy of...
Conference Paper
Full-text available
There is currently an extended use of post-editing of machine translation (PEMT) in the translation industry. This is due to the increase in the demand of translation and to the significant improvements in quality achieved in recent years. PEMT has been included as part of the translation work-flow because it increases translators' productivity and...
Article
Full-text available
In this paper we propose a neural network approach to detect the metaphoricity of Adjective-Noun pairs using pre-trained word embeddings and word similarity using dot product. We found that metaphorical word pairs tend to have a lower dot product score while literal pairs a higher score. On this basis, we compared seven optimizers and two activatio...
Poster
Full-text available
Catalan and Spanish are closely-related languages derived from Latin. Rule-based and statistical-based systems yield good results in MT. Post-editing of machine translation (PEMT) has been a regular practice for these languages because it increases productivity and reduces costs. In recent years, neural MT has gained popularity because of the good...
Conference Paper
Full-text available
In the last years, we have witnessed an increase in the use of post-editing of machine translation (PEMT) in the translation industry. It has been included as part of the translation workflow because it increases productivity of translators. Currently , many Language Service Providers offer PEMT as a service. For many years now, (closely) related l...
Article
Full-text available
The more language service companies (LSCs) include machine translation post-editing (MTPE) in their workflows, the more important it is to know how the PE task is performed, who the post-editors are, and what skills they should have. This research is designed to address such questions. It aims to deepen our knowledge of current practices to later c...
Article
Linguistic resources available in the form of open data are an essential source of information for creating e-dictionaries, but access to these linguistic resources is still limited. This paper presents a method for maximising use of open access linguistic resources and integrating them into specialised e-dictionaries. The method combines automatic...
Preprint
Full-text available
This article is based on the theory about the relationship between the direct object (spa. complemento directo – CD) and the prepositional object (spa. complemento de régimen – CR) in Spanish, mainly the possibility of their co-occurrence in the same predicate (Alarcos, 1966; Bosque, 1983; Rojo, 1983), as well as the predicates in which these two c...
Article
Full-text available
The MOMENT project aims to contribute to a better understanding of severe mental disorders by analyzing the discourse of the two main groups involved, affected people and mental health professionals, in the light of the Conceptual Metaphor Theory and Corpus Linguistics methodology. In this framework, a corpus of first-person accounts from both grou...
Article
The identification of reliable terms from domain-specific corpora using computational methods is a task that has to be validated manually by specialists, which is a highly time-consuming activity. To reduce this effort and improve term candidate selection, we implemented the Token Slot Recognition method, a filtering method based on terminological...
Article
Este artículo presenta un sistema de creación automática de libros bilingües con textos alineados. El sistema permite crear libros electrónicos en los que la oración en la lengua de partida está vinculada con la correspondiente oración en la lengua de llegada. Los usuarios pueden leer en la lengua original y ver la oración correspondiente en la len...
Article
En l’àmbit de la traducció especialitzada es considera que la Viquipèdia no és un recurs d’informació especialitzat fiable degut al fet que qualsevol usuari, sigui especialista o no de la matèria, pot redactar un article. En aquest article es vol determinar el grau de fiabilitat de la Viquipèdia com a recurs terminològic especialitzat per als tradu...
Article
Full-text available
In this paper we present a system for automatic terminology extraction and automatic detection of the equivalent terms in the target language to be used alongside a computer assisted translation (CAT) tool that provides term candidates and their translations in an automatic way each time the translator goes from one segment to the next one. The sys...
Book
Full-text available
http://www.editorialuoc.com/herramientas-tecnologicas-para-traductores Este libro presenta una panorámica general clara y en profundidad de las tecnologías que se aplican hoy en día en el mundo de la traducción: herramientas de traducción asistida, traducción automática, y extracción y gestión de terminología. La obra presenta tanto los principios...
Conference Paper
Full-text available
In this paper we present an extension of the dictionary-based strategy for word-net construction implemented in the WN-Toolkit. This strategy allows the extraction of information for polysemous En-glish words if definitions and/or semantic relations are present in the dictionary. The WN-Toolkit is a freely available set of programs for the creation...
Article
En este artículo presentamos el TMX (Translation Memory eXchange), el formato estándar de intercambio de memorias de traducción. Repasaremos el concepto de memoria de traducción y sus usos que las convierten en uno de los principales recursos para el traductor. Veremos las estrategias para recuperar de manera rápida los segmentos más similares a qu...
Conference Paper
Full-text available
The manual identification of terminology from specialized corpora is a complex task that needs to be addressed by flexible tools, in order to facilitate the construction of multilingual terminologies which are the main resources for computer-assisted translation tools, machine translation or ontologies. The automatic terminology extraction tools de...
Conference Paper
Full-text available
In this paper an automatic morphology learning system for complex and agglutinative languages is presented. We process complex agglutinative morphology of Indian languages using Adaptor Grammars and linguistic rules of morphology. Adaptor Grammars are a compositional Bayesian framework for grammatical inference, where we define a morphological gram...
Conference Paper
Full-text available
Wordnet is a standard semantic resource for several Natural Language Processing tasks and it is available for an increasing number of languages. The Croatian Wordnet (CroWN) was a relatively small resource with 10.026 synsets and 31.367 synset-variant pairs covering only 45.91% of the so-called Core WordNet. Comparing these figures with the size of...
Research
Full-text available
Presentation at Computational Lexicology & Terminology Lab (Vrije Universitiet, Amsterdam, The Neederlands). http://www.cltl.nl/publications/presentations/antoni-oliver-gonzalez/
Article
Full-text available
In this paper the methodology and a detailed evaluation of the results of the expansion of the Galician WordNet using the WN-Toolkit are presented. This toolkit allows the creation and expansion of wordnets using the expand model. In our experiments we have used methodologies based on dictionaries and parallel corpora. The evaluation of the results...
Article
Full-text available
The InLéctor project aims to promote reading in original language, offering an interactive scenario which facilitates foreign language teaching and selflearning, as well as the study of literature. In order to achieve this aim, the project develops computational techniques which provide automatically generation of bilingual e-books, incorporating d...
Conference Paper
Full-text available
In this paper we present the evaluation results for the creation of WordNets for five languages (Spanish, French, German, Italian and Portuguese) using an approach based on parallel corpora. We have used three very large parallel corpora for our experiments: DGT-TM, EMEA and ECB. The English part of each corpus is semantically tagged using Freeling...
Conference Paper
Full-text available
This paper presents a set of methodologies and algorithms to create WordNets following the expand model. We explore dictionary and BabelNet based strategies, as well as methodologies based on the use of parallel corpora. Evaluation results for six languages are presented: Catalan, Spanish, French, German, Italian and Portuguese. Along with the meth...
Article
Full-text available
At times it is difficult to automatically identify the most representative terms ina specialized corpus and to validate them as correct due to the similarity of words and terms. In order to identify the most representative terms in a corpus that can be easily adapted to any language or terminology extraction tool, we explore the combination of toke...
Article
Full-text available
Resumen: En este artículo presentamos el proyecto InLéctor para la creación de libros electrónicos bilingües inter-activos. El objetivo del proyecto es desarrollar una serie de aplicaciones para la creación automática de libros electrónicos bilingües. Dichos libros permiten pasar del texto original al traducido con un solo clic y se publican en los...
Article
Full-text available
En aquest article presentem un conjunt de programes que faciliten la creació de WordNets a partir de diccionaris bilingües mitjançant l'estratègia d'expansió. Els programes estan escrits en Python i són per tant multiplataforma. El seu ús, tot i que no disposen d'una interfície gràfica d'usuari, és molt senzill. Aquests programes s'han fet servir a...
Article
Full-text available
Este artículo ofrece una revisión de métodos para la construcción de WordNets siguiendo la estrategia de expansión, es decir, mediante la traducción de las variants inglesas del Princeton WordNet. En el proceso de construcción se han utilizado recursos libres disponibles en Internet. El artículo presenta también los resultados de la evaluación de l...
Article
Full-text available
Este proyecto pretende desarrollar un sistema que genere libros bilingües, con audio e interactivos. El sistema ofrecerá diversos formatos de salida que permitan leer y escuchar los libros en diferentes dispositivos, como libros electrónicos, tabletas y ordenadores. Asimismo, ofrecerá la posibilidad de obtener libros paralelos impresos. Palabras cl...
Article
This paper presents a review of methods for building WordNets following the expand model, that is, by translating the English variants of the Princeton WordNet. Only free resources available online have been used. The paper also presents the evaluation of the techniques applied in the construction of Spanish and Catalan WordNets 3.0. These techniqu...
Article
The aim of this project is the development of a system for the generation of interactive bilingual electronic books with audio support. The system will offer several output formats for reading and listening the books on different devices such as electronic books, tablets and computers. It will also allow printing a parallel bilingual book. © 2012 S...
Article
Full-text available
En aquest article presentem l'estat de la qüestió en l'ús de la Viquipèdia per a tasques relacionades amb el processament del llenguatge natural i tres aplicacions que hem creat per a l'enriquiment d'un recurs lingüístic de gran abast: el WordNet versió 3.0 per al català i castellà. Els investigadors en aquesta àrea fa anys que cerquen vies perquè...
Conference Paper
Full-text available
In this paper we present a methodology for WordNet construction based on the exploitation of parallel corpora with semantic annotation of the English source text. We are using this methodology for the enlargement of the Spanish and Catalan versions of WordNet 3.0, but the methodology can also be used for other languages. As big parallel corpora wit...
Conference Paper
Full-text available
This paper describes a methodology for the construction of WordNets based on machine translation of an English sense tagged corpus. For the construction of such a corpus we use two freely avail-able resources: the Semcor Corpus and the Princeton WordNet Gloss Corpus. This methodology is being used for the con-struction of Spanish and Catalan WordNe...
Conference Paper
Full-text available
En este documento, se describen los criterios que determinan el conjunto de multipalabras ('multiwords') recogidas en el recurso electrónico WordNet 3.0 (http://adimen.si.ehu.es/cgi-bin/ wei/public/wei.consult.perl). Puesto que se trata de una propuesta aplicada y restringida a un recurso en concreto, se aleja de la idea de ser una propuesta teóric...
Article
Full-text available
En aquest article presentem l'estat de la qüestió en l'ús de la Viquipèdia per a tasques relacionades amb el processament del llenguatge natural i tres aplicacions que hem creat per a l'enriquiment d'un recurs lingüístic de gran abast: el WordNet versió 3.0 per al català i castellà. Els investigadors en aquesta àrea fa anys que cerquen vies perquè...
Conference Paper
Full-text available
Spelling and grammar checking has become a daily activity for almost all text processor users. Usually these tools offer limited information about the misspelling or the grammar error and in certain cases suggest one or more possible alternatives. Sometimes users make the same mistakes one day after the other because they don't know the real reason...
Article
Full-text available
En esta demostración presentamos un primer prototipo de asistente para la mejora de la redacción en catalán. El sistema va más allá de un simple corrector gramatical, ya que propone enlaces a gramáticas y ejercicios que permiten al usuario practicar los aspectos donde presenta más carencias. El sistema funciona también como evaluador de nivel y per...
Article
Full-text available
Resumen: Este artículo describe una metodología de construcción de WordNets que se basa en la traducción automática de un corpus en inglés desambiguado por sentidos. El corpus que utilizamos está formado por las propias glosas de WN 3.0 etiquetadas semánticamente y por el corpus Semcor. Los resultados de precisión son comparables a los obtenidos me...
Article
This paper describes a methodology for the construction of WordNets based on machine translation of an English sense tagged corpus. We use the Semcor corpus and the WordNet 3.0 sense tagged glosses as a corpus. Precision results are comparable to those obtained by methods based on bilingual dictionaries for the same languages. This methodology is b...
Article
In this demo we present a first prototype of an assistant for the improvement of writing skills in Catalan. The system is more than a grammatical checker as it proposes links to grammatical explanations and exercises, allowing the user to practice specific aspects. The program also works as a level evaluator and allows to track the user's improveme...
Chapter
Full-text available
The automatic detection of lexical units of a specialised nature in a given area of knowledge is one of the key challenges in the organisation and retrieval of information. This communication addresses the use of different statistics strategies with a view to be able to automatically extract terminological units from a specialist area to retrieve...
Article
Aquest curs d'actualització està concebut com a una introducció general als conceptes i les eines necessàries per a gestionar projectes de traducció: determinació de recursos humans i informàtics, càlcul de volum i cost, formats, control de qualitat, de flux de treball, etc. Este curso de actualización está concebido como una introducción general a...
Article
Full-text available
Aquest curs d'actualització està concebut com a una introducció general als conceptes i les eines necessàries per a gestionar projectes de traducció: determinació de recursos humans i informàtics, càlcul de volum i cost, formats, control de qualitat, de flux de treball, etc. Este curso de actualización está concebido como una introducción general a...
Conference Paper
Full-text available
The Linguamón-UOC Chair in Multilingualism of the Universitat Oberta de Catalunya has developed a project consisting of the automatic elab-oration of linguistic resources, those including Catalan. For this, we have created an automatic extractor of terminology, which is freely distributed, multi-platform and adaptable to users' needs. One of its mo...
Conference Paper
Full-text available
Multilingualism is a reality in the XXIst Century and New Technolo-gies reveal as a new powerful way to cope with its main issues and the challenges its treatment implies. In this sense a great amount of work has been carried out for the last twenty years in the field of Language Engineering and Applied Linguistics. A big effort has been made to de...
Conference Paper
Full-text available
English The Linguamón-UOC Chair in Multilingualism of the Universitat Oberta de Catalunya has developed an automatic extractor of terminology, which is freely distributed, multi-platform and adaptable to users' needs. One of its most important useful applications is the elaboration of glossaries, both monolingual and multilingual, based on a set of...
Conference Paper
Full-text available
El multilingüismo es una realidad del siglo XXI y el desarrollo de las nuevas tecnologías en las últimas décadas se revela como un método eficaz para hacerle frente. Actualmente, las comunidades son cada vez más multiculturales y la sociedad necesita abastecerse de herramientas capaces de gestionar el multilingüismo derivado de esta condición. La U...
Conference Paper
Full-text available
This paper presents the complete and consistent annotation of the nominal part of the EuroWordNet (EWN). The annotation has been carried out using the semantic features defined in the EWN Top Concept Ontology. Up to now only an initial core set of 1024 synsets, the so-called Base Concepts, were ontologized in such a way.
Conference Paper
Full-text available
This paper presents the complete and consistent ont ological annotation of the nominal part of WordNet. The annotation has been carried out using the semantic features defined in the EuroWordNet Top Concept Ontology and made available to the NLP community. Up to now only an initial core set of 1,024 synsets , the so-called Base Concepts, was ontolo...
Article
Full-text available
The UOC, within the framework of the Linguamón-UOC Chair in Multilingualism, has developed a virtual learning environment with an integrated machine translation system. Thanks to this project, which works with free software applications Moodle and Apertium, a multilingual learning environment can be provided in Catalan, English, French and Spanish....
Conference Paper
Full-text available
In this paper we will present a set of terminology extraction tools that are distributed under a Free Software License, so that users can freely download, use, distribute and modify them to meet their needs. The tools are mainly programmed in Perl and they will work under different platforms, such as Windows or Linux. These terminology ex-traction...
Article
Full-text available
En este artículo presentamos LexTerm, una herramienta de extracción automática de terminología que es gratuita y de código abierto. Con esta herramienta se facilita la selección de los términos más relevantes que deben tener un equivalente de traducción consistente. Muchos traductores y algunas agencias de traducción realizan esta tarea todavía a m...
Chapter
This chapter presents an in-depth linguistic evaluation of a corpus of messages posted in several bilingual newsgroups in Catalonia (Spain). The social context is a situation of bilingualism and language contact where Spanish seems to be progressively overtaking Catalan as the language of daily use. The decline of Catalan might be prevented by inte...
Article
Full-text available
In this paper we present a prototype trans-lation system that uses only a source-language (SL) tagger, a bilingual dictio-nary and a lemmatised target-language (TL) corpus. In our approach, the TL corpus is innovatively exploited both for lexical selection (selecting among the dif-ferent translations proposed by the dictio-nary) and for structure b...
Book
Translation into Spanish by Antoni Oliver in Editorial UOC ISBN: 978-84-9788-740-3
Article
Full-text available
A Internet es poden trobar molts productes i serveis de traducció automàtica. En aquest article presentarem una classificació funcional d’aquests productes i serveis, posant un especial èmfasi en aquells que puguin ser més útils per a un traductor professional. No presentarem una guia exhaustiva de productes o serveis de traducció automàtica sinó ú...
Conference Paper
Full-text available
This paper presents the Multilingual Translation Service of eTITLE, a European eContent project, which has produced tools to assist in the multilingual subtitling of audiovisual material through the web. The eTITLE Translation Service combines state-of-the-art Machine translation and Translation memories, which may be tailored to the customer needs...
Article
Full-text available
Resumen La presente comunicación presenta el proyecto INTERLINGUA 1 : su diseño y el trabajo realizado hasta el momento. El objetivo de INTERLINGUA es implementar un entorno totalmente automático (sin preedición ni postedición) de mensajes de correo electrónico en el Campus Virtual de la Universitat Oberta de Catalunya (UOC). Se describe la estrate...
Article
Full-text available
As the creation of computational (verb) lexicons is a huge time-consuming task, tagged corpora appear to be a very useful resource for inducing verb knowledge. In this paper we present a multilingual verb lexicon with syntactic and semantic infor-mation for four languages. For three of them (Catalan, Basque and Spanish) this lexicon is induced from...
Conference Paper
Full-text available
In this paper we present an approach to Statistical Machine Translation that uses a bilingual dictionary and a target language model based on n-grams extracted from a monolingual corpus. This approach is still in an experimental stage and is being developed in the context of Metis-II, a UE project that aims at constructing free text translations by...
Article
Full-text available
The Open University of Catalonia (UOC) has set up a programme for the integration of automatised translation techniques and assisted translation in order to process the large amount of Catalan and Spanish teaching documents that its virtual courses produce. After revising the problem and various experiences with the application of these linguistic...
Article
Full-text available
En este artículo presentamos un sistema experimental de traducción automática de tipo estadístico basado en n-gramas. El sistema utiliza un corpus paralelo y fue concebido inicialmente como una extensión de un sistema de Traducción Asistida (TAO). Los buenos resultados obtenidos para el par de lenguas catalán-castellano nos han impulsado a explorar...
Article
Full-text available