About
41
Publications
5,476
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
196
Citations
Introduction
Additional affiliations
January 2004 - July 2013
Publications
Publications (41)
This paper describes the methodology followed to build a neural machine translation system in the biomedical domain for the English-Catalan language pair. This task can be considered a low-resourced task from the point of view of the domain and the language pair. To face this task, this paper reports experiments on a cascade pivot strategy through...
The experiments presented here exploit the properties of the Apertium RDF Graph, principally cycle density and nodes' degree, to automatically generate new translation relations between words, and therefore to enrich existing bilingual dictionaries with new entries. Currently, the Apertium RDF Graph includes data from 22 Apertium bilingual dictiona...
This article provides an overview of the dissemination work carried out in META-NET from 2010 until early 2014; we describe its impact on the regional, national and international level, mainly with regard to politics and the situation of funding for LT topics. This paper documents the initiative's work throughout Europe in order to boost progress a...
The proliferation of different metadata schemas and models pose serious problems of interoperability. Maintaining isolated repositories with overlapping data is costly in terms of time and effort. In this paper, we describe how we have achieved a Linked Open Data version of metadata descriptions coming from heterogeneous sources, originally encoded...
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We argue that processing this type of text requires the revisiting of the initial steps of Natural Language Processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user-generated texts) presents a number of nonstandard communicative and linguis...
In recent years, machine translation (MT) research focused on investigating how hybrid MT as well as MT combination systems can be designed so that the resulting translations give an improvement over the individual translations. As a first step towards achieving this objective we have developed a parallel corpus with source data and the output of a...
El processament computacional de la llengua abraça qualsevol activitat relacionada amb la creació, la gestió i la utilització de tecnologia i de recursos lingüístics. En el pla científic, aquesta activitat és central en disciplines com ara la lingüística de corpus, l’enginyeria lingüística o el processament del llenguatge natural escrit o parlat. E...
L'article fa balanç de la Jornada del Processament Computacional del català. S'hi exposen els objectius, es planteja una visió del processament des dels àmbits de recursos i recerca i el perfil dels participants, es plantegen els principals fils d'argumentació del debat que s'hi va portar a terme i se n'extreuen unes conclusions.
We present the conclusions of the first "Jornada del processament Computacional del Català", held in Barcelona on March 2009
METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input
without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with
patterns and statistics from the monolingual target-language corpus. The METIS-II project has four par...
In this paper we describe the METIS-II system and its evaluation on each of the language pairs: Dutch, German, Greek, and Spanish to English. The METIS-II system envisaged developing a data-driven approach in which no parallel corpus is required and in which no full parser or extensive rule sets are needed. We describe the evaluation on a developme...
In this paper we present a prototype trans-lation system that uses only a source-language (SL) tagger, a bilingual dictio-nary and a lemmatised target-language (TL) corpus. In our approach, the TL corpus is innovatively exploited both for lexical selection (selecting among the dif-ferent translations proposed by the dictio-nary) and for structure b...
We present an experimental Machine Translation prototype system that is able to translate between Span-ish and English, using very basic linguistic resources. In our approach, no structural transfer rules are used to deal with structural divergences between the two lan-guages: the target corpus is the basis both for lexical selection and for struct...
Es presenta el sistema MSR-MT, un sistema híbrid de TA desenvolupat pel grup de Processament de Llenguatge Natural a Microsoft Research, gràcies al qual es podran traduir automàticament a diverses llengües, tots els articles encara no traduïts de la base de coneixement desenvolupada pels Serveis de Suport de Productes (Product Support Services, PSS...
La recerca lingüística pot contribuir molt al desenvolupament de la Traducció Automàtica, i al problema fonamental de les divergències en la traducció, amb observacions de fenòmens, amb tècniques i teories que la recerca en TA pot adoptar i combinar amb mètodes estadístics d’anàlisi de corpus.
This paper presents the Multilingual Translation Service of eTITLE, a European eContent project, which has produced tools to assist in the multilingual subtitling of audiovisual material through the web. The eTITLE Translation Service combines state-of-the-art Machine translation and Translation memories, which may be tailored to the customer needs...
In this paper we present an approach to Statistical Machine Translation that uses a bilingual dictionary and a target language model based on n-grams extracted from a monolingual corpus. This approach is still in an experimental stage and is being developed in the context of Metis-II, a UE project that aims at constructing free text translations by...
En este artículo presentamos un sistema experimental de traducción automática de tipo estadístico basado en n-gramas. El sistema utiliza un corpus paralelo y fue concebido inicialmente como una extensión de un sistema de Traducción Asistida (TAO). Los buenos resultados obtenidos para el par de lenguas catalán-castellano nos han impulsado a explorar...
We present the METIS-II project, aimed at creating a Statistical Machine Translation system which uses only a monolingual corpus of the target language and a bilingual dictionary, thus eliminating the need for parallel corpora to train the system Presentamos el proyecto METIS-II, dirigido a la creación de un sistema de traducción automático estadís...
This paper presents an overview of a robust, broad-coverage, and application-independent natural language generation system. It demonstrates how the different language generation components function within a multilingual Machine Translation (MT) system, using the languages that we are currently working on (English, Spanish, Japanese, and Chinese).
In this paper we describe two parallel experiments on the integration of machine learning (ML) methods into the Spanish and Japanese rule-based sentence realization modules developed at Microsoft Research.
We propose a framework for representing semantic tense that is language-neutral, in the sense that it represents what is expressed by different tenses in different languages in a shared formal vocabulary. The proposed framework allows the representation to retain surface distinctions for particular languages, while allowing fully semantic represent...
This paper discusses the Spanish research projects undertaken at Microsoft Research since 1995, beginning with the challenge of creating a broad-coverage analysis system for Spanish. This initial project provided the basic Spanish NLP resources needed for any Spanish NLP application. The next challenge was the development of a Spanish grammar check...
This paper presents an overview of the broad-coverage, application-independent natural language generation component of the NLP system being developed at Microsoft Research. It demonstrates how this component functions within a multilingual Machine Translation system (MSR-MT), using the languages that we are currently working on (English, Spanish,...
This paper presents an overview of the broad-coverage, application-independent natural language generation component of the NLP system being developed at Microsoft Research. It demonstrates how this component functions within a multilingual Machine Translation system (MSR-MT), using the languages that we are currently working on (English, Spanish,...
U Collective Animal or Human = (Collective + O) V Plant or Animal = (P + A) W Inanimate Concrete or Abstract = (T + I) X Abstract or Human = (T + H) Y Abstract or Animate = (T + H) 80 CHAPTER 3. LEXICAL SEMANTIC RESOURCES Z Unmarked 1 Human or Solid = (H + S) 2 Abstract or Solid = (T + S) 4 Abstract Physical 5 Organic Material 6 Liquid or Abstract...
U Collective Animal or Human = (Collective + O) V Plant or Animal = (P + A) W Inanimate Concrete or Abstract = (T + I) X Abstract or Human = (T + H) Y Abstract or Animate = (T + H) 82 CHAPTER 3. LEXICAL SEMANTIC RESOURCES Z Unmarked 1 Human or Solid = (H + S) 2 Abstract or Solid = (T + S) 4 Abstract Physical 5 Organic Material 6 Liquid or Abstract...
Our aim is to present a Generation Grammar for Spanish in development. This grammar is part of a multilingual, general-purpose Natural Language Processing (NLP) System developed at Microsoft Research and is intended to be used in a future English-Spanish Machine Translation (MT) application. It is implemented in G, a programming language close to C...