Victoria Arranz

Victoria Arranz
  • PhD in Language Engineering
  • Head of R&D at Evaluations and Language resources Distribution Agency, Paris

About

45
Publications
3,937
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
270
Citations
Current institution
Evaluations and Language resources Distribution Agency, Paris
Current position
  • Head of R&D

Publications

Publications (45)
Chapter
Full-text available
This deep dive on data, knowledge graphs (KGs) and language resources (LRs) is the final of the four technology deep dives, as data as well as related models are the basis for technologies and solutions in the area of Language Technology (LT) for European digital language equality (DLE). This chapter focuses on the data and LRs required to achieve...
Chapter
Full-text available
This chapter provides an overview of what is available in ELG in terms of datasets, corpora and other language resources (LRs) and how this has been achieved. We look at the procedures and steps that have been followed to complete the full resource ingestion cycle, which goes from repository and LR identification to metadata description and ingesti...
Chapter
Full-text available
The European MAPA (Multilingual Anonymisation for Public Administrations) project aims at developing an open-source solution for automatic de-identification of medical and legal documents. We introduce here the context, partners and aims of the project, and report on preliminary results.
Preprint
Full-text available
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT b...
Preprint
Full-text available
The current scientific and technological landscape is characterised by the increasing availability of data resources and processing tools and services. In this setting, metadata have emerged as a key factor facilitating management, sharing and usage of such digital assets. In this paper we present ELG-SHARE, a rich metadata schema catering for the...
Article
With 24 official EU and many additional languages, multilingualism in Europe and an inclusive Digital Single Market can only be enabled through Language Technologies (LTs). European LT business is dominated by hundreds of SMEs and a few large players. Many are world-class, with technologies that outperform the global players. However, European LT b...
Article
Machine translation (MT) has become increasingly important and popular in the past decade, leading to the development of MT evaluation metrics aiming at automatically assessing MT output. Most of these metrics use reference translations to compare systems output, therefore, they should not only detect MT errors but also be able to identify correct...
Article
Full-text available
The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition, production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the...
Technical Report
Full-text available
This report elaborates on the exploitation of the PANACEA project assets. These assets have been clustered into a few items (a) the PANACEA Factory/Platform, (b) the web services integrated within the platform, (c) the associated workflows to manage the sequencing of web services (d) the tools developed during the project and last but not least (e)...
Conference Paper
Full-text available
This paper presents a metadata model for the description of language resources proposed in the framework of the META-SHARE infrastructure, aiming to cover both datasets and tools/technologies used for their processing. It places the model in the overall framework of metadata models, describes the basic principles and features of the model, elaborat...
Conference Paper
Full-text available
This paper presents the metadata schema for describing language resources (LRs) cur-rently under development for the needs of META-SHARE, an open distributed facility for the exchange and sharing of LRs. An es-sential ingredient in its setup is the existence of formal and standardized LR descriptions, cornerstone of the interoperability layer of an...
Article
This paper describes the joint submission of UniversitatPoli ecnica de Catalunya and Uni-versitat de Barcelona to the Metrics MaTr 2010 evaluation challenge, in collaboration with ELDA/ELRA. Our work is aimed at widening the scope of current automatic evaluation measures from sentence to document level. Preliminary ex-periments, based on an extensi...
Conference Paper
This paper describes the joint submission of Universitat Politècnica de Catalunya and Universitat de Barcelona to the Metrics MaTr 2010 evaluation challenge, in collaboration with ELDA/ELRA. Our work is aimed at widening the scope of current automatic evaluation measures from sentence to document level. Preliminary experiments, based on an extensio...
Article
Resumen: El presente trabajo muestra la evaluación cuantitativa y cualitativa de un grupo de analizadores de constituyentes y de dependencias con el objetivo de ser usados en el desarrollo de una métrica automática basada en conocimiento para evaluar la salida de sistemas de traducción automática. Primero se describe la metodología seguida en ambos...
Article
15 years have gone by and ELRA continues embracing the needs of the HLT community to design its services and to implement them through its operational body, ELDA. The needs of the community have become much more ambitious...Larger language resources (LR), better quality ones (how do we reach a compromise between price – maybe free – and quality?),...
Conference Paper
Full-text available
This paper presents the end-to-end evaluation of an automatic simultaneous translation system, built with state-of-the-art components. It shows whether, and for which situations, such a system might be advantageous when compared to a human interpreter. Using speeches in English translated into Spanish, we present the evaluation procedure and we dis...
Conference Paper
Full-text available
The project described in this paper is funded by th e French Ministry of Research. It aims at providing producers of Language Resources, and HLT players in general, with a guide which offers technical, legal and strategic recomme ndations/guidelines for the reuse of their Language Resources. The guide is dedi cated in particular to academic laborat...
Article
This paper describes the latest developments in ELRA's services within the field of Language Resources (LR). These developments focus on 4 main groups of activities: the identification and distribution of Language Resources; the production of LRs; the evaluation of Human Language Technology (HLT), and the dissemination of information in the field....
Article
Full-text available
This paper describes the final evaluation of the FAME interlingua-based speech-to-speech translation system for Catalan, English and Spanish. It is an extension of the already existing NESPOLE! System that translates between English, French, German and Italian. However, the FAME modules have now been integrated in an Open Agent Architecture platfor...
Chapter
This chapter provides an overview of available language resources, from both U.S. and European perspectives. Multilingual data repositories as well as large ongoing and planned collection efforts are introduced, along with a description of the major challenges of collection efforts, such as transcription issues due to inconsistent writing standards...
Conference Paper
Full-text available
In 2008 the Olympics Games will be held in Beijing. For this purpose the city government of Beijing has launched the Special Pro- gramme for Construction of Digital Olympics. One of the objectives of the program is the use of artificial intelligence technology to overcome language barriers during the games. In order to demonstrate the con- tributio...
Article
Full-text available
This paper describes the FAME Interlingua-based Speech-to-Speech Translation System for Catalan, English and Spanish. This is an extension of the already existing NESPOLE! that translates between English, French, German and Italian, but all modules have now been integrated in an Open Agen Architecture. This article describes the system architecture...
Conference Paper
This paper studies the impact of multiword expressions on Word Sense Disambiguation (WSD). Several identification strategies of the multiwords in WordNet2.0 are tested in a real Senseval-3 task: the disambiguation of WordNet glosses. Although we have focused on Word Sense Disambiguation, the same techniques could be applied in more complex tasks, s...
Conference Paper
Full-text available
This paper describes the “FAME” multi-modal demonstrator, which integrates multiple communication modes – vision, speech and object manipulation – by combining the physical and virtual worlds to provide support for multi-cultural or multi-lingual communication and problem solving. The major challenges are automatic perception of human actions and u...
Conference Paper
Full-text available
This paper describes the evaluation of the FAME interlingua-based speech-to-speech translation system for Catalan, English and Spanish. This system is an extension of the already existing NESPOLE! that translates between English, French, German and Italian. This article begins with a brief introduction followed by a description of the system archit...
Conference Paper
Full-text available
Creation of lexica and corpora for Catalan, Spanish and US-English is described. A lexicon is being created for speech recognition and synthesis including relevant information. The lexicon contains 50K common words selected to achieve a wide coverage on the chosen domains, and 50K additional entries in- cluding special application words, and proper...
Conference Paper
Full-text available
This paper focuses on the strategies adopted to tackle problematic input and ease communication between modules in a Spanish railway information dialogue system for spontaneous speech. The paper describes the design and tuning considerations followed by the understanding module, both from a language processing and semantic information extraction po...
Article
This paper focuses on the strategies adopted to tackle problematic input and ease communication between modules in a Spanish railway information dialogue system for spontaneous speech. The paper describes the design and tuning considerations followed by the understanding module, both from a language processing and semantic information extraction po...
Article
This paper describes on-going work on the development of two complementary resources: WordMed® and Scriptum®. The former is a lexico-conceptual knowledge base (KB) comprising information from four medical sub-domains (diagnostics, procedures, tumors and medicines). This resource is only accessible for the language and domain expert in charge of sup...
Conference Paper
Full-text available
This paper focuses on the increasing need for a more natural and sophisticated human-machine interaction (HMI). The research here presented shows work on the development of a restricted-domain spontaneous speech dialogue system in Spanish. This human-machine interface is oriented towards a semantically restricted domain: Spanish railway information...
Article
This paper focuses on the increasing need for a more natural and sophisticated human-machine interaction (HMI). The research here presented shows work on the development of a restricted-domain spontaneous speech dialogue system in Spanish. This human-machine interface is oriented towards a semantically restricted domain: Spanish railway information...
Article
This paper focuses on the general problem of the lexical bottleneck and, in particular, on the issues of semantic clustering and disambiguation by means of word usage cues obtained from sublanguage-specific corpora. Our approaches combines the use of numerical techniques with some symbolic modules. Our numerical tool Dynamic Context Matching is sup...
Article
Full-text available
This paper describes the design and development of a trilingual spontaneous speech corpus for statistical speech-to-speech translation. The languages considered are Catalan, Spanish and US-English. This corpus has been built bearing in mind the strong need for multi-lingual collections of on-line data within the area of statistical translation, as...
Article
This paper describes the creation of linguis-tically enriched aligned corpora for Catalan, Spanish and US-English for Speech-to-Speech Translation. These corpora are obtained from two diierent sources: US-English transcribed speech data and transcriptions of conversations recorded in Catalan and Spanish. After hu-man translation, a large trilingual...
Article
Full-text available
Machine translation evaluation campaigns require the pro-duction of reference corpora to automatically measure sys-tem output. This paper describes recent efforts to create such data with the objective of measuring the quality of the sys-tems participating in the Quaero evaluations. In particular, we focus on the protocols behind such production as...
Article
In the last decades, a wide range of automatic metrics that use linguistic knowledge has been developed. Some of them are based on lexical information, such as METEOR; others rely on the use of syntax, either using constituent or dependency analysis; and others use semantic information, such as Named Entities and semantic roles. All these metrics w...
Article
Full-text available
In this document, we propose a new unique and universal identification schema for Language Resources to provide Language Resources with unique names using a standardized nomenclature. This will also ensure Language Resources to be identified, and consequently to be recognized with proper references in activities within Human Language Technologies a...
Article
This paper emphasises the need to develop efficient lexical knowledge acquisition techniques in order to tackle problems related to the so-called lexical bottleneck. Bearing this in mind, a semi-automatic technique for semantic clustering and word sense disambiguation is proposed. The main principles behind this method are the extraction of knowled...

Network

Cited By