Manuel Mager

Manuel Mager
Amazon · Amazon AWS

MS

About

40
Publications
8,315
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
333
Citations
Introduction
Computer Science and Computer Linguistics/ Natural Language Processing student. My main aim is the inclusion of minority and indigenous languages into the scientific research.

Publications

Publications (40)
Preprint
Full-text available
Neural models have drastically advanced state of the art for machine translation (MT) between high-resource languages. Traditionally, these models rely on large amounts of training data, but many language pairs lack these resources. However, an important part of the languages in the world do not have this amount of data. Most languages from the Ame...
Preprint
Full-text available
In recent years machine translation has become very successful for high-resource language pairs. This has also sparked new interest in research on the automatic translation of low-resource languages, including Indigenous languages. However, the latter are deeply related to the ethnic and cultural groups that speak (or used to speak) them. The data...
Article
Full-text available
Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for...
Conference Paper
Full-text available
Indigenous languages, including those from the Americas, have received very little attention from the machine learning (ML) and natural language processing (NLP) communities. To tackle the resulting lack of systems for these languages and the accompanying social inequalities affecting their speakers, we conduct the second AmericasNLP competition (a...
Preprint
Full-text available
Data sparsity is one of the main challenges posed by Code-switching (CS), which is further exacerbated in the case of morphologically rich languages. For the task of Machine Translation (MT), morphological segmentation has proven successful in alleviating data sparsity in monolingual contexts; however, it has not been investigated for CS settings....
Preprint
Full-text available
Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation. We investigate a wide variety of supervised and unsupervised morphological segmentation methods for four polysynthetic languages: Nahuatl, Raramuri, Shipibo-Konibo, and W...
Preprint
Full-text available
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team. We utilize state-of-the-art models combined with several data augmentation, multi-task and transfer learning approaches for the automatic speech recognition (ASR) and machine translation (MT) steps of our cascaded system. Moreover, we also...
Article
La traducción automática es una herramienta que aún con sus limitaciones permite el acceso a ideas e información entre lenguas. Su adopción para las lenguas indígenas de México podría revalorizarlas ante los hablantes que se enfrentan a la situación de qué sistemas digitales no las usan en sus interfaces. En este trabajo presentamos resultados de l...
Conference Paper
Full-text available
This paper presents the results of the 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas. The shared task featured two independent tracks, and participants submitted machine translation systems for up to 10 indigenous languages. Overall, 8 teams participated with a total of 214 submissions. We provided training s...
Preprint
Pretrained multilingual models are able to perform cross-lingual transfer in a zero-shot setting, even for languages unseen during pretraining. However, prior work evaluating performance on unseen languages has largely been limited to low-level, syntactic tasks, and it remains unclear if zero-shot learning of high-level, semantic tasks is possible...
Preprint
Full-text available
Canonical morphological segmentation consists of dividing words into their standardized morphemes. Here, we are interested in approaches for the task when training data is limited. We compare model performance in a simulated low-resource setting for the high-resource languages German, English, and Indonesian to experiments on new datasets for the t...
Preprint
Full-text available
In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS-CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020). The task consists of generating the morphological paradigms of a set of lemmas, given only the lemmas themselves and unlabel...
Preprint
Full-text available
Meaning Representations (AMRs) are broad-coverage sentence-level semantic graphs. Existing approaches to generating text from AMR have focused on training sequence-to-sequence or graph-to-sequence models on AMR annotated data only. In this paper, we propose an alternative approach that combines a strong pre-trained language model with cycle consist...
Conference Paper
Full-text available
Keywords: Indigenous· Languages· Twitter Internet communication has become an important social phenomenon. However, minor languages have been largely ignored in the design of social networks. Also, natural language processing (NLP) and computational linguistics (CL) communities have done only a small amount of research in the last years for those l...
Preprint
Full-text available
Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language per token. However, if at least one language is morphologically rich, a large number of words can be composed of morphemes from more than one langu...
Conference Paper
Full-text available
Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language per token. However, if at least one language is morphologically rich, a large number of words can be composed of morphemes from more than one langu...
Conference Paper
Full-text available
Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the digital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant la...
Conference Paper
Full-text available
The Wixarika-Spanish Parallel Corpus is composed of 8967 different phrases from the Wixarika to the Spanish language. Wixarika (also known as Huichol) is a polysynthetic indigenous language spoken in Mexico by around fifty thousand native speakers. The corpus has been used for the creation of machine translation systems.
Preprint
Full-text available
Machine translation from polysynthetic to fusional languages is a challenging task, which gets further complicated by the limited amount of parallel text available. Thus, translation performance is far from the state of the art for high-resource and more intensively studied language pairs. To shed light on the phenomena which hamper automatic trans...
Conference Paper
Full-text available
En México existen 68 lenguas indígenas oficialmente reconocidas. Esta riqueza lingüística forma parte del mosaico multicultural que define la identidad de nuestro país. Sin embargo, la predominancia cultural del español y el rezago generalizado del acceso a las tecnologías de información por parte de los hablantes de estas lenguas crea barreras cul...
Preprint
Full-text available
Indigenous languages of the American continent are highly diverse. However, they have received little attention from the technological perspective. In this paper, we review the research, the digital resources and the available NLP systems that focus on these languages. We present the main challenges and research questions that arise when distant la...
Conference Paper
Full-text available
Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we firs...
Article
In this work, we present a morphological segmenter for the Mexican indigenous language Wixarika. Segmentation is fundamental for rich morphological languages, a common aspect of the native American languages, to improve other tasks like machine translation, dialogue systems, summarization, etc. On top of the agglutinative nature of the language, th...
Presentation
Full-text available
We present the first morphological segmenter for the Mexican indigenous language Wixarika.
Conference Paper
Full-text available
En este artículo se presenta un traductor automático entre las lenguas español y wixarika, usando traducción estadística y recursos gramaticales complementarios. El wixarika es una lengua indígena hablada en los estados mexicanos de Jalisco, Nayarit, Zacatecas y Durango. Este trabajo se enfoca en dos problemas: la escasa existencia de corpus parale...
Article
Full-text available
El análisis de los lenguajes se ha convertido en una tarea multidisciplinaria que se ha involucrado desde la lingüística misma hasta la teoría de la computación. El proceso de comunicación también es un problema de computación, es en este contexto que Richard Feynman se adentra a la discusión sobre la equivalencia de lenguajes en el marco de la uni...
Conference Paper
Full-text available
Los pueblos indígenas han sido objeto de una fuerte política de integración al sistema capitalista estadounidense, que pretende no sólo integrarlos a una clase social oprimida, si no incluso incorporarlos a las oligarquías. Su nueva concepción del mundo, resultado de una amalgamación de las antiguas for-maciones culturales y del american way of liv...

Network

Cited By