Emmanuel Morin

Emmanuel Morin
University of Nantes | UNIV Nantes · Nantes Atlantic Computer Science Lab

Université de Nantes · LS2N UMR CNRS 6004

About

137
Publications
14,024
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,558
Citations
Citations since 2016
29 Research Items
520 Citations
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
2016201720182019202020212022020406080100
Additional affiliations
January 1996 - present
University of Nantes
Position
  • Professor (Full)

Publications

Publications (137)
Conference Paper
Full-text available
The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced. However, the historical contextbased projection method dedicated to this task is relatively insensitive to the sizes of each part of the comparable corpus. Within this context, we have carried out a study on the influ...
Conference Paper
Full-text available
In this paper, we present a new method that improves the alignment of equivalent terms monolingually acquired from bilingual comparable corpora: The Compositional Method with Context-Based Projection (CMCBP). Our overall objective is to identify and to translate high specialized terminology made up of multi-word terms acquired from comparable corpo...
Article
Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings...
Article
Full-text available
This paper tackles the problem of e-commerce thesauri alignment. It includes the definition of three alignment techniques which can be combined to increase the effectiveness and reduce the execution time. It also introduces a filtering technique to reduce the number of candidates returned to the final user. This work reports a set of evaluations th...
Preprint
BACKGROUND Often missing or uncertain in biomedical data warehouse (BDW), vital status after discharge is central to the value of BDW for medical research. The French national mortality database (FNMD) offers open-source nominative records of every death. Matching large scale BDWs records with the FNMD combines multiple challenges: the absence of u...
Article
Background: Often missing from or uncertain in a biomedical data warehouse (BDW), vital status after discharge is central to the value of a BDW in medical research. The French National Mortality Database (FNMD) offers open-source nominative records of every death. Matching large-scale BDWs records with the FNMD combines multiple challenges: absenc...
Chapter
Methods for bilingual lexicon induction are often based on word embeddings (WE) similarity. These methods must be able to project the WE to the same space. Uncontextualized WE proved to be useful for this task. We compare them to contextualized WE and Bag of Words, using specialized and general datasets. We also evaluate the impact of seed lexicons...
Article
This paper presents our work on the Dialog System Technology Challenges 7 (DSTC7). We took part in Track 1 on sentence selection which evaluates response retrieving in dialog systems on more realistic test scenarios compared to the state-of-the-art evaluations. Our proposed dialog system matches the context with the best response by computing their...
Article
Full-text available
Retrieval-based dialogue systems converse with humans by ranking candidate responses according to their relevance to the history of the conversation (context). Recent studies either match the context with the response on only sequence level or use complex architectures to match them on the word and sequence levels. We show that both information lev...
Preprint
Full-text available
This work investigates spoken language understanding (SLU) systems in the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. Two SLU tasks are considered: named entity recognition (NER) and semantic slot filling (SF). For these tasks, in order to improve the mode...
Chapter
Full-text available
This work investigates spoken language understanding (SLU) systems in the scenario when the semantic information is extracted directly from the speech signal by means of a single end-to-end neural network model. Two SLU tasks are considered: named entity recognition (NER) and semantic slot filling (SF). For these tasks, in order to improve the mode...
Preprint
Building dialogue systems that naturally converse with humans is being an attractive and an active research domain. Multiple systems are being designed everyday and several datasets are being available. For this reason, it is being hard to keep an up-to-date state-of-the-art. In this work, we present the latest and most relevant retrieval-based dia...
Preprint
Full-text available
We present an end-to-end approach to extract semantic concepts directly from the speech audio signal. To overcome the lack of data available for this spoken language understanding approach, we investigate the use of a transfer learning strategy based on the principles of curriculum learning. This approach allows us to exploit out-of-domain data tha...
Conference Paper
Full-text available
Named entity recognition (NER) is among SLU tasks that usually extract semantic information from textual documents. Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (err...
Preprint
Named entity recognition (NER) is among SLU tasks that usually extract semantic information from textual documents. Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (err...
Chapter
Dialogue act taxonomies, such as those of DAMSL, DiAML or the HCRC dialogue structure, can be incorporated into a larger meta-model by breaking down their labels into primitive functional features. Doing so enables the re-exploitation of annotated data for automatic dialogue act recognition tasks across taxonomies, i.e. it gives us the means to mak...
Chapter
Opinion target extraction is a crucial task of opinion mining, aiming to extract occurrences of the different entities of a corpus that are subjects of an opinion. In order to produce a readable and comprehensible opinion summary, which is the main application of opinion target extraction, these occurrences are consolidated at the entity level in a...
Conference Paper
Full-text available
As the amount of news information available online grows, media are in need of advanced tools to explore the information surrounding specific events before writing their own piece of news, e.g., adding context and insight. While many tools exist to extract information from large datasets, they do not offer an easy way to gain insight from a news co...
Article
The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced in terms of size. However, the historical context-based projection method is relatively insensitive to the size of each part of the comparable corpus. Within this context, we have carried out a study on the influence o...
Article
Multilingual terminology acquisition from comparable corpora has been attracting the interest of researchers for twenty years, but challenges still remain. Bilingual term alignment, a subtask of multilingual terminology acquisition, requires a pre-processing step, because term structure may differ according to the language. Morphologically construc...
Article
Full-text available
We investigate video hyperlinking based on speech transcripts, leveraging a hierarchical topical structure to address two essential aspects of hyperlinking, namely, serendipity control and link justification. We propose and compare different approaches exploiting a hierarchy of topic models as an intermediate representation to compare the transcrip...
Article
Full-text available
This paper describes the LINA system for the BUCC 2015 shared track. Following (Enright and Kondrak, 2007), our system identify comparable documents by collecting counts of hapax words. We extend this method by filtering out document pairs sharing target documents using pigeonhole reasoning and cross-lingual information .
Article
Full-text available
Alignment from comparable corpora usually involves two languages, one source and one target language. Previous works on bilingual lexicon extraction from parallel corpora demonstrated that more than two languages can be useful to improve the alignments. Our works have investigated to which extent a third language could be interesting to bypass the...
Article
Full-text available
Nous présentons une typologie de liens pour un corpus multimédia ancré dans le domaine journalistique. Bien que plusieurs typologies aient été créées et utilisées par la communauté, aucune ne permet de répondre aux enjeux de taille et de variété soulevés par l'utilisation d'un corpus large comprenant des textes, des vidéos, ou des émissions radioph...
Article
Full-text available
Automatic terminology processing appeared 10 years ago when electronic corpora became widely available. Such processing may be statistically or linguistically based and produces terminology resources that can be used in a number of applications : indexing, information retrieval, technology watch, etc. We present the tools that have been developed i...
Article
Full-text available
Ce travail s'intéresse à la notion de contexte lexical qui est au coeur de l'approche fon-datrice en extraction de lexiques bilingues à partir de corpus comparables spécialisés. D'une part, nous revenons sur les deux principales stratégies, dédiées à la caractérisation du contexte lexical, qui reposent sur l'exploitation de représentations graphiqu...
Conference Paper
Full-text available
This paper proposes two strategies for combining a window-based and a syntax-based context representation for the task of bilingual lexicon extraction from comparable corpora. The first strategy involves combining the scores assigned to translations by both models and using them for ranking and selection; the second strategy involves a combination...
Article
Full-text available
This work focuses on the concept of lexical context that is central to the historical approach of bilingual lexicon extraction from specialized comparable corpora. First, we revisit the two main strategies dedicated to lexical context characterization, that rely on the use of window-based and syntactic-based representations. We show that the combin...
Article
Full-text available
Nous nous intéressons à la tâche de reconnaissance des entités nommées pour la moda-lité orale. Cette tâche pose un certain nombre de difficultés qui sont inhérentes au traitement de l'oral. Dans ce travail, nous proposons d'étudier le couplage étroit entre la tâche de transcription de la parole et la tâche de reconnaissance des entités nommées. Da...
Chapter
In this paper we study the problem of compiling bilingual lexicon from language for special purposes (LSP) comparable corpora. We first define what would be the comparability for specialized comparable corpus and stress the distinction between expert and non-experts documents. We then turn to the contextual information method that concentrates on b...
Conference Paper
Full-text available
Named Entity Recognition (NER) from speech usually involves two sequential steps: Transcribing the speech using Automatic Speech Recognition (ASR) and annotating the outputs of the ASR process using NER techniques. Recognizing named entities in automatic transcripts is difficult due to the presence of transcription errors and the absence of some im...
Conference Paper
Full-text available
In this paper, we present a French named entity recognition (NER) system that was first developed as part of our participation in the ETAPE 2012 evaluation campaign and then extended to cover more entity types. The ETAPE 2012 evaluation campaign considers an hierarchical and compositional taxonomy that makes the NER task more complex. We present a...
Article
Full-text available
We are interested in the recognition of named entities for the speech modality. Some difficulties may arise for this task due to speech processing. In this work, we propose to study the tight pairing between the speech recognition task and the named entity recognition task. For that purpose, we take away the basic functionnalities of a speech recog...
Article
Full-text available
This paper proposes a method for extracting translations of morphologically constructed terms from comparable corpora. The method is based on compositional translation and exploits translation equivalences at the morpheme-level, which allows for the generation of "fertile" translations (translation pairs in which the target term has more words than...
Article
Full-text available
This paper defines a method for lexicon in the biomedical domain from comparable corpora. The method is based on compositional translation and exploits morpheme-level translation equivalences. It can generate translations for a large variety of morphologically constructed words and can also generate 'fertile' translations. We show that fertile tran...
Conference Paper
Full-text available
RÉSUMÉ Dans cet article, nous cherchons à mettre en correspondance de traduction des termes extraits de chaque partie monolingue d'un corpus comparable. Notre objectif concerne l'identification et la traduction de termes spécialisés. Pour ce faire, nous mettons en oeuvre une approche com-positionnelle dopée avec des informations contextuelles issue...
Conference Paper
Full-text available
The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) has contributed to leveraging computer-assisted translation tools, machine translation systems and multilingual content (corpora and terminology) management tools by generating bilingual terminologies automatically from comparable corpora in seven EU languages, as we...
Conference Paper
Full-text available
The paper deals with the automatic compilation of bilingual dictionary from specialized comparable corpora. We concentrate on a method to automatically extract and to align neoclassical compounds in two languages from comparable corpora. In order to do this, we assume that neoclassical compounds translate compositionally to neoclassical compounds f...
Conference Paper
In this paper, we present a new way of looking at the problem of bilingual lexicon extraction from comparable corpora, mainly inspired from information retrieval (IR) domain and more specifically, from question-answering systems (QAS). By analogy to QAS, we consider a word to be translated as a part of a question extracted from a source language, a...
Article
Full-text available
Online handwritten data, produced with Tablet PCs or digital pens, consists in a sequence of points (x, y). As the amount of data available in this form increases, algorithms for retrieval of online data are needed. Word spotting is a common approach used for the retrieval of handwriting. However, from an information retrieval (IR) perspective, wor...
Conference Paper
Full-text available
In this paper, we present HAMEX, a new public dataset that contains mathematical expressions available in their on-line handwritten form and in their audio spoken form. We have designed this dataset so that, given a mathematical expression, its handwritten signal and its audio signal can be used jointly to design multimodal recognition systems. Her...