• Home
  • Christophe Servan
Christophe Servan

Christophe Servan
Qwant · Research

PhD

About

50
Publications
6,166
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
514
Citations
Citations since 2016
21 Research Items
360 Citations
20162017201820192020202120220102030405060
20162017201820192020202120220102030405060
20162017201820192020202120220102030405060
20162017201820192020202120220102030405060
Additional affiliations
January 2018 - present
Qwant Research
Position
  • Researcher
September 2016 - January 2019
SYSTRAN
Position
  • Researcher
Description
  • Neural Machine Translation
March 2015 - August 2016
University Joseph Fourier - Grenoble 1
Position
  • Researcher
Description
  • Statistical Machine Translation, Machine Learning, Deep Learning, Confidence Measures for SMT and ASR, Evaluation Methods
Education
September 2005 - December 2008
Université d´Avignon et des Pays du Vaucluse
Field of study
  • PhD in Machine Learning and Spoken Language Understanding within the framework of human-machine dialogue

Publications

Publications (50)
Preprint
Full-text available
In the last five years, the rise of the self-attentional Transformer-based architectures led to state-of-the-art performances over many natural language tasks. Although these approaches are increasingly popular, they require large amounts of data and computational resources. There is still a substantial need for benchmarking methodologies ever upwa...
Preprint
Full-text available
Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications when a sufficient number of training examples are available. In practice, these approaches suffer from the drawbacks of domain-driven design and under-resourced languages. Domain and langua...
Preprint
Full-text available
For many tasks, state-of-the-art results have been achieved with Transformer-based architectures, resulting in a paradigmatic shift in practices from the use of task-specific architectures to the fine-tuning of pre-trained language models. The ongoing trend consists in training models with an ever-increasing amount of data and parameters, which req...
Conference Paper
Full-text available
In this paper, we present a study on a French Spoken Language Understanding (SLU) task: the MEDIA task. Many works and studies have been proposed for many tasks, but most of them are focused on English language and tasks. The exploration of a richer language like French within the framework of a SLU task implies to recent approaches to handle this...
Conference Paper
Full-text available
Dans les moteurs de recherche sur Internet, l’une des tâches les plus importantes vise à identifier l’intention de l’utilisateur. Cet article présente notre étude pour proposer un nouveau système de détection d’intention pour le moteur de recherche sur Internet Qwant. Des logs de clic au système de détection d’intention, l’ensemble du processus est...
Conference Paper
Full-text available
In Machine Translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a simple yet promising approach to add contextual information in Neural Machine Translation. We present a method to add source context that capture the whole document with accurate boundaries, taking every word...
Preprint
Full-text available
In Machine Translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a simple yet promising approach to add contextual information in Neural Machine Translation. We present a method to add source context that capture the whole document with accurate boundaries, taking every word...
Preprint
Full-text available
This paper reports on Qwant Research contribution to tasks 2 and 3 of the DEFT 2019's challenge, focusing on French clinical cases analysis. Task 2 is a task on semantic similarity between clinical cases and discussions. For this task, we propose an approach based on language models and evaluate the impact on the results of different preprocessings...
Chapter
Full-text available
Bilingual lexicons of multiword expressions play a vital role in several natural language processing applications such as machine translation and cross-language information retrieval because they often characterize specific-domains vocabularies. Word alignment approaches are generally used to construct bilingual lexicons automatically from parallel...
Preprint
Full-text available
Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressin...
Conference Paper
Full-text available
This paper describes SYSTRAN's systems submitted to the WMT 2017 shared news translation task for English-German, in both translation directions. Our systems are built using OpenNMT 1 , an open-source neural machine translation system, implementing sequence-to-sequence models with LSTM encoder/decoders and attention. We experimented using mono-ling...
Article
Full-text available
This paper describes SYSTRAN's systems submitted to the WMT 2017 shared news translation task for English-German, in both translation directions. Our systems are built using OpenNMT, an open-source neural machine translation system, implementing sequence-to-sequence models with LSTM encoder/decoders and attention. We experimented using monolingual...
Article
Full-text available
Domain adaptation is a key feature in Machine Translation. It generally encompasses terminology, domain and style adaptation, especially for human post-editing workflows in Computer Assisted Translation (CAT). With Neural Machine Translation (NMT), we introduce a new notion of domain adaptation that we call "specialization" and which is showing pro...
Article
Full-text available
This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding. We propose a model for direct speech-to-text translation, which gives promising results on a small French-English synthetic corpus. Relaxing the need for source language transcri...
Article
Full-text available
Since the first online demonstration of Neural Machine Translation (NMT) by LISA, NMT development has recently moved from laboratory to production systems as demonstrated by several entities announcing roll-out of NMT engines to replace their existing technologies. NMT systems have a large number of training configurations and the training process...
Article
Full-text available
This paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric: METEOR. This metric enables an approximate match (synonymy or morphological similarity) between an automat...
Conference Paper
Full-text available
Recently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This p...
Article
Full-text available
Nous présentons des travaux préliminaires sur une approche permettant d'ajouter des termes bilingues à un système de Traduction Automatique Statistique (TAS) à base de segments. Les termes sont non seulement inclus individuellement, mais aussi avec des contextes les englobant. Tout d'abord nous générons ces contextes en généralisant des motifs (ou...
Article
Full-text available
The effective integration of MT technology into computer-assisted translation tools is a challenging topic both for academic research and the translation industry. In particular, professional translators consider the ability of MT systems to adapt to the feedback provided by them to be crucial. In this paper, we propose an adaptation scheme to tune...
Conference Paper
Full-text available
This paper describes the development of French-English and English-French statistical machine translation systems for the 2012 WMT shared task evaluation. We developed phrase-based systems based on the Moses decoder, trained on the provided data only. Additionally, new features this year included improved language and translation model adaptation u...
Conference Paper
Full-text available
This paper describes a new machine translation approach based on a statistical language model and a cross-language search engine. This approach consists in building a database of sentences in the target language and considering each sentence to translate as a "query" to that database. Linguistic information such as lemmas, part-of-speech and syntac...
Article
Full-text available
Optimisation in statistical machine translation is usually made toward the BLEU score, but this metric is questioned about its relevance to an human evaluation. Many other metrics exist but none of them are in perfect harmony with human evaluation. On the other hand, most evaluation campaigns use multiple metrics (BLEU, TER, METEOR, etc.). Statisti...
Article
Full-text available
This paper describes the development of French--English and English--French statistical machine translation systems for the 2011 WMT shared task evaluation. Our main systems were standard phrase-based statistical systems based on the Moses decoder, trained on the provided data only, but we also performed initial experiments with hierarchical system...
Conference Paper
Full-text available
Most of the freely available parallel data to train the translation model of a statistical machine translation system comes from very specific sources (European parliament, United Nations, etc). Therefore, there is increasing interest in methods to perform an adaptation of the translation model. A popular approach is based on unsupervised training,...
Article
Full-text available
This paper presents a hybrid approach for Machine Translation (MT) based on Cross-language Information Retrieval (CLIR). This approach uses linguistic and statistical processing and does not need parallel corpora as linguistic resources. A first experimental evaluation of this approach has been done on the CESTA corpus and the obtained results seem...
Conference Paper
Full-text available
In this paper, we present a hybrid approach to align single words, compound words and idiomatic expressions from bilingual parallel corpora. The objective is to develop, improve and maintain automatically translation lexicons. This approach combines linguistic and statistical information in order to improve word alignment results. The linguistic im...
Conference Paper
Full-text available
In this paper, we present a hybrid approach to align single words, compound words and idiomatic expressions from bilingual parallel corpora. The objective is to develop, improve and maintain automatically translation lexicons. This approach combines linguistic and statistical information in order to improve word alignment results. The linguistic im...
Conference Paper
Full-text available
Across language portability of a spoken language understanding system (SLU) deals with the possibility of reusing with moderate effort in a new language knowledge and data acquired for another language. The approach proposed in this paper is motivated by the availability of the fairly large MEDIA corpus carefully transcribed in French and semantica...
Article
Full-text available
Spoken dialogues systems are interfaces between users and services. Simple examples of services for which theses dialogue systems can be used include : banking, booking (hotels, trains, flights), etc. Dialogue systems are composed of a number of modules. The main modules include Automatic Speech Recognition (ASR), Spoken Language Understanding (SLU...
Conference Paper
Full-text available
This paper presents a new method for the fast development of call-routing systems based on pre-existing corpora and knowl- edge databases. This method pushes forward the reduction of specific data collection and annotation for developing a new call-classification system. No specific data collection is needed for training both for the Automatic Spee...
Article
Full-text available
A knowledge representation formalism for SLU is introduced. It is used for in-cremental and partially automated annotation of the Media corpus in terms of semantic structures. An automatic interpretation process is described for compos-ing semantic structures from basic semantic constituents using patterns involving constituents and words. The proc...
Conference Paper
Full-text available
A knowledge representation formalism for SLU is in-troduced. It is used for incremental and partially au-tomated annotation of the Media corpus in terms of semantic structures. An automatic interpretation pro-cess is described for composing semantic structures from basic semantic constituents using patterns in-volving constituents and words. The pr...
Conference Paper
Full-text available
With the purpose of improving spoken language understanding (SLU) performance, a combination of different acoustic speech recognition (ASR) systems is proposed. State a posteriori probabilities obtained with systems using different acoustic feature sets are combined with log-linear interpolation. In order to perform a coherent combination of these...
Conference Paper
Full-text available
A knowledge representation formalism for SLU is introduced. It is used for incremental and partially automated annotation of the Media corpus in terms of semantic structures. An automatic interpretation process is described for composing semantic structures from basic semantic constituents using patterns involving constituents and words. The proces...
Conference Paper
Full-text available
Within the framework of the French evaluation program MEDIA on spoken dialogue systems, this paper presents the methods pro- posed at the LIA for the robust extraction of basic conceptual con- stituents (or concepts) from an audio message. The conceptual decoding model proposed follows a stochastic paradigm and is di- rectly integrated into the Aut...
Conference Paper
Full-text available
Within the framework of the French evaluation program MEDIA on spoken dialogue systems, this paper presents the methods proposed at the LIA for the robust extraction of basic conceptual constituents (or concepts) from an audio message. The conceptual decoding model proposed follows a stochastic paradigm and is directly integrated into the Automatic...
Conference Paper
Full-text available
The aim of the MEDIA-EVALDA project is to evaluate the understanding capabilities of dialog systems. This paper presents the MEDIA protocol for speech understanding evaluation and describes the results of the June 2005 literal evaluation campaign. Five systems, both symbolic or corpus-based participated to the evaluation which is based on a common...
Article
Full-text available
Résumé Cette étude présente les travaux du LIA effectué sur le corpus de dialogue homme-machine MEDIA et visant à proposer des méthodes d'analyse robuste permettant d'ex-traire d'un message audio une séquence de concepts élémentaires. Le modèle de décodage conceptuel présenté est basé sur une approche stochastique qui intègre directement le proces-...

Network

Cited By

Projects

Projects (12)
Archived project
Domain Adaptation for viedolectures Creation of a plateform of videolectures with automatic subtitles and translation.