Damien Nouvel

Damien Nouvel
Institut National des Langues et Civilisations Orientales | INALCO · ERTIM

PhD

About

47
Publications
3,974
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
273
Citations
Citations since 2017
11 Research Items
186 Citations
2017201820192020202120222023010203040
2017201820192020202120222023010203040
2017201820192020202120222023010203040
2017201820192020202120222023010203040
Introduction

Publications

Publications (47)
Article
English below Le présent article aborde la question des théories du complot relatives aux vaccins et aux chemtrails circulant sur Twitter au prisme du traitement automatique des langues et de l'analyse de données textuelles. Si la thématique des chemtrails concerne davantage les contenus en anglais dans un registre technique, les théories conspira...
Article
Full-text available
In this article, we try to solve the problem of classification of counterfactual statements and extraction of antecedents/consequences in raw data, by mobilizing on one hand Support vector machine (SVMs) and on the other hand Natural Language Understanding (NLU) infrastructures available on the market for conversational agents. Our experiments allo...
Conference Paper
Full-text available
This paper describes a study on opinion analysis applied to both human to chatbot conversations, but also to human to human conversations using data coming from the banking sector. A polarity classifier SVM model applied to conversations provides insights and visualisations of the satisfaction of users at a given time and its evolution. We conducte...
Chapter
There has been increasing interest in the artificial intelligence community for influencer detection in recent years for its utility in singling out pertinent users within a large network of social media users. This could be useful, for example in commercial campaigns, to promote a product or a brand to a relevant target set of users. This task is...
Article
Full-text available
Arabic is recognised as the 4th most used language of the Internet. Arabic has three main varieties: (1) classical Arabic (CA), (2) Modern Standard Arabic (MSA), (3) Arabic Dialect (AD). MSA and AD could be written either in Arabic or in Roman script (Arabizi), which corresponds to Arabic written with Latin letters, numerals and punctuation. Due to...
Chapter
The LANGAS project provides an online database containing historical (16th–19th) texts in Quechua, Guarani and Tupi, for sociolinguistic studies. Querying texts for such low-resourced languages raises several questions, issues and challenges. Among them, our work addresses word variation (diacritization, typographic variations) as an optional query...
Conference Paper
Full-text available
In many languages such as Bambara or Arabic, tone markers (diacritics) may be written but are actually often omitted. NLP applications are confronted to ambiguities and subsequent difficulties when processing texts. To circumvent this problem , tonalization may be used, as a word sense disambiguation task, relying on context to add diacritics that...
Conference Paper
Recognition of real-world entities is crucial for most NLP applications. Since its introduction some twenty years ago, named entity processing has undergone a significant evolution with, among others, the definition of new tasks (e.g. entity linking) and the emergence of new types of data (e.g. speech transcriptions, micro-blogging). These pose cer...
Book
One of the challenges brought on by the digital revolution of the recent decades is the mechanism by which information carried by texts can be extracted in order to access its contents. The processing of named entities remains a very active area of research, which plays a central role in natural language processing technologies and their applicatio...
Chapter
This chapter first examines the reality of named entities that can be seen in evaluation campaigns and recognition systems. It highlights the disparity between the units which make up the named entity set, the quasi-absence of the definition of the concept and the difficulty of designing the task for recognizing these units. The chapter examines th...
Chapter
For the named entity detection, the different evaluation metrics are precision, recall and their harmonic mean, i.e. the F-measure. The F-measure itself was initially proposed to simplify the comparative evaluation of different systems. It is easier to compare multiple systems using a single value than using two values. Other metrics, based on erro...
Chapter
Named entity recognition is used to detect and assign types to token segments in texts. This chapter focuses on how to link named entity mentions to references provided by a knowledge base. The connection between a set of mentions and a set of references may be formally examined from a general perspective, independently of the specific texts and en...
Chapter
This chapter examines what has given rise to the concept of a named entity (NE) and develop an overview of the extensive work on document analysis tasks. It presents a historical overview of the research programs and their evaluation campaigns during which the NE automatic processing tasks were defined, then refined, continuously developing over mo...
Chapter
This chapter concerns the resources, in the broadest sense, associated with named entities, i.e. the means used to apply automatic treatments to these units in the context of natural language processing (NLP). It focuses on three main types of resources, namely typologies, corpora annotated using named entities, and lexicons and knowledge bases. Ea...
Chapter
This chapter presents the way in which the named entity recognition task may be broken down, making a distinction between detection and classification. Several different approaches have been used over the last few decades, following broad trends in computer science and specifically natural language processing (NLP): rule-based approaches have been...
Article
Full-text available
In the latest decades, machine learning approaches have been intensively experimented for natural language processing. Most of the time, systems rely on using statistics within the system, by analyzing texts at the token level and, for labelling tasks, categorizing each among possible classes. One may notice that previous symbolic approaches (e.g....
Conference Paper
We present the preliminary results of an ongoing work aimed at using morpho-syntactic patterns to extract information from process descriptions in a semi-supervised manner. The experiments have been designed for generic information extraction tasks and evaluated on detecting ingredients from cooking recipes in French using a large gold standard cor...
Article
Full-text available
Cet article présente une analyse des relations anaphoriques d’un corpus de dialogue oral spontané en français. Au cours des deux dernières décennies, l’ingénierie des langues a connu des avancées spectaculaires qui ont permis l’émergence de nombreuses applications opérationnelles destinées aussi bien au grand public qu’aux professionnels. Parmi ces...
Article
Full-text available
Like many NLP tasks, the question of Named Entity Recognition can be adressed either using a symbolic or a data-centered approach. In this paper, we present a hybrid approach which consists in the adaptation of data mining techniques. Our system, mXS, relies on a sequential hierarchical text mining techniques. It implements a data-centered approach...
Article
Those latest decades, the development of information and communication technologies has substantially modified the way we access knowledge. Facing the volume and the diversity of data streams, working out robust and efficient technologies to retrieve information becomes a necessity. In this context, Named Entities (persons, locations, organizations...
Article
Full-text available
Cet article présente une analyse des relations anaphoriques d’un corpus de dialogue oral spontané en français. Au cours des deux dernières décennies, l’ingénierie des langues a connu des avancées spectaculaires qui ont permis l’émergence de nombreuses applications opérationnelles destinées aussi bien au grand public qu’aux professionnels. Parmi ces...
Conference Paper
Full-text available
Within Information Extraction tasks, Named Entity Recognition has received much attention over latest decades. From symbolic / knowledge-based to data-driven / machine-learning systems, many approaches have been experimented. Our work may be viewed as an attempt to bridge the gap from the data-driven perspective back to the knowledge-based one. We...
Article
Full-text available
Many evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our research team has developed a symbolic system based on finite state tranducers, which achieved promising results during the Ester2 French-speaking evaluation campaign. Despite these encouraging results, m...
Conference Paper
Full-text available
Many evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our research team has developed CasEN, a symbolic system based on finite state transducers, which achieved promising results during the Ester2 French-speaking evaluation campaign. Despite these encouraging re...
Article
Full-text available
Recognizing named entities is a task that is mainly processed by systems that are specified using rules or that are learned. In this paper, we introduce an approach aiming at extracting symbolic and discriminative rules that may be reviewed by humans. We are given a reference corpus, from which we extract informative transducer rules. Then an algor...
Article
Full-text available
E. Cet article présente tout d'abord la cascade de transducteurs CasEN pour la reconnaissance des entités nommées. CasEN est implantée sous le logiciel CasSys de la plate-forme Unitex et est librement misè a disposition des utilisateurs sous licence LGPL-LR. Après une discussion sur la typologie des entités nommées qu'elle utilise et une descriptio...
Article
This paper presents first the CasEN transducer cascade to recognize French Named Entities. CasEN is implemented with the CasSys software of the Unitex plateform and is put at user free disposal (LGPL-LR license). We discuss about Named Entity typology used and we describe the cascade, before reporting its evaluation from Eslo 1 corpus and evaluatio...
Article
Full-text available
In this paper, we present and analyze the results obtained by our named entity recognition system, CasEN, during the Ester2 evaluation campaign. We identify on what difficulties our system was the most challenged, which mainly are: out-of-vocabulary words, metonymy and detection of the boundaries of named entities. Next, we propose a direction whic...
Article
Full-text available
In this paper, we present a detailed and critical analysis of the behaviour of the CasEN named entity recognition system during the French Ester2 evaluation campaign. In this project, CasEN has been confronted with the task of detecting and categorizing named entities in manual and automatic transcriptions of radio broadcastings. At first, we give...
Article
Natural Language Processing systems are large-scale softwares, whose development involves many man-years of work, in terms of both coding and resource development. Given a dictionary of 110k lemmas, a few hundred syntactic analysis rules, 20k ngrams matrices and other resources, what will be the impact on a syntactic analyzer of adding a new possib...

Network

Cited By