About
40
Publications
4,520
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
259
Citations
Citations since 2017
Introduction
I am associate professor in computer science at INSA Lyon and LIRIS lab.
I was previously (2016-2018) a postdoctoral researcher at the Naval Academy Research Institute (IRENav) in France, where I worked with Prof. C. Claramunt. I hold a PhD in computer science from the University of Pau (France) and University of Zaragoza (Spain) (supervised by Prof. M. Gaio, Prof. J. Nogueras-Iso and Dr. S. Mustière).
My research is oriented towards pluri-disciplinary aspects of Natural Language Processing (NLP), information retrieval, data mining, digital humanities and geographical information science (GIS)
Additional affiliations
September 2018 - present
September 2016 - August 2018
November 2015 - September 2016
Publications
Publications (40)
Geoparsing and geocoding are two essential middleware services to facilitate final user applications such as location-aware searching or different types of location-based services. The objective of this work is to propose a method for establishing a processing chain to support the geoparsing and geocoding of text documents describing events strongl...
Considerable amounts of geographical data are still collected not in form of GIS data but just as natural language texts. This paper proposes an approach for the automatic geocoding of itineraries described in natural language. This approach needs as an input a text annotated with part-of-speech and geo-semantic tags. The proposed method is divided...
In this paper, we propose and discuss a methodology to map the spatial fingerprints of novels and authors based on all of the named urban roads (i.e., odonyms) extracted from novels. We present several ways to explore Parisian space and fictional landscapes by interactively and simultaneously browsing geographical space and literary text. Our proje...
Geographic text analysis (GTA) research in the digital humanities has focused on projects analyzing modern English-language corpora. These projects depend on temporally specific lexicons and gazetteers that enable place name identification and georesolution. Scholars working on the early modern period (1400-1800) lack temporally appropriate geopars...
The backbone of the proposal in this chapter is an automatic parser and a formal encoder of information describing places, spatial and verbal relations in textual documents in order to reconstruct and map the textually described itinerary. These tools allow us to show how to combine the information expressed in French texts, referring to places, sp...
This article reports on the 5th ACM SIGSPATIAL Workshop on Geospatial Humanities, held in conjunction with the 29th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. The article outlines the objectives of the workshop, and briefly describes the technical program.
Understanding the role of humans in environmental change is one of the most pressing challenges of the 21st century. Environmental narratives – written texts with a focus on the environment – offer rich material capturing relationships between people and surroundings. We take advantage of two key opportunities for their computational analysis: mass...
Understanding the role of humans in environmental change is one of the most pressing challenges of the 21st century. Environmental narratives – written texts with a focus on the environment – offer rich material capturing relationships between people and surroundings. We take advantage of two key opportunities for their computational analysis: mass...
This article presents a comparative study of supervised classification approaches applied to the automatic classification of encyclopedia articles written in French. Our dataset includes all 70k text articles from Diderot and d’Alembert’s Encyclopédie (1751-72). In a two-task experiment we test combinations of (1) text vectorization methods (bags-o...
Geocoding aims to assign unambiguous locations (i.e., geographic coordinates) to place names (i.e., toponyms) referenced within documents (e.g., within spreadsheet tables or textual paragraphs). This task comes with multiple challenges, such as dealing with referent ambiguity (multiple places with a same name) or reference database completeness. In...
Résolution de toponymes par apprentissage profond à partir de cooccurrences et de relations spatiales
Présentation vidéo d'une minute : https://pod.univ-lr.fr/video/2865-resolution-de-toponymes-par-apprentissage-profond-a-partir-de-cooccurrences-et-de-relations-spatiales/
This article reports on the 4th ACM SIGSPATIAL Workshop on Geospatial Humanities, held in conjunction with the 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. The article outlines the objectives of the workshop, and briefly describes the technical program.
Suggesting services or products to people is a task that should be handled by recommendation systems due to the important increase of information and the multitude of user criteria. In fact, when expressing wishes for a product, a user is influenced by his/her tastes or priorities. These influential characteristics tend to be challenging regarding...
Discourse may contain both named and nominal entities. Most common nouns or nominal mentions in natural language do not have a single, simple meaning but rather a number of related meanings. This form of ambiguity led to the development of a task in natural language processing known as Word Sense Disambiguation. Recognition and categorisation of na...
Nous présentons la méthode que nous avons suivie pour améliorer notre annotation automatique des entités nommées dans l’Encyclopédie de Diderot et d’Alembert. L’outil d’annotation sémantique PERDIDO que nous utilisons a été initialement développé pour l’annotation d’informations géographiques et la reconstruction d’itinéraire. Nous proposons d’y im...
This article reports on the 3rd ACM SIGSPATIAL Workshop on Geospatial Humanities, held in conjunction with the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. The article outlines the objectives of the workshop, and briefly describes the technical program.
Nous présentons la méthode que nous avons suivie pour améliorer notre annotation automatique des entités nommées dans l’Encyclopédie de Diderot et d’Alembert. L’outil d’annotation sémantique PERDIDO que nous utilisons a été initialement développé pour l’annotation d’informations géographiques et la reconstruction d’itinéraire. Nous proposons d’y im...
In this paper we use network analysis to identify qualitative "neighbors" for toponyms in an eighteenth-century French encyclopedia, but could apply to any entry-based text with annotated toponyms. This method draws on relations in a corpus of articles, which improves disambiguation at a later stage with an external resource. We suggest the network...
Points of interest (POI) are central in many applications such as tourism, itinerary search, crisis management. Cartographic providers usually represent these POI with a spatial entity. However, the description of these entities may significantly vary from one provider to another (e.g., missing properties, outdated information, conflicting values)....
Cet article propose une méthodologie pour cartographier les empreintes spa-tiales des romans et des auteurs sur la base de tous les odonymes extraits des romans. Nous présentons une manière originale d'explorer l'espace parisien et les paysages fictifs en parcourant de manière interactive et simultanée l'espace géographique et le texte littéraire....
Dans cet article, nous nous intéressons à deux aspects peu étudiés dans les travaux de recherche en TAL : traiter des documents historiques en français et traiter des structures textuelles complexes au-delà du texte courant ou des listes de noms de lieux. Notre méthodologie s'appuie sur l'évaluation des résultats de deux outils de reconnaissance d'...
Fictive motion (e.g. ‘The highway runs along the coast’) is a pervasive phenomenon in language that can imply both a static and a moving observer. In a corpus of alpine narratives, it is used in three types of spatial descriptions: conveying the actual motion of the observer, describing a vista and communicating encyclopaedic spatial knowledge. Thi...
Spatial descriptions, with or without motion, are the main issues addressed by this paper. We describe construction grammars implemented in the PERDIDO platform with cascaded finite-state transducers which aims at marking and formalizing relations between extended named entities, geographical terms, spatial relations and motion verbs. These grammar...
Our project involves building a platform able to retrieve, map and analyze the occurrences of place names in fictional novels published between 1800 and 1914 and whose action occurs wholly or partly in Paris. We describe a proof of concept using queries made via the TXM textual analysis platform for the extraction of street names. Then, we propose...
The semantic annotation of spatial information aims to identify words or phrases describing geographical references (place names) as well as various associated spatial expressions. One of the major difficulties in designing an automatic annotation system for such information is due to ambiguities related to spatial entities. A modular approach base...
The textual geographical information is frequently organized around spatial named entities. Such entities have intrinsic ambiguities and Named Entity Recognition and Classification methods should be improved in order to handle this problem. This article describes a knowledge-based method implementing a full process with the aim of annotating in a m...
Information extraction is one of the main tasks in text mining, which is essential for all types of applications exploiting geographic information because there is a big volume of geographic information not directly compiled in specific formats proposed by Geographic Information Systems, but just embedded in plain text sources. Currently, there are...
This PhD thesis is part of the research project PERDIDO, which aims at extracting and retrieving displacements from textual documents. This work was conducted in collaboration with the LIUPPA laboratory of the university of Pau (France), the IAAA team of the university of Zaragoza (Spain) and the COGIT laboratory of IGN (France). The objective of t...
In this paper we describe a markup language for semantically annotating raw texts. We define a formal representation of text documents written in natural language that can be applied for the task of Named Entities Recognition and Spatial Role Labeling.
The proposal relies on a multi-layer annotation process based on a core generic layer, which can...
This paper proposes an approach for the reconstruction of itineraries extracted from narrative texts. This approach is divided into two main tasks. The first extracts geographical information with natural language processing. Its outputs are annotations of so called expanded entities and expressions of displacement or perception from hiking descrip...
The aim of this work is to find sub-types for Place Named Entities, from the analysis of relations between Place Names and a nominal group within a specific phrasal context. The proposed method combines the use of specific intra-sentential lexico-syntactic relations and external resources like gazetteers, thesauri, or ontologies. It relies on expan...
Projects
Projects (3)
GEODISCO combine traitement automatique du langage (TAL), statistique textuelle, analyse du discours et système d’information géographique autour d’une même question : "Quelles représentations géographiques du monde les encyclopédies françaises véhiculent-elles à travers leurs discours, et que nous disent ces représentations sur chacune des époques où ces encyclopédies ont été écrites et publiées ?"
Problématique :
GEODISCO est un projet de collaboration interdisciplinaire qui réunit des chercheurs en linguistique (ICAR, D. Vigier), en informatique (LIRIS, L. Moncla), en histoire (The Alan Turing Institute, K. McDonough) et en géographie (EVS, T. Joliveau). Il propose de faire converger les acquis, les outils et les méthodes élaborés dans trois laboratoires du pôle universitaire Lyon-St-Etienne autour d’un objet scientifique partagé : le discours géographique tenu dans les encyclopédies françaises des Lumières à Wikipédia.
Notre corpus réunit trois encyclopédies :
- Encyclopédie ou Dictionnaire Raisonné des Sciences, des Arts et des Métiers dirigée par Diderot et d’Alembert (1751-1772) ;
- Encyclopædia Universalis (édition numérique 2018) ;
- Wikipédia (version juillet 2018).
Notre objectif est de combiner les méthodes et les ressources du TAL et de la cartographie d’une part, des humanités numériques et de la statistique textuelle d’autre part, afin de proposer une première analyse de la manière dont ces trois encyclopédies se réfèrent à - et rendent compte de - l’espace. L’objectif, combinant approche linguistique, historique et géographique, est d’explorer les méthodes automatiques d’annotation spatiale des textes encyclopédiques et de cartographie des toponymes cités, cartographie que nous enrichirons d’informations contextuelles extraites des textes. Notre conviction est que de telles visualisations enrichies par des informations linguistiques constitueront des objets numériques à fort potentiel heuristique en vue de mieux appréhender et de mieux comparer les spécificités du discours géographique tenu dans telle ou telle œuvre de notre corpus.
This project shares work-in-progress on space and place in Diderot and d’Alembert’s Encyclopédie (http://kmcdono.com/enc/).
Our work focuses on using new methods in Named Entity Recognition (NER) to interpret the spatial horizons of the Encyclopédie ou Dictionnaire raisonné des sciences, des arts et des métiers, par une Société de Gens de lettres (First Paris edition 1751-1772, in 17 volumes of text and 11 volumes of plates). There are 44,632 total text entries and 14,445 entries classified by the original editors as Géographie.