Raphaël Troncy

Raphaël Troncy
EURECOM · Data Science Department

PhD

About

316
Publications
56,993
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,305
Citations
Citations since 2016
96 Research Items
2283 Citations
20162017201820192020202120220100200300
20162017201820192020202120220100200300
20162017201820192020202120220100200300
20162017201820192020202120220100200300

Publications

Publications (316)
Article
Full-text available
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta)....
Conference Paper
We present new approaches used in the DAGOBAH system to perform automatic semantic table interpretation. DAGOBAH semantically annotates tables with Wikidata entities and relations to perform three tasks: Columns-Property Annotation (CPA), Cell-Entity Annotation (CEA) and Column-Type Annotation (CTA). In our system, the initial scores from entity di...
Article
Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with r...
Chapter
Relational tables are widely used to store information about entities and their attributes and they are the de-facto format for training AI algorithms. Numerous Semantic Table Interpretation approaches have been proposed in particular for the so-called cell-entity annotation task aiming at disambiguating the values of table cells given reference kn...
Chapter
Dynamic environments can be modeled as a series of events and facts that interact with each other, these interactions being characterised by different relations including temporal and causal ones. These have largely been studied in knowledge management, information retrieval or natural language processing, leading to several strategies aiming at ex...
Chapter
The past few years have seen a growing research interest in Semantic Table Interpretation (STI), i.e. the task of annotating tables with elements defined in knowledge graphs (KGs). These semantic annotations make use of entities and standardized types and relations and can, in turn, support several downstream use cases for tabular data such as data...
Article
Full-text available
Content-based recommendation systems offer the possibility of promoting media (e.g., posts, videos, podcasts) to users based solely on a representation of the content (i.e., without using any user-related data such as views or interactions between users and items). In this work, we study the potential of using different textual representations (bas...
Article
Full-text available
When browsing or studying a video corpus, particularly relevant information consists in knowing who are the people appearing in the scenes. In this paper, we show how a combination of state of the art techniques can be organised in a pipeline for face recognition of celebrities. In particular, we propose a system which combines MTCNN for detecting...
Preprint
Topic models are statistical methods that extract underlying topics from document collections. When performing topic modeling, a user usually desires topics that are coherent, diverse between each other, and that constitute good document representations for downstream tasks (e.g. document classification). In this paper, we conduct a multi-objective...
Chapter
Smells are a key sensory experience. They are part of a multi-billion euro industry and gaining traction in different research fields such as museology, art, history, and digital humanities. Until now, a semantic model for describing smells and their associated experiences was lacking. In this paper, we present the Odeuropa data model for olfactory...
Article
As technology accelerates the generation and communication of textual data, the need to automatically understand this content becomes a necessity. In order to classify text, being it for tagging, indexing or curating documents, one often relies on large, opaque models that are trained on pre-annotated datasets, making the process unexplainable, dif...
Article
Full-text available
An important problem in large symbolic music collections is the low availability of high-quality metadata, which is essential for various information retrieval tasks. Traditionally, systems have addressed this by relying either on costly human annotations or on rule-based systems at a limited scale. Recently, embedding strategies have been exploite...
Article
A scientific conference is a type of event where attendees have a tremendous activity on social media platforms. Participants tweet or post longer status messages, engage in discussions with comments, share slides and other media captured during the conference. This information can be used to generate informative reports of what is happening, where...
Chapter
Full-text available
Injecting real-world information (typically contained in Knowledge Graphs) and human expertise into an end-to-end training pipeline for Natural Language Processing models is an open challenge. In this preliminary work, we propose to approach the task of Named Entity Recognition, which is traditionally viewed as a Sequence Labeling problem, as a Gra...
Article
Full-text available
Recommender systems have already been introduced in several industries such as retailing and entertainment, with great success. However, their application in the airline industry remains in its infancy. We discuss why this has been the case and why this situation is about to change in light of IATA’s New Distribution Capability standard. We argue t...
Chapter
How to understand better the knowledge provided by Google results to build future “smart vehicle-centric” applications? What is the knowledge expertise required to build a smart vehicle application (e.g., driver assistance system)? Automotive companies (e.g., Toyota, BMW, Renault) are employing Internet of Things (IoT) and Semantic Web technologies...
Article
Editors: Taylor Arnold, Jasmijn van Gorp, Stefania Scagliola, and Lauren Tilton
Chapter
In Digital Humanities, one of the main challenge consists in capturing the structure of complex information in data models and ontologies, in particular when connections between terms are not trivial. This is typically the case for librarian music data. In this chapter, we provide some good practices for representing complex knowledge using the DOR...
Article
Full-text available
The documentation, dissemination, and enhancement of Cultural Heritage is of great relevance. To that end, technological tools and interactive solutions (e.g., 3D models) have become increasingly popular. Historical silk fabrics are nearly flat objects, very fragile and with complex internal geometries, related to different weaving techniques and t...
Preprint
Full-text available
This paper presents the D2KLab team's approach to the RecSys Challenge 2019 which focuses on the task of recommending accommodations based on user sessions. What is the feeling of a person who says "Rooms of the hotel are enormous, staff are friendly and efficient"? It is positive. Similarly to the sequence of words in a sentence where one can affi...
Preprint
Full-text available
This paper describes the approach proposed by the D2KLab team for the 2020 RecSys Challenge on the task of predicting user engagement facing tweets. This approach relies on two distinct stages. First, relevant features are learned from the challenge dataset. These features are heterogeneous and are the results of different learning modules such as...
Conference Paper
Cet article présente le système DAGOBAH permettant d’annoter sémantiquement des tables à l’aide d’entités Wikidata et DBPedia. Le système proposé annote les cellules et les colonnes d’une table et identifie des relations entre ces colonnes. Pour cela, un processus allant du pré-traitement des tables jusqu’à l’enrichissement d’un graphe de connaissa...
Poster
Full-text available
We present results of collaborative work bringing together semantic technologies, machine learning and cultural heritage to enable advanced search and visualization of textual descriptions of museum artifacts related to silk fabrics. Proposed is a multilingual txt analysis approach where the developed domain-specific multilingual thesaurus and doma...
Poster
Full-text available
We present results of collaborative work bringing together semantic technologies, machine learning and cultural heritage to enable advanced search and visualization of textual descriptions of museum artifacts related to silk fabrics. Proposed is a multilingual txt analysis approach where the developed domain-specific multilingual thesaurus and doma...
Article
Knowledge graphs have shown to be highly beneficial to recommender systems, providing an ideal data structure to generate hybrid recommendations using both content-based and collaborative filtering. Most knowledge-aware recommender systems are based on manually engineered features, typically relying on path counting and/or on random walks. Recently...
Book
This book constitutes the proceedings of the satellite events held at the 17th Extended Semantic Web Conference, ESWC 2020, in May/June 2020. The conference was planned to take place in Heraklion, Crete, Greece, but changed to an online format due to the COVID-19 pandemic. ESWC is a major venue for presenting and discussing the latest scientific re...
Conference Paper
In this paper, we present the DAGOBAH system which tackles the Tabular Data to Knowledge Graph Matching (TDKGM) challenge. DAGOBAH aims to semantically annotate tables with Wikidataand DBpedia entities, and more precisely performs cell and column annotation and relationship identification, via a pipeline starting from pre-processing to enriching an...
Chapter
In a document-based world as the one of Web APIs, the triple-based output of SPARQL endpoints can be a barrier for developers who want to integrate Linked Data in their applications. A different JSON output can be obtained with SPARQL Transformer, which relies on a single JSON object for defining which data should be extracted from the endpoint and...
Conference Paper
Technological developments in comprehensive video understanding - detecting and identifying visual elements of a scene, combined with audio understanding (music, speech), as well as aligned with textual information such as captions, subtitles, etc. and background knowledge - have been undergoing a significant revolution during recent years. The wor...
Conference Paper
Full-text available
Modern vehicles produce big data with a wide variety of formats due to missing open standards. Thus, abstractions of such data in the form of descriptive labels are desired to facilitate the development of applications in the automotive domain. We propose an approach to reduce vehicle sensor data into semantic outcomes of dangerous driving events b...
Conference Paper
The Web of Things offers a platform-independent solution for interacting with connected devices. An important vertical of the WoT is the transportation domain with, at its core, autonomous systems and among others, connected vehicles. They can be seen as complex artefacts , as they are composed of many sensors and actuators, legacy specifications a...
Chapter
More than 2 millions of new books are published every year and choosing a good book among the huge amount of available options can be a challenging endeavor. Recommender systems help in choosing books by providing personalized suggestions based on the user reading history. However, most book recommender systems are based on collaborative filtering,...
Conference Paper
Full-text available
The amount of information available in social media and specialized blogs has become useful for a user to plan a trip. However, the user is quickly overwhelmed by the list of possibilities offered to him, making his search complex and time-consuming. Recommender systems aim to provide personalized suggestions to users by leveraging different type o...
Conference Paper
News agencies produce thousands of multimedia stories describing events happening in the world that are either scheduled such as sports competitions, political summits and elections, or breaking events such as military conflicts, terrorist attacks, natural disasters, etc. When writing up those stories, journalists refer to contextual background and...
Preprint
Full-text available
News agencies produce thousands of multimedia stories describing events happening in the world that are either scheduled such as sports competitions, political summits and elections, or breaking events such as military conflicts, terrorist attacks, natural disasters, etc. When writing up those stories, journalists refer to contextual background and...
Preprint
The knowledge of city exploration trails of people is in short supply because of the complexity in defining meaningful trails representative of individual behaviours and in the access to actionable data. Existing datasets have only recorded isolated check-ins of activities featured by opaque venue types. In this paper, we fill the gaps of defining...
Preprint
Full-text available
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neu-ral machine translation (NMT) architecture to a multi-modal setting. In this paper , we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top...
Conference Paper
Current evaluation methods of exploratory search systems are still incomplete as they are not fully based on a suitable model of the exploratory search process: as such they cannot be used to determine if they effectively support exploratory search behaviors and tasks. Aiming to elaborate evaluation methods based on an appropriate model of explorat...
Conference Paper
Full-text available
Application developers in the automotive domain have to deal with thousands of different signals, represented in highly heterogeneous formats, and coming from various car architectures. This situation prevents the development and connectivity of modern applications. We hypothesize that a formal model of car signals, in which the definition of signa...
Conference Paper
This paper describes the approach of the D2KLab team to the RecSys Challenge 2018 that focuses on the task of playlist completion. We propose an ensemble strategy of different recurrent neural networks leveraging pre-trained embeddings representing tracks, artists, albums, and titles as inputs. We also use lyrics from which we extract semantic and...
Preprint
Full-text available
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top s...
Chapter
In the past years, knowledge graphs have proven to be beneficial for recommender systems, efficiently addressing paramount issues such as new items and data sparsity. Graph embeddings algorithms have shown to be able to automatically learn high quality feature vectors from graph structures, enabling vector-based measures of node relatedness. In thi...
Chapter
Translational models have proven to be accurate and efficient at learning entity and relation representations from knowledge graphs for machine learning tasks such as knowledge graph completion. In the past years, knowledge graphs have shown to be beneficial for recommender systems, efficiently addressing paramount issues such as new items and data...
Poster
Full-text available
We propose a car signal ontology named VSSo that provides a formal definition of the numerous sensors embedded in car regardless of the vehicle model and brand, re-using the work made by the GENIVI alliance with the Vehicle Signal Specification (VSS). We observe that recent progress in machine learning enables to predict a number of useful informat...
Conference Paper
Car signal data is usually hard to access, understand and integrate for non automotive domain experts. In this paper, we use semantic technologies for enriching signal data in the automotive industry and access it through Web of Things interactions. This combination allows the access and integration of car data from the web. We built VSSo, a Vehicl...
Conference Paper
Full-text available
EVA¹ is describing a new class of emotion-aware autonomous systems delivering intelligent personal assistant functionalities. EVA requires a multi-disciplinary approach, combining a number of critical building blocks into a cybernetics systems/software architecture: emotion aware systems and algorithms, multimodal interaction design, cognitive mode...
Conference Paper
Full-text available
In this paper, we use semantic technologies for enriching trajectory data in the automotive industry for offline analysis. We proposed to re-use a combination of existing ontologies and we designed a Vehicle Signal Specification ontology to provide an environment in which we developed an application that analyzes the variations of signal values and...
Conference Paper
Full-text available
SPARQL endpoints are one possible access method to linked data. The results of SPARQL queries serialized in JSON are, however, not suitable to be directly used by web developers in end-user applications who often need to merge the values resulting from variable bindings. In this work, we propose a generic approach implemented in a JavaScript module...
Article
Full-text available
Das DOREMUS Projekt strebt eine bessere Beschreibung von Musik an, indem es Daten dreier französicher Institutionen untersucht und zusammenführt. Der vorliegende Artikel gibt einen Überblick über das auf FRBRoo basierende Datenmodell, das die automatische Umwandlung und Verlinkung von Daten ermöglicht. Er stellt Prototypen vor, wie die Daten nach d...
Article
Event-based services have recently witnessed a rapid growth driving the way people explore and share information of interest. They host a huge amount of users' activities including explicit RSVP, shared photos, comments and social connections. Exploiting these activities to detect communities of similar users is a challenging problem. In reality, a...
Article
One of the major issues encountered in the generation of knowledge bases is the integration of data coming from a collection of heterogeneous data sources. A key essential task when integrating data instances is the entity matching. Entity matching is based on the definition of a similarity measure among entities and on the classification of the en...
Chapter
Full-text available
Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have ide...