Raphaël Troncy

Raphaël Troncy
EURECOM · Data Science Department

PhD

About

318
Publications
57,987
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,382
Citations
Citations since 2017
88 Research Items
2060 Citations
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300
20172018201920202021202220230100200300

Publications

Publications (318)
Article
Full-text available
In this paper, we propose an unsupervised approach to generate TV series summaries using screenplays that are composed of dialogue and scenic textual descriptions. In the last years, the creation of large language models has enabled zero-shot text classification to perform effectively in some conditions. We explore if and how such models can be use...
Article
Full-text available
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta)....
Conference Paper
We present new approaches used in the DAGOBAH system to perform automatic semantic table interpretation. DAGOBAH semantically annotates tables with Wikidata entities and relations to perform three tasks: Columns-Property Annotation (CPA), Cell-Entity Annotation (CEA) and Column-Type Annotation (CTA). In our system, the initial scores from entity di...
Article
Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with r...
Chapter
Relational tables are widely used to store information about entities and their attributes and they are the de-facto format for training AI algorithms. Numerous Semantic Table Interpretation approaches have been proposed in particular for the so-called cell-entity annotation task aiming at disambiguating the values of table cells given reference kn...
Chapter
Dynamic environments can be modeled as a series of events and facts that interact with each other, these interactions being characterised by different relations including temporal and causal ones. These have largely been studied in knowledge management, information retrieval or natural language processing, leading to several strategies aiming at ex...
Chapter
The past few years have seen a growing research interest in Semantic Table Interpretation (STI), i.e. the task of annotating tables with elements defined in knowledge graphs (KGs). These semantic annotations make use of entities and standardized types and relations and can, in turn, support several downstream use cases for tabular data such as data...
Article
Full-text available
Content-based recommendation systems offer the possibility of promoting media (e.g., posts, videos, podcasts) to users based solely on a representation of the content (i.e., without using any user-related data such as views or interactions between users and items). In this work, we study the potential of using different textual representations (bas...
Article
Full-text available
When browsing or studying a video corpus, particularly relevant information consists in knowing who are the people appearing in the scenes. In this paper, we show how a combination of state of the art techniques can be organised in a pipeline for face recognition of celebrities. In particular, we propose a system which combines MTCNN for detecting...
Preprint
Full-text available
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta)....
Preprint
Topic models are statistical methods that extract underlying topics from document collections. When performing topic modeling, a user usually desires topics that are coherent, diverse between each other, and that constitute good document representations for downstream tasks (e.g. document classification). In this paper, we conduct a multi-objective...
Chapter
Smells are a key sensory experience. They are part of a multi-billion euro industry and gaining traction in different research fields such as museology, art, history, and digital humanities. Until now, a semantic model for describing smells and their associated experiences was lacking. In this paper, we present the Odeuropa data model for olfactory...
Article
As technology accelerates the generation and communication of textual data, the need to automatically understand this content becomes a necessity. In order to classify text, being it for tagging, indexing or curating documents, one often relies on large, opaque models that are trained on pre-annotated datasets, making the process unexplainable, dif...
Article
Full-text available
An important problem in large symbolic music collections is the low availability of high-quality metadata, which is essential for various information retrieval tasks. Traditionally, systems have addressed this by relying either on costly human annotations or on rule-based systems at a limited scale. Recently, embedding strategies have been exploite...
Article
A scientific conference is a type of event where attendees have a tremendous activity on social media platforms. Participants tweet or post longer status messages, engage in discussions with comments, share slides and other media captured during the conference. This information can be used to generate informative reports of what is happening, where...
Chapter
Full-text available
Injecting real-world information (typically contained in Knowledge Graphs) and human expertise into an end-to-end training pipeline for Natural Language Processing models is an open challenge. In this preliminary work, we propose to approach the task of Named Entity Recognition, which is traditionally viewed as a Sequence Labeling problem, as a Gra...
Article
Full-text available
Recommender systems have already been introduced in several industries such as retailing and entertainment, with great success. However, their application in the airline industry remains in its infancy. We discuss why this has been the case and why this situation is about to change in light of IATA’s New Distribution Capability standard. We argue t...
Chapter
How to understand better the knowledge provided by Google results to build future “smart vehicle-centric” applications? What is the knowledge expertise required to build a smart vehicle application (e.g., driver assistance system)? Automotive companies (e.g., Toyota, BMW, Renault) are employing Internet of Things (IoT) and Semantic Web technologies...
Article
Editors: Taylor Arnold, Jasmijn van Gorp, Stefania Scagliola, and Lauren Tilton
Chapter
In Digital Humanities, one of the main challenge consists in capturing the structure of complex information in data models and ontologies, in particular when connections between terms are not trivial. This is typically the case for librarian music data. In this chapter, we provide some good practices for representing complex knowledge using the DOR...
Article
Full-text available
The documentation, dissemination, and enhancement of Cultural Heritage is of great relevance. To that end, technological tools and interactive solutions (e.g., 3D models) have become increasingly popular. Historical silk fabrics are nearly flat objects, very fragile and with complex internal geometries, related to different weaving techniques and t...
Preprint
Full-text available
This paper presents the D2KLab team's approach to the RecSys Challenge 2019 which focuses on the task of recommending accommodations based on user sessions. What is the feeling of a person who says "Rooms of the hotel are enormous, staff are friendly and efficient"? It is positive. Similarly to the sequence of words in a sentence where one can affi...
Preprint
Full-text available
This paper describes the approach proposed by the D2KLab team for the 2020 RecSys Challenge on the task of predicting user engagement facing tweets. This approach relies on two distinct stages. First, relevant features are learned from the challenge dataset. These features are heterogeneous and are the results of different learning modules such as...
Conference Paper
Cet article présente le système DAGOBAH permettant d’annoter sémantiquement des tables à l’aide d’entités Wikidata et DBPedia. Le système proposé annote les cellules et les colonnes d’une table et identifie des relations entre ces colonnes. Pour cela, un processus allant du pré-traitement des tables jusqu’à l’enrichissement d’un graphe de connaissa...
Poster
Full-text available
We present results of collaborative work bringing together semantic technologies, machine learning and cultural heritage to enable advanced search and visualization of textual descriptions of museum artifacts related to silk fabrics. Proposed is a multilingual txt analysis approach where the developed domain-specific multilingual thesaurus and doma...
Poster
Full-text available
We present results of collaborative work bringing together semantic technologies, machine learning and cultural heritage to enable advanced search and visualization of textual descriptions of museum artifacts related to silk fabrics. Proposed is a multilingual txt analysis approach where the developed domain-specific multilingual thesaurus and doma...
Article
Knowledge graphs have shown to be highly beneficial to recommender systems, providing an ideal data structure to generate hybrid recommendations using both content-based and collaborative filtering. Most knowledge-aware recommender systems are based on manually engineered features, typically relying on path counting and/or on random walks. Recently...
Book
This book constitutes the proceedings of the satellite events held at the 17th Extended Semantic Web Conference, ESWC 2020, in May/June 2020. The conference was planned to take place in Heraklion, Crete, Greece, but changed to an online format due to the COVID-19 pandemic. ESWC is a major venue for presenting and discussing the latest scientific re...
Conference Paper
In this paper, we present the DAGOBAH system which tackles the Tabular Data to Knowledge Graph Matching (TDKGM) challenge. DAGOBAH aims to semantically annotate tables with Wikidataand DBpedia entities, and more precisely performs cell and column annotation and relationship identification, via a pipeline starting from pre-processing to enriching an...
Chapter
In a document-based world as the one of Web APIs, the triple-based output of SPARQL endpoints can be a barrier for developers who want to integrate Linked Data in their applications. A different JSON output can be obtained with SPARQL Transformer, which relies on a single JSON object for defining which data should be extracted from the endpoint and...
Conference Paper
Technological developments in comprehensive video understanding - detecting and identifying visual elements of a scene, combined with audio understanding (music, speech), as well as aligned with textual information such as captions, subtitles, etc. and background knowledge - have been undergoing a significant revolution during recent years. The wor...
Conference Paper
Full-text available
Modern vehicles produce big data with a wide variety of formats due to missing open standards. Thus, abstractions of such data in the form of descriptive labels are desired to facilitate the development of applications in the automotive domain. We propose an approach to reduce vehicle sensor data into semantic outcomes of dangerous driving events b...
Conference Paper
The Web of Things offers a platform-independent solution for interacting with connected devices. An important vertical of the WoT is the transportation domain with, at its core, autonomous systems and among others, connected vehicles. They can be seen as complex artefacts , as they are composed of many sensors and actuators, legacy specifications a...
Chapter
More than 2 millions of new books are published every year and choosing a good book among the huge amount of available options can be a challenging endeavor. Recommender systems help in choosing books by providing personalized suggestions based on the user reading history. However, most book recommender systems are based on collaborative filtering,...
Conference Paper
Full-text available
The amount of information available in social media and specialized blogs has become useful for a user to plan a trip. However, the user is quickly overwhelmed by the list of possibilities offered to him, making his search complex and time-consuming. Recommender systems aim to provide personalized suggestions to users by leveraging different type o...
Conference Paper
News agencies produce thousands of multimedia stories describing events happening in the world that are either scheduled such as sports competitions, political summits and elections, or breaking events such as military conflicts, terrorist attacks, natural disasters, etc. When writing up those stories, journalists refer to contextual background and...
Preprint
Full-text available
News agencies produce thousands of multimedia stories describing events happening in the world that are either scheduled such as sports competitions, political summits and elections, or breaking events such as military conflicts, terrorist attacks, natural disasters, etc. When writing up those stories, journalists refer to contextual background and...
Preprint
The knowledge of city exploration trails of people is in short supply because of the complexity in defining meaningful trails representative of individual behaviours and in the access to actionable data. Existing datasets have only recorded isolated check-ins of activities featured by opaque venue types. In this paper, we fill the gaps of defining...
Preprint
Full-text available
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neu-ral machine translation (NMT) architecture to a multi-modal setting. In this paper , we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top...
Conference Paper
Current evaluation methods of exploratory search systems are still incomplete as they are not fully based on a suitable model of the exploratory search process: as such they cannot be used to determine if they effectively support exploratory search behaviors and tasks. Aiming to elaborate evaluation methods based on an appropriate model of explorat...
Conference Paper
Full-text available
Application developers in the automotive domain have to deal with thousands of different signals, represented in highly heterogeneous formats, and coming from various car architectures. This situation prevents the development and connectivity of modern applications. We hypothesize that a formal model of car signals, in which the definition of signa...
Conference Paper
This paper describes the approach of the D2KLab team to the RecSys Challenge 2018 that focuses on the task of playlist completion. We propose an ensemble strategy of different recurrent neural networks leveraging pre-trained embeddings representing tracks, artists, albums, and titles as inputs. We also use lyrics from which we extract semantic and...
Preprint
Full-text available
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top s...
Chapter
In the past years, knowledge graphs have proven to be beneficial for recommender systems, efficiently addressing paramount issues such as new items and data sparsity. Graph embeddings algorithms have shown to be able to automatically learn high quality feature vectors from graph structures, enabling vector-based measures of node relatedness. In thi...
Chapter
Translational models have proven to be accurate and efficient at learning entity and relation representations from knowledge graphs for machine learning tasks such as knowledge graph completion. In the past years, knowledge graphs have shown to be beneficial for recommender systems, efficiently addressing paramount issues such as new items and data...
Poster
Full-text available
We propose a car signal ontology named VSSo that provides a formal definition of the numerous sensors embedded in car regardless of the vehicle model and brand, re-using the work made by the GENIVI alliance with the Vehicle Signal Specification (VSS). We observe that recent progress in machine learning enables to predict a number of useful informat...
Conference Paper
Car signal data is usually hard to access, understand and integrate for non automotive domain experts. In this paper, we use semantic technologies for enriching signal data in the automotive industry and access it through Web of Things interactions. This combination allows the access and integration of car data from the web. We built VSSo, a Vehicl...
Conference Paper
Full-text available
EVA¹ is describing a new class of emotion-aware autonomous systems delivering intelligent personal assistant functionalities. EVA requires a multi-disciplinary approach, combining a number of critical building blocks into a cybernetics systems/software architecture: emotion aware systems and algorithms, multimodal interaction design, cognitive mode...
Conference Paper
Full-text available
In this paper, we use semantic technologies for enriching trajectory data in the automotive industry for offline analysis. We proposed to re-use a combination of existing ontologies and we designed a Vehicle Signal Specification ontology to provide an environment in which we developed an application that analyzes the variations of signal values and...
Conference Paper
Full-text available
SPARQL endpoints are one possible access method to linked data. The results of SPARQL queries serialized in JSON are, however, not suitable to be directly used by web developers in end-user applications who often need to merge the values resulting from variable bindings. In this work, we propose a generic approach implemented in a JavaScript module...
Article
Full-text available
Das DOREMUS Projekt strebt eine bessere Beschreibung von Musik an, indem es Daten dreier französicher Institutionen untersucht und zusammenführt. Der vorliegende Artikel gibt einen Überblick über das auf FRBRoo basierende Datenmodell, das die automatische Umwandlung und Verlinkung von Daten ermöglicht. Er stellt Prototypen vor, wie die Daten nach d...
Article
Event-based services have recently witnessed a rapid growth driving the way people explore and share information of interest. They host a huge amount of users' activities including explicit RSVP, shared photos, comments and social connections. Exploiting these activities to detect communities of similar users is a challenging problem. In reality, a...
Article
One of the major issues encountered in the generation of knowledge bases is the integration of data coming from a collection of heterogeneous data sources. A key essential task when integrating data instances is the entity matching. Entity matching is based on the definition of a similarity measure among entities and on the classification of the en...
Chapter
Full-text available
Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have ide...
Book
This book constitutes the refereed proceedings of the 15th International Semantic Web Conference, ESWC 2018, held in Heraklion, Crete, Greece. The 48 revised full papers presented were carefully reviewed and selected from 179 submissions. The papers cover a large range of topics such as logical modelling and reasoning, natural language processing,...
Article
Full-text available
Data owners are creating an ever richer set of information resources online, and these are being used for more and more applications. Spatial data on the Web is becoming ubiquitous and voluminous with the rapid growth of location-based services, spatial technologies, dynamic location-based data and services published by different organizations. How...
Conference Paper
In this paper we report the participation of ADEL to the OKE 2017 challenge. In particular, an adaptive entity recognition and linking framework that combines various extraction methods for improving the recognition level and implements an efficient knowledge base indexing process to increase the performance of the linking step. We detail how we de...
Conference Paper
Representing and retrieving fine-grained information related to something as complex as music composition, recording and performance is a challenging activity. This complexity requires that the data model enables to describe different outcomes of the creative process, from the writing of the score, to its performance and publishing. In this paper,...
Conference Paper
Full-text available
Second screen applications are becoming key for broadcasters ex- ploiting the convergence of TV and Internet. Authoring such appli- cations however remains costly. In this paper, we present a second screen authoring application that leverages multimedia content analytics and social media monitoring. A back-office is dedicated to easy and fast conte...
Conference Paper
Full-text available
In this paper, we present the design and implementation of DrIveSCOVER, a recommender system for places and events in case of an in-car use, where the driving conditions such as weather and local traffic are taken into account. We integrate multiple data sources using semantic technologies and we devise recommending functions that are presented in...
Conference Paper
Full-text available
Knowledge Graphs have proven to be extremely valuable to recommender systems, as they enable hybrid graph-based recommendation models encompassing both collaborative and content information. Leveraging this wealth of heterogeneous information for top-N item recommendation is a challenging task, as it requires the ability of effectively encoding a d...
Article
Planning a visit to Expo Milano 2015 or simply touring in Milan are activities that require a certain amount of a priori knowledge of the city. In this paper, we present the process of building such comprehensive knowledge bases that contain descriptions of events and activities, places and sights, transportation facilities as well as social activi...