
Raphaël TroncyEURECOM · Data Science Department
Raphaël Troncy
PhD
About
321
Publications
59,073
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,463
Citations
Citations since 2017
Introduction
Publications
Publications (321)
In this paper, we propose an unsupervised approach to generate TV series summaries using screenplays that are composed of dialogue and scenic textual descriptions. In the last years, the creation of large language models has enabled zero-shot text classification to perform effectively in some conditions. We explore if and how such models can be use...
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta)....
We present new approaches used in the DAGOBAH system to perform automatic semantic table interpretation. DAGOBAH semantically annotates tables with Wikidata entities and relations to perform three tasks: Columns-Property Annotation (CPA), Cell-Entity Annotation (CEA) and Column-Type Annotation (CTA). In our system, the initial scores from entity di...
As technology accelerates the generation and communication of textual data, the need to automatically understand this content becomes a necessity. In order to classify text, being it for tagging, indexing or curating documents, one often relies on large, opaque models that are trained on pre-annotated datasets, making the process unexplainable, dif...
Tabular data often refers to data that is organized in a table with rows and columns. We observe that this data format is widely used on the Web and within enterprise data repositories. Tables potentially contain rich semantic information that still needs to be interpreted. The process of extracting meaningful information out of tabular data with r...
Relational tables are widely used to store information about entities and their attributes and they are the de-facto format for training AI algorithms. Numerous Semantic Table Interpretation approaches have been proposed in particular for the so-called cell-entity annotation task aiming at disambiguating the values of table cells given reference kn...
Dynamic environments can be modeled as a series of events and facts that interact with each other, these interactions being characterised by different relations including temporal and causal ones. These have largely been studied in knowledge management, information retrieval or natural language processing, leading to several strategies aiming at ex...
The past few years have seen a growing research interest in Semantic Table Interpretation (STI), i.e. the task of annotating tables with elements defined in knowledge graphs (KGs). These semantic annotations make use of entities and standardized types and relations and can, in turn, support several downstream use cases for tabular data such as data...
Content-based recommendation systems offer the possibility of promoting media (e.g., posts, videos, podcasts) to users based solely on a representation of the content (i.e., without using any user-related data such as views or interactions between users and items). In this work, we study the potential of using different textual representations (bas...
When browsing or studying a video corpus, particularly relevant information consists in knowing who are the people appearing in the scenes. In this paper, we show how a combination of state of the art techniques can be organised in a pipeline for face recognition of celebrities. In particular, we propose a system which combines MTCNN for detecting...
We develop a multimodal classifier for the cultural heritage domain using a late fusion approach and introduce a novel dataset. The three modalities are Image, Text, and Tabular data. We based the image classifier on a ResNet convolutional neural network architecture and the text classifier on a multilingual transformer architecture (XML-Roberta)....
Topic models are statistical methods that extract underlying topics from document collections. When performing topic modeling, a user usually desires topics that are coherent, diverse between each other, and that constitute good document representations for downstream tasks (e.g. document classification). In this paper, we conduct a multi-objective...
Smells are a key sensory experience. They are part of a multi-billion euro industry and gaining traction in different research fields such as museology, art, history, and digital humanities. Until now, a semantic model for describing smells and their associated experiences was lacking. In this paper, we present the Odeuropa data model for olfactory...
An important problem in large symbolic music collections is the low availability of high-quality metadata, which is essential for various information retrieval tasks. Traditionally, systems have addressed this by relying either on costly human annotations or on rule-based systems at a limited scale. Recently, embedding strategies have been exploite...
A scientific conference is a type of event where attendees have a tremendous activity on social media platforms. Participants tweet or post longer status messages, engage in discussions with comments, share slides and other media captured during the conference. This information can be used to generate informative reports of what is happening, where...
Injecting real-world information (typically contained in Knowledge Graphs) and human expertise into an end-to-end training pipeline for Natural Language Processing models is an open challenge. In this preliminary work, we propose to approach the task of Named Entity Recognition, which is traditionally viewed as a Sequence Labeling problem, as a Gra...
Recommender systems have already been introduced in several industries such as retailing and entertainment, with great success. However, their application in the airline industry remains in its infancy. We discuss why this has been the case and why this situation is about to change in light of IATA’s New Distribution Capability standard. We argue t...
How to understand better the knowledge provided by Google results to build future “smart vehicle-centric” applications? What is the knowledge expertise required to build a smart vehicle application (e.g., driver assistance system)? Automotive companies (e.g., Toyota, BMW, Renault) are employing Internet of Things (IoT) and Semantic Web technologies...
Editors: Taylor Arnold, Jasmijn van Gorp, Stefania Scagliola, and Lauren Tilton
In Digital Humanities, one of the main challenge consists in capturing the structure of complex information in data models and ontologies, in particular when connections between terms are not trivial. This is typically the case for librarian music data. In this chapter, we provide some good practices for representing complex knowledge using the DOR...
The documentation, dissemination, and enhancement of Cultural Heritage is of great relevance. To that end, technological tools and interactive solutions (e.g., 3D models) have become increasingly popular. Historical silk fabrics are nearly flat objects, very fragile and with complex internal geometries, related to different weaving techniques and t...
This paper presents the D2KLab team's approach to the RecSys Challenge 2019 which focuses on the task of recommending accommodations based on user sessions. What is the feeling of a person who says "Rooms of the hotel are enormous, staff are friendly and efficient"? It is positive. Similarly to the sequence of words in a sentence where one can affi...
This paper describes the approach proposed by the D2KLab team for the 2020 RecSys Challenge on the task of predicting user engagement facing tweets. This approach relies on two distinct stages. First, relevant features are learned from the challenge dataset. These features are heterogeneous and are the results of different learning modules such as...
Cet article présente le système DAGOBAH permettant d’annoter sémantiquement des tables à l’aide d’entités Wikidata et DBPedia. Le système proposé annote les cellules et les colonnes d’une table et identifie des relations entre ces colonnes. Pour cela, un processus allant du pré-traitement des tables jusqu’à l’enrichissement d’un graphe de connaissa...
We present results of collaborative work bringing together semantic technologies, machine learning and cultural heritage to enable advanced search and visualization of textual descriptions of museum artifacts related to silk fabrics. Proposed is a multilingual txt analysis approach where the developed domain-specific multilingual thesaurus and doma...
We present results of collaborative work bringing together semantic technologies, machine learning and cultural heritage to enable advanced search and visualization of textual descriptions of museum artifacts related to silk fabrics. Proposed is a multilingual txt analysis approach where the developed domain-specific multilingual thesaurus and doma...
Knowledge graphs have shown to be highly beneficial to recommender systems, providing an ideal data structure to generate hybrid recommendations using both content-based and collaborative filtering. Most knowledge-aware recommender systems are based on manually engineered features, typically relying on path counting and/or on random walks. Recently...
This book constitutes the proceedings of the satellite events held at the 17th Extended Semantic Web Conference, ESWC 2020, in May/June 2020. The conference was planned to take place in Heraklion, Crete, Greece, but changed to an online format due to the COVID-19 pandemic.
ESWC is a major venue for presenting and discussing the latest scientific re...
In this paper, we present the DAGOBAH system which tackles the Tabular Data to Knowledge Graph Matching (TDKGM) challenge. DAGOBAH aims to semantically annotate tables with Wikidataand DBpedia entities, and more precisely performs cell and column annotation and relationship identification, via a pipeline starting from pre-processing to enriching an...
In a document-based world as the one of Web APIs, the triple-based output of SPARQL endpoints can be a barrier for developers who want to integrate Linked Data in their applications. A different JSON output can be obtained with SPARQL Transformer, which relies on a single JSON object for defining which data should be extracted from the endpoint and...
Technological developments in comprehensive video understanding - detecting and identifying visual elements of a scene, combined with audio understanding (music, speech), as well as aligned with textual information such as captions, subtitles, etc. and background knowledge - have been undergoing a significant revolution during recent years. The wor...
Modern vehicles produce big data with a wide variety of formats due to missing open standards. Thus, abstractions of such data in the form of descriptive labels are desired to facilitate the development of applications in the automotive domain. We propose an approach to reduce vehicle sensor data into semantic outcomes of dangerous driving events b...
The Web of Things offers a platform-independent solution for interacting with connected devices. An important vertical of the WoT is the transportation domain with, at its core, autonomous systems and among others, connected vehicles. They can be seen as complex artefacts , as they are composed of many sensors and actuators, legacy specifications a...
More than 2 millions of new books are published every year and choosing a good book among the huge amount of available options can be a challenging endeavor. Recommender systems help in choosing books by providing personalized suggestions based on the user reading history. However, most book recommender systems are based on collaborative filtering,...
The amount of information available in social media and specialized blogs has become useful for a user to plan a trip. However, the user is quickly overwhelmed by the list of possibilities offered to him, making his search complex and time-consuming. Recommender systems aim to provide personalized suggestions to users by leveraging different type o...
News agencies produce thousands of multimedia stories describing events happening in the world that are either scheduled such as sports competitions, political summits and elections, or breaking events such as military conflicts, terrorist attacks, natural disasters, etc. When writing up those stories, journalists refer to contextual background and...
News agencies produce thousands of multimedia stories describing events happening in the world that are either scheduled such as sports competitions, political summits and elections, or breaking events such as military conflicts, terrorist attacks, natural disasters, etc. When writing up those stories, journalists refer to contextual background and...
The knowledge of city exploration trails of people is in short supply because of the complexity in defining meaningful trails representative of individual behaviours and in the access to actionable data. Existing datasets have only recorded isolated check-ins of activities featured by opaque venue types. In this paper, we fill the gaps of defining...
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neu-ral machine translation (NMT) architecture to a multi-modal setting. In this paper , we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top...
Current evaluation methods of exploratory search systems are still incomplete as they are not fully based on a suitable model of the exploratory search process: as such they cannot be used to determine if they effectively support exploratory search behaviors and tasks. Aiming to elaborate evaluation methods based on an appropriate model of explorat...
Application developers in the automotive domain have to
deal with thousands of different signals, represented in highly heterogeneous
formats, and coming from various car architectures. This situation
prevents the development and connectivity of modern applications.
We hypothesize that a formal model of car signals, in which the
definition of signa...
This paper describes the approach of the D2KLab team to the RecSys Challenge 2018 that focuses on the task of playlist completion. We propose an ensemble strategy of different recurrent neural networks leveraging pre-trained embeddings representing tracks, artists, albums, and titles as inputs. We also use lyrics from which we extract semantic and...
Three major French cultural institutions—the French National Library (BnF), Radio France and the Philharmonie de Paris—have come together in order to develop shared methods to describe semantically their catalogs of music works and events. This process comprises the construction of knowledge graphs representing the data contained in these catalogs...
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top s...
In the past years, knowledge graphs have proven to be beneficial for recommender systems, efficiently addressing paramount issues such as new items and data sparsity. Graph embeddings algorithms have shown to be able to automatically learn high quality feature vectors from graph structures, enabling vector-based measures of node relatedness. In thi...
Translational models have proven to be accurate and efficient at learning entity and relation representations from knowledge graphs for machine learning tasks such as knowledge graph completion. In the past years, knowledge graphs have shown to be beneficial for recommender systems, efficiently addressing paramount issues such as new items and data...
We propose a car signal ontology named VSSo that provides a formal definition of the numerous sensors embedded in car regardless of the vehicle model and brand, re-using the work made by the GENIVI alliance with the Vehicle Signal Specification (VSS). We observe that recent progress in machine learning enables to predict a number of useful informat...
Car signal data is usually hard to access, understand and integrate for non automotive domain experts. In this paper, we use semantic technologies for enriching signal data in the automotive industry and access it through Web of Things interactions. This combination allows the access and integration of car data from the web. We built VSSo, a Vehicl...
EVA¹ is describing a new class of emotion-aware autonomous systems delivering intelligent personal assistant functionalities. EVA requires a multi-disciplinary approach, combining a number of critical building blocks into a cybernetics systems/software architecture: emotion aware systems and algorithms, multimodal interaction design, cognitive mode...
In this paper, we use semantic technologies for enriching trajectory data in the automotive industry for offline analysis. We proposed to re-use a combination of existing ontologies and we designed a Vehicle Signal Specification ontology to provide an environment in which we developed an application that analyzes the variations of signal values and...
SPARQL endpoints are one possible access method to linked data. The results of SPARQL queries serialized in JSON are, however, not suitable to be directly used by web developers in end-user applications who often need to merge the values resulting from variable bindings. In this work, we propose a generic approach implemented in a JavaScript module...
Das DOREMUS Projekt strebt eine bessere Beschreibung von Musik an, indem es Daten dreier französicher Institutionen untersucht und zusammenführt. Der vorliegende Artikel gibt einen Überblick über das auf FRBRoo basierende Datenmodell, das die automatische Umwandlung und Verlinkung von Daten ermöglicht. Er stellt Prototypen vor, wie die Daten nach d...
Event-based services have recently witnessed a rapid growth driving the way people explore and share information of interest. They host a huge amount of users' activities including explicit RSVP, shared photos, comments and social connections. Exploiting these activities to detect communities of similar users is a challenging problem. In reality, a...
One of the major issues encountered in the generation of knowledge bases is the integration of data coming from a collection of heterogeneous data sources. A key essential task when integrating data instances is the entity matching. Entity matching is based on the definition of a similarity measure among entities and on the classification of the en...
Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have ide...
This book constitutes the refereed proceedings of the 15th International Semantic Web Conference, ESWC 2018, held in Heraklion, Crete, Greece.
The 48 revised full papers presented were carefully reviewed and selected from 179 submissions. The papers cover a large range of topics such as logical modelling and reasoning, natural language processing,...
Data owners are creating an ever richer set of information resources online, and these are being used for more and more applications. Spatial data on the Web is becoming ubiquitous and voluminous with the rapid growth of location-based services,
spatial technologies, dynamic location-based data and services published by different organizations. How...
In this paper we report the participation of ADEL to the OKE 2017 challenge. In particular, an adaptive entity recognition and linking framework that combines various extraction methods for improving the recognition level and implements an efficient knowledge base indexing process to increase the performance of the linking step. We detail how we de...
Representing and retrieving fine-grained information related to something as complex as music composition, recording and performance is a