
Benoit Huet- HDR, PhD
- Professor (Assistant) at EURECOM
Benoit Huet
- HDR, PhD
- Professor (Assistant) at EURECOM
About
217
Publications
23,498
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,215
Citations
Introduction
Current institution
Additional affiliations
July 2016 - August 2016
February 2016 - present
Publications
Publications (217)
The last decade has witnessed the development and uprising of social media web services. The use of these shared online media as a source of huge amount of data for research purposes is still a challenging problem. In this paper, a novel framework is proposed to collect training samples from online media data to model the visual appearance of socia...
This paper presents a comparison of some methodologies for the automatic construction of video summaries. The work is based on the simulated user principle to evaluate the quality of a video summary in a way that is automatic, yet related to the user′s perception. The method is studied for the case of multiepisode video, where we do not descri...
Multimedia indexing is about developing techniques allowing people to effectively find media. Content-based methods become necessary when dealing with large databases. Current technology allows exploring the emotional space which is known to carry very interesting semantic information. In this paper we state the need for an integrated method which...
Pathologies systematically induce morphological changes, thus providing a major but yet insufficiently quantified source of observables for diagnosis. The study develops a predictive model of the pathological states based on morphological features (3D-morphomics) on Computed Tomography (CT) volumes. A complete workflow for mesh extraction and simpl...
Pathologies systematically induce morphological changes, thus providing a major but yet insufficiently quantified source of observables for diagnosis. The study develops a predictive model of the pathological states based on morphological features (3D-morphomics) on Computed Tomography (CT) volumes. A complete workflow for mesh extraction and simpl...
Glaucoma is an eye condition that leads to loss of vision and blindness if not diagnosed in time. Diagnosis requires human experts to estimate in a limited time subtle changes in the shape of the optic disc from retinal fundus images. Deep learning methods have been satisfactory in classifying and segmenting diseases in retinal fundus images, assis...
Glaucoma is an eye condition that, if not diagnosed in time, leads to loss of vision and blindness. While diagnosing campaigns are regularly launched, these require human experts to estimate in a limited time subtle changes in the shape of the optic disc from retinal fundus images. Automatic glaucoma detection methods are desirable to help with the...
In Multi-Task Learning (MTL), it is a common practice to train multi-task networks by optimizing an objective function, which is a weighted average of the task-specific objective functions. Although the computational advantages of this strategy are clear, the complexity of the resulting loss landscape has not been studied in the literature. Arguabl...
e21145
Background: Immune-checkpoint inhibitors (ICIs), specifically monoclonal antibodies targeting the programmed cell death protein-1 and programmed death-ligand 1 (PD-1 and PD-L1) have shown promise in the treatment of non-small cell lung cancer (NSCLC). Unfortunately, patients response to ICI is difficult to predict, with cancer stage, treatme...
Multi-task learning has gained popularity due to the advantages it provides with respect to resource usage and performance. Nonetheless, the joint optimization of parameters with respect to multiple tasks remains an active research topic. Sub-partitioning the parameters between different tasks has proven to be an efficient way to relax the optimiza...
The image captioning task and the video captioning task consist in automatically generating short textual descriptions for images and videos respectively. They are challenging multimedia tasks as they require to grasp all information contained in a visual document, such as objects, persons, context, actions, location, and to translate this informat...
Video content has been increasing at an unprecedented rate in recent years, bringing the need for improved tools providing efficient access to specific contents of interest. Within the management of video content, hyperlinking aims at determining related video segments from a collection with respect to an input video anchor. This paper describes th...
Multi-task learning has gained popularity due to the advantages it provides with respect to resource usage and performance. Nonetheless, the joint optimization of parameters with respect to multiple tasks remains an active research topic. Sub-partitioning the parameters between different tasks has proven to be an efficient way to relax the optimiza...
In this paper, we present the features implemented in the 4th version of the VIREO Video Search System (VIREO-VSS). In this version, we propose a sketch-based retrieval model, which allows the user to specify a video scene with objects and their basic properties, including color, size, and location. We further utilize the temporal relation between...
Automatic video captioning can be used to enrich TV programs with textual informations on scenes. These informations can be useful for visually impaired people, but can also be used to enhance indexing and research of TV records. Video captioning can be seen as being more challenging than image captioning. In both cases, we have to tackle a challen...
In recent years, representation learning approaches have disrupted many multimedia computing tasks. Among those approaches, deep convolutional neural networks (CNNs) have notably reached human level expertise on some constrained image classification tasks. Nonetheless, training CNNs from scratch for new task or simply new data turns out to be compl...
In recent years, representation learning approaches have disrupted many multimedia computing tasks. Among those approaches, deep convolutional neural networks (CNNs) have notably reached human level expertise on some constrained image classification tasks. Nonetheless, training CNNs from scratch for new task or simply new data turns out to be compl...
Video hyperlinking is a task aiming to enhance the accessibility of large archives, by establishing links between fragments of videos. The links model the aboutness between fragments for efficient traversal of video content. This paper addresses the problem of link construction from the perspective of cross-modal embedding. To this end, a generaliz...
The first months of the new calendar year, multimedia researchers traditionally are hard at work on their ACM Multimedia submissions. (This year the submission deadline is 1 April.) Questions of reproducibility, including those of data set availability and release, are at the forefront of everyone's mind. In this edition of SIGMM Records, the edito...
In the original version of the book, the following belated corrections have been incorporated:
In this paper, the VIREO team video retrieval tool is described in details. As learned from Video Browser Showdown (VBS) 2018, the visualization of video frames is a critical need to improve the browsing effectiveness. Based on this observation, a hierarchical structure that represents the video frame clusters has been built automatically using k-m...
The two-volume set LNCS 11295 and 11296 constitutes the thoroughly refereed proceedings of the 25th International Conference on MultiMedia Modeling, MMM 2019, held in Thessaloniki, Greece, in January 2019.
Of the 172 submitted full papers, 49 were selected for oral presentation and 47 for poster presentation; in addition, 6 demonstration papers, 5...
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neu-ral machine translation (NMT) architecture to a multi-modal setting. In this paper , we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top...
In the developing of advanced Human-Computer Interaction, multimedia technology plays a fundamental role to increase usability, and accessibility of computer interfaces. The first workshop on Multimedia for Accessible Human Computer Interface (MAHCI) provides a forum to both multimedia and HCI researchers to discuss the accessible human computer in...
In this paper, we propose a multimodal framework for video segment interestingness prediction based on the genre and affective impact of movie content. We hypothesize that the emotional characteristic and impact of a video infer its genre, which can in turn be a factor for identifying the perceived interestingness of a particular video segment (sho...
This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top s...
EVA¹ is describing a new class of emotion-aware autonomous systems delivering intelligent personal assistant functionalities. EVA requires a multi-disciplinary approach, combining a number of critical building blocks into a cybernetics systems/software architecture: emotion aware systems and algorithms, multimodal interaction design, cognitive mode...
Video-to-video linking systems allow users to explore and exploit the content of a large-scale multimedia collection interactively and without the need to formulate specific queries. We present a short introduction to video-to-video linking (also called ‘video hyperlinking’), and describe the latest edition of the Video Hyperlinking (LNK) task at T...
This paper compares several approaches of natural language access to video databases. We present two main strategies. The first one is visual, and consists in comparing keyframes with images retrieved from Google Images. The second one is textual and consists in generating a text-based description of the keyframes, and comparing these descriptions...
In this paper, we propose a multimodal deep learning architecture for emotion recognition in video regarding our participation to the audio-video based sub-challenge of the Emotion Recognition in the Wild 2017 challenge. Our model combines cues from multiple video modalities, including static facial features, motion patterns related to the evolutio...
Second screen applications are becoming key for broadcasters ex-
ploiting the convergence of TV and Internet. Authoring such appli-
cations however remains costly. In this paper, we present a second
screen authoring application that leverages multimedia content
analytics and social media monitoring. A back-office is dedicated
to easy and fast conte...
In this paper, we describe the first-ever machine human collaboration at creating a real movie trailer (officially released by 20th Century Fox). We introduce an intelligent system designed to understand and encode patterns and types of emotions in horror movies that are useful in trailers. We perform multi-modal semantics extraction including audi...
n this paper, we present EURECOM’s approach to address the
MediaEval 2017 Predicting Media Interestingness Task. We developed models for both the image and video subtasks. In particular, we investigate the usage of media genre information (i.e., drama, horror, etc.) to predict interestingness. Our approach is related to the affective impact of medi...
User generated content, available in massive amounts on the Internet, comes in many "flavors" (i.e. micro messages, text documents, images and videos) and is receiving increasing attention due to its many potential applications. One important applications is the automatic generation of multimedia enrichments concerning users topic of interests and...
This demo showcases the utility of video hyperlinks with celebrities as the link anchors and their social circles as targets, aiming to help users quickly explore the aboutness of a celebrity by link traversal. Through content analysis, our system embeds hyperlinks into videos such that users can click-and-jump between celebrity faces in different...
In recent years, organizing social media by social event has drawn increasing attentions with the increasing amounts of rich-media content taken during an event. In this paper, we address the social event enrichment and summarization problem and propose a demonstration system \(E^2SGM\) to summarize the event with relevant media selected from a lar...
The two-volume set LNCS 9516 and LNCS 9517 constitutes the refereed proceedings of the 22nd International Conference on Multimedia Modeling, MMM 2016, held in Miami, FL, USA, in January 2016.
The 32 revised full papers and 52 poster papers presented were carefully reviewed and selected from 117 submissions. In addition 20 papers were accepted for f...
The twenty papers in this special section aim at providing a forum to present recent advancements in deep learning research that directly concerns the multimedia community. Specifically, deep learning has successfully designed algorithms that can build deep nonlinear representations to mimic how the brain perceives and understands multimodal inform...
This paper overviews ongoing work that aims to support end-users in conveniently exploring and exploiting large audiovisual archives by deploying multiple multimodal linking approaches. We present ongoing work on multimodal video hyperlinking, from a perspective of unconstrained link anchor identification and based on the identification of named en...
Multimedia content produced by professionals and individual users on the daily basis and in constantly growing quantity requires creation of navigation systems that allow access to this data on different levels of granularity that can contribute to further discovery of a topic of user interest or to browsing by each user in an individual way. In th...
Nowadays, with the continual development of digital capture technologies and social media services, a vast number of media documents are captured and shared online to help attendees record their experience during events. In this paper, we present a method combining semantic inference and multimodal analysis for automatically finding media content t...
Massive amounts of digital media is being produced and consumed daily on the Internet. Efficient access to relevant information is of key importance in contemporary society. The Hyper Video Browser provides multiple navigation means within the content of a media repository. Our system utilizes the state of the art multimodal content analysis and in...
Recent years have witnessed the rapid growth of social multimedia data available over the Internet. The age of huge amount of media collection provides users facilities to share and access data, while it also demands the revolution of data management techniques, since the exponential growth of social multimedia requires more scalable, effective and...
Multimedia hyperlinking is an emerging research topic in the context of digital libraries and (cultural heritage) archives. We have been studying the concept of video-to-video hyperlinking from a video search perspective in the context of the MediaEval evaluation benchmark for several years. Our task considers a use case of exploring large quantiti...
Currently, popular search engines retrieve documents on the basis of text information. However, integrating the visual information with the text-based search for video and image retrieval is still a hot research topic. In this paper, we propose and evaluate a video search framework based on using visual information to enrich the classic text-based...
This paper introduces a framework for establishing links between related media fragments within a collection of videos. A set of analysis techniques is applied for extracting information from different types of data. Visual-based shot and scene segmentation is performed for defining media fragments at different granularity levels, while visual cues...
Recent years have witnessed the blooming of Web 2.0 content such as Flickr and YouTube, etc. How we can benefit from such rich media resources is still an open and challenging question. In this paper, we present a method combining semantic inferencing and visual analysis for automatically finding media (photos and videos) illustrating events. We re...
This paper introduces a framework for establishing links between related media fragments within a collection of videos. A set of analysis techniques is applied for extracting information from different types of data. Visual-based shot and scene segmentation is performed for defining media fragments at different granularity levels, while visual cues...
This is the abstract for the ``Video Hyperlinking'' tutorial, presented as part of the 2014 ACM Multimedia Conference. Video hyperlinking is the introduction of links that originate from pieces of video material and point to other relevant content, be it video or any other form of digital content. The tutorial presents the state of the art in video...
The paper presents the LinkedTV approaches for the Search and Hyperlinking (S&H) task at MediaEval 2014. Our sub-missions aim at evaluating 2 key dimensions: temporal gran-ularity and visual properties of the video segments. The temporal granularity of target video segments is defined by grouping text sentences, or consecutive automatically de-tect...
User generated content, available in massive amounts on the Internet, is receiving increased attention due to its many potential applications. One of such applications is the representation of events using multimedia data. In this paper, an event-based cross media question answering system, which retrieves and summarizes events on a given topic is...
Currently, popular search engines retrieve documents on the basis of text information. However, integrating the visual information with the text-based search for video and image retrieval is still a hot research topic. In this paper, we propose and evaluate a video search framework based on using visual information to enrich the classic text-based...
As the amount of social media shared on the Internet grows increasingly, it becomes possible to explore a topic with a novel, people based viewpoint. We aim at performing topic enriching using media items mined from social media sharing platforms. Nevertheless, such data collected from the Web is likely to contain noise, hence the need to further p...
This book constitutes the refereed proceedings of the 15th Pacific Rim Conference on Multimedia, PCM 2014, held in Kuching, Malaysia, in December 2014. The 35 revised full papers and 6 short papers presented were carefully reviewed and selected from 84 submissions. The papers cover a wide range of topics in the area of multimedia content analysis,...
The user generated content, available in massive amounts in social media, is receiving increased attention due to its many potential applications. One of such applications is the representation of events with multimedia data. This paper addresses the problem of retrieving and summarizing events on a given topic, and propose a novel and original fra...
Enriching linear videos by offering continuative and related information via, e.g., audio streams, web pages, as well as other videos, is typically hampered by its demand for massive editorial work. While a large number of analysis techniques that extract knowledge automatically from video content exists, their produced raw data are typically not o...
As the amount of social media shared on the Internet grows increasingly, it becomes possible to explore a topic with a novel, people based viewpoint. Contrasting with traditional man-made topic summarization which provide the personal view of its author, we want to focus on public reaction to events. To this end, we propose an approach to automatic...
This paper aims at presenting the results of LinkedTV's first participation to the Search and Hyperlinking task at MediaEval challenge 2013. We used textual information, tran-scripts, subtitles and metadata, and we tested their combi-nation with automatically detected visual concepts. Hence, we submitted various runs to compare diverse approaches a...
Exploiting event context to organize social media draws lots of interest from the multimedia community. In this paper, we present our system, called EventEnricher, to infer the semantics behind events and explore social media to illus-trate events. We extend the set of illustrating images for a particular event by querying social media with diverse...
With the rapid development of social media sites, a lot of user generated content is being shared in the Web, leading to new challenges for traditional media retrieval techniques. An event describes the happening at a specific time and place in real-world, and it is one of the most important cues for people to recall past memories. The reminder val...
This book constitutes the proceedings of the 14th Pacific-Rim Conference on Multimedia, PCM 2013, held in Nanjing, China, in December 2013. The 30 revised full papers and 27 poster papers presented were carefully reviewed and selected from 153 submissions. The papers cover a wide range of topics in the area of multimedia content analysis, multimedi...
In recent years, the emergence of social media on the Internet has derived many of interesting research and applications. In this paper, a novel framework is proposed to model the visual appearance of social events using automatically collected training samples on the basis of photo context analysis. While collecting positive samples can be achieve...
Enriching linear videos by offering continu-ative and related information via, e.g., audiostreams, webpages, as well as other videos, is typically hampered by its demand for massive editorial work. While there exist several automatic and semi-automatic methods that analyse audio/video content, one needs to decide which method offers appropriate inf...
There is a digital revolution happening right before our eyes, the way we communicate is rapidly changing dues to rapid technological advances. Pencil and paper communication is drastically reducing and being replaced with newer communication medium ranging from emails to sms/mms and other instant messaging services. Information/news used to be bro...
The widespread adoption of smartphones equipped with high-quality image-capturing capabilities coupled with the prevalent use of social networks have resulted in an explosive growth of social media content. People now routinely capture the scenes around them and instantly share the multimedia content with their friends over a variety of social netw...
With the keen interest of people for social media sharing websites the multimedia research community faces new challenges and compelling opportunities. In this paper, we address the problem of discovering specific events from social media data automatically. Our proposed approach assumes that events are conjoint distribution over the latent topics...
We present a method to automatically detect and identify events from social media sharing web sites. Our approach is based on the observation that many photos and videos are taken and shared when events occur. We select 9 venues across the globe that demonstrate a significant activity ac-cording to the EventMedia dataset and we thoroughly eval-uate...
This paper deals with information retrieval and semantic indexing of multimedia documents. We propose a generic scheme combining an ontology-based evidential framework and high-level multimodal fusion, aimed at recognising semantic concepts in videos. This work is represented on two stages: First, the adaptation of evidence theory to neural network...
The ever-growing availability of multimedia data, such as video, creates a strong requirement for efficient tools to manipulate and present this type of data in an effective manner. Automatic summarization is one of those tools. The idea is to automatically and with little or no human interaction creates a short version or subset of key-frames whic...
Multimedia information indexing and retrieval is about developing techniques which allow people to effectively find the media they are looking for. Content-based methods become necessary when dealing with big databases due to the limitations inherent in metadata-based systems. Current technology allows researchers to explore the emotional space whi...
In this book, the authors present the latest research results in the multimedia and semantic web communities, bridging the ''Semantic Gap'' This book explains, collects and reports on the latest research results that aim at narrowing the so-called multimedia "Semantic Gap": the large disparity between descriptions of multimedia content that can be...
New approaches need to be developed to appropriately scale up current systems for use with very large multimedia databases. This book covers a broad range of topics with various level of granularity to enable the semantic web enthusiast to be used to basic machine learning and multimedia analysis techniques while educating the multimedia researcher...
Humans have natural abilities to categorize objects, images or sounds and to place them in specific classes in which they will share some common characteristics, patterns or semantics. More generally, the fundamental task of a classification algorithm is therefore to put data into groups (or classes) according to similarity measures with or without...
Feature extraction is an essential step for processing of multimedia content semantics. Indeed, it is necessary to extract embedded characteristics from within the audiovisual material in order to further analyze its content and then provide a relevant semantic description. This chapter provides an overview of some of the most frequently used low-l...
We present a method combining semantic inferencing and visual analysis for finding automatically media (photos and videos) illustrating events. We report on experiments validating our heuristic for mining media sharing platforms and large event directories in order to mutually enrich the descriptions of the content they host. Our overall goal is to...
This special issue samples the state of the art in large-scale multimedia analysis techniques and explores how advanced multimedia analysis techniques can be leveraged to address the challenges in large-scale data collections. In particular, from a total of 20 submissions, the guest editors selected five representative articles, that investigate la...
This paper provides an overview of the Social Event Detection (SED) task, which is organized as part of the MediaEval 2011 benchmarking activity. With the convergence between social networking and multimedia creation and distribution being experienced on a regular basis by hundreds of millions of people worldwide, this task examines how new or stat...