Camille Guinaudeau’s research while affiliated with Heidelberg Institute for Theoretical Studies and other places

What is this page?

This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)

Investigating domain-independent NLP techniques for precise target selection in video hyperlinking
  • Article
  • Full-text available

September 2014


33 Reads


Camille Guinaudeau



Guillaume Gravier

Automatic generation of hyperlinks in multimedia video data is a subject with growing interest, as demonstrated by recent work undergone in the framework of the Search and Hyperlinking task within the Mediaeval benchmark initiative. In this paper, we compare NLP-based strategies for precise target selection in video hyperlinking exploiting speech material, with the goal of providing hyperlinks from a specified anchor to help information retrieval. We experimentally compare two approaches enabling to select short portions of videos which are relevant and possibly complementary with respect to the anchor. The first approach exploits a bipartite graph relating utterances and words to find the most relevant utterances. The second one uses explicit topic segmentation, whether hierarchical or not, to select the target segments. Experimental results are reported on the Mediaeval 2013 Search and Hyperlinking dataset which consists of BBC videos, demonstrating the interest of hierarchical topic segmentation for precise target selection.


Graph-based Local Coherence Modeling

August 2013


161 Reads


125 Citations

We propose a computationally efficient graph-based approach for local coherence modeling. We evaluate our system on three tasks: sentence ordering, summary coherence rating and readability assessment. The performance is comparable to entity grid based approaches though these rely on a computationally expensive training phase and face data sparsity problems.

Figure 1: Overview of the search and hyperlinking task. 
Table 6 : MAP results for Hyperlinking sub-task.
Multimedia Information Seeking through Search And Hyperlinking

April 2013


160 Reads


51 Citations






Martha Larson

Searching for relevant webpages and following hyperlinks to related content is a widely accepted and effective approach to information seeking on the textual web. Existing work on multimedia information retrieval has focused on search for individual relevant items or on content linking without specific attention to search results. We describe our research exploring integrated multimodal search and hyperlinking for multimedia data. Our investigation is based on the MediaEval 2012 Search and Hyperlinking task. This includes a known-item search task using the Blip10000 internet video collection, where automatically created hyperlinks link each relevant item to related items within the collection. The search test queries and link assessment for this task was generated using the Amazon Mechanical Turk crowdsourcing platform. Our investigation examines a range of alternative methods which seek to address the challenges of search and hyperlinking using multimodal approaches. The results of our experiments are used to propose a research agenda for developing effective techniques for search and hyperlinking of multimedia content.

HITS and IRISA at MediaEval 2013: Search and hyperlinking task

January 2013


31 Reads


9 Citations

This paper describes our approach and results in the hy-perlinking sub-task at MediaEval 2013. A two step method is implemented where the first step consists in establishing a shortlist of relevant videos. In the second step, a target segment is selected from each video in the shortlist. We focus on target selection comparing two distinct strategies. The first one exploits a bipartite graph relating utterances and words to find the most relevant utterances from which segments are derived. The second one uses explicit topic segmentation, whether hierarchical or not, to select the target segments.

IRISA at MediaEval 2012: Search and Hyperlinking Task

October 2012


35 Reads


4 Citations

We describe our approach and results towards the Hyper-linking sub-task at MediaEval 2012. We approached this as an Information Retrieval task and used re-ranking strategies for finding relevant videos. A three-step approach was then applied on results to extract the most relevant part of the video regarding the query content. Our results show that re-ranking strategies and integration of metadata information both improve the system performance.

Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

April 2012


58 Reads


35 Citations

Computer Speech & Language

Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion based on generalized probabilities with a unigram language model. On the one hand, confidence measures and semantic relations are considered as additional sources of information. On the other hand, language model interpolation techniques are investigated for better language model estimation. Experimental topic segmentation results are presented on two corpora with distinct characteristics, composed respectively of broadcast news and reports on current affairs. Significant improvements are obtained on both corpora, demonstrating the effectiveness of the extended lexical cohesion measure for spoken TV contents, as well as its genericity over different programs.

Structuration automatique de flux télévisuels

December 2011


10 Reads


5 Citations

The increasing quantity of video material available requires the implementation of automatic structuring techniques that can facilitate access to the information contained in documents, while being generic enough to be able to structure different kinds of videos. For this, we develop two kinds of thematic structuring of TV shows, linear or hierarchical, based on the automatic transcripts of the speech pronounced in the programs. These transcripts, independent of the type of documents considered, are used thanks to natural language processing (NLP) methods. The two structuring techniques, as well as the topic segmentation phase on which they rely, has led to several original contributions. First, the topic segmentation technique employed, originally developed for text, is adapted to the peculiarities of professional videos transcripts - transcription errors, limited number of repetition. The lexical cohesion criterion on which the segmentation step is based is, indeed, sensitive to these characteristics, which severely penalizes the algorithm performances. This adaptation is implemented, on the one hand by taking into account, during the lexical cohesion computation, linguistic knowledge and automatic speech recognition and signal information (semantic relations, prosody, confidence measures), and on the other hand on language model interpolation techniques. From this topic segmentation step, we propose a method for linear thematic structuring that is able to connect segments addressing similar topic. The method, based on a technique from the information retrieval domain, is adapted to the audiovisual data through prosodic cues, that help to promote prominent words in the speech, and semantic relations. Finally, we propose an exploratory work that studies different ways to adapt a linear topic segmentation algorithm to a hierarchical topic segmentation task. For this, the linear topic segmentation algorithm is modified - adjustement of the lexical cohesion computation, use of lexical chains - to reflect the distribution of the vocabulary in the document to be segmented. Experiments conducted on three corpora composed of broadcast news and reports on current affairs, manually and automatically transcribed, show that the proposed adjustments lead to improved performance of the structuring methods developed.

Figure 1 Overview of the architecture of the video indexing  
Figure 2 News trends over different periods  
Figure 3 Different displays of the same query results  
A Scalable Video Search Engine Based on Audio Content Indexing and Topic Segmentation

September 2011


117 Reads


3 Citations

One important class of online videos is that of news broadcasts. Most news organisations provide near-immediate access to topical news broadcasts over the Internet, through RSS streams or podcasts. Until lately, technology has not made it possible for a user to automatically go to the smaller parts, within a longer broadcast, that might interest them. Recent advances in both speech recognition systems and natural language processing have led to a number of robust tools that allow us to provide users with quicker, more focussed access to relevant segments of one or more news broadcast videos. Here we present our new interface for browsing or searching news broadcasts (video/audio) that exploits these new language processing tools to (i) provide immediate access to topical passages within news broadcasts, (ii) browse news broadcasts by events as well as by people, places and organisations, (iii) perform cross lingual search of news broadcasts, (iv) search for news through a map interface, (v) browse news by trending topics, and (vi) see automatically-generated textual clues for news segments, before listening. Our publicly searchable demonstrator currently indexes daily broadcast news content from 50 sources in English, French, Chinese, Arabic, Spanish, Dutch and Russian.

Figure 1: Recall/precision curves for topic tracking based on a combination of the tf - idf criterion and acoustic features. ing correctly transcribed — can be used in order to counteract transcription errors and semantic relations can be integrated as additional information to take into account the semantic links that exist between words. 
Table 1 : Vector characterization using tf -idf , prosodic informa- tion and both information types
Table 2 : Topic tracking results with segment characterization using AIE information only (F1-measure)
Table 3 : Topic tracking results with segment characterization using tf -idf criterion and prosodic scores (F1-measure)
Accounting for Prosodic Information to Improve ASR-Based Topic Tracking for TV Broadcast News

August 2011


128 Reads


13 Citations

The increasing quantity of video material available on line requires improved methods to help users navigate such data, among which are topic tracking techniques. The goal of this paper is to show that prosodic information can improve an ASRbased topic tracking system for French TV Broadcast News. To this end, two kinds of prosodic information - extracted with and without a learning phase - are integrated in the system. This integration shows significant improvements in the F1-measure, by 13 and 8 points for the two techniques compared with the baseline system.

Table 1 : Comparison of the news and reports corpora in terms of word repetitions and of confidence measures.
Figure 2: Principle of the speech-based validation of labels obtained from EPG alignment.
Figure 3: Results of the validation of the labels provided by the alignment of the stream with the EPG.
Table 4 : Example of queries formed based on subsets of the 5 best- scored keywords. Queries in bold include at least one misrecognized word.
Exploiting Speech for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation

January 2011


85 Reads


8 Citations

EURASIP Journal on Image and Video Processing

The gradual migration of television from broadcast diffusion to Internet diffusion offers countless possibilities for the generation of rich navigable contents. However, it also raises numerous scientific issues regarding delinearization of TV streams and content enrichment. In this paper, we study how speech can be used at different levels of the delinearization process, using automatic speech transcription and natural language processing (NLP) for the segmentation and characterization of TV programs and for the generation of semantic hyperlinks in videos. Transcript-based video delinearization requires natural language processing techniques robust to transcription peculiarities, such as transcription errors, and to domain and genre differences. We therefore propose to modify classical NLP techniques, initially designed for regular texts, to improve their robustness in the context of TV delinearization. We demonstrate that the modified NLP techniques can efficiently handle various types of TV material and be exploited for program description, for topic segmentation, and for the generation of semantic hyperlinks between multimedia contents. We illustrate the concept of cross-media semantic navigation with a description of our news navigation demonstrator presented during the NEM Summit 2009.

Citations (12)

... This does not provide the best results, as each representation still belongs to its own representation space. It is also possible to utilize two separate modalities by performing a linear combination [12] of the similarities obtained by comparing each of the two modalities. We use these two methods as a baseline to compare standard autoencoders and bidirectional deep neural networks against. ...


A Crossmodal Approach to Multimodal Fusion in Video Hyperlinking
HITS and IRISA at MediaEval 2013: Search and hyperlinking task

... The measure of lexical cohesion is very sensitive to the errors found in TV show transcripts and the reduced number of words in these transcripts. Therefore, we can find in the literature some efforts like those presented in [59] to adapt the lexical cohesion to the peculiarities of the words pronounced in TV shows (e.g., use confidence measures for the words). ...

Structuration automatique de flux télévisuels
  • Citing Article
  • December 2011

... The vast majority of approaches developed for the selection step rely on direct pairwise contentbased similarity, seeking targets whose content is very similar to the anchor. Unsurprisingly, most use textual and/or visual content comparison [14,7,2,17,12,1,6,21]. Maximizing content-based similarity between anchors and targets showed to offer good relevance, as evidenced in [14] where n-gram bag-of-words are used to emphasize segments sharing common sequences of words. ...

IRISA at MediaEval 2012: Search and Hyperlinking Task

... To link the anchors (source videos) and targets (destination videos), several technical approaches are proposed. [36] and Video-to-Text (VTT) [37] proposed to link two videos by both visual clues and text clues, with ResNet-152 features extracted from frames and text features encoded by LSTM. Ad-Hoc Video Search (AVS) further combines VTT and a textbased module that extracts the on-screen text ad speech text to achieve video-text search. ...

Multimedia Information Seeking through Search And Hyperlinking

... Wow ! ) auxquels s'ajoutent des erreurs de transcription. Les résultats prometteurs des recherches antérieures sur l'extraction d'informationsà partir d'émissions d'actua-lité [Gotoh and Renals, 2000;Guinaudeau et al., 2009] ont motivés l'orientation de notre travail, mais en ayant comme objectifà plus long terme le traitement de documents moins faciles. ...

Can Automatic Speech Transcripts Be Used for Large Scale TV Stream Description and Structuring?

... The metadata provided by the TV channels such as the EPG or EIT (e. g. [2,17,18]). 2. The metadata extracted from the signal itself such as the speech transcripts (e. g. [24]), teletext or the recognition of opening and closing credits of some specified programs. In this article, the techniques of the literature are categorized into two main categories: ...

Improving ASR-based topic segmentation of TV programs with confidence measures and semantic relations

... Thus, methods for automatic prominence detection can have various uses in spoken language applications, such as during the development of text-to-speech (TTS) systems where it is particularly important to achieve a naturalistic production of speech (see, e.g., [6,7]). Similarly, there are various applications based on automatic speech recognition (ASR) systems such as that of spoken content retrieval [8] and topic tracking [9]. ...

Accounting for Prosodic Information to Improve ASR-Based Topic Tracking for TV Broadcast News

... Their method did not require training data and they claimed that it can be applied to any text. (Guinaudeau et al., 2012) proposed modifications of the computation of the lexical cohesion to make the algorithm proposed by (Utiyama and Isahara, 2001) more robust to TV programs automatic transcripts peculiarities (compared to written text). (Scaiano and Inkpen, 2012) performed scene segmentation in a movie using the text of the subtitles. ...

Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation
  • Citing Article
  • April 2012

Computer Speech & Language

... Ce travail repose sur un apprentissage supervisé (classifieurs relation versus pas de relation, relation même événement versus continuation…). [9] s'intéresse également à l'établissement de liens fondés contenu mais exploite pour ce faire la transcription de la parole exprimée dans un mois de journaux télévisés de France 2. La prise en compte au sein d'un système de segmentation thématique de textes des mesures de confiance associées aux mots transcrits et de relations sémantiques apprises en corpus permet de pallier les particularités des transcriptions et d'offrir automatiquement un découpage en sujets successifs. Une extraction de quelques mots-clés de chaque sujet, à nouveau en modifiant les mesures traditionnelles de saillance par les mesures de confiance, permet d'une part la création de liens entre les sujets similaires dans différents journaux, mais aussi vers des pages Web apportant des informations complémentaires aux points traités (cf. ...

Exploiting Speech for Automatic TV Delinearization: From Streams to Cross-Media Semantic Navigation

EURASIP Journal on Image and Video Processing