Conference PaperPDF Available

MKLab interactive video retrieval system

Authors:

Abstract and Figures

In this paper, the MKLab interactive video retrieval system is described.
Content may be subject to copyright.
MKLab Interactive Video Retrieval System
Stefanos Vrochidis, Paul King, Lambros Makris, Anastasia Moumtzidou, Vasileios Mezaris
and Ioannis Kompatsiaris
Informatics and Telematics Institute / Centre for Research and Technology Hellas
6th Km Charilaou-Thermi Road, 57001 Thermi-Thessaloniki, Greece
+302310464160
{stefanos, king, lmak, moumtzid, bmezaris, ikom}@iti.gr
ABSTRACT
In this paper, the MKLab interactive video retrieval system is
described.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval – retrieval models, search process.
General Terms
Algorithms, Performance, Design, Experimentation
Keywords
Search engine, retrieval, visual, hybrid, video, MPEG-7, query
1.INTRODUCTION
The Search Engine implemented by MKLab is capable of
handling video resources, integrating different search modules.
2.VIDEO RETRIEVAL SYSTEM
In general, the developed application is a hybrid interactive
retrieval system, combining basic retrieval functionalities with a
user-friendly interface supporting the submission of queries and
the accumulation of relevant retrieval results.
The following basic retrieval modules are supported:
Visual similarity search module
Textual information processing module
The search system is built on web technologies (more
specifically, php, JavaScript and a mySQL database) providing
a GUI for performing retrieval tasks over the internet. Using this
GUI, the user is capable of employing any of the supported
retrieval functionalities. Retrieval results are presented ordered
by rank in descending order with links to the temporally
neighboring shots of each one. Storage of relevant shots uses a
storage structure that mimics the functionality of the shopping
cart found in electronic commerce sites.
3.RETRIEVAL MODULES DESCRIPTION
3.1Visual Similarity Search
Content based similarity search is realized using MPEG-7 visual
descriptors capturing different aspects of human perception such
as color and texture [1]. By concatenating these descriptors, a
feature vector is formulated to compactly represent each image
in the multidimensional space. An r-tree [2] structure is
constructed off-line by using the feature vectors of all images. In
the query phase, a feature vector is extracted from the example
image and submitted to the index structure. Subsequently, the
set of resulting images is ranked using custom distance metrics
between their feature vectors.
Figure 1. GUI of MKLab Video Retrieval System
3.2Textual Information Processing Module
Text query is based on audio annotations automatically
transcribed from video. A controlled vocabulary paradigm is
implemented rather than full-text. Indexing is implemented
using thesauri, built from concepts identified in the LSCOM, the
CalTech256, the U.S. Library of Congress Name Authorities,
and the GEOnet Names Server. Thesaural relationships are
automatically expanded by referencing the lexical databases of
WordNet and Wikipedia as well as application of a clustering
algorithm called the Levenshtein Similarity Metric [3].
Index and search terms are stemmed using the Porter algorithm
and ranking is achieved using the TF-IDF weight algorithm
(Term Frequency-Inverse Document Frequency). Facet analysis
was used during domain analysis.
4.REFERENCES
[1] V. Mezaris, H. Doulaverakis, S. Herrmann, et.al.
“Combining Textual and Visual Information Processing for
Interactive Video Retrieval”, TRECVID 2004 Workshop,
Gaithersburg MD, USA, November 2004.
[2] A. Gutmann, “R-trees: a dynamic index structure for spatial
searching,”, ACM International Conference on
Management and Data (SIGMOD’88), Siena, Italy, 1988.
[3] Lucia Specia and Enrico Motta, "Integrating Folksonomies
with the Semantic Web", The Semantic Web: Research and
Applications, pp. 624--639, 2007
... In [Vrochidis 2008], the MKLab interactive retrieval system brings back in the spotlight the traditional MPEG-7 descriptors. The system takes advantage of the MPEG-7 visual descriptors capturing different aspects of human perception such as colour and texture in order to provide a content-based similarity search. ...
Article
With the ever increasing amount of available video content on video repositories the issue of content-based video objects retrieval is growing in difficulty and becomes a mandatory feature for video search engines.The present thesis advances a user defined video object retrieval framework and brings two major contributions. The first contribution is a methodological framework for user selected video object instances retrieval, entitled DOOR (Dynamic Object Oriented Retrieval), while the second one concerns the support offered for video retrieval, namely the video navigation and retrieval system and interface and its underlying architecture.Under the DOOR framework, the user defined video object comports a hybrid representation obtained by over-segmenting the frames, constructing region adjacency graphs and aggregating interest points. The identification of object instances across multiple videos is formulated as an energy optimization problem approximating an NP-hard problem. Object candidates are sub-graphs that yield an optimum energy towards the user defined query. In order to obtain the optimum energy four optimization strategies are proposed: Greedy, Relaxed Greedy, Simulated Annealing and GraphCut. The region-based object representation is further improved by the aggregation of interest points into a hybrid object representation. The similarity between an object and a frame is achieved with the help of a spectral matching technique integrating both colorimetric and interest points descriptors.The DOOR framework is suitable to large scale video archives through the use of a Bag-of-Words representation enriched with a query definition and expansion mechanism based on a multi-modal, text-image-video principle.The performances of the proposed techniques are evaluated on multiple TRECVID video datasets prooving their effectiveness.The second contribution is related to the user support for video retrieval - video navigation, video retrieval, graphical interface - and consists in the OVIDIUS (On-line VIDeo Indexing Universal System) on-line video browsing and retrieval platform. The OVIDIUS platform features hierarchical video navigation functionalities that exploit the MPEG-7 approach for structural description of video content. The DOOR framework is integrated in the OVIDIUS platform, ensuring the search functionalities of the system. The major advantage of the proposed system concerns its modular architecture which makes it possible to deploy the system on various terminals (both fixed and mobile), independently of the exploitation systems involved. The choice of the technologies employed for each composing module of the platform is argumented in comparison with other technological options. Finally different scenarios and use cases for the OVIDIUS platform are presented.
... The idea is to support the submission of hybrid queries either by fusing the results of different retrieval modules (e.g. [5,6]) or by generating recommendations after processing the initial results of a query and exploiting the heterogeneous information (e.g. [7,8]). ...
Article
This paper proposes a method for binary image retrieval, where the black-and-white image is represented by a novel feature named the adaptive hierarchical density histogram, which exploits the distribution of the image points on a two-dimensional area. This adaptive hierarchical decomposition technique employs the estimation of point density histograms of image regions, which are determined by a pyramidal grid that is recursively updated through the calculation of image geometric centroids. The extracted descriptor combines global and local properties and can be used in variant types of binary image databases. The validity of the introduced method, which demonstrates high accuracy, low computational cost and scalability, is both theoretically and experimentally shown, while comparison with several other prevailing approaches demonstrates its performance.
Article
This paper introduces a novel on-line video browsing and retrieval platform, so-called OVIDIUS (On-line VIDeo Indexing Universal System). In contrast with traditional and commercial video retrieval platforms, where video content is treated in a more or less monolithic manner (i.e. with global descriptions associated with the whole document), the proposed approach makes it possible to browse and access video content in a finer, per-segment basis. The hierarchical metadata structure exploits the ISO/MPEG-7 approach for structural description of video content, which provides a multi-granular, hierarchical framework for heterogeneous metadata fusion. The issues of content interaction and visualization, which are of highest relevance in both annotation and metadata exploitation stages are also addressed. Our innovative approach makes it possible to quickly provide a comprehensive overview of complex video documents with a minimal time and interaction effort. The developed approach shows all its pertinence within a multi-terminal context and in particular for video access from mobile devices.
Conference Paper
Full-text available
This paper describes a video retrieval search engine that exploits both video analysis and user implicit feedback. Video analysis (i.e. automatic speech recognition, shot segmentation and keyframe processing) is performed by employing state of the art techniques, while for implicit feedback analysis we propose a novel methodology, which takes into account the patterns of user-interaction with the search engine. In order to do so, we introduce new video implicit interest indicators and we define search subsessions based on query categorization. The main idea is to employ implicit user feedback in terms of user navigation patterns in order to construct a weighted graph that expresses the semantic similarity between the video shots that are associated with the graph nodes. This graph is subsequently used to generate recommendations. The system and the approach are evaluated with real user experiments and significant improvements in terms of precision and recall are reported after the exploitation of implicit user feedback.
Conference Paper
This paper introduces a novel on-line video browsing and retrieval platform, so-called OVIDIUS (On-line VIDeo Indexing Universal System). In contrast with traditional and commercial video retrieval platforms, where video content is treated in a more or less monolithic manner (i.e. with global descriptions associated with the whole document), the proposed approach makes it possible to browse and access video content in a finer, per-segment basis. The hierarchical metadata structure exploits the MPEG-7 approach for structural description of video content. The MPEG-7 description schemes have been here enriched with both semantic and content-based metadata. The developed approach shows all its pertinence within a multi-terminal context and in particular for video access from mobile devices. The platform has been recently (February, 2010) validated within the framework of the Médi@TIC French national project.
Conference Paper
Multimedia search systems face a number of challenges, emanating mainly from the semantic gap problem. Implicit feedback is considered a useful technique in addressing many of the semantic-related issues. By analysing implicit feedback information search systems can tailor the search criteria to address more effectively users' information needs. In this paper we examine whether we could employ affective feedback as an implicit source of evidence, through the aggregation of information from various sensory channels. These channels range between facial expressions to neuro-physiological signals and are regarded as indicative of the user's affective states. The end-goal is to model user affective responses and predict with reasonable accuracy the topical relevance of information items without the help of explicit judgements. For modelling relevance we extract a set of features from the acquired signals and apply different classification techniques, such as Support Vector Machines and K-Nearest Neighbours. The results of our evaluation suggest that the prediction of topical relevance, using the above approach, is feasible and, to a certain extent, implicit feedback models can benefit from incorporating such affective features.
Conference Paper
By analyzing explicit & implicit feedback information retrieval systems can determine topical relevance and tailor search criteria to the user's needs. In this paper we investigate whether it is possible to infer what is relevant by observing user affective behaviour. The sensory data employed range between facial expressions and peripheral physiological signals. We extract a set of features from the signals and analyze the data using classification methods, such as SVM and KNN. The results of our initial evaluation indicate that prediction of relevance is possible, to a certain extent, and implicit feedback models can benefit from taking into account user affective behavior.
Article
Full-text available
This paper describes an approach to exploit the implicit user feedback gathered during interactive video retrieval tasks. We propose a framework, where the video is first indexed according to temporal, textual, and visual features and then implicit user feedback analysis is realized using a graph-based methodology. The generated graph encodes the semantic relations between video segments based on past user interaction and is subsequently used to generate recommendations. Moreover, we combine the visual features and implicit feedback information by training a support vector machine classifier with examples generated from the aforementioned graph in order to optimize the query by visual example search. The proposed framework is evaluated by conducting real-user experiments. The results demonstrate that significant improvement in terms of precision and recall is reported after the exploitation of implicit user feedback, while an improved ranking is presented in most of the evaluated queries by visual example.
Conference Paper
Full-text available
While tags in collaborative tagging systems serve primarily an indexing purpose, facilitating search and navigation of resources, the use of the same tags by more than one individual can yield a collective classification schema. We present an approach for making explicit the semantics behind the tag space in social tagging systems, so that this collaborative organization can emerge in the form of groups of concepts and partial ontologies. This is achieved by using a combination of shallow pre-processing strategies and statistical techniques together with knowledge provided by ontologies available on the semantic web. Preliminary results on the del.icio.us and Flickr tag sets show that the approach is very promising: it generates clusters with highly related tags corresponding to concepts in ontologies and meaningful relationships among subsets of these tags can be identified.
Article
In this paper, the two different applications based on the Schema Reference System that were developed by the SCHEMA NoE for participation to the search task of TRECVID 2004 are illustrated. The first application, named ”Schema-Text”, is an interactive retrieval application that employs only textual information while the second one, named ”Schema-XM”, is an extension of the former, employing algorithms and methods for combining textual, visual and higher level information. Two runs for each application were submitted, I A 2 SCHEMA-Text 3, I A 2 SCHEMA-Text 4 for Schema-Text and I A 2 SCHEMA-XM 1, I A 2 SCHEMA-XM 2 for Schema-XM. The comparison of these two applications in terms of retrieval efficiency revealed that the combination of information from different data sources can provide higher efficiency for retrieval systems. Experimental testing additionally revealed that initially performing a text-based query and subsequently proceeding with visual similarity search using one of the returned relevant keyframes as an example image is a good scheme for combining visual and textual information.
R-trees: a dynamic index structure for spatial searching
  • A Gutmann
A. Gutmann, "R-trees: a dynamic index structure for spatial searching,", ACM International Conference on Management and Data (SIGMOD'88), Siena, Italy, 1988.
Combining Textual and Visual Information Processing for Interactive Video Retrieval
  • V Mezaris
  • H Doulaverakis
  • S Herrmann
  • H Mezaris
  • S Doulaverakis
  • Herrmann
  • Mezaris V.