Conference PaperPDF Available

A System Framework for Concept- and Credibility-Based Multimedia Retrieval


Abstract and Figures

We present a multimedia retrieval system framework that incorporates components for processing multimedia content in different modes and languages. The framework provides concept-based information retrieval facilities that applies credibility information for result re-ranking. The architecture combines both a direct user interface and a batched evaluation interface for reproducible research in multimedia IR. The demo presents a preliminary version of the system framework and shows a use case based on the ImageCLEF 2011 Wikipedia test collection.
Content may be subject to copyright.
A System Framework for Concept- and Credibility-Based
Multimedia Retrieval
Ralf Bierig
Vienna University of
Favoritenstr. 9-11 / 188
1040 Vienna, Austria
Cristina Serban
Alexandru Ioan Cuza
University of Iasi
Iasi, Romania
Alexandra Siriteanu
Alexandru Ioan Cuza
University of Iasi
Iasi, Romania
Mihai Lupu
Vienna University of
Favoritenstr. 9-11 / 188
1040 Vienna, Austria
Allan Hanbury
Vienna University of
Favoritenstr. 9-11 / 188
1040 Vienna, Austria
We present a multimedia retrieval system framework that
incorporates components for processing multimedia content
in dierent modes and languages. The framework provides
concept-based information retrieval facilities that applies cred-
ibility information for result re-ranking. The architecture
combines both a direct user interface and a batched evalu-
ation interface for reproducible research in multimedia IR.
The demo presents a preliminary version of the system frame-
work and shows a use case based on the ImageCLEF 2011
Wikipedia test collection.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Miscellaneous;
H.3.1 [Information Storage and Retrieval]: Content
Analysis and Indexing—Indexing methods
General Terms
Concept Learning, Credibility, Experimentation, Framework
Over the past 10 years, the growing social web has estab-
lished an impressive selection of very large and complex data
sets. The success of this movement is omnipresent: There
are at least 200 social networks1currently active that attract
100s of millions of visitors monthly. Wikipedia as one major
example is not only the most comprehensive encyclopedia,
Permission to make digital or hard copies of part or all of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights for
third-party components of this work must be honored. For all other uses,
contact the Owner/Author.
Copyright is held by the owner/author(s).
ICMR ’14, Apr 01-04 2014, Glasgow, Scotland
ACM 978-1-4503-2782-4/14/04 ..
but has demonstrated to match the quality of commercially
edited counterparts (e.g. Britannica) [8]. People spend con-
siderable amounts of time on social networks where they
contribute and shape the information landscape of the fu-
ture. Flickr draws nearly 70 million visitors monthly to its
collaborative image collection and 30 billion pieces of con-
tent are shared on Facebook every month [6]. The knowledge
contained in these massive data networks is impressive and
could be harvested and put to use for a wide range of subject
areas. Although research has already been directed toward
automatic information mining from these rich sources, the
problem of knowledge extraction from multimedia content
remains dicult. The main challenges are the heterogeneity
of the data, the scalability of the processing methods and the
reliability of the results. The CHIST-ERA MUCKE project
focuses on these areas and aims to create an architecture
and testbed that integrates and evaluates a scalable range
of methods to overcome these challenges. In this demo, we
present a preliminary prototype of the system framework
that highlights the overall architecture and shows a use case
of its application based on the ImageCLEF 2011 Wikipedia
test collection. The demo provides a user interface where the
general searcher can retrieve images from the forementioned
collection based on their underlying concepts and results are
ranked by their credibility properties. We equally address
the researcher as a user of the system by oering a batch
interface that allows for automatic TREC-style evaluations.
Our aim is that by adding the notion of concept and cred-
ibility to multimedia search, we can help people to extract
content that was formerly hidden and, in addition, rank
the importance of results based on an alternative feature.
The next section describes work that is closely related. We
present the main aspects of our approach in Section 2 and
describe the system architecture for our demo in Section 3
together with an overview of how researchers and users work
with it and how new test collections are integrated. The Im-
ageCLEF 2011 collection is applied as an example collection
for our demo. We elaborate on our plans for the future in
Section 4.
This section describes our focus on multimedia retrieval
— mainly by extending traditional text and image index-
ing with a combined multi-modal concept index and using
credibility as an alternative for re-ranking search results.
2.1 Concept Indexing
Automatically estimating similarity between texts is an
old problem. Purely data driven methods, such as Latent
Semantic Analysis (LSA) and Explicit Semantic Analysis
(ESA) [2, 7] were proposed and perform well. Multimedia
document similarity, on the other hand, is more dicult
due to their much larger conceptual spaces. Methods that
exploit low level features, such as colour and texture [9],
and bag-of-visual-words [1] have problems capturing and de-
scribing the semantic content of images adequately. Recent
studies apply object recognition that rely on invariant lo-
cal descriptors [5]. Recently, the use of a large number of
semantic concept detectors is proposed for multimedia re-
trieval [3, 4]. An important innovation of MUCKE will be
the proposition of a multimedia concept similarity frame-
work for reliable concept recognition that will combine text
and image information within a single framework. Using
LSA as a baseline for our proposed framework, we address
the limitations of LSA being based too narrowly on bag-of-
word representations and ignoring important linguistic phe-
nomena in the underlying content. The distinction between
text and image concepts takes into account their dierent
nature. Whereas textual concepts are often high level (e.g.
a car), visual concepts tend to focus on very specific low
level attributes (e.g. metallic texture of a car). Combining
both flavors of concepts in a common framework make them
accessible for indexing and searching.
2.2 Credibility
An IR system may be viewed in four coarse-grained levels:
a user, the content, a ranking or a retrieval system, and a
user interface. The credibility model used in MUCKE dis-
tinguishes four aspects: expertise,trustworthiness,quality,
and relia bilit y, with the following definitions:
expertise denotes the knowledge ability to provide truth-
ful information;
trustworthiness denotes the estimated intent to pro-
vide truthful information;
quality denotes the ability to convey the truthfulness
of the information provided;
relia bilit y denotes the consistency of the object’s abil-
ity to provide information.
Each of the four above-mentioned aspects may be applied
to each of the four levels where the issue of credibility may
appear. In doing a credibility assessment, while it would be
desirable to consider the system as a whole, in practice we
may choose to focus on a particular aspect and a particular
level of credibility. For instance, in the case of multimedia
streams consisting of images and tags associated with them,
we may focus on the credibility of the user by evaluating the
quality of the tags the user created thus assessing the cred-
ibility of the images based on the user creating their tags.
The aim of the system framework therefore is to support
multimedia retrieval focused on the credibility of its user
base, the content it retrieves, the algorithms that are used
to retrieve and rank its content, and the user interface.
We describe the system architecture with respect to the
overarching principles that drove our design process and
then highlight the individual components. We then focus
on how the framework satisfies dierent kinds of users and
briefly describe how content collections are integrated.
3.1 Architectural Principles
In the context of credibility and concept-based multime-
dia, as described above, the framework oers a set of archi-
tectural principles that make it appealing for the research
Highly modularized and configurable components allow
researchers to apply the various parts of the framework
as independent variables for their experiments.
Evaluation-based research is supported by the ability
to evaluate diverse settings as variables using auto-
matic batch execution.
Reprodu cible re search is fostered by capturing configu-
rations of key components as part of the evaluation and
storing it for documentation and reuse. Researchers
can retrieve not only the results but also the settings
that generated the results.
The framework is extensible in terms of
new test collections to better evaluate existing
methods in dierent content settings and
new logic (e.g. new indexing and search methods)
to widen the scope of retrieval and explore new
algorithms and approaches.
3.2 Framework Components
The components of the system framework are depicted
in Figure 1 and represent the conceptual elements of the
The concept package describes the concept model. Ev-
ery indexed document consists of a list of Field objects.
There are three types of Field objects: TextField, Tag-
Field and ImageField corresponding to the dierent
types of data a document can contain: text, tags and
images. By processing these fields, a list of concept ob-
jects is obtained. We make a distinction between two
types of concepts: textual and visual. The distinction
is motivated by their dierent nature. Whereas textual
concepts tend to be high level (e.g. tree, house or car),
visual concepts are often treated as low level features
(e.g. rectangular shapes, red color or wood surface
pattern). Combining both kinds of concepts in a single
framework provides the advantage of combining both
levels of granularity and taking advantage from each
level of detail. To extract concepts from fields, three
document field types (text, tag and image) are applied
that may have alternative implementations. This leads
to a potentially large variety of ways with which con-
cepts can be extracted from multimedia content.
Figure 1: Components of the Concept and Credibility-Based Multimedia IR Framework
The credibility package provides an interface for the
four dierent types of credibility described in Section 2.2.
The system framework currently encapsulates two of
the four types of credibility. First, the credibility of the
user who produced the content (a user who created an
image on Flikr) is maintained by the user profiler. Sec-
ond, the objective credibility of the content (e.g. the
source, language or link structure of a document) that
is generated by the content profiler. The credibility
of the ranking or a retrieval system and the credibil-
ity of the user interface is currently not addressed but
interfaces are provided to add these elements in the
The index package oers interfaces to a text index, an
image index and a concept index. The text index is
represented by a classic Lucene2inverted text index
where indexed documents are represented by a list of
tokenized text fields. The image index is implemented
using Lucene Lire3that incorporates image features
(e.g. color distributions). The concept index uses the
concept package to transform text, tags and images
into text concepts (TConcepts) and visual concepts
(VConcepts) before storing them into a concept-based
index storage that is made available for concept search
by the search package.
The search package provides search functionality for
queries using text, images and concepts in any combi-
nation. For every search, the query text, the image ID
or a list of concepts will be provided along with an op-
tional list. Already existing results may be applied as
a filter to process the final list of results. The search is
separated from index and query to keep the framework
flexible and because it is generally required to involve
a combination of several indices, either in parallel or
The clustering package provides tools for merging re-
sults into ranked lists, basically tools for re-ranking
search results of dierent modalities while considering
the similarity and the credibility of the content.
The query package processes input strings or input im-
age collections into unified query objects that consist
of fields. There are three kinds of fields representing
the three types of queries the framework handles: text,
tags and image. These fields allow queries to be arbi-
trary combinations of dierent multimodal elements
that can be used interchangeably within the frame-
The batch i nterf ace provides batch-access to the frame-
work where evaluations can be configured and executed
by researchers. Currently, this interface is controlled
by configuration files, one for each type of evaluation
run. The user interface package provides a Google-like
web interface for general users to query for images.
The plugin package enables researchers to inject addi-
tional functionality — in particular configuration and
logic relating to additional IR test collections. This
is key for an extensible system architecture. A plu-
gin optionally relates to a collection outside the code
base (via a link) and may also generate own indices.
At the minimum, it has a configuration that defines
the location of collections and indices and provides
the parametrization of additional operations. Plugins,
once added, can then be used for batched IR evaluation
runs in any combination and with diverse parameteri-
zations within the framework.
The evaluation package deals with the evaluation envi-
ronment (i.e. the evaluation parameters and the eval-
uation results) and provides an interface to common
IR metrics (e.g. p@10, MAP and R-Precision). Evalu-
ation environments can be expanded with plugins and
are stored with the data package.
The data package provides a data model for the system
framework that is stored in a database. Besides keep-
ing a history of evaluation results it also stores the
parametric settings of the evaluation runs that pro-
duced the results. This is an attempt to not only cap-
ture results but also to keep information about how
they were generated, hence creating a more holistic
and reproducible research landscape.
3.3 User Access
The system framework serves two kinds of users, researchers
and general users, and provides user interfaces for these two
users groups. An instance of the standard user interface
is provided in Figure 2 that shows a minimalistic Google-
like image search interface for the general searcher. The
Figure 2: Standard Multimedia User Interface for
image search
much more complex batch interface is provided by the means
of configuration files that instruct the system framework to
perform automatic evaluations on test collections and store
both settings and results in the system database through the
evaluation package and the data package.
3.4 Content Integration
The system framework provides general support for inte-
grating a wide range of dierent IR test collections. Alter-
native test collections can be included as plugins with ad-
ditional logic and adapted configuration parameters. These
plugins define the location and structure of new data sets
and inject new logic for indexing and searching. The aim
of the framework is to provide a comprehensive set of func-
tionality that minimizes the need for programming, and to
reduce the integration work to an adapting of plugin param-
eters (e.g. by pointing to the test collection or by configuring
existing indexing and search parameters of the core API).
The demo applies the ImageCLEF 2011 Wikipedia test col-
lection which consists of 237,434 wikipedia images dumped
from September 2009 with a wide topical range that are asso-
ciated with unstructured and noisy text in German, French
and English. The collection provides 50 topics with binary
relevance judgements. This collection represents a sound ex-
ample for a use case as it provides collaboratively created
multimodal data that crosses three common languages and
provides low level text and image features that can be used
for concept detection, indexing and search.
The framework will be improved and extended throughout
the project lifecycle as a platform that hosts our research on
multimedia IR as it is needed by all MUCKE project part-
ners. We plan to mature it to a quality where it can be
released as an open source project for the benefit of the IR
community. Future work will focus on the improvement and
extension of the text and visual concept detection compo-
nents that directly support improved indexing and searching
capabilities, and the inclusion of alternative models of cred-
ibility that support all four credibility types. Furthermore,
we plan to turn the framework into a service architecture
that would allow it to work in distributed environments.
MUCKE is located within the CHIST-ERA Program for
Transnational Research Projects and funded by the Austrian
Science Foundation (FWF) project number I 1094-N23.
[1] G. Csurka, C. Dance, L. Fan, J. Willamowski, and
C. Bray. Visual categorization with bags of keypoints.
Workshop on statistical learning in computer vision,
[2] E. Gabrilovich and S. Markovitch. Computing semantic
relatedness using wikipedia-based explicit semantic
analysis. In Proceedings of the 20th International Joint
Conference on Artifical Intelligence, IJCAI’07, pages
1606–1611, San Francisco, CA, USA, 2007. Morgan
Kaufmann Publishers Inc.
[3] A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and
H. Wactlar. Can high-level concepts fill the semantic
gap in video retrieval? a case study with broadcast
news. IEEE Transactions on Multimedia,9(5):958966,
[4] L.-J. Li, H. Su, Y. Lim, and L. Fei-Fei. Objects as
attributes for scene classification. In K. Kutulakos,
editor, Trends and Topics in Computer Vision, volume
6553 of Lecture Notes in Computer Science, pages
57–69. Springer Berlin Heidelberg, 2012.
[5] D. G. Lowe. Distinctive image features from
scale-invariant keypoints. International journal of
computer vision,60(2):91110,2004.
[6] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs,
C. Roxburgh, and A. H. Byers. Big data: The next
frontier for innovation, competition, and productivity.
McKinsey Global Institute, 2011.
[7] K. Radinsky, E. Agichtein, E. Gabrilovich, and
S. Markovitch. A word at a time: computing word
relatedness using temporal semantic analysis. In
WWW, pages 337–346, 2011.
[8] N. J. Reavley, A. J. Mackinnon, A. J. Morgan,
M. Alvarez-Jimenez, S. E. Hetrick, E. Killackey,
B. Nelson, R. Purcell, M. B. Yap, A. F. Jorm, et al.
Quality of information sources about mental disorders:
a comparison of wikipedia with centrally controlled web
and printed sources. Psychological medicine,42(8):1753,
[9] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta,
and R. Jain. Content-based image retrieval at the end
of the early years. IEEE Transactions on Pattern
Analysis and Machine Intel ligence,22(12):13491380,
... Data flow within the multimedia search system based on concepts and credibility(Bierig et al., 2014) ...
Full-text available
The habilitation thesis presents two main directions: 1. Exploiting data from social networks (Twitter, Facebook, Flickr, etc.) - creating resources for text and image processing (classification, retrieval, credibility, diversification, etc.); 2. Creating applications with new technologies : augmented reality (eLearning, games, smart museums, gastronomy, etc.), virtual reality (eLearning and games), speech processing with Amazon Alexa (eLearning, entertainment, IoT, etc.). The work was validated with good results in evaluation campaigns like CLEF (Question Answering, Image CLEF, LifeCLEF, etc.), SemEval (Sentiment and Emotion in text, Anorexia, etc.).
... Data flow within the multimedia search system based on concepts and credibility(Bierig et al., 2014) ...
Full-text available
This thesis presents the author's research activity after March 2009 when he defended his Ph.D. Thesis "Textual Entailment" from the artificial intelligence domain, related to natural language processing (NLP). NLP is a field of research that covers computer understanding and manipulation of human language, trying to make the machine derive meaning from human language in a smart and useful way, and performing difficult tasks such as information retrieval and extraction, question answering, exam marking, document classification, report generation, automatic summarization and translation, speech recognition, dialogs between human and machine, or other tasks currently performed by humans such as help-desk jobs. Continuing the work from this domain another great challenge of the NLP was approached, the one related to the creation of large textual resources, where the notion of credibility was introduced. When this activity is done manually by human experts, it is costly in terms of the time required to create them and in terms of the human resource to be involved. At the same time, these resources are the basic elements of NLP software applications, their quality depending on the size of the resources and their quality. For this reason, in recent years, automated methods involving social networks have proved to be a worthy method to consider, because here we have access on the one hand to many data, and on the other hand to the thoughts and feelings of users, their comments on events or products, etc.. A new direction that emerged after supporting the Ph.D. thesis is related to the use of new technologies in applications, which will come to the aid of the users who use them. Applications have been made for smartphones or tablets most of the time, and have used more natural modes of user interaction with applications. We mention here, the interpretation of user gestures, speech recognition, the exploitation of images made with the help of video cameras and the use of information taken by sensors. Augmented reality and virtual reality are two current research directions that allow the creation of applications for increasingly diverse fields, such as e-learning, games, interior design, museums, botanical gardens, medicine, etc.. There are two main directions: exploiting data from social networks and using new technologies to improve the quality of life, which corresponds to the next two sections. The last section contains conclusions and proposals for future work.
... Figure 1 shows an overview of the MUCKE framework, covering how documents are processed, concepts extracted and indexed, similarity computed based on concepts, text and images, and how credibility is estimated and fed into the re-ranking process to improve the final set of results. (Bierig et al., 2014) One of the project goals is to prove the feasibility of the models and methods over largescale multimodal data. The data collected for the MUCKE project is highly dynamic and complex, two characteristics required for an extraction framework to be implemented in a flexible manner so as to cope with new data whenever needed. ...
Conference Paper
Full-text available
MUCKE (Multimedia and User Credibility Knowledge Extraction) is a CHIST-ERA research project whose goal is to create an image retrieval system that takes into account available information from social networks. In this paper, we give a short overview of the MUCKE project, and we present the work done by the UAIC group. MUCKE incorporates modules for processing multimedia content in different modes and languages and UAIC is responsible with text processing tasks. One of the problems addressed by our work is related to search results diversification. In order to solve this problem, we first process the user queries in both languages and secondly, we create clusters of similar images.
This paper presents main activities of UAIC ("Alexandru Ioan Cuza" University) team from the MUCKE project. MUCKE addressed the stream of multimedia social data with new and reliable knowledge extraction models designed for multilingual and multimodal data shared on social networks. Credibility models for multimedia streams are a novel topic, and constituted the main scientific contribution of the project. UAIC group was involved in the main tasks of the project: building the data collection, text processing, diversification in image retrieval and data credibility.
Conference Paper
A great variety of websites try to help users in finding items of interest by offering a list of recommendations. It has become a function of great importance, especially for online stores. This paper presents a recommendation system for images which works with ratings to compute similarities, and with social profiling to introduce diversity in the list of suggestions.
Conference Paper
Full-text available
Robust low-level image features have proven to be effective representations for a variety of high-level visual recognition tasks, such as object recognition and scene classification. But as the visual recognition tasks become more challenging, the semantic gap between low-level feature representation and the meaning of the scenes increases. In this paper, we propose to use objects as attributes of scenes for scene classification. We represent images by collecting their responses to a large number of object detectors, or "object filters". Such representation carries high-level semantic information rather than low-level image feature information, making it more suitable for high-level visual recognition tasks. Using very simple, off-the-shelf classifiers such as SVM, we show that this object-level image representation can be used effectively for high-level visual tasks such as scene classification. Our results are superior to reported state-of-the-art performance on a number of standard datasets.
Full-text available
We present a novel method for generic visual catego-rization: the problem of identifying the object content of natural images while generalizing across variations inherent to the ob-ject class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naïve Bayes and SVM. The main advan-tages of the method are that it is simple, computationally effi-cient and intrinsically invariant. We present results for simulta-neously classifying seven semantic visual categories. These re-sults clearly demonstrate that the method is robust to back-ground clutter and produces good categorization accuracy even without exploiting geometric information.
Conference Paper
Full-text available
Computing the degree of semantic relatedness of words is a key functionality of many language applications such as search, clustering, and disambiguation. Previous approaches to computing semantic relatedness mostly used static language resources, while essentially ignoring their temporal aspects. We believe that a considerable amount of relatedness information can also be found in studying patterns of word usage over time. Consider, for instance, a newspaper archive spanning many years. Two words such as "war" and "peace" might rarely co-occur in the same articles, yet their patterns of use over time might be similar. In this paper, we propose a new semantic relatedness model, Temporal Semantic Analysis (TSA), which captures this temporal information. The previous state of the art method, Explicit Semantic Analysis (ESA), represented word semantics as a vector of concepts. TSA uses a more refined representation, where each concept is no longer scalar, but is instead represented as time series over a corpus of temporally-ordered documents. To the best of our knowledge, this is the first attempt to incorporate temporal evidence into models of semantic relatedness. Empirical evaluation shows that TSA provides consistent improvements over the state of the art ESA results on multiple benchmarks.
Full-text available
The paper presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.
Conference Paper
Full-text available
Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r=0.56 to 0.75 for individual words and from r=0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.
This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Although mental health information on the internet is often of poor quality, relatively little is known about the quality of websites, such as Wikipedia, that involve participatory information sharing. The aim of this paper was to explore the quality of user-contributed mental health-related information on Wikipedia and compare this with centrally controlled information sources. Content on 10 mental health-related topics was extracted from 14 frequently accessed websites (including Wikipedia) providing information about depression and schizophrenia, Encyclopaedia Britannica, and a psychiatry textbook. The content was rated by experts according to the following criteria: accuracy, up-to-dateness, breadth of coverage, referencing and readability. Ratings varied significantly between resources according to topic. Across all topics, Wikipedia was the most highly rated in all domains except readability. The quality of information on depression and schizophrenia on Wikipedia is generally as good as, or better than, that provided by centrally controlled websites, Encyclopaedia Britannica and a psychiatry textbook.
A number of researchers have been building high-level semantic concept detectors such as outdoors, face, building, to help with semantic video retrieval. Our goal is to examine how many concepts would be needed, and how they should be selected and used. Simulating performance of video retrieval under different assumptions of concept detection accuracy, we find that good retrieval can be achieved even when detection accuracy is low, if sufficiently many concepts are combined. We also derive suggestions regarding the types of concepts that would be most helpful for a large concept lexicon. Since our user study finds that people cannot predict which concepts will help their query, we also suggest ways to find the best concepts to use. Ultimately, this paper concludes that "concept-based" video retrieval with fewer than 5000 concepts, detected with a minimal accuracy of 10% mean average precision is likely to provide high accuracy results in broadcast news retrieval.