ArticlePDF Available

The Continuous Media Web: A distributed multimedia information retrieval architecture extending the World Wide Web

Authors:

Abstract and Figures

The World Wide Web, with its paradigms of surfing and searching for information, has become the predominant system for computer-based information retrieval. Media resources, however information-rich, only play a minor role in providing information to Web users. While bandwidth (or the lack thereof) may be an excuse for this situation, the lack of surfing and searching capabilities on media resources are the real issue. We present an architecture that extends the Web to media, enabling existing Web infrastructures to provide seamless search and hyperlink capabilities for time-continuous Web resources, with only minor extensions. This makes the Web a true distributed information system for multimedia data. The article provides an overview of the specifications that have been developed and submitted to the IETF for standardization. It also presents experimental results with prototype applications.
Content may be subject to copyright.
Multimedia Systems (2005) 10(6): 544–558
DOI 10.1007/s00530-005-0181-8
REGULAR PAPER
Silvia Pfeiffer ·Conrad Parker ·André Pang
The Continuous Media Web: a distributed multimedia information
retrieval architecture extending the World Wide Web
Published online: 8 August 2005
c
Springer-Verlag 2005
Abstract The World Wide Web, with its paradigms of surf-
ing and searching for information, has become the predom-
inant system for computer-based information retrieval. Me-
dia resources, however information-rich, only play a minor
role in providing information to Web users. While band-
width (or the lack thereof) may be an excuse for this situ-
ation, the lack of surfing and searching capabilities on me-
dia resources are the real issue. We present an architecture
that extends the Web to media, enabling existing Web in-
frastructures to provide seamless search and hyperlink ca-
pabilities for time-continuous Web resources, with only mi-
nor extensions. This makes the Web a true distributed infor-
mation system for multimedia data. The article provides an
overview of the specifications that have been developed and
submitted to the IETF for standardization. It also presents
experimental results with prototype applications.
Keywords Continuous Media Web ·Annodex ·CMML ·
Markup of media ·Open media metadata standards
1 Introduction
Since the Web’s very foundation by Tim Berners-Lee, aca-
demics, entrepreneurs, and users have dreamt of ways to in-
tegrate the Web’s various multimedia contents into one dis-
tributed system for storage and retrieval. The power of the
World Wide Web stems from being a world-wide distributed
information storage system architected to be scalable, simple
to author for, and simple to use. Information is found mostly
through Web search engines such as Google or through Web
portals, both of which commonly just provide starting points
for hyperlinking (surfing) to other related documents. In-
formation nowadays encompasses heterogeneous Web re-
sources of all sorts, including media. The uniform means
of accessing information on the Web is through hyperlinks
S. Pfeiffer (B
)·C. Parker ·A. Pang
CSIRO-ICT Centre PO Box 76 Epping, NSW 1710, Australia
E-mail: {Silvia.Pfeiffer, Conrad.Parker, Andre.Pang}@csiro.au
given in the URI (Unified Resource Identifier) format (see
http://www.w3.org/Addressing/).
The World Wide Web is currently the predominant dis-
tributed information retrieval architecture on the Internet [2].
However, it is not yet a distributed information retrieval sys-
tem for time-continuously sampled data such as audio and
video. It is only set up well for information retrieval on
HTML resources. Although the number of audio and video
data on the Internet is increasing, these documents cannot be
used as intensively as HTML content.
While URIs provide a uniform means to access and ex-
plore information, they can only point to a complete audio
or video file, not into a specific temporal offset or a spe-
cific segment of interest. Also, when viewing an audio or
video resource, the hyperlinking functionality of the Web is
“left behind” and a dead end is reached. Audio or video re-
sources cannot typically hyperlink to further Web resources,
thus interrupting the uniform means of information explo-
ration provided by the Web.
Web information search is also deficient for media re-
sources. While it is nowadays possible to search the content
of binary text documents, such as Postscript files or word
processor documents, and identify whether they contain in-
formation relevant to a particular query, this is not possible
for media resources. Media resources do not currently have
a textual representation that fits into the text-based index-
ing paradigm of existing Web search engines, and therefore
their content, however information-rich, cannot be searched
uniformly on the Web.
Realizing this situation, we have developed an exten-
sion to the World Wide Web that addresses these issues. We
are proposing a new standard for the way time-continuously
sampled data such as audio and video files are placed on
Web sites to make media content just as “surfable” as ordi-
nary text files. In this new standard, users can link not only
from a text passage to e.g., a video, but into a specific time
interval containing the information being sought [21]. The
time-continuous resource itself is annotated with HTML-
like markup and has an XML representation of its content,
enabling Web search engines to index it. The markup may
The continuous media web 545
contain hyperlinks to other Web resources, which enables
search engines to crawl time-continuous resources in the
same way as they crawl HTML pages. These hyperlinks also
enable the Web user to hyperlink through a time-continuous
resource, thus integrating these resources into the “surfing”
paradigm of the Web.
At CSIRO we call the Web that is extended to time-
continuous resources the Continuous Media Web (CMWeb)
[24]. The technology is based upon the observation that the
data stream of existing time-continuous data might be tem-
porally subdivided based on a semantic concept, and that
this structure enables access to interesting subparts, known
as clips, of the stream. This structure is captured in an
HTML-like markup language, which we have named Con-
tinuous Media Markup Language (CMML) [23]. CMML
allows annotations and hyperlinks to be attached to me-
dia resources. For synchronized delivery of the markup and
the time-continuous data over the Web, we have developed
a streamable container format called Annodex [22], which
encapsulates CMML together with the time-continuous re-
source. When consuming (i.e., listening, viewing, reading)
such an annotated and indexed resource, the user experience
is such that while, e.g., watching a video, the annotations
and links change over time and enable browsing of collec-
tions simply by following a link to another Web resource.
The remainder of this article is organized as follows.
Section 2provides some definitions for the key concepts dis-
cussed in this article, and Sect. 3discusses some related re-
search work. Section 4describes the specifications of the
CMWeb architecture and technologies, introducing tempo-
ral URI addressing, the Continuous Media Markup Lan-
guage (CMML), and the Annodex format. Switching the fo-
cus to the actual handling of Annodex resources, Sects. 5
and 6detail how an Annodex resource is being produced
fromanexistingmediafilesuchasanMPEG-1 encoded
video and how it is distributed over the Internet. Section 7
looks at information retrieval through Web search engines.
Section 8provides some experimental results of first imple-
mentations. Section 9compares the CMWeb to other media
standards with respect to their capabilities of extending the
World Wide Web to a distributed multimedia information
retrieval infrastructure. Finally, Sect. 10 concludes the paper
with a summary and outlook for future work.
2 Definitions
Before we can discuss work in this field, we need to estab-
lish that the terms we use have a well-understood meaning.
Therefore, we now introduce some fundamental definitions.
Time-continuously sampled data are any sequence of
numbers that represents an analog-time signal sampled in
discrete time steps. In contrast to actual discrete-time sig-
nals as known from signal processing, time-continuously
sampled data may also come in compressed form, such
that a block of numbers represents an interval of time.
Decompression will lead to the higher temporal resolu-
tion that the resource was sampled at in the first place.
This is the case for all common audio and video compres-
sion formats. Other types of time-continuously sampled
data may, for example, be physical measurements of nat-
ural phenomena, such as the acidity (pH) value of a lake,
or the temporally changing value of a stock market. Time-
continuously sampled data are also sometimes just called
time-continuous data or, in the case of audio or video, just
media.
Time-continuous (Web) resource is a time-continuous data
stream that can be distributed by a Web server. The re-
source may exist as a file or may be created on the fly
(live or composed). To retain the ability to handle and dis-
play the time-continuous data progressively during deliv-
ery over HTTP, a time-continuous Web resource must fur-
ther be streamable, i.e., it must be possible to decode it in
a single linear pass using a fixed amount of memory.
Clip is an arbitrary temporal section of a time-continuous
Web resource as defined by a user. A clip makes sense
with respect to some semantic measure of the user. A
whole time-continuous Web resource may be regarded as
a clip, though the more common case is to describe such
a resource as a combination of several clips.
Annotation is a fee-text, unstructured description of a clip.
Metadata are a set of name–value pairs that provide
databaselike structured descriptions, e.g., in HTML meta-
data are represented in meta elements within the head tag
[29]. An existing metadata scheme that is independent of
the format of data that it can describe is the Dublin Core
[7]).
Hyperlink is a Unified Resource Identifier that can point
to or into any Web resource. In this context, the term me-
dia browsing is then mainly used for following hyperlinks
into and out of media files rather than looking through a
collection of media files, as is the more common use of
the term in media management systems.
Meta-information is a collection of information about a
time-continuous data stream, which may include annota-
tions, metadata, and hyperlinks.
Markup is the collection of tags that are used to provide
metainformation to a Web resource. The most common
language for providing markup is XML [33].
3 Related research
There have been several approaches to a better integration
of media documents into the World Wide Web that have not
been standards based. These will be briefly discussed here.
A standards-based approach is superior as it enables interop-
erability between different proprietary solutions and scales
over the whole Internet rather than providing a solution lim-
ited to a subpart of the Internet only.
The database community has been addressing this issue.
Here, the idea is to use databases to store references to media
objects and metainformation about them (see, e.g., [14,16]).
546 S. Pfeiffer et al.
This approach requires the introduction of a middle layer
(middleware) to provide an interface between a Web server
and the database in the backend. A media presentation, such
as a SMIL presentation, can be dynamically created by the
middleware upon a user’s HTTP request. Such hiding of in-
formation in databases, however, excludes Web search en-
gines from accessing the metainformation about the media
objects. Also, hyperlinking into clips of media is not gener-
ally possible. Therefore, while the database approach solves
content management requirements for time-continuous Web
resources, it does not extend the searching and surfing in-
frastructure of the Web to audio or video. Most of the ap-
proaches in the database community rely on SMIL do pro-
vide that functionality. We will discuss the feasibility of
SMIL later in this article.
The search engine community has approached the
partial problem of searching media on the Web. While
crawling for common audio, video, or image content is
not possible as these resources do not have hyperlinks
associated with them, this community has come up with
interesting special solutions, most of which are based
on simple metainformation attached to the data via file-
names or simple schemes for specific resource types such
as ID3 tags [19] for MP3 files. A good collection of
currently available media search engines can be found
at http://searchenginewatch.com/links-
/article.php/2156251 [28]. These are special so-
lutions, partly also resulting from signal-processing-based
automated audio and video content analysis techniques
developed over the last decade (see, e.g., [11,17,27,37]).
Nobody has as yet been able to handle time-continuous Web
resources and HTML Web pages in a single query entry
interface.
4 The CMWeb technology
To turn the World Wide Web into an architecture for mul-
timedia information retrieval, it is instructional to examine
what turned the Web into an architecture for text informa-
tion retrieval. This will help extrapolate the gaps that need
to be filled to extend the Web’s distributed information re-
trieval infrastructure to time-continuous data.
The World Wide Web was created by three core tech-
nologies [4]: HTML, HTTP, and URIs. They respectively
enable:
The markup of textual data, giving them an addressable
structure, metadata, and outgoing hyperlinks;
The distribution of the marked-up documents over a sim-
ple Internet-based protocol providing a standard inter-
change format; and
The hyperlinking to and into Web documents.
These together created the scalable distributed information
retrieval architecture we know as the World Wide Web with
its uniform means of accessing, exploring, and searching
for information. These core technologies enabled the imple-
mentation of Web browsers and Web search engines, which
turned the Web into a distributed information retrieval sys-
tem. However, time-continuous data such as audio or video
currently do not form part of the Web’s searching and surfing
infrastructure. What are they missing?
Extrapolating from the experiences of the Web, the fol-
lowing three core capabilities are required to enable a Web
of time-continuous resources:
A markup language to create an addressable structure,
searchable metadata, and outgoing hyperlinks for a time-
continuous resource;
An integrated document format that can be distributed
via HTTP and enables synchronized delivery of markup
and data, providing a standard interchange format;
and
A means to hyperlink into time-continuous resources.
The technical challenge for the development of
the CMWeb [1] was the creation of a solution to
these three issues. We have therefore developed three
specifications:
1. CMML [23], the Continuous Media Markup Language,
which is based on XML and provides tags to mark
up time-continuous data into sets of annotated tempo-
ral clips. CMML draws upon many features of HTML
[29].
2. Annodex [22], the binary stream format to store and trans-
mit interleaved CMML and media data.
3. Temporal URIs [21], which enable hyperlinking to tem-
porally specified sections of an Annodex resource.
The cited documents are works in progress, and
changes may be made as the technical discussions
with standards bodies and other experts continue.
For the latest versions, check out http://www.
annodex.net/specifications.html. Reference
implementations of the core Annodex technology, including
desktop browsers, for the CMWeb have been developed and
areavailableathttp://www.annodex.net.
The following three subsections present in more detail
the three core technologies that enable the CMWeb. See
Sect. 8for example implementations of a CMWeb browser,
an Annodex-enabled search engine, and an Annodex en-
abled Web server.
4.1 Temporal URI references
Web-integrated access to clips or temporal offsets in time-
continuous data requires URIs [3] that point to such resource
references. Already [25] discussed the need to specify better
addressing schemes for media fragments to improve integra-
tion of media content into the Web.
We have developed two ways to point to subparts of
time-continuous data: URI fragment identifiers and URI
query components. According to the URI standard [3]
URI fragments are specified in a URI after the hash
#character and URI queries after the question mark?
character.
The continuous media web 547
By definition, URI fragments can be interpreted on the
client application only [3]. However, media data are usually
high-bandwidth and large-size data. Hence, downloading a
complete media file before performing the offset action may
not be desirable if the user has to wait for an unacceptable
amount of time. Therefore, the same scheme that is used to
access fragments locally is also proposed as a generic URI
query scheme to tell the server to provide only the requested
fragment(s) of the media.
Our proposed URI format for fragment and query iden-
tifiers for time-continuous Web resources is specified in an
Internet Draft [21] and is conformant to the URI standard
given in RFC 2396 [3]. The syntax is closely related to the
specification of relative timestamps of the RTSP protocol pa-
rameters as given in RFC 2326 [26].
Two fundamentally different ways of addressing in-
formation in an Annodex resource are necessary: ad-
dressing of clips and addressing of time offsets or
time segments. These ways of addressing also ex-
tend to CMML resources, as these are essentially tex-
tual representations of the content of an Annodex re-
source. Note that as CMML resources are XML files
(Sect. 4.2), any element in a CMML resource can
also be addressed using XPointer [32]andXPath[35]
constructs. These addressing schemes do not extend
to binary data such as time-continuous data, and the
schemes provided here were developed to bridge that
gap.
4.1.1 Addressing of clips
Clips in Annodex resources (file extension .anx) are identi-
fied by their id attribute. These id values create anchor points
to which URI references can be attached. Thus, accessing a
named clip in an Annodex resource is achieved with the fol-
lowing CGI [18] conformant query parameter specification:
id=”clip_id”
or the following fragment specification:
#clip_id
Examples for accessing a clip in an example CMML or
Annodex resource are:
http://foo.bar/example.cmml?id=“findingGalaxies”
http://foo.bar/example.anx?id=“findingGalaxies”
http://foo.bar/example.cmml#findingGalaxies
http://foo.bar/example.anx#findingGalaxies
For delivering the queried CMML or Annodex resources
from a CMWeb Server, the resource may need to be pro-
cessed before being served out. To gain a conformant data
format that starts at the clip offset, the inherent basetime of
the resource has to be adjusted to the start time of the re-
quested clip, which is only possible for a format that, like
Annodex, stores a basetime for the first frame of its encapsu-
lated time-continuous data. Then, an interval of the resource
has to be extracted starting with the time-continuous data
from the queried clip data, which needs to be appended to
adjusted file headers.
4.1.2 Linking to time segments
It is also desirable to be able to address any arbitrary
time segment of an Annodex or CMML resource. This is
again achieved with a CGI conformant query parameter
specification:
t=[time-scheme:]time_interval
or the following fragment specification:
#[time-scheme:]time_interval
Available time schemes are npt for normal play time,
smpte specifications of the Society of Motion Pictures and
Television Engineers (SMPTE), and clock for a Universal
Time Code (UTC) time specification. For more details, see
the specification document [21].
Examples for requesting one or several time inter-
vals from a given sample CMML and Annodex resources
are:
http://foo.bar/example.cmml?t=85.28
http://foo.bar/example.cmml#85.28
http://foo.bar/example.anx?t=npt:15.6-85.28, 100.2
http://foo.bar/example.anx#npt:15.6-85.28,100.2
http://foo.bar/example.anx?t=smpte-25:00:01: 25:07
http://foo.bar/example.anx#smpte-25:00:01:25:07
.../example.anx?t=clock:20040114T153045.25Z
.../example.anx#clock:20040114T153045.25Z
Where only a single time point is given, this is inter-
preted to relate to the time interval covered from that time
point onward until the end of the stream.
The same preprocessing as described in the previous
subsection is required for these specifications.
4.2 CMML
The Continuous Media Markup Language [23] has been de-
signed to cater for two different yet related uses:
Authoring of clip structures, anchor points, annotations,
and hyperlinks for time-continuous data in preparation for
integration with the data in an Annodex resource.
Indexing and crawling of time-continuous Web resources
for search engines to perform retrieval on a textual repre-
sentation of the binary Annodex resources.
CMML is an XML-based language to describe the con-
tent of a time-continuous Web resource. It is an authoring
language for annotating, indexing, and hyperlinking time-
continuous data in Annodex format. A CMML file contains
structured XML markup where we have chosen the XML
tags to be very similar to XHTML to enable a simple trans-
fer of knowledge for HTML authors.
CMML documents consist of three main types of tags:
at most one stream tag, exactly one head tag,andanarbi-
trary number of clip tags. The stream tag is optional and
describes the input bitstream(s) necessary for the creation of
548 S. Pfeiffer et al.
Fig. 1 Extract of a CMML file with stream, head, and clip tags
an Annodex resource. The head tag contains information re-
lated to the complete time-continuous resource. A clip tag,
in contrast, contains information on a temporal fragment of
the data.
Figure 1shows an example of a CMML document. It de-
scribes two clips for an MPEG video about the “Research
Hunter.” The media file is referred to in the stream tag
as “index.mpg,” and the title in the head tag provides the
subject, while details about the clips are found in the clip
tags. This is an actual example from our test collection at
http://media.annodex.net/sciweb/.
The XML markup of a head tag in the CMML document
contains information on the complete time-continuous Web
resource. Its essential information contains:
Structured descriptions in meta tags and
Unstructured textual annotations in the title tag.
The XML markup of a clip tag contains information on
a fragment of time-continuous data:
Anchor points (i.e., the id attribute) to provide locations
inside a time-continuous resource that a URI can refer
to. Anchor points identify the start of a clip. This enables
URIs to refer to clips in CMML or Annodex resources via
fragment specifications as described in Sect. 4.1.
Structured textual annotations in the meta tags in the same
way as the head tag.
Unstructured textual annotations in the desc tags. Un-
structured annotations are free text and mainly relevant
for search applications.
An optional keyframe in the img tag providing a represen-
tative image for the clip and enabling display of a table of
contents for Annodex resources.
Outgoing URI links in the atag of the clip can point to
any other place a URI can point to, such as clips in other
Annodex resources or anchors in HTML pages.
4.3 Annodex format
Annodex is the format in which media with interspersed
CMML markup is exchanged. Analogous to a normal Web
server offering a collection of HTML pages to clients, an
Annodex server offers a collection of Annodex resources.
After a Web client has issued a URI request for an Annodex
resource, the Web server delivers the Annodex resource, or
an appropriate subpart of it according to the URI query pa-
rameters.
Annodex resources conceptually consist of one or more
media streams and one CMML annotation stream, inter-
leaved in a temporally synchronized way. The annotation
stream may contain several sets of clips that provide alter-
native markup tracks for the Annodex resource. This is im-
plemented in CMML through a track attribute of the clip tag.
The media streams may be complementary, such as an audio
track with a video track, or alternative, such as two speech
tracks in different languages. Figure 2shows a conceptual
representation of an example Annodex resource with three
media tracks (light colored bars) and two annotation tracks
(darker clips) with a header describing the complete resource
(dark bar at the start).
The Annodex format enables encapsulation of any type
of streamable time-continuous data and is thus independent
Fig. 2 Conceptually represented example Annodex file, time increas-
ing from left to right
The continuous media web 549
Fig. 3 Merging of the frames of several media bitstreams with a structured CMML file into an Annodexed bitstream
of a media compression format. It is basically a bitstream
consisting of continuous media data interspersed with the
structured XML markup of the CMML file. This is per-
formed by merging the clip tags time-synchronously with
the time-continuous bitstreams on authoring an Annodex
bitstream. The clip tags are regarded as state changes in this
respect and are valid from the time that they appear in the
bitstream until another clip tag replaces them. If there is no
clip that directly replaces a previous one, an empty clip tag is
inserted that simply marks the end of the previous clip tag.
Thus, Annodex is designed to be used as both a persistent
file format and a streaming format.
Figure 3shows an example of the creation of a bitstream
of an Annodexed media resource. Conceptually, the me-
dia bitstreams and the annotation bitstreams share a com-
mon timeline. When encapsulated into one binary bitstream,
these data have to be flattened (serialized). CMML is de-
signed for serialization through multiplexing. The figure
shows roughly how this is performed.
There are several advantages to having an integrated bit-
stream that includes the annotations in a time-synchronous
manner with the media data. Firstly, all the information re-
quired is contained within one resource that can be dis-
tributed more easily. Also, many synchronization problems
that occur with other media markup formats such as SMIL
[31] are inherently solved. Also, when extracting temporal
intervals from the resource for reuse, the metainformation is
included in the media data, which enables one, e.g., to retain
the copyright information of a clip over the whole lifetime
of a reused clip. Last but not least, having a flat integrated
format solves the problem of making the Annodex resource
streamable.
To perform the encapsulation, a specific bitstream for-
mat was required. As stated, an Annodex bitstream consists
of XML markup in the annotation bitstream interleaved with
the related media frames of the media bitstreams into a sin-
gle bitstream. It is not possible to use straight XML as en-
capsulation because XML cannot enclose binary data unless
encoded as Unicode, which would introduce too much over-
head. Therefore, an encapsulation format that could handle
binary bitstreams and textual frames was required.
The following list gives a summary of the requirements
for the Annodex format bitstream:
Framing for binary time-continuous data and XML.
Temporal synchronization between XML and time-con-
tinuous media bitstreams.
Temporal resynchronization after parsing error.
Detection of corruption.
Seeking landmarks for direct random access.
Streaming capability (i.e., the information required to
parse and decode a bitstream part is available at the time
at which the bitstream part is reached and does not come,
e.g., at the end of the stream).
Small overhead.
Simple interleaving format with a track paradigm.
We selected Xiph.Org’s [36] Ogg encapsulation format
version 0 [20] as the encapsulation format for Annodex bit-
streams as it meets all the requirements, has proven reliable
and stable, and is an open IETF (Internet Engineering Task
Force, http://www.ietf.org/) standard [20]. Hier-
archical formats like MPEG-4 or QuickTime were deemed
less suitable as they are hierarchical file formats and there-
fore could not easily provide for streamable, time-accurate
interleaving of multiple media and annotation tracks.
5 Authoring
To author Annodexed media, we must distinguish between
files and live streams. The advantage of the former is that a
file can be uploaded from the computer’s file system and an-
notated in a conventional authoring application. In contrast,
550 S. Pfeiffer et al.
Fig. 4 Network view of the Continuous Media Web architecture
the markup of a live Internet stream by its very nature has to
be done on the fly.
Annodex media files may be created in a traditional au-
thoring application (e.g., iMovie or Adobe Premiere may
easily support Annodex in the future) or through the use
of CMML transcoded from metainformation collected in
databases. The authoring application should support the cre-
ation of:
Structured and unstructured annotations,
Keyframe references,
Anchor points, and
URI links for media clips.
Live Annodexed media streams must be created by
merging clip tags with the live digital media stream. A
merger application, similar to that described in Fig. 3,in-
serts clip tags into the live stream at any point in time under
the control of a user, e.g., by selecting a previously prepared
clip tag from a list.
It is expected that extending existing graphical video
editing applications such as Apple’s iMovie or Adobe’s Pre-
miere to author Annodex will be a simple task. Most already
provide for specific markup to be attached to fragments of a
video (sometimes also called chapters or regions), thus ex-
tending the set of metainformation to cover keyframes, an-
notations, and hyperlinks should be fairly straightforward.
The availability of such tools will support the uptake of
Annodex for the average Web user, though computer spe-
cialists can already author Annodex by using anxenc and
CMML.
6 Distribution
The distribution of Annodex resources over the Internet is
based on URIs, similar to the distribution of HTML pages
for the World Wide Web. Annodex resources are basically
accessible via any of the protocols currently used to trans-
port media formats, e.g., RTP/RTSP [26] or HTTP [8]. The
use of Annodex over RTP requires the definition of a pay-
load format for Annodex, which is future work. So far we
have been using Annodex only over HTTP.
The basic process for the distribution and delivery of
an Annodex resource is the following. A client dispatches
a download or streaming request to the server with the
specification of a certain URI. The server resolves the
URI and starts packetizing an Annodexed media document
from the requested clip or time, issuing a head tag at the
start.
As an alternative to streaming/downloading Annodexed
media from a URI, we also envisage that different applica-
tions may prefer to retrieve only either the continuous media
data or the CMML transcription. Examples are browsers that
cannot handle the XML markup, and information collection
applications such as search engines that do not require the
media data, but just its textual representation in CMML. This
is possible via the content-type flag in an HTTP client
request (Fig. 4).
7 Information retrieval
For exploiting the rich metainformation provided by Ann-
odex resources about media, a special media player or brow-
ser plugin is necessary. Such an application has to split an
Annodex resource into its constituent header and clip tags,
and the media data (the reverse of the process specified in
Fig. 3). A decoder is required for the given media encoding
format to display the underlying media data. While playing
back the media data, the application should display the hy-
perlinks and the annotations for the active clip. If the dis-
played media data is a file and not a live stream, it is even
possible to display a table of contents extracted from the an-
notations of the file and browse through the file based on
that. The hyperlinks allow the user to freely link back and
forth between Annodexed media clips, HTML pages, and
other Web resources. This is transparent to the user, i.e.,
the user “surfs” the Annodex media resources in the way
The continuous media web 551
Fig. 5 Screenshot of a CMWeb browser
familiar from browsing the Web, because Annodex media
seamlessly integrate with the existing Web.
Search engines can include Annodex media resources
into their search repertoire effortlessly because CMML is
very similar to HTML and thus implementation of pars-
ing support for CMML is straightforward. Indexing is
performed on a per-clip basis. A search engine finds an-
notations in the clip tags in the desc and metatags, inde-
pendent of the encoding format of the media data encap-
sulated in Annodex. For crawling Web resources, the search
engine uses the hyperlinks given in the aelement in the clip
tag. Thus both indexing and crawling are supported easily
with Annodex. In addition, the HTTP protocol allows one
to download only the CMML markup of a published Ann-
odexed media resource by setting in the HTTP request the
Accept header with a higher priority on the media type
text/x-cmml than on application/x-annodex.A
conformant Annodex server will then only provide the ex-
tracted CMML content for the given Annodex resource. This
prevents crawlers from creating extensive network loads.
It also reduces the size of search archives, even for large
amounts of published Annodex resources, because a CMML
resource contains all searchable annotations for the media
clips of its Annodex resource.
8 Experimental results
To accumulate experience with the three developed tech-
nologies, we implemented several applications. The core ap-
plications required are a CMWeb server, a CMWeb browser,
and a means to author Annodex resources. Then, a set of
Annodexed audio and video content can be created and
experiments on CMWeb browsing undertaken. A CMWeb
search engine is also required to allow for information re-
trieval to occur. We implemented all of these core applica-
tions. In this section, we report on our experiences with these
applications.
ACMWeb server is best implemented by extending an
existing Web server with some Annodex-specific function-
ality. We implemented one by extending the Apache Web
server [13] with a module that controls the distribution of
Annodex content. It has in particular the following function-
ality:
1. Parses the HTTP Accept header to return the correct
content type (either Annodex or CMML content).
2. Parses the URI of the HTTP request to identify a potential
query component and return the requested subpart of the
Annodex resource.
3. Checks if the requested Annodex resource exists on the
server or has to be created on the fly from a CMML file
by merging with the media stream(s) referred to in the
stream tag.
As we implemented most of the required author-
ing functionality in libraries, the Apache Annodex mod-
ule mod_annodex has turned out to be a fairly
small piece of code with only about 500 lines of
code. It is being used on a production Apache server
(http://media.annodex.net/) that serves Annodex
or CMML content reliably.
ACMWeb browser is a CMWeb client that can display
Annodexed audio and video streams coming from a CMWeb
server and provides a rich functionality for interaction with
the content as enabled through the CMML tags. The use of
plugins for existing Web browsers is our ultimate goal such
that we can enable a full integration of Annodex content with
existing Web content and retain a common browsing his-
tory. Our first prototype implementation is, however, a stand-
alone CMWeb browser, which can be used as an external
552 S. Pfeiffer et al.
helper application to existing Web browsers. Figure 5shows
a screenshot of this prototype.
The various features of this CMWeb browser include:
1. Media player: Display and transport control of the video
or audio.
2. Browser: History of the browser (i.e., back and forward),
reload, and stop buttons as well as URI of the resource on
display.
3. Table of contents: List of the clips making up the current
Annodex resource, including representative keyframes,
timestamps, and short descriptions. id tags in CMML clips
allow for this list of clips to be created and to be pointed to
directly. The keyframes come from img tags and the text
from title tags of CMML clips.
4. Annotation: Additional free-text information for the cur-
rent clip stored in the desc tag in CMML clips.
5. Metadata: Additional structured text information for the
current clip stored in the metatags of CMML clips.
6. Hyperlink: Attached to the current clip, a clip-dependent
(i.e., time-sensitive) hyperlink points to other Web re-
sources including other Annodex resources. It comes from
the atag of the CMML clip.
The browsing interface of this CMWeb browser has
proven a very successful design. Once the concepts of
browsing webs of audio and video are understood, people
find it intuitive to use the CMWeb browser to follow hyper-
links, go back and forward in history, get an overview of
the current resource through the table of contents, and go
directly to clips of interest.
To allow for authoring of CMWeb content,weim-
plemented two means of creating Annodex content from
CMML files: a standalone command-line Annodex encod-
ing tool anxenc and a dynamic on-the-fly authoring capabil-
ity of the Annodex Apache module mod_annodex.Both
use the same libraries for parsing of CMML, audio, and
video files and merging them into an Annodex bitstream.
CMML files have so far been created manually using com-
mon text editors. There is potential to automate some of the
markup creation task through, e.g., automated speech recog-
nition and automated audiovisual content analysis tools or
through transcoding of existing meta information to CMML.
Several CMWeb content sites have been created to
demonstrate the CMWeb concepts, experiment with the us-
ability of the technology, and prove that it provides the nec-
essary data to allow a Web search engine to retrieve relevant
clips for a query. In particular we have created content sites
with:
Interviews with scientists and video reports on results of
scientific research from the CSIRO (Commonwealth Sci-
entific and Industrial Research Organization). This site
contains 29 audio and video files with 70 clips, and the
CMML content has been created manually.
News content from the Australian Broadcasting Corpo-
ration. This site contains 6 files with 41 clips, and the
CMML content has been created mostly automatically
through an xsl transform script [30] from metainforma-
tion available in NewsML [10].
Movie content from the Australian Associate Press. This
site contains 6 +23 video files, and the CMML content
has been created manually.
financial news content from the Australian Associate
Press. This site contains 5 Annodex video files, and again
the CMML content has been created manually.
In summary, more than 69 Annodex video and audio files
have been created as trial material covering more than 3
hours material. While the manual creation of CMML files
for these is a labor-intensive process, we have been able to
prove that automation is possible, especially where preexist-
ing structured metainformation is available that can be trans-
formed with simple text-parsing programs.
The information retrieval experience that the CMWeb
creates is only complete with a search application. We
therefore extended an existing search engine with the ability
to index and crawl Annodex and CMML files. Figure 6
shows a screenshot of an example query result on the
science content site.
The tested Annodex search engine is CSIRO’s
Panoptic search engine (see http://www.
panopticsearch.com/), an ordinary Web search
engine extended with the ability to crawl and index the
CMML markup. As the search engine already allowed
for indexing and crawling of XML resources, extension
to CMML was straightforward to allow it to search and
retrieve CMWeb content.
The search engine retrieves clips that are relevant to
the user’s query using its existing algorithms and presents
ranked search results based on the relevance of the markup
of the clips. A keyframe for the retrieved clips and their
description are displayed. Display of such a thumbnail has
been identified in [12] as an important feature required
for good media search and retrieval. As CMML includes
keyframes into clip tags through a URI reference, it is trivial
to include them also into a query results page in HTML
format.
In summary, the implemented extensions to existing Web
applications show that our proposed approach provides a
simple and effective means to extend the Web to time-con-
tinuous Web resources and make them first-class citizens.
Experimental content sites prove that searching and brows-
ing are enabled for both audio and video content. Initial in-
tegration with existing Web infrastructure has been achieved
by extending an existing Web server and an existing Web
search engine. A better integration of Annodex with exist-
ing Web browsers through browser plugins is the next logi-
cal step to improve integration.
9 Related standards
One may expect that the many existing standardization
efforts in multimedia would already allow for the creation
of a Continuous Media Web as we are proposing. However,
this is not the case. The three most commonly pointed out
The continuous media web 553
Fig. 6 Screenshot of a CMWeb search engine interface
standards that relate to our work are SMIL, MPEG-7, and
MPEG-21.
9.1 SMIL
According to http://www.w3.org/TR/smil20/,the
W3C’s “Synchronized Multimedia Interaction Language,
SMIL 2.0 [31], has the following two design goals:
Define an XML-based language that allows authors
to write interactive multimedia presentations. Using
SMIL 2.0, an author can describe the temporal behavior
of a multimedia presentation, associate hyperlinks with
media objects, and describe the layout of the presentation
on a screen.
Allow reuse of SMIL syntax and semantics in other XML-
based languages, in particular those that need to represent
timing and synchronization. For example, SMIL 2.0 com-
ponents are used for integrating timing into XHTML [31]
and into SVG (Scalable Vector Graphics).
SMIL focuses on the authoring and presentation of interac-
tive multimedia presentations composed of multiple media
formats, encompassing animations, audio, video, images,
streaming text, and text. A SMIL document describes the
sequence of media documents to play back, including con-
ditional playback, loops, and automatically activated hyper-
links. SMIL has outgoing hyperlinks and elements that can
be addressed inside it using XPath [35] and XPointer [32].
Features of SMIL cover the following modules:
554 S. Pfeiffer et al.
1. Animation provides for incorporating animations on a
timeline.
2. Content control provides for runtime content choices and
prefetch delivery.
3. Layout allows positioning of media elements on the vi-
sual rendering surface and control of audio volume.
4. Linking allows navigations through the SMIL presenta-
tion that can be triggered by user interaction or other
triggering events. SMIL 2.0 provides only for inline link
elements.
5. Media objects describes media objects that come in the
form of hyperlinks to animations, audio, video, images,
streaming text, or text. Restrictions of continuous media
objects to temporal subparts (clippings) are possible, and
short and long descriptions may be attached to a media
object.
6. Metainformation allows description of SMIL documents
and attachment of RDF metadata to any part of the SMIL
document.
7. Structure structures a SMIL document into a head and a
body part, where the head part contains information that
is not related to the temporal behavior of the presentation
and the body tag acts as a root for the timing tree.
8. Timing and synchronization provides for different chore-
ographing of multimedia content through timing and
synchronization commands.
9. Time manipulation allows manipulation of the time be-
havior of a presentation, such as control of the speed or
rate of time for an element.
10. Transitions provides for transitions such as fades and
wipes.
11. Scalability provides for the definition of profiles of
SMIL modules (1–10) that meet the needs of a specific
class of client devices.
So, with all these capabilities, does SMIL allow for the cre-
ation of webs of time-continuous resources?
SMIL is defined for creating interactive multimedia pre-
sentations – thus a SMIL document does not in the strict
sense defined above represent a time-continuous Web re-
source as it does not actually contain time-continuous data
– it is just an XML document. A combination of the SMIL
XML file plus its externally referenced files of multiple me-
dia types also does not create a time-continuous Web re-
source as these files all have different timelines that they
run toward. Every use of a SMIL file can result in a differ-
ent experience and therefore a SMIL document is not a sin-
gle, temporally addressable time-continuous data stream that
compares directly to the Annodex format. Consequences of
this issue on the possibilities of retrieval of a consistent pre-
sentation fragment of interest have been discussed in [6].
So, while SMIL does not compare to the Annodex for-
mat, could it be used to do the work of CMML?
SMIL is an XML-based authoring format for multimedia
presentations. In that respect, it is far more advanced than
CMML as it has many features to allow for diverse media
to be composed together. The stream tag in CMML provides
a very basic means of authoring multitrack media streams,
but that’s where the presentation authoring capabilities of
CMML end. However, this is also a strength of CMML: it is
not restricted to authoring presentations with audio or video,
but it extends to any time-continuous data type, such as the
above-mentioned acidity measurements, while SMIL is re-
stricted to animations, audio, video, and streaming text as
time-continuous data types.
In addition, CMML is designed to be serialized and in-
terleaved into a binary bitstream format, where clips of time
retain their metainformation together with their media con-
tent. SMIL, on the other hand, is a general XML file with
a hierarchical structure that cannot generally be serialized.
Even if it were possible to define an Annodex-specific sub-
set of SMIL to support the authoring and markup creation for
an Annodex resource, the serialization would be highly com-
plex and infeasible for on-the-fly compositions. An example
of such a SMIL file that contains RDF and externally refer-
enced metadata, a decomposition into several clips (termed
clippings in SMIL), and some anchors to the clips is given
in Fig. 7.
The final point to look at with respect to SMIL is the hy-
perlinking functionality. As SMIL is an XML-based markup
language, XPointer [32]andXPath[35] provide all the hy-
perlinking functionality to link to specific named tags or
named time points. There is no general mechanism to ad-
dress arbitrary time offsets through URI hyperlinks, though.
As those do not actually exist since a SMIL presentation’s
timeline is defined by a user’s interaction, this would not
make sense. What is more important than the XML side
of hyperlinking is, however, the hyperlinking into the time-
continuous data themselves. As such a consistent binary file
format does not exist within the SMIL-based technology, hy-
perlinking between time-continuous Web resources is not
supported by SMIL. The simple extraction of subparts of
media based on the specification of a temporal interval is a
multistage complex process with SMIL that discourages its
use for dynamic media fragment reuse.
SMIL 2.0 is a very powerful standard for creating multi-
media presentations. It is not a standard for enabling search-
able and crawlable webs of time-continuous resources. It in-
tegrates with the CMWeb where a recording of a single inter-
active session of a user with a SMIL presentation would be
used as the basis for annotating and indexing. Such a record-
ing would result in a time-continuous data stream that could
be referenced in a CMML stream tag and could thus be ann-
odexed and integrated into the CMWeb. If the recording is
done intelligently, existing SMIL metainformation and hy-
perlinks can find their way into Annodex and thus create a
rich resource.
9.2 MPEG-21
The ISO/MPEG MPEG-21 [5] standard is building an open
framework for multimedia delivery and consumption. It thus
focuses on addressing how to generically describe a set
of content documents (called a “digital item”) that belong
together from a semantic point of view, including all the
The continuous media web 555
Fig. 7 SMIL file that is analogous to a CMML file
information necessary to provide services on these digital
items.
As an example, consider a music CD album. When it
is turned into a “digital item,” the album is described in an
XML document that contains references to the cover im-
age, the text on the CD cover, the text on an accompany-
ing brochure, references to a set of audio files that contain
the songs on the CD, ratings of the album, rights associated
with the album, information on the different encoding for-
mats in which the music can be retrieved, different bitrates
that can be supported when downloading, etc. This descrip-
tion supports everything that you would want to do with a
digital CD album: it allows you to manage it as an entity, de-
scribe it with metadata, exchange it with others, and collect
it as an entity.
An MPEG-21 document thus does not typically describe
just one media document, but rather several and additionally
describes other types of content such as images or graphs.
An MPEG-21 document is an XML document that describes
how documents in a set relate to each other, how they should
be distributed and managed.
As with SMIL, an MPEG-21 document is an XML doc-
ument with hyperlinks to the content it describes. There-
fore, an MPEG-21 document is also not a time-continuous
document as defined above, excluding it from enabling
deep hyperlinked, searchable, and browsable webs of time-
continuous resources in a nonstaged fashion.
In contrast to being interested in the deep structure of
single time-continuous Web resources as is the aim of Ann-
odex, MPEG-21 addresses how collections of documents
are handled. It can therefore be used to describe collections
of Web resources that belong together for a semantic rea-
son, such as the different songs, the cover image, and re-
lated metainformation of the CD album described above.
MPEG-21 can therefore be used to describe, for example,
a Web of Annodex, JPEG, and HTML resources that repre-
sent the CD.
Interestingly, MPEG-21 has come up against the prob-
lem of how to address content inside an MPEG-21 digi-
tal item through hyperlinks. This is required as metainfor-
mation given in an MPEG-21 file can, for example, relate
to a segment of an MPEG-21 content file. Here, Annodex
resources and the proposed temporal hyperlinks provide a
solution for temporal addressing of time-continuous data.
As MPEG-21 requires a more general solution to allow, for
example, addressing of spatial regions or spatiotemporally
moving regions, a different addressing scheme is being dis-
cussed within MPEG-21 that will require a two-stage pro-
cess where first the resource is resolved and then the frag-
ment addressing is performed depending on the capabilities
of the addressed resource.
The aims of MPEG-21 are orthogonal to the aims that
we pursue. MPEG-21 does not allow the creation of a web
of time-continuous resources; however, it does provide for
the distribution, management, and adaptation of sets of Web
resources that are related to each other. It covers the big-
ger picture in which time-continuous Web resources play an
556 S. Pfeiffer et al.
equal part to all other resources, but it does not allow them to
become equal. The Annodex format is therefore a prime can-
didate to become part of the formats that MPEG-21 digital
items can hold to allow time-continuous Web resources to
be handled on equal terms.
9.3 MPEG-7
The ISO/MPEG MPEG-7 [15] standard is an open frame-
work for describing multimedia content. It provides a large
set of description schemes to create markup in XML format.
MPEG-7’s markup is not restricted to textual information
only – in fact it is tailored to allow for the description of
audiovisual content with low-level image and audio features
as extracted through signal processing methods. It also has
basically no resemblance to HTML as it was not built with a
particular focus on Web applications.
An MPEG-7 document is an XML file that contains any
sort of metainformation related to a media document. The
MPEG-7 committee has developed a large collection of de-
scription schemes that cover the following aspects:
1. Content management: Creation and production metain-
formation, media coding, storage and file format infor-
mation, and content usage information.
2. Content description: Description of structure and seman-
tics of media content. The structural tools describe the
structure of the content in terms of video segments,
frames, still and moving regions, and audio segments.
The semantic tools describe the objects, events, and no-
tions from the real world that are captured by the AV con-
tent.
3. Content navigation and access: Descriptions that facil-
itate browsing and retrieval of audiovisual content by
defining summaries, partitions and decompositions, and
variations of the audiovisual material.
4. Content organization: Descriptions for organizing and
modeling collections of audiovisual content and of de-
scriptions.
5. User interaction: Descriptions of user preferences and us-
age history pertaining to the consumption of the media
material.
As with SMIL and MPEG-21, an MPEG-7 document
is an XML document with references to the content it de-
scribes. Hyperlinking into time-continuous resources with
MPEG-7 is a two-stage process where in a first stage linking
to descriptions of the requested fragment of an MPEG-7 de-
scription of a media document is performed and from there,
with the specification of a temporal offset, it is possible to
perform the file offset function on the content. Obviously, an
MPEG-7 document is also not a time-continuous document
as defined above, excluding it from enabling deep hyper-
linked, searchable, and browsable webs of time-continuous
resources in a nonstaged fashion.
A representation of the content of a CMML file in the
wayMPEG-7wouldrepresentitisshowninFigs.8and 9.
Fig. 8 MPEG-7 file that is analogous to a CMML file (Part 1)
A more extensive example description of a similar markup
was published by the Harmony Project (see [9]).
While MPEG-7 can be used to create markup similar
to the one possible in CMML, it is a difficult task to iden-
tify the required tags from the several thousands available
through MPEG-7. It would be necessary to define a pro-
file of MPEG-7. Specifications in this profile would need
to be serialized and interleaved into a binary bitstream for-
mat to achieve searchable and surfable time-continuous Web
resources such as Annodex. As is the case with SMIL,
MPEG-7 is not designed for this serialization but is a gen-
eral XML file with a hierarchical structure. Therefore, the
serialization can be expected to be complex and infeasible
for on-the-fly compositions.
MPEG-7 is a powerful collection of description tools
for multimedia content covering all sorts of metainforma-
tion necessary with media content with no particular view
on networking-related issues and scalability over the Web.
However, MPEG-7 does not enable the creation of webs
of media, nor of webs of time-continuous data. MPEG-7
has a rich set of description schemes, and the creation of
metaschemes for CMML will require the specification of
such collections of sensible metatags. A direct integration
of MPEG-7 markup with the CMWeb may be through ref-
erencing of richly marked-up MPEG-7 files from inside an
Annodex format bitstream. Also, Annodex files can be used
as the basis for metainformation created in the MPEG-7
format.
The continuous media web 557
Fig. 9 MPEG-7 file that is analogous to a CMML file (Part 2)
10 Summary and outlook
This article presented the Annodex technology, which ex-
tends the familiar searching and surfing capabilities of the
World Wide Web to time-continuous data. This implies that
media data are not handled as atomic at the file level, but
conceptually broken up into clips and thus made deeply ad-
dressable. At the core of the technology are the Continu-
ous Media Markup Language CMML, the Annodex inter-
change format, and clip- and time-referencing URI hyper-
links. These enable the extensions of the Web to a Continu-
ous Media Web with Annodex browsers, Annodex servers,
Annodex content, and Annodex search engines. These ap-
plications together extend the Web to become a powerful
distributed information retrieval system for multiple media
types where different media types are handled equally.
Each one of the discussed existing multimedia standards
addresses a related but distinct issue related to our work.
Combined, all these standards, including the CMWeb, form
a solid basis for a more integrated and interoperable use
of distributed media on the Internet. SMIL allows author-
ing of highly interactive multimedia content, MPEG-7 al-
lows for the creation of rich XML annotations for media
content, Annodex allows for the exchange of enriched clips
of media content in a standardized format and also allows
search engines to crawl and index time-continuous data, and
MPEG-21 allows for the management of collections of doc-
uments of multiple media types.
Another interesting standard to look at in this context is
the Semantic Web (http://www.w3.org/2001/sw/)
with RDF (the Resource Description Framework [34]) at its
core. RDF is a general-purpose language for representing
informationontheWeb.ItisbasedonXMLandURIstoat-
tach information to content. In this context, Annodex breaks
up files or streams of time-continuous data into smaller seg-
ments, enabling RDF to attach information to clips of con-
tent. This relationship must be explored further in future
work.
Having published stable specifications of CMML, Ann-
odex, and the temporal URI formats, we are now building
plugins for common Web browsers and researching search
algorithms that better exploit the Annodex and CMML for-
mats and provide for high-quality search results in media
information retrieval. The combination with other types of
Web resources and the presentation of search results are par-
ticular facets of this research. We are also investigating how
to exploit the features of Annodex to create more-powerful
user interfaces for multimedia applications. These can in
particular support interactive information exploration tasks
and improve the CMWeb browsing experience. Intensive
automated audiovisual content analysis research in recent
years has created a solid foundation to explore new ways
of automating CMML authoring, which is another research
topic that we are interested in and that will support the up-
take of Annodex.
Acknowledgements The authors greatly acknowledge the comments,
contributions, and proofreading of Claudia Schremmer, who is mak-
ing use of the Continuous Media Web technology in her research on
metainformation extraction of meeting recordings. The authors are also
grateful to the reviewers for their suggestions for improvements,which
have contributed to improve the article.
References
1. Annodex.net: Open Standards for Annotating and Indexing Net-
worked Media. http://www.annodex.net. [Accessed Nov.
2004]
558 S. Pfeiffer et al.
2. Baeza-Yates, R., Baeza-Yates, R., Ribeiro-Neto, B.: Modern In-
formation Retrieval, 1st edn. Addison-Wesley, Boston (1999)
3. Berners-Lee, T., Fielding, R., Irvine, U., Masinter, L.:
Uniform Resource Identifiers (URI): Generic Syntax.
http://www.ietf.org/rfc/rfc2396.txt (1998)
[Accessed Nov. 2004]
4. Berners-Lee, T., Fischetti, M., Dertouzos, M.: Weaving the Web:
The Original Design and Ultimate Destiny of the World Wide Web
by its Inventor. Harper, San Francisco (1999)
5. Burnett, I., de Walle, R.V., Hill, K., Bormans, J., Pereira, F.:
MPEG-21: Goals and Achievements. IEEE Multimedia J. 10(4),
60–70 (2003)
6. Celentano, A., Gaggi, O.: Querying and browsing multimedia
presentations. In: Tucci, M. (ed.) MDIC 2001, 2nd International
Workshop on Multimedia Databases and Image Communica-
tion. Lecture notes in computer science, vol. 2184, pp. 105–116.
Springer, Berlin Heidelberg New York (2001)
7. Dublin Core Metadata Initiative: Dublin Core Metadata Element
Set, Version 1.1. http://dublincore.org/documents/
2003/02/04/dces (2003) [Accessed Nov. 2004]
8. Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter,
L., Leach, P., Berners-Lee, T.: Hypertext Transfer Protocol–
HTTP/1.1.http://www.ietf.org/rfc/rfc2616.txt
(June 1999) [Accessed Nov. 2004]
9. Hunter, J.: An application profile which combines
DC and MPEG-7 for simple video description. ViDE
Video Access Group. http://www.metadata.net/
harmony/video_appln_ profile.html [Accessed Feb.
2002]
10. International Press Telecommunications Council: Specifica-
tion NewsML 1.2. http://www.newsml.org/pages/
spec main.php (2003) [Accessed Nov. 2004]
11. Jain, R., Hampapur, A.: Metadata in video databases. SIGMOD
Rec. 23(4), 27–33 (1994)
12. Jansen, B., Goodrum, A., Spink, A.: Searching for multimedia:
video, audio, and image Web queries. World Wide Web J. 3(4),
249–254 (2000)
13. Laurie, B., Laurie, P.: Apache: the Definitive Guide, 1st edn.
O’Reilly, Sebastopol, CA (1999)
14. Little, T., Ghafoor, A.: Interval-based conceptual models for time-
dependent multimedia data. Knowl. Data Eng. 5(4), 551–563
(1993)
15. Martinez, J., Koenen, R., Pereira, F.: MPEG-7: the Generic Mul-
timedia Content Description Standard. IEEE Multimedia J. 9(2),
78–87 (2002)
16. Mulhem, P., Martin, H.: From database to Web multimedia docu-
ments. Multimedia Tools Appl. J. 20(3), 263–282 (2003)
17. Naphade, M.R., Huang, T.S.: A probabilistic framework for se-
mantic indexing and retrieval in video. In: IEEE International
Conference on Multimedia and Expo (I), pp. 475–478 (2000)
18. NCSA HTTPd Development Team. The Common
Gateway Interface (CGI). http://hoohoo.ncsa.
uiuc.edu/cgi/ (1995) [Accessed Nov. 2004]
19. Nilsson, M.: ID3 tag version 2.4.0 – Main Structure.
http://www.id3.org/id3v2.4.0-structure.txt
(2000) [Accessed Nov. 2004]
20. Pfeiffer, S.: The Ogg Encapsulation Format Version 0.
http://www.ietf.org/rfc/rfc3533.txt (2003)
21. Pfeiffer, S., Parker, C., Pang, A.: Specifying time inter-
vals in URI queries and fragments of time-based Web
resources (work in progress). http://www.ietf.
org/internet-drafts/draft-pfeiffer-temporal-
fragments-03.txt (2004) [Accessed Dec. 2004]
22. Pfeiffer, S., Parker, C., Pang, A.: The Annodex annota-
tion and indexing format for time-continuous data files,
version 3.0 (work in progress). http://www.ietf.
org/internet-drafts/draft-pfeiffer-annodex-
02.txt (2004) [Accessed Dec. 2004]
23. Pfeiffer, S., Parker, C., Pang, A.: The Continuous Media
Markup Language (CMML), version 3.0 (work in progress).
http://www.ietf.org/internet-drafts/draft-
pfeiffer-cmml-02.txt (2004)
24. Pfeiffer, S., Parker, C., Schremmer, C.: Annodex: a simple ar-
chitecture to enable hyperlinking, search and retrieval of time-
continuous data on the Web. In: Proceedings of the IEEE 5th ACM
SIGMM International Workshop on Multimedia Information Re-
trieval (MIR), pp. 87–93. Berkeley, CA (2003)
25. Rutledge, L., Schmitz, P.: Improving media fragment integration
in emerging Web formats. In: Proceedings of the International
Conference on Multimedia Modeling 2001 (MMM01), pp. 147–
166. CWI, Amsterdam (2001)
26. Schulzrinne, H., Rao, A., Lanphier, R.: Real Time
Streaming Protocol (RTSP). http://www.ietf.org/
rfc/rfc2326.txt (1998) [Accessed Nov. 2004]
27. Smith, J.R., Chang, S.-F.: Image and video search engine for the
World Wide Web. In: Storage and Retrieval for Image and Video
Databases (SPIE), pp. 84–95 (1997)
28. Sullivan, D.: Multimedia Search Engines: Image, Audio
& Video Searching. http://searchenginewatch.
com/links/article.php/2156251 (2003) [Accessed
Nov. 2004]
29. World Wide Web Consortium (W3C): HTML 4.01 Specification.
http://www.w3.org/TR/html4/ (1999) [Accessed Nov.
2004]
30. World Wide Web Consortium (W3C): XSL Trans-
forms (XSLT) Version 1.0. http://www.w3.org/
TR/xslt/ (1999) [Accessed Nov. 2004]
31. World Wide Web Consortium (W3C): Synchro-
nized Multimedia Integration Language (SMIL 2.0).
http://www.w3.org/TR/smil20/ (2001) [Accessed
Nov. 2004]
32. World Wide Web Consortium (W3C): XML Pointer Lan-
guage (XPointer). http://www.w3.org/TR/xptr/
(August 2002) [Accessed Nov. 2004]
33. World Wide Web Consortium (W3C): Extensible Markup
Language (XML) 1.1. http://www.w3.org/TR/xml11/
(2004) [Accessed Nov. 2004]
34. World Wide Web Consortium (W3C): RDF/XML Syntax
Specification. http://www.w3.org/TR/rdf-syntax-
grammar/ (2004) [Accessed Nov. 2004]
35. World Wide Web Consortium (W3C): XML Path Lan-
guage (XPath) 2.0. http://www.w3.org/TR/xpath20/
(2004) [Accessed Nov. 2004]
36. Building a New Era of Open Multimedia: http://www.
xiph.org [Accessed Nov. 2004]
37. Young, S., Foote, J., Jones, G., Sparck, K., Brown, M.: Acoustic
indexing for multimedia retrieval and browsing. In: Proceedings
of the 1997 IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP ’97), April 1997, vol. 1, pp. 199–
202. Munich, Germany (1997)
COPYRIGHT INFORMATION
TITLE: The Continuous Media Web: a distributed multimedia
information retrieval architecture extending the World
Wide Web
SOURCE: Multimedia Syst 10 no66/6 O 20055
The magazine publisher is the copyright holder of this article and it
is reproduced with permission. Further reproduction of this article in
violation of the copyright is prohibited. To contact the publisher:
http://www.springerlink.com/content/1432-1882/
Article
In recent years, blogging has become an exploding passion among Internet communities. By combining the grassroots blogging with the richness of expression available in video, videoblogs (vlogs for short) will be a powerful new media adjunct to our existing televised news sources. Vlogs have gained much attention worldwide, especially with Google's acquisition of YouTube. This article presents a comprehensive survey of videoblogging (vlogging for short) as a new technological trend. We first summarize the technological challenges for vlogging as four key issues that need to be answered. Along with their respective possibilities, we give a review of the currently available techniques and tools supporting vlogging, and envision emerging technological directions for future vlogging. Several multimedia technologies are introduced to empower vlogging technology with better scalability, interactivity, searchability, and accessability, and to potentially reduce the legal, economic, and moral risks of vlogging applications. We also make an in-depth investigation of various vlog mining topics from a research perspective and present several incentive applications such as user-targeted video advertising and collective intelligence gaming. We believe that vlogging and its applications will bring new opportunities and drives to the research in related fields.
Article
Full-text available
Since the year 2000 a project under the name of "Continuous Media Web", CMWeb, has explored how to make video (and incidentally audio) a first class citizen on the Web. The project has led to a set of open specifications and open source implementations, which have been included into the Xiph set of open media technologies. In the spirit of the Web, specifications for a Video Web should be based on unencumbered formats, which is why Xiph was chosen.
Article
Full-text available
In this paper we illustrate the model-driven development approach applied to the user interface of an audiovisual search application, within the European project PHAROS. We show how conceptual modelling can capture the most complex features of an audio-visual Web search portal, which allows users to pose advanced queries over multi-media materials, access results of queries using multi-modal and multi-channel interfaces, and customize the search experience by saving queries of interest in a personal profile, so that they can be exploited for asynchronous notification of new relevant audiovisual information. We show how model-driven development can help the generation of the code for sophisticated Rich Internet Application front-ends, typical of the multimedia portals of the future.
Article
Full-text available
In recent years, blogging has become an exploding passion among Internet communities. By combining the grassroots blogging with the richness of expression available in video, videoblogs (vlogs for short) will be a powerful new media adjunct to our existing televised news sources. Vlogs have gained much attention worldwide, especially with Google's acquisition of YouTube. This article presents a comprehensive survey of videoblogging (vlogging for short) as a new technological trend. We first summarize the technological challenges for vlogging as four key issues that need to be answered. Along with their respective possibilities, we give a review of the currently available techniques and tools supporting vlogging, and envision emerging technological directions for future vlogging. Several multimedia technologies are introduced to empower vlogging technology with better scalability, interactivity, searchability, and accessability, and to potentially reduce the legal, economic, and moral risks of vlogging applications. We also make an in-depth investigation of various vlog mining topics from a research perspective and present several incentive applications such as user-targeted video advertising and collective intelligence gaming. We believe that vlogging and its applications will bring new opportunities and drives to the research in related fields.
Article
Full-text available
In this paper, we provide a brief survey of the mul timedia information retrieval domain as well as introduce s ome ideas investigated in the special issue. We hope that the contributions of this issue provide motivation for readers to dea l with the current challenges and problems. Such contributions are the basis of tomorrow's multimedia information systems. Our aims are to clarify some notions raised by this new technology by reviewing the current capabilities and the potential usefulne ss to users in various areas. The research and development issues cover a wide range of fields, many of which are shared with medi a processing, signal processing, data base technologies and data mining.
Conference Paper
Semantic interpretation of the data distributed over the Internet is subject to major current research activity. The Continuous Media Web (CMWeb) extends the World Wide Web to time-continuously sampled data such as audio and video in regard to the searching, linking, and browsing functionality. The CMWeb technology is based the file format Annodex which streams the media content interspersed with markup in the Continuous Media Markup Language (CMML) format that contains information relevant to the whole media file, e.g., title, author, language as well as time-sensitive information, e.g., topics, speakers, time-sensitive hyperlinks. The CMML markup may be generated manually or automatically. This paper investigates the automatic extraction of meta data and markup information from complex linguistic annotations, which are annotated recordings collected for use in linguistic research. We are particularly interested in annotated recordings of meetings and teleconferences and see automatically generated CMML files and their corresponding Annodex streams as one way of viewing such recordings. The paper presents some experiments with generating Annodex files from hand-annotated meeting recordings.
Article
Full-text available
MPEG-21 is an open standards-based framework for multimedia delivery and consumption. It aims to enable the use of multimedia resources across a wide range of networks and devices. We discuss MPEG-21's parts, achievements, ongoing activities, and opportunities for new technologies.
Conference Paper
Full-text available
Today, Web browsers can interpret an enormous amount of different file types, including time-continuous data. By consuming an audio or video, however, the hyperlinking functionality of the Web is "left behind" since these files are typically unsearchable, thus not indexed by common text-based search engines. Our XML-based CMML annotation format and the Annodex file format presented in this paper are designed to solve this problem of "dark matter" on the Internet: Continuous media files are annotated and indexed (i.e., Annodexed), enabling hyperlinks to and from the media. Furthermore, the hyperlinks do not typically point to an entire media file, but to and from arbitrary fragments or intervals. The standards proposed in to create the Continuous Media Web have been submitted to the IETF for review.
Conference Paper
Full-text available
Querying and searching the Web is an important research field which has drawn a number of concepts from databases and information retrieval fields, but has added its own models, requirements and techniques. Multimedia information adds another dimension to the problem, when information is globally conveyed by different media which are archived and delivered separately, and must be coordinated and synchronized. In this paper we discuss some issues about information retrieval in synchronized multimedia presentations. We introduce a class of multimedia presentations made of independent and synchronized media, and discuss how retrieval requirements can be defined. Then we discuss the need of a model for retrieving and browsing multimedia data in a way capable of integrating atomic media objects in coherent presentations.
Article
Full-text available
This paper deals with the integration of multimedia and database technologies in order to describe web multimedia documents. We present a middleware to seamlessly handle database accesses as well as compositional, spatial and temporal constraints related to data presentation. Our approach is based on the concept of Templates. A template is a logical presentation unit that merge database queries with layout specifications. We choose an XML and SMIL approach to implement template. Template definition and invocation are mapped into a XML DTD. Each template is then translated into a SMIL document. In this paper, we give an example to show the advantages of our approach.
Article
Full-text available
The media components integrated into multimedia presentations are typically entire files. At times the media component desired for integration, either as a navigation destination or as coordinate presentation, is a part of a file, or what we call a fragment. Basic media fragment integration has long been implemented in hypermedia systems, but not to the degree envisioned by hypermedia research. The current emergence of several XML-based formats is beginning to extend the possibilities for media fragment integration on a large scale. This paper presents a set of requirements for media fragment integration, describes how standards currently meet some of these requirements and proposes extensions to these standards for meeting remaining requirements.
Article
This document describes a simple application profile which combines Dublin Core and MPEG-7 elements for generic video description.
Article
El servidor de web Apache es uno de los más utilizados como plataforma para la creación de páginas de internet. En esta obra se describe como obtenerlo, establecer y dar seguridad a un servidor Apache para entornos Unix y Windows.
Conference Paper
This paper proposes a novel probabilistic framework for semantic indexing and retrieval in digital video. The components of the framework are multijects and multinets. Multijects are probabilistic multimedia objects (Naphade et al., 1998) representing semantic features or concepts. A multinet is a probabilistic network of multijects which accounts for the interaction between concepts. The main contribution of this paper is a Bayesian multinet which enhances the detection probability of individual multijects, provides a unified framework for integrating multiple modalities and supports inference of unobservable concepts based on their relation with observable concepts. We develop multijects for detecting sites (locations) in video and integrate the multijects using a multinet in the form of a Bayesian network. Experiments reveal significant performance improvement using the multinet