Video personalization in resource-constrained multimedia environments.
-
Citations (0)
-
Cited In (0)
Page 1
Video Personalization in Resource-Constrained
Multimedia Environments
Yong Wei
Department of Computer Science
The University of Georgia
Athens, GA 30602-7404, USA
yong@cs.uga.edu suchi@cs.uga.edu
Suchendra M. Bhandarkar
Department of Computer Science
The University of Georgia
Athens, GA 30602-7404, USA
Kang Li
Department of Computer Science
The University of Georgia
Athens, GA 30602-7404, USA
kangli@cs.uga.edu
ABSTRACT
Multimedia data, especially video data, is being increasingly
transmitted to, transmitted from and viewed on mobile devices
such as PDA’s, laptop PCs, pocket PCs and cell phones. One of
the natural limitations of these multimedia-capable, mobile
devices is that they are constrained by their battery power
capacity, viewing time limit, amount of data received, and in
many situations, by available network bandwidth connecting
these devices with video servers. The video server is typically
also constrained by its computing power and connection
bandwidth. In order to provide a resource-constrained mobile
client with its desired video content, it is necessary to adapt or
personalize the video content while simultaneously satisfying the
aforementioned constraints. Also, in order to limit the client-
experienced latency, it is necessary to perform client request
aggregation on the server end. To this end, a video
personalization strategy is proposed to provide mobile, resource-
constrained clients with personalized video content that is most
relevant to the client’s request while simultaneously satisfying
multiple client-side system-level resource constraints. A client
request aggregation strategy is also proposed to cluster client
requests with similar video content preferences and similar client-
side resource constraints such that the number of requests the
server needs to process and the client-experienced latency are
both reduced.
The primary contributions of the paper are (1) the formulation and
implementation of a Multiple-choice Multi-dimensional Knapsack
Problem (MMKP)-based video personalization strategy; and (2)
the design and implementation of a multi-stage clustering-based
client request aggregation strategy. Experimental results
comparing the proposed MMKP-based video personalization
strategy to existing 0/1 Knapsack Problem (0/1KP)-based and the
Fractional Knapsack Problem (FKP)-based video personalization
strategies are presented. It is observed that (1) the proposed
MMKP-based personalization strategy includes more relevant
video content in response to the client’s request compared to the
existing 0/1KP-based and FKP-based personalization strategies;
and (2) in contrast to the 0/1KP-based and FKP-based
personalization strategies which can satisfy only a single client-
side constraint at a time, the proposed MMKP-based
personalization strategy is shown to be capable of satisfying
simultaneously multiple client-side
Experimental results comparing the client-experienced latency
with and without the proposed client request aggregation strategy
are also presented. It is shown that the proposed client request
aggregation strategy significantly reduces the mean client-
experienced latency without significant reduction in the average
relevance value of the video content delivered in response to the
client’s request.
resource constraints.
Categories & Subject Descriptors: H.5.1 Multimedia
Information Systems, Video (e.g., tape, disk, DVI)
General Terms: Algorithms
Keywords
Video personalization, Video summarization, Multiple-choice
Multi-dimensional Knapsack Problem, Request aggregation.
1. INTRODUCTION
The current proliferation of mobile computing devices and
networking technologies has created enormous opportunities for
mobile device users to communicate with multimedia servers. As
handheld mobile computing and communication devices such as
personal digital assistants (PDAs), pocket-PCs and cellular
devices have become increasingly capable of storing, rendering
and display of multimedia data, the user demand for being able to
view streaming video on such devices has increased several-fold.
For example, a mobile handheld client may be interested in
viewing a video showing traffic conditions on the road and
browsing the weather forecast for his/her travel destination. One
of the natural limitations of typical handheld mobile devices is
that they are resource constrained, i.e., constrained by their
battery power capacity, viewing time limit and in many situations,
by the available network bandwidth connecting them with video
servers. Thus, the original video content often needs to be
personalized in order to fulfill the client’s request while satisfying
simultaneously various client-side
constraints. Also, in order to limit the client-experienced latency,
it is often necessary to perform client request aggregation on the
server end when dealing with multiple client requests.
system-level resource
In light of the above, a definition of video personalization can be
given as follows:
Definition 1: Given the client’s preferences regarding the video
content, and given the client-side resource constraints, video
personalization is the process of compiling and disseminating the
most relevant video content to the mobile client while satisfying
simultaneously the client-side resource constraints.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
MM’07, September 23–28, 2007, Augsburg, Bavaria, Germany.
Copyright 2007 ACM 978-1-59593-701-8/07/0009...$5.00.
902
Page 2
A client request consists of the client’s preference(s) with regard
to video content and a list of client-side resource constraints. A
client query protocol is established
communication of the query between the client and the server. A
client query, under the currently implemented, protocol is a
structure with two fields: PREFERENCES and CONSTRAINTS.
The PREFERENCES field is a list of strings representing
semantic terms that encapsulate the client’s request for
information whereas the CONSTRAINTS field is a list of
numerical parameters representing the client-side resource
constraints such as the viewing time limit, bandwidth limit and
the limit on the amount of data the client can receive.
to facilitate the
In this paper, we present a client-centered video personalization
system which can optimally fulfill the client’s requests while
simultaneously ensuring optimal utilization of the client-side
system-level resources. While there are many challenges to be
addressed in the design and implementation of a comprehensive
video personalization system, the work presented in this paper
focuses on the design and implementation of video
personalization strategies. In addition, a client request
aggregation strategy is proposed and implemented in order to
cluster multiple client requests with similar video content
preferences and similar client-side resource constraints. The goal
of the proposed client request aggregation strategy is to
simultaneously reduce both, the number of client requests the
server needs to process and the client-experienced latency.
The video personalization problem is modeled as a constrained
optimization problem, i.e., maximization of the “total relevance
value” of the video summary delivered to the client under
multiple constraints that represent the client’s content preferences
and the available system-level resources. Various personalization
strategies based on the classical Knapsack Problem (KP) have
been proposed in the literature. However, existing video
personalization strategies do not consider a multiple-client
scenario. In order to serve multiple clients with acceptable client-
experienced latency while ensuring efficient utilization of server
resources, it is shown to be necessary for the server to aggregate
multiple client requests prior to video content delivery.
In the proposed video personalization scheme, raw videos are
automatically segmented and indexed/labeled in a single pass
using a stochastic modeling approach [1] and summarized offline
at multiple levels of abstraction. Content-aware key frame
selection algorithms and dynamic motion panoramas are used to
generate video summaries. Videos are labeled using semantic
terms selected from an ontology such as WordNet [16]. A client
request to the video server consists of the client’s video content
preferences and a set of client-side resource constraints. The
client-side resource constraints can include video viewing time,
battery power capacity, transmission bandwidth, amount of
received data and expected quality of the received video. The
video personalization module matches the client’s video content
preferences with the indices of the video summaries in the
database, and selects from the retrieved video summaries, a subset
of video segments or summaries at the appropriate levels of
abstraction that best matches the client content preferences while
satisfying simultaneously the various client-side resource
constraints.
In a typical video personalization system, requests are received
from multiple clients. Servicing each request on an individual
basis, entails a high degree of consumption of server-side and
network resources, such as computing time and network
bandwidth and also entails high client-experienced latency on
average. The proposed client request aggregation strategy clusters
similar client requests together such that the number of effective
requests to be processed by the server is reduced. This, in turn,
reduces both the server and network load and the average client-
experienced latency. Since the client requests are heterogeneous
along multiple dimensions, i.e. they differ in terms of their video
content preferences and also in terms of the specified client-side
resource constraints, a multi-stage clustering strategy is proposed
to group similar client requests together.
The remainder of the paper is organized as follows. Section 2
provides a brief review of related work. Section 3 discusses the
computation of the relevance values of the video segments and
video summaries in response to a client request. In Section 4,
various video personalization strategies based on variations of the
classical Knapsack Problem (KP) are discussed and the proposed
MMKP-based video personalization strategy is detailed. Section 5
provides details of the proposed multi-stage client request
aggregation strategy. In Section 6, experimental evaluation results
of the proposed MMKP-based video personalization are
compared with those of existing 0/1KP-based and FKP-based
personalization strategies. Experimental results of the proposed
multi-stage client request aggregation strategy are also provided.
Section 7 concludes the paper with an outline for future work.
2. BRIEF REVIEW OF RELATED WORK
Video summarization is a field of active research in computer
vision and multimedia, and constitutes the first step towards video
personalization.
Definition 2: A video summary is defined as a set of still or
moving image frames which represents the semantic content of a
video segment.
Various innovative key frame selection algorithms have been
proposed in the literature in the context of video summarization.
Doulamis et al. [2] use a content-sampling algorithm to extract a
small set of key frames from a video stream. Kim et al. [3] take
advantage of the objects of interest in the video along with their
actions and the resulting events to generate a video abstraction.
For panning videos with moving object(s) against a static
background, dynamic motion panoramas have been used to
represent both dynamic and static scene elements in a
geometrically consistent manner [4][5].
To facilitate content-based retrieval, video summaries are
typically organized in a hierarchical manner. Jaimes et al. [6]
propose a visual information indexing framework for systematic
representation of image and video data based on syntax and
semantics. In the proposed client-centered video personalization
system, content-aware key frame selection algorithms and
dynamic motion panoramas are used to generate video
summaries.
Various personalization strategies have been proposed in the
literature to generate the optimal response to the client’s request
while satisfying various client-side resource constraints. The
optimal response to the client’s request is defined as a set of video
summaries that is most relevant to the client’s content
preference(s). Merialdo et al. [7] demonstrate that the video
903
Page 3
personalization problem can be modeled as the classical 0/1
Knapsack Problem (0/1KP). Tseng et al. [8],[9] propose a
personalization strategy based on a combination of 0/1KP-based
optimization and context clustering to collect successive similar
shots. Context clustering is shown to be an enhancement of the
scheme proposed in [8] in that it considers the temporal
smoothness of the generated video summary in order to improve
the client’s viewing experience. One of the drawbacks of 0/1KP-
based video personalization strategies is that some of the video
segments which are excluded in the response to the client’s
request may still contain information that is potentially relevant or
of interest to the client. Another drawback of 0/1KP-based
personalization strategies is that they can satisfy only a single
client-side resource constraint, such as the viewing time limit, at a
time.
MMKP-based optimization has been used in the design of an
adaptive multimedia system (AMS) [10]. The admission control
in an AMS, where the clients are required to pay a fee based on
the desired quality of service, is modeled as an MMKP [10]. A
certain quality of service is deemed to consume a predetermined
set of server resources. In order to maximize the net revenue
generated by providing multimedia services to a client population,
the multimedia server admits an optimal set of service requests by
solving an MMKP. Since the MMKP is known to be NP-hard,
heuristic algorithms are proposed to solve the MMKP for real
time applications [10], [11], [12]. It needs to be noted that the
AMS admission control problem is quite distinct from the video
personalization problem discussed in this paper even though both
problems are modeled as an MMKP. First, the objective in the
case of the AMS admission control problem is to maximize the
net revenue generated by the server; whereas that in the case of
video personalization is to maximize the total relevance value of
the video content delivered in the response to a client’s request.
Second, the constraints in the AMS admission control problem are
on the server-end system-level resources; whereas those in the
case of the video personalization problem are on the client-end
system-level resources.
Service request aggregation techniques have been discussed in the
context of multimedia streaming systems [13], [14]. Existing
video personalization strategies published in the literature [7], [8],
[9] address primarily a single-client scenario. Yu et al. [15]
investigate user behavior and access patterns in a large video-on-
demand (VOD) system. They report that the popularity of videos
and user requested session lengths exhibit certain statistical
distributions. The user request arrival pattern can be modeled
using a modified Poisson distribution. Their findings indicate that
multiple clients may request similar video content with similar
viewing limits in a given time duration. In this paper, we therefore
propose a multi-stage clustering-based client request aggregation
strategy with a goal to reduce the server and network load and
simultaneously improve the client-experienced latency.
3. RELEVANCE VALUE COMPUTATION
In Section 3.1, we discuss the computation of the relevance value
of a video segment in response to the client’s content
preference(s). In Section 3.2 we discuss the computation of the
relevance value of a summarized (or transcoded) version of a
video segment based on its relative duration and the relevance
value of the original version of the video segment.
3.1 Relevance Value of a Video Segment
Video segments are indexed using semantic terms derived from
an ontology such as WordNet [16]. In well organized videos, the
video can be viewed as a sequence of semantic units (genres) that
are concatenated based on a predefined video program syntax. In
the proposed system, we define six semantic terms for TV
broadcast news videos, i.e. News Anchor, News, Sports News,
Commercial, Weather Forecast and Program Header, and three
semantic concepts for Major League Soccer (MLS) video, i.e.
Zoom Out, Close Up and Replay. Each video segment is assigned
a relevance value based on the client’s preference with regard to
video content. Assume video segment
iS is indexed by a semantic
term
content using a descriptive term denoted by P which is also
derived from the same ontology. The relevance value
iT . In its request, the client specifies a preference for video
iV assigned
to the video segment
iS is then given by:
10 ),,(
≤≤=
iii
VPT similarityV
(3.0)
In the current implementation the similarity is evaluated using the
lch semantic similarity measurement algorithm [16]. The lch
algorithm measures the length of the shortest path between two
concepts in the hierarchical WordNet lexical database and scales
the value by the maximum is-a path length in the hierarchy. For
example, similarity (“anchor”, “newsreader) = 1.56, and
similarity(“anchor”, “news”) = 0.99.
3.2 Relevance Value of a Video Summary
We now discuss the computation of the relevance value of a video
summary given its relative duration and the relevance value of the
original video segment computed using equation (3.0). Each
indexed video segment is summarized at multiple levels of
abstraction using content-aware key frame selection and motion
panorama computation algorithms. Each video summary consists
of a set of key frames and motion panoramas. If the image frames
are displayed at a fixed frame rate, the higher the level of
abstraction, the shorter the duration of the video summary. This is
so because at a higher level of abstraction, fewer image frames
are included in the video summary. Since the FKP-based and the
MMKP-based video personalization strategies could potentially
include both the original video segments and their summaries, the
relationship between the relevance value of the original video
segment and the relevance value of each of its summaries needs to
be first established.
For each video segment, the original version is assumed to
contain the greatest amount of detail; whereas its summary at the
highest level of abstraction is assumed to contain the least amount
of detail. It is reasonable to assume that the amount of
information contained within a video summary (relative to the
original version) is related to its relative duration, i.e.
)/(
00
LLfvv
iii
⋅=
(3.1)
where
0 iv is the relevance value of the original video segment, and
0 L and
and the video summary respectively.
iL are the time durations of the original video segment
904
Page 4
Typically, the amount of information contained within a video
summary (relative to original version) does not necessarily
increase linearly with its relative duration. In this paper, we
propose to use the empirical
quantify
)/(Lf
i
. For some categories of videos, such as news
broadcast, most of the information is revealed in the first 20%-
30% of the video segment. In a typical news broadcast video, a
news anchor summarizes the news events at the beginning of the
video segment which is then followed by detailed field news. This
observation justifies the use of the Zipf function. The
mathematical definition of the Zipf function is given by:
Zipf’s law [17] to
0 L
sNsk
HHI
,,/
=
(3.2)
where I (expressed as a percentage) is the amount of information
contained within a video summary relative to the original video
segment, N is the set of all possible discrete durations of the
video summary,
Nk∈
is a
Rss
∈> , 0
is the characteristic parameter of the Zipf function
H,is the kth generalized harmonic number [17].
When
0
=
s
, the information content within a video summary
increases linearly (i.e., at a constant rate) with its duration.
video summary’s duration,
and
sk
Equation (3.2) is a definition of the discrete Zipf function. In our
application, the relative (i.e., normalized) duration of a video
summary is a continuous variable in the range [0, 1]. To use the
Zipf function defined in equation (3.2), the following
approximation and linear transform are used. Let
normalized and discrete
00 . 1 , 99 . 0,..., 02. 0 ,01 . 0{
∈
norm
L
following linear transform maps the values of
norm
L
denote the
where
. Then the
video
}
and let
duration
100
=
N
L
norm
to k, i.e.,
)(
NL roundk
norm×=
(3.3)
Figure 1 depicts the relationship between the relative information
content of a video summary versus its normalized discrete
duration under the Zipf’s law-based mapping function. The
relative duration and relative information content of the video
summary are normalized to lie within the range [0, 1] based on
the duration and information content of the original video
segment. In Figure 1, the parameter s takes values 0, 0.5, 1.0 and
1.5.
0.10.2 0.30.4
Relative Video Duration
0.50.60.7 0.80.91.0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Relative Information Content
s = 0
s = 0.5
s = 1.0
s = 1.5
s=0
s=0.5
s=1.0
s=1.5
Figure 1. Relative Information Content of a Video Summary versus
Segment Duration: Zipf Function.
4. PERSONALIZATION STRATEGIES
The objective of video personalization is to present a customized
or personalized video summary that retains as much of the
semantic content desired by the client as possible but within the
system-level resource constraints imposed by the client. The
client typically wants to retrieve and view only the video content
that matches his/her content preferences. In order to generate the
personalized video summary, the client preferences, client usage
environment and client-side system-level resource constraints
need to be considered. The personalization engine needs to select
the optimal set of video contents (i.e., the most relevant set of
video summaries) for the client within the resource constraints
imposed by the client.
This paper presents the design and implementation of an MMKP-
based video personalization strategy to generate a customized
response to the client’s request while satisfying multiple client-
side system-level resource constraints. Compared to the 0/1KP-
based and the FKP-based video personalization strategies
presented in [7], [8] and [9], the proposed MMKP-based video
personalization strategy is shown to include more relevant
information in its response to the client’s request. The MMKP-
based personalization strategy is also shown to be capable of
simultaneously satisfying multiple
constraints, in contrast to the 0/1KP-based and the FKP-based
personalization strategies which can only satisfy a single client-
side resource constraint at a time.
client-side resource
In the video database, each video segment is assigned a relevance
value based on the client’s content preferences, as computed in
equation (3.0). The relevance value of a video summary is
computed using equation (3.1). We propose to use the Zipf’s law
defined in equations (3.2) and (3.3) to quantify
)/(
0 LLf
i
.
Merialdo et al. [7] propose that video personalization be modeled
along the lines of the classical 0/1 Knapsack Problem (0/1KP)
defined by:
∑
∈
i
i
ni
V )( max
2 , 1 {},...,
,
subject to
∑
i
≤
i
TL
(4.1)
where
video viewing time limit and n is the number of candidate video
segments. In the 0/1KP-based video personalization strategy, a
video item is either included in or excluded from the response.
However, some of the video segments which are excluded in the
response may still contain some useful information that is
potentially of interest to the client. The 0/1KP-based video
personalization algorithm does not convey this information to the
client in its generated response.
iL is the time duration of video segment i, T is the client
In order to include more relevant video content to fill the capacity
of the knapsack, i.e., the client viewing time limit in this case, the
video personalization problem is formulated along the lines of the
following fractional knapsack problem (FKP):
∑
∈
i
ii
ni
Vx)( max
2 , 1{} ,...,
,
905
Page 5
subject to
∑
i
≤
ii
TLy
(4.2)
where T is the client video viewing time limit,
iL is the temporal
iy , where
x
length of video segment
are fractional factors pertaining to the video segment’s relevance
value and its duration respectively. The above FKP can be solved
by using a greedy algorithm. Video segments are sorted in
decreasing order of their Value_Intensity which is computed as
LV Intensity Value
/_
=
, where
iS , and
ix and
] 1 , 0 [
∈
,
iiy
,
iii
iV is the relevance value and
iS . Video segments with
iL is the time duration of video segment
high Value_Intensity values are selected first. Although the FKP-
based optimization scheme can include transcoded video
segments, some potentially relevant videos could be excluded in
the server’s response. This can be attributed to the basic nature of
the constrained optimization problem posed by the FKP and the
greedy algorithm used to solve it. In the case of the FKP-based
video personalization, a fractional portion of a single video
segment could be included in the response generated by the video
personalization module. The last video segment in the generated
response could be summarized or shortened to enable it to fit
within the limit of the client’s video viewing time (i.e., the
knapsack capacity in our formulation).
In the aforementioned 0/1KP-based and FKP-based video
personalization strategies, some video segments and their
corresponding summaries could be excluded from the generated
response to the client’s request. Furthermore, simultaneously
satisfying multiple client-side resource constraints, is beyond the
capabilities of the 0/1KP-based and FKP-based video
personalization strategies.
In many applications, it is desirable to provide the client with as
much information as possible. In such cases it may be preferable
to include two shorter video summaries in the generated response
rather than a single video segment of longer duration that contains
more detail. For example, if a client needs to browse the sports
news of the day, it might be helpful to provide him/her with
multiple, though short, sports news summaries rather than a single
long and detailed video segment containing news of a specific
sport. We propose a Multiple-Choice Multi-Dimensional
Knapsack Problem [10], [11], [12] (MMKP)-based video
personalization strategy to address this issue.
Definition 3: A content group consists of a video segment and its
summaries at multiple levels of abstraction.
Each original video segment
iS is summarized into
iS and its
constitute a multi-level content group, as
1
−
il
summaries. The video segment
,...,2 , 1 {,
ij
jS
∈
shown in Figure 2.
1
−
il
summaries
}
i
l
We denote the original video segment and each of its summaries
as an item. Each item is associated with a relevance value and is
deemed to require m resources. The computation of the relevance
value of a video segment or video summary has already been
discussed in Section 3. The objective of the MMKP-based video
personalization strategy is to select exactly one item from each
content group in order to maximize total relevance value of the
selected items, subject to the m resource constraints of the client.
Figure 2. Multiple Abstraction Level Content Group
The MMKP-based video personalization strategy is formulated as
follows.
Let
ij v be the relevance value of the jth summary of the video
,...,,(
21ijmijijij
rrrr =
vector for the jth summary of the video segment
R =
be the vector that denotes the client-side
resource bounds. The problem thus is to determine
segment
iS ,
)
v
be the required resource
iS and
),...,,(
21m
RRR
v
∑∑
=
i
1
=
=
n
l
j
ijij
i
vxV
1
) max(
,
subject to the constraints
∑∑
=
i
1
=
=≤
n
l
j
kijk ij
i
mkRrx
1
,...,2 , 1,
and
∑
=
j
∈=
il
ijij
xx
1
} 1 , 0{, 1
(4.3)
The MMKP-based video personalization strategy is illustrated in
Figure 3. Video segments at the bottom of each content group are
the original versions. Each original video segment has two
associated summaries. The variable
ijv denotes the relevance
value of the jth item in the ith content group whereas
ijt and
ij b denote respectively the duration and the amount of data
associated with the jth item in the ith content group. We assume
that the client has two resource constraints, i.e., the viewing time
limit T and the total received data limit B. The goal of the
MMKP-based video personalization strategy is to select exactly
one item from each content group such that the total relevance
value ∑∑
==
11ij
∑∑
==
11
ij
==
11
33
ijijvx
is maximized subject to the constraints
Ttx
ij
ijij
≤
33
and
Bbx
ijij
≤
∑∑
33
. The constraint
Abstraction
Level
Original
Video
Segment
Video
Summary
Video
Summary
Video
Summary
906