Content uploaded by Yi Cui
Author content
All content in this area was uploaded by Yi Cui on Mar 27, 2014
Content may be subject to copyright.
An Analysis of User Behavior in Online Video Streaming
Fan Qiu
Department of Electrical Engineering and
Computer Science
Vanderbilt University, Nashville, TN 37212, USA
fan.qiu@vanderbilt.edu
Yi Cui
Department of Electrical Engineering and
Computer Science
Vanderbilt University, Nashville, TN 37212, USA
yi.cui@vanderbilt.edu
ABSTRACT
Understanding user behavior in online video streaming is
essential to designing streaming systems which provide user-
oriented service. However, it is challenging to gain insightful
knowledge of the characteristics of user behavior due to its
high volatility. To this end, the paper provides an extensive
analysis of user behavior in online video streaming, based on
a large scale trace database of online streaming video access
sessions. We categorize user behaviors into multiple patterns
and probe the relationship between them. Our work puts
emphasis on the statistical characteristics of user behavior
patterns. Particularly, this study uncovers that the behavior
of one individual user in a video streaming session is not
only related to the popularity level of the video, but also
has strong correlation with the user’s behaviors in previous
streaming sessions.
Categories and Subject Descriptors
G.3 [Probability and Statistics]: Distribution Function;
H.2.8 [Database Management]: Database Applications—
Data Mining
General Terms
Measurement, Human Factors
Keywords
Online video streaming, User behavior, Probability
1. INTRODUCTION
Streaming video service offers a convenient and flexible
way of watching online videos, through which users could
play the video files simultaneously when the files are being
delivered from the server. Many streaming media websites
such as YouTube and MSN video provide millions of online
streaming videos. This gains online streaming video grow-
ing popularity among Internet users. With the development
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
VLS-MCMR’10 October 29, 2010, Firenze, Italy
Copyright 2010 ACM 978-1-4503-0166-4/10/10 ...$10.00.
of this technique, designers attempt to improve the qual-
ity of streaming service, so as to promote the level of user
satisfaction. To achieve this, it is essential to analyze the in-
formation of user behavior in streaming sessions supported
by the streaming servers [10]. These sessions essentially in-
volve a great amount of user interactions with streaming
system. There are various types of user behaviors related
to these interactions, such as play,stop,jump and pause.If
we understand the implications of the behaviors, we could
make the streaming server more efficient. For example, un-
derstanding behavior patterns helps us estimate the data
load on the server. In addition, the designer can adjust the
contents of the videos according to the response of the users.
Much research effort has been spent on investigating the
user behavior or client/server interactions in online video
streaming. However, user behaviors are not easy to be mod-
eled due to the following reasons: First, extraction of be-
havior information usually requires tedious effort on mining
the video trace datasets, which often contains huge amount
of data. Second, the high volatility of user behavior makes
it difficult to identify the behavior patterns and study the
relationship between them.
In order to analyze user behavior in online video stream-
ing, we use a large scale video trace database from MSN
video, which contains a great amount of video streaming
records retrieved in Year 2007 and Year 2008. Our study
consists of the following steps: 1) Mining the database and
retrieving the streaming sessions that contain information
of active user behavior; 2) Defining behavior patterns and
mapping user behaviors of the selected streaming sessions to
appropriate patterns; 3) Modeling the transitions between
behavior patterns by using a finite state machine (FSM); and
4) Analyzing several explanatory factors that carry signifi-
cant impact on user behavior. We mainly investigate session
groups consisting of consecutive streaming sessions. Each
group is corresponding to one individual user and one video
file. The time interval between two sessions in a group is rel-
atively small. Our research characterizes the user behaviors
of the streaming sessions from MSN database and uncov-
ers the relationship between the behavior patterns. We find
that user behavior is closely related to the popularity level
of a video. In addition, the behavior of an earlier session has
potential impact on the following sessions.
The rest of this paper is organized as follows: Sec. 2 in-
troduces the characteristics of our video trace database. In
Sec. 3 we propose the user behavior patterns and discuss the
transitions between them. We further analyze some factors
that have impact on user behaviors in this section. Sec. 4
49
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 104
2
3
4
5
6
7
8
9
10
11
12
File index
Number of Accesses (logrithm)
Figure 1: File Accesses
overviews some related work and indicates our contribution.
Sec. 5 concludes the paper and introduces the directions of
our future work.
2. VIDEO TRACE DATABASE
Our database consists of trace records from about 5,000,000
video streaming sessions of Year 2007 and Year 2008. The
information of each streaming session is recorded as a ses-
sion entry. Because the raw database has much redundant
information, we first parse the database and keep the fol-
lowing items in each session entry: video ID, source data
rate, video length, player ID, session start time, buffer time,
buffer count and session duration. Here buffer time is the
total time spent on buffering the video file during a ses-
sion. Buffer count is the number of interruptions due to
data buffering during a session. In this section, we mainly
analyze the characteristics of the trace database and discuss
the extraction of sessions that are used in behavior analysis.
2.1 Video File Analysis
The database contains streaming sessions of approximately
76,000 online streaming video files. Each file is related to
some sessions. We first measure two metrics: the number
of accesses and the average duration ratio. The numbers
of accesses (during the entire time period monitored) of the
video files range from 1 to 158,000. The sorted results (nat-
ural logarithm of the number of accesses) of are presented
in Figure 1. Here we only keep the files that are accessed
more than 10 times. The other files are filtered because their
information may not be representative. The next important
metric is the average duration ratio of the streaming sessions
related to each video file. This value is an indicator of the
popularity level of a file, which will be discussed in Sec. 3. An
intuitive understanding of session duration is: users would
like to spend more time on a video if they are satisfied with
the streaming service, which leads to a longer session dura-
tion. However, it is unfair to directly compare the durations
of all the sessions, because different video files usaully have
different playback lengths. So we simply normalize the ses-
sion durations by computing the duration ratio (the ratio
between the session duration and the entire video file play-
back length). For each video file, we compute the median
duration ratio of its sessions. The sorted results are pre-
sented in Figure 2. This plot indicates that the video files
differ a lot: some files have a high popularity level, while
some files are less popular.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
File Index
Median Duration Ratio
Figure 2: Median Session Duration Ratio
5 10 15 20 25 30 35 40
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Accesses of a User
Cumulative Distribution Function (CDF)
Figure 3: User Access Distribution
2.2 User Access Analysis
From the users’ perspective, we investigate the access ses-
sions of each individual user. Our database is divided into
many small groups. All the sessions in a group correspond to
one user ID and one video file ID. According to our measure-
ment, most groups contain only one session, which means
most users access a video file once. Because the groups
containing one session can barely provide any representa-
tive information for behavior analysis, we only explore the
characteristics of the groups with multiple (greater than 3)
streaming sessions. Figure 3 is the plot of the cumulative
distribution function (CDF) of the access rate. In Figure 4
we plot the cumulative distribution function of session du-
ration ratio.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Session Duration Ratio
Cumulative Distribution Function
Figure 4: User Access Duration Ratio
50
3. USER BEHAVIOR ANALYSIS
3.1 User Behavior Patterns
To study the characteristics of user behavior, we still use
the session groups mentioned in last section: all the sessions
belonging to the same user and same file. Therefore the
session entries in one group collectively describe a specified
user’s behaviors related to a particular video file. Because
we could hardly retrieve useful behavior information from
the groups with only one or two sessions, we filter out these
groups with small sizes.
A behavior pattern is defined according to two adjacent
streaming sessions in a group. We propose the following
seven behavior patterns: start,stop,jump forward,jump
backward,replay,pause and return. The definitions of these
patterns are as follows:
1. Start (B0): the beginning of an access to a video file.
The user has never accessed this file before. Here by
”access” we mean a sequence of behavior patterns cor-
responding to a group. In other words, a group of
sessions belong to the same ”access”.
2. Stop (B1): the end of an access to a video file. After
this behavior, the user will not re-access the file during
the monitored period;
3. Jump forward (B2): compared to the end point of last
session in the timeline of the video, the user’s video
player jumps a certain distance towards the end of the
video.
4. Jump backward (B3): compared to the end point of
last session in the timeline of the video, the user’s video
player jumps a certain distance towards the beginning
of the video file.
5. Replay (B4): the user finished playing the video before
this session, and he/she restarts to play the video from
the beginning point.
6. Pause (B5): the user’s video player starts from the
time point where it stopped in last session.
7. Return (B6): the user returns and keep playing the
video.
Figure 5 is an illustration of several behavior patterns.
Now next important issue is how to identify patterns in the
session records. As we can see from the figure, if there is
a gap between two adjacent streaming sessions, a jump for-
ward pattern is generated. If there is an overlap between two
adjacent sessions, a jump backward pattern is generated. If
two adjacent sessions are well connected, a pause pattern is
generated. This pattern can be explained as: a user played
the video in an earlier session, then the video file was closed
at some point. In the following session, the user came back
and played the video from the point where he/she stopped
in last session.
According to the definitions of user behavior patterns, the
sessions within each group can be mapped to a sequence of
behavior patterns. Figure 6 is an example of translating a
group of sessions into behavior patterns. Here ”start time”
is the starting point of a session in the timeline of the video.
Similarly, ”end time” is the end point in the timeline.
Figure 5: User Behavior Patterns
Figure 6: Mapping Session Records to Behavior pat-
terns
Since we have converted the session groups to sequences
of user behavior patterns, we can compute the distribution
of these patterns, which is shown in Figure 7. It is obvious
that in our experiment, jump forward and replay are the
two most frequent behavior patterns, and pa use is relatively
rare in these access sessions.
3.2 Transitions Between Patterns
To understand the relationship between behavior patterns,
we set out to study the transitions between them. From the
observations of the behavior pattern sequences, we find that
the patterns are transitive to each other. For example, in
Figure 6, the last replay pattern is followed by a jump back-
ward pattern in the following session. And this jump forward
pattern is then followed by a stop pattern in the next session.
This structure is similar to a finite state machine (FSM), if
we regard each behavior pattern as a state. Actually, the
seven user behavior patterns form a finite state space. So
Start Stop J_Forward J_Backward Replay Return Pause
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5 x 104
Behavior Pattern
Number of Occurrences
Figure 7: Distribution of Behavior Patterns
51
Figure 8: Finite State Machine of User Behavior
Patterns
we can construct a finite state machine to describe these
transitions. The sate machine is shown in Figure 8. The
transitions are indicated by the arrows. This FSM describes
how the transitions occur between the behavior patterns. In
addition, the FSM is featured by the following aspects: 1)
stop is always the last state of a state sequence, there is no
state following stop;2)start is the first state of a state se-
quence, so no state can move to start;3)paus e is always
followed by return.
We use transition probabilities to accurately describe the
transitions. Suppose current state is Si, and the next state
is Sj, so the transition probability between the two states
can be expressed as
Pij =Pr{Sj|Si}(1)
In Table 1 we list all the transition probabilities, which
are derived from the dataset consisting of the selected trace
groups (with group size greater than 3). Thus we can an-
alyze the transitions between the behavior patterns from a
statistical perspective.
In this table, the behavior patterns are represented by
variables B0 to B6. The variables on the first column repre-
sent current state. The variables on the first row represent
the following state. Notably, the probability from jump for-
ward to jump forward, and the probability from replay to
replay are relatively high. Here we need to clarify that this
transition table relies heavily on the trace database used. It
might not be accurate enough when applied to other trace
databases. However, it implies some implicit relationships
between the behavior patterns.
3.3 User Behavior Characteristics
In this section, we introduce some characteristics of user
behavior. We first focus on session duration. Intuitively,
session duration is easily affected by the inclination of the
user to the content of a video. To study this relationship, we
define a new variable: video popularity level. This metric
represents the popularity level of a file among its users. We
assume that it is strongly correlated to video content and
user inclination. Since the duration ratio of a session can
reflect a user’s satisfaction level, we use the median dura-
tion ratio of all the sessions related to a file to represent its
Table 1: Transition Probabilities Between User Be-
havior Patterns
B0 B1 B2 B3 B4 B5 B6
B0 0 0 0.399 0.127 0.421 00.054
B1 0 0 0 0 0 0 0
B2 00.308 0.494 0.104 0.065 00.031
B3 00.443 0.050 0.328 0.144 00.036
B4 00.302 0.068 0.030 0.589 00.011
B5 00.384 0.078 0.084 0.109 00.344
B6 0 0 0 0 0 1.000 0
0 20 40 60 80 100 120 140 160 180 200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Group Index
Session Duration Ratio
Video File
Popularity Level
Accumulative Sesseion
Duration Ratio
Figure 9: Video Popularity level and Session Dura-
tion Ratio
popularity level, which is a unique characteristic of this file.
In our experiment, we pick out about 200 session groups
with large group sizes. Here each group corresponds to one
video file and one user. We sort these groups by using video
popularity level. The result is represented by the solid line in
Figure 9. Then we measure the accumulative session dura-
tion ratio (represented by stars in the figure) of each group.
The accumulative group duration ratio is the sum of the du-
ration ratios of all the sessions in this group. If the sum is
greater than 1, it is set to 1. As the figure illustrates, the
accumulative duration ratio of a group with higher video
popularity level is possibly greater than that of a group with
lower level. So we argue that users usually spend more time
on a video with high popularity level.
Our study also reveals that a user’s behavior at a partic-
ular time point could be related to some other priori infor-
mation, such as the behavior patterns of the earlier access
sessions. For example, if a user has replayed the video for
several times, it is highly possible that he/she would play the
video in the following session(in the form of replay or jump
forward ),basedontheFSMdiscussedinlastsection.From
this phenomenon we could find some implications about the
interest level of the user. In addition, by analyzing the user
behaviors in a sequence of consecutive streaming sessions,
we can predict the user’s possible behaviors in the future
sessions.
4. RELATED WORK
There have been many studies on mining user behavior
in web based applications, such as online advertisements [5],
multimedia streaming web search [1, 2, 11], and online so-
cial networks [12, 9, 8]. Attenberg, Cheng, Chatterjee et al.
[3, 6, 5] have studied the user behaviors involved in spon-
52
sored online advertisements. They modeled and predicted
the probabilities of clicks of the consumers. Agichtein, Brill,
and Dumais proposed that incorporating user behavior in-
formation can greatly improve the ordering accuracy of top
results in web search setting [1]. In [9], Maia et al. character-
ized and identified user behaviors in online social networks.
The researches by Benevenuto, Costa et al. [4, 7] focused on
analyzing user behaviors on the basis of video interactions.
Yu et al. introduced the user behavior and content access
patterns of large-scale video-on-demand systems [13]. They
further discussed the implications of user behavior on the
design of media streaming systems. The study of [14] in-
troduces the design and implementation of an optimization
system that enhances user experiences during web-based ac-
tivities.
Different from the existing works, we categorize user be-
haviors during online video streaming into several behavior
patterns. By employing a finite state machine, we focus on
mining the statistical relationship between these patterns.
5. CONCLUSION AND FUTURE WORK
In this paper, we provide an extensive analysis of user
behavior in online video streaming, using streaming session
records from a large scale streaming video trace database.
Our contribution is in the following aspects: First, we an-
alyze the overall statistical characteristics access sessions
in the database. Second, we proposed seven user behavior
patterns and estimated the transition probabilities between
these patterns. Third, we introduce some factors that affect
user behavior, such as video popularity level and the behav-
iors in previous sessions. Currently, our study only uncovers
the characteristics of behavior patterns of consecutive ses-
sions.
One central challenge of user behavior analysis is estab-
lishing a quantitative model that accurately describes the
behaviors in video streaming. Base on the implicit charac-
teristics of user behavior analyzed in this paper, one direc-
tion of our future work is to design such a model to predict
the behaviors of an individual user or even the cooperative
behaviors of several users. Another branch of our research is
to apply user behavior model to user-oriented online stream-
ing system design.
6. ACKNOWLEDGMENTS
We would like to thank Dr. Jin Li and Dr. Cheng Huang
of Microsoft Research to provide the video trace database.
7. REFERENCES
[1] E. Agichtein, E. Brill, and S. Dumais. Improving web
search ranking by incorporating user behavior
information. In SIGIR ’06: Proceedings of the 29th
annual international ACM SIGIR conference on
Research and development in information retrieval,
pages 19–26, New York, NY, USA, 2006. ACM.
[2] E. Agichtein and Z. Zheng. Identifying ”best bet” web
search results by mining past user behavior. In KDD
’06: Proceedings of the 12t h ACM SIGKD D
international conference on Knowledge discovery and
data mining, pages 902–908, New York, NY, USA,
2006. ACM.
[3] J. Attenberg, S. Pandey, and T. Suel. Modeling and
predicting user behavior in sponsored search. In KDD
’09: Proceedings of the 15t h ACM SIG KDD
international conference on Knowledge discovery and
data mining, pages 1067–1076, New York, NY, USA,
2009. ACM.
[4] F. Benevenuto, F. Duarte, T. Rodrigues, V. A.
Almeida, J. M. Almeida, and K. W. Ross.
Understanding video interactions in youtube. In MM
’08: Proceeding of the 16th ACM international
conference on Multimed ia, pages 761–764, New York,
NY, USA, 2008. ACM.
[5] P. Chatterjee, D. Hoffman, and T. Novak. Modeling
the clickstream: Implications for web-based
advertising efforts. Marketing Science, 22(4):520–541,
2003.
[6] H. Cheng and E. Cant´u-Paz. Personalized click
prediction in sponsored search. In WSDM ’10:
Proceedings of the third ACM i ntern ational conferen ce
on Web search and data mining, pages 351–360, New
York, NY, USA, 2010. ACM.
[7] C. P. Costa, I. S. Cunha, A. Borges, C. V. Ramos,
M. M. Rocha, J. M. Almeida, and B. Ribeiro-Neto.
Analyzing client interactivity in streaming media. In
WWW ’04: Proceedings of the 13th internation al
conference on World Wide Web, pages 534–543, New
York, NY, USA, 2004. ACM.
[8] W. Lin, H. Zhao, and K. Liu. Incentive cooperation
strategies for peer-to-peer live multimedia streaming
social networks. Multimedia, IEEE Transactions on,
11(3):396 –412, april 2009.
[9] M. Maia, J. Almeida, and V. Almeida. Identifying user
behavior in online social networks. In SocialNets ’08:
Proceedings of the 1 st Workshop on Social N etwor k
Systems, pages 1–6, New York, NY, USA, 2008. ACM.
[10] Y.-C. Tu, J. Sun, M. Hefeeda, and S. Prabhakar. An
analytical study of peer-to-peer media streaming
systems. ACM Trans. Multimedia Comput. Commun.
Appl., 1(4):354–376, 2005.
[11] R. W. White and D. Morris. Investigating the
querying and browsing behavior of advanced search
engine users. In SIGIR ’ 07: Proceedings o f the 30th
annual international ACM SIGIR conference on
Research and development in information retrieval,
pages 255–262, New York, NY, USA, 2007. ACM.
[12] C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and
B. Y. Zhao. User interactions in social networks and
their implications. In E uroSys ’ 09: P roceedings of the
4th ACM European conference on Computer systems,
pages 205–218, New York, NY, USA, 2009. ACM.
[13] H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng.
Understanding user behavior in large-scale
video-on-demand systems. In EuroSys ’06:
Proceedings of the 1 st ACM SI GOPS/EuroSys
European Conference on Computer Systems 2006,
pages 333–344, New York, NY, USA, 2006. ACM.
[14] D. Zhou, A. Chander, and H. Inamura. Optimizing
user interaction for web-based mobile tasks. In WWW
’10: Proceedings of the 19th inter nationa l con ference
on World wide web, pages 1333–1336, New York, NY,
USA, 2010. ACM.
53