Content uploaded by Lazaros Gkatzikis
Author content
All content in this area was uploaded by Lazaros Gkatzikis on Jul 04, 2016
Content may be subject to copyright.
Adapting Caching to Audience Retention Rate:
Which Video Chunk to Store?
Lorenzo Maggi?, Lazaros Gkatzikis?, Georgios Paschos?, and J´
er´
emie Leguay?
Abstract—Rarely do users watch online contents entirely. We
study how to take this into account to improve the performance of
cache systems for video-on-demand and video-sharing platforms
in terms of traffic reduction on the core network. We exploit the
notion of “Audience retention rate”, introduced by mainstream
online content platforms and measuring the popularity of differ-
ent parts of the same video content. We first characterize the
performance limits of a cache able to store parts of videos, when
the popularity and the audience retention rate of each video are
available to the cache manager. We then relax the assumption of
known popularity and we propose a LRU (Least Recently Used)
cache replacement policy that operates on the first chunks of
each video. We characterize its performance by extending the
well-known Che’s approximation to this case. We prove that, by
refining the chunk granularity, the chunk-LRU policy increases
its performance. It is shown numerically that even for a small
number of chunks (N= 20), the gains of chunk-LRU are still
significant in comparison to standard LRU policy that caches
entire files, and they are almost optimal.
Index Terms—cache, audience retention rate, chunk, LRU
I. INTRODUCTION
Content Distribution Networks (CDN) and Video on De-
mand applications use network caches to store the most
popular contents near the user and reduce backhaul bandwidth
expenditure. The future projections for the cost of memory
and bandwidth promote the use of caching to satisfy the ever-
increasing network traffic [1]. Since the bandwidth saving
potential of caching is restricted by the number of files that fit
in the cache (the cache capacity), it is interesting to maximize
the caching effectiveness under such a constraint. Here we
consider the use of partial caching, a technique according to
which we may cache specific parts of files, instead of whole
ones.
We focus on video files which represent a significant
fraction of the global Internet traffic (64% according to [2]).
Videos are the most representative example of contents that
are only partially retrieved, since specific parts of a video
are viewed more than others. Typically, the average user
will “crawl” several video files before watching one in its
entirety. The above imply that most of the times it is not
needed to cache the entire video. Indeed Fig. 1 shows the
video watch-time from a trace of 7000 YouTube videos. The
histogram emphasizes the fact that the vast majority of files
is only partially watched, and motivates the design of caching
algorithms that avoid caching rarely accessed video parts, e.g.
the tail.
Optimization of caching is often based on file popularity.
Storing the most popular files results in more cache hits,
?Mathematical and Algorithmic Sciences Lab, France Research Center,
Huawei Technologies Co. Ltd.
Figure 1: Histogram of watch-time in YouTube (based on a data sample of 7000 video
files from [5]). On average 60% of a file is watched.
which decreases the impact on the traffic on the core network.
Nevertheless, not all the parts of a file are equally popular
[3]. Hence, a natural generalization of “store the most popular
files” is to split the video files into chunks and “store the most
popular chunks” instead. To differentiate the popularity of each
video chunk we use the metric of the audience retention rate
[4], which measures the popularity of different parts of the
same file. It has many advantages: it is file specific, it is
available in most content distribution platforms, e.g., YouTube
[4], and it evolves very slowly over time, which facilitates its
easy estimation1. The latter is not generally true for chunk
popularity which are affected by the time-varying popularity
of the corresponding file.
In this paper we establish a link between the audience re-
tention rate and the efficiency of partial caching. Our approach
is based on decomposing popularity into video popularity
and video retention rate. More specifically, we address the
following questions: i)How much bandwidth could we save
via partial caching of video content and ii)Is this gain
achievable by practical caching algorithms?
A. Related Work
Partial caching techniques were first reported in the context
of proxy caching, where it was proposed to store the file
headers to improve latency performance [6]. To capture both
latency and bandwidth improvements, [7] splits the files into
segments of exponentially increasing size. More generally, it
is possible to cache specific chunks in order to capture the
different popularity of sections within a file (a.k.a. internal
popularity) [3], [8].
1The quasi-static nature of audience retention rate relates to file particular-
ities, e.g. a movie may become uninteresting towards the end.
arXiv:1512.03274v1 [cs.NI] 10 Dec 2015
Intuitively, extreme chunking (e.g. at byte level) offers finer
granularity and potentially leads to the optimal caching perfor-
mance. However, tracking popularity at such fine granularity
is impractical and leads to algorithms of prohibitively high
complexity [9]. A series of works suggest to split each file
into a small number of chunks and treat each chunk indepen-
dently [7], [10]. Alternatively, it is proposed to model internal
popularity as a parametric k-transformed Zipf distribution [9],
[11]. Knowing the distribution type, simplifies the estimation
task but still requires parameter estimations individually for
each file. Deducing the optimal size and number of chunks
is not straightforward. It was recently shown that restricting
to nhomogeneous chunks incurs a loss which is bounded
by O(n−2) [8]. Alternative heuristic approaches suggest that
only a specific segment of each file should be cached and
dynamically adjust its size. For instance, [12] proposes a seg-
mentation scheme where initially the whole object is cached
but the segment size is gradually set equal to its estimated
average watch-time. Similar adaptive strategies have been
also considered for peer-to-peer networks [13], where starting
from a small segment, the portion to be cached is increased
according to the number of requests and watch-time. The
caching of several segments of each file was proposed in [14],
since users may be interested only in specific, non-contiguous
parts of files. In this case the segment size has to be selected
accordingly.
In this paper we prove that the performance of partial
caching indeed improves when the file is split into chunks. We
develop an analytical framework for LRU performance under
partial caching and we use it to show that the performance
gains of partial caching remain significant even for a small
number of chunks. Up to the authors’ knowledge, there are no
studies assessing analytically the actual performance of such
cache management strategies and their inherent performance
limits under the partial viewing assumption.
B. Main contributions
We first investigate a trace of YouTube data [5] and conclude
that partial caching has a great potential to improve perfor-
mance, mainly because: (i) the average video watch-time is
no more than 70%, and (ii) the larger the video is the less
its average watch-time. Motivated by this, in Section IV we
present an analysis of traffic bandwidth reduction which is
based on the audience retention rate. Combining the theoretical
analysis with the YouTube data, we show that in realistic
settings the traffic reduction of partial caching over traditional
caching may reach up to 50%.
The above analysis compares the performance limits of
the two caching approaches assuming known popularity and
retention rates. Therefore, it is also interesting to investigate
the bandwidth benefits from partial caching in a more realistic
setting. In Section V we design a class of practical chunk-LRU
(Least Recently Used) policies, which split files into different
chunks and always drop (i.e., never cache) the last chunk at
the tail of files. Chunk-LRU policies harness the realistic gain
of partial caching due to video watch-time. Moreover we gain
intuition into designing optimal chunking and we show that
the maximum performance can be approached with a small
number of chunks of equal size.
Our main technical contributions to the literature are:
•We formulate the traffic reduction optimization problem
and provide a waterfilling algorithm to solve it efficiently.
For the special case where users watch each video con-
tinuously until they abandon it, we derive the optimal
waterfilling partial allocation in closed form. It consists
of caching a compact interval [0, ν]of the file where ν
is given in closed form.
•We propose a novel chunk-LRU algorithm that splits each
file in N+ 1 chunks where the last one is never cached.
•We build an analytical framework to analyze the chunk-
LRU performance under partial viewing, subject to Che’s
approximation for LRU performance, [15].
•We provide a sufficient condition for retention rates such
that sub-splitting chunks is always beneficial.
•We characterize the optimal performance of chunk-LRU
as a simple optimization problem over the tail drop factor
and with infinitesimal chunking.
II. YOU TUB E VID EO WATCH -TI ME
In this section we examine YouTube access traces2[5] in
order to analyze the average video watch-time, which is the
portion (∈[0; 1]) of each file watched by the users. Watch-
times are crucial for caching: using partial caching we may
avoid to cache rarely watched parts of videos and use the freed
cache space to store more files.
Since most strategies try to cache the most popular files,
first we investigate the relationship between average watch-
time and file popularity. We classify videos into 10 groups
according to their average daily views. Fig. 2 depicts the
estimated probability density function of watch-time for three
representative groups, the 10% most popular videos, the 10%
least popular, and the intermediate ones. Interestingly, we
observe that the more popular a video is, the higher the
average watch-time. However, even for the most popular ones,
on average only 72% of each video is watched, which leaves
room for caching optimization.
Figure 2: Watch-time distribution for different classes of video popularity. The average
watch-time of a video increases with its popularity.
2The dataset is publicly available and was crawled using the YouTube Data
API in 2013. It contains information about 7000 files, including daily views,
watch-time, duration, genre and title of each file.
2
Figure 3: Average watch-time is increasing with the popularity of files, but steeply
decreasing with its duration.
Next, we investigate the relationship between watch-time
and video duration. The latter is a critical parameter for
caching due to the cache capacity constraint which eventually
determines caching performance. If longer videos are only
partially watched, avoiding to cache their unwatched parts
will yield a greater benefit. In Fig. 3 we depict with dots
the YouTube data for the 20% most popular files. In order to
identify how the watch-time is affected by the video duration
and its popularity, we use locally weighted polynomial regres-
sion [16] to fit a smoothed surface to the corresponding data.
Notice that the most beneficial regime for caching purposes
corresponds to the upper left corner of the plot, namely highly
popular videos of large size. We observe that in this region the
average watch-time is around 0.7. In addition, independently
of the video popularity, watch-time decreases rapidly with
video duration.
We then group the available data to 10 classes according to
their popularity and duration (≷200 sec). We depict the details
of the derived classes in Table I, namely for each class we
depict the average watch-time, the fraction of videos belonging
to this class and its average duration in seconds. We observe
that the large and popular videos amount to a non-negligible
percentage of 5%. In addition, the average watch-time of large
files is significantly smaller than that of smaller ones. To
precisely evaluate the impact of watch-time to caching, we
use these data in the subsequent Sections IV,V to quantify
the theoretical maximum and the practically feasible caching
performance.
III. SYS TE M MOD EL
We consider a communication system where users download
video contents from the network. Let M={1, . . . , M }
be the video content (or simply, video) catalog. Each video
i∈ M is of size Sibytes. Content requests are generated
using the well-known Independent Reference Model (IRM)
[17] according to which the requests for the videos Mare
independent of each other. We call pithe probability that
video iis requested, given that a video request has arrived.
Equivalently, the sequence of video requests can be thought
of as Mindependent homogeneous Poisson processes with
intensity rate proportional to the probability vector {pi}i. For
M: video catalog of cardinality |M| =M
C: cache size
pi: popularity of video i
Ri(τ): audience retention rate of video i
πi(τ): viewing abandonment p.d.f. of video i
Si: size of video i
Bs(Y): traffic bandwidth on the core network when the portion
Yiof video iis statically stored in the cache (see Eq. (2))
N+1 : number of chunks for chunk-LRU
B: minimum core network traffic achieved by optimal partial
caching
[xk−1, xk]:k-th chunk of a video
x: collection of chunks
ν: tail drop factor for chunk-LRU; the last chunk [ν; 1] is
never stored in the cache
hk,i : hit rate of the k-th chunk of video i
tC: characteristic time for chunk-LRU
BcLRU(x, ν ): traffic on core network with chunk-LRU (see Eq. (9))
subject to the chunking xand a tail drop factor ν
BcLRU : optimal traffic performance for chunk-LRU (see Eq. (11))
Table II: Table of notation symbols
convenience of notation, we assume that the probabilities are
in decreasing order, i.e., p1≥p2≥ ·· · ≥ pM.
One cache of size Cbytes is deployed in the network.3
Whenever a requested video is found in the cache, the cache
itself can directly serve the user. Otherwise, the video needs to
be retrieved through the core network, which provides access
to a central video content store containing the entire video
catalog, see Fig. 4. Hence, good caching performance has a
profound impact on the traffic reduction on the core network.
The goal of this paper is to determine the extra bandwidth
benefits that may be gained by exploiting the fact that videos
are rarely watched entirely.
Figure 4: System model
A. Viewing Behavior Model: Audience Retention Rate
To mathematically analyze the impact of watch-time, we
introduce the central notion of audience retention rate Ri(τ).
According to Youtube’s definition, the audience retention rate
Ri(τ)measures the percentage of users that are still watching
video iat the corresponding (normalized) instant τ, out of the
overall number of views [4]. As we will see, in our analysis the
retention rate has a prominent role in determining the caching
performance.
Typically a user may watch video ifrom instant ai(1) up
to bi(1), then she possibly skips to ai(2) and watches until
bi(2), and so forth4. The watched part Wi, which equals the
3Our analysis can be extended to a cache hierarchy by letting piexpress
the probability that a request for video iis missed by the caches at all the
child nodes [1].
4We remark that such intervals may also overlap, i.e., a user may rewind the
video and watch a part of it multiple times. We assume that, if this occurs, then
the user can directly retrieve the video portion that she has already watched
from her terminal’s cache.
3
Popularity \Duration Small Large
Av. watch-time Fraction of population Av. Duration (sec) Av. watch-time Fraction of population Av. Duration (sec)
Lowest 0.52 0.179 81 0.37 0.020 220
Low 0.6 0.162 112 0.47 0.036 220
Medium 0.64 0.153 128 0.57 0.045 223
High 0.67 0.152 130 0.60 0.047 222
Highest 0.72 0.145 124 0.65 0.053 235
Table I: The characteristics of each class of videos. These data will be used to derive realistic and class-specific retention rates for our numerical evaluation.
minimum portion of video ithat the user needs to download,
is the union of all watch intervals j:
Wi=∪j[ai(j); bi(j)].
We call |Wi|the watch-time of user watching video i. For ease
of notation we consider ai, bi∈[0; 1] as portions of the whole
video duration. The “audience retention rate5” function Ri(τ)
can be then formally defined as the probability that a user has
watched the (normalized) instant τof the video, i.e.,
Ri(τ) = Pr (τ∈Wi), τ ∈[0; 1].
Alternatively, we may think of Ri(τ)as the fraction of users
that watch the (normalized) instant τof the video i.
We remark that, thanks to the definition of Ri, we can easily
evaluate the average watch-time for video ias R1
0Ri(τ)dτ.
Next we devise a realistic and more specific viewing behav-
ior model and we derive its relationship to audience retention
rate.
1) Viewing Abandonment Model: This is a special instance
of the viewing model presented above. It assumes that users
always start watching each video ifrom its beginning, and
they abandon it after a random time portion bi∈[0; 1].
Hence, in this case the watched part Witakes on the simple
form Wi= [0; bi], thus biequals the watch-time. We call
πi(.)the probability density distribution of the abandonment
time variable bi. The relationship between the abandonment
distribution πiand the audience retention rate Riis described
by the expression:
Ri(τ) = 1 −Zτ
0
πi(t)dt. (1)
Hence, in this case the audience retention rate Ri(τ)measures
the fraction of users with watch-time higher than τfor the par-
ticular video i. We first observe from (1) that Riis inherently
non-increasing, with Ri(0) = 1. We also remark that, under
the viewing abandonment assumption, the audience retention
rate Riuniquely describes the random watch behavior [0; bi]
of user via πi. This observation does not hold though for
the general case described in Section III-A, where the same
retention rate Rimay result from an arbitrary distribution of
watch behaviors.
In order to come up with a realistic audience retention
rate function from the estimated parameters in Tab. I for our
numerical investigations in Sections IV-C,V-D we assume that
the viewing abandonment model holds.
5Our definition is in accordance with the definition of audience retention (or
“engagement”) rate by Wistia.com [18]. Youtube’s audience retention rate [4]
actually counts the video rewinds as multiple views inside the same videos.
Figure 5: Instance of audience retention rate from YouTube.
IV. PERFORMANCE LIMITS OF PARTIAL CACHI NG
This section analyzes the performance limits of partial
caching in the context of audience retention rate. Our perfor-
mance metric is core network traffic and we tackle the off-line
problem of finding the optimal static (partial) file cache allo-
cation6. In particular, we will compare the maximum network
traffic saved by caching entire videos versus caching arbitrary
portions of each of those. In both cases it is idealistically
assumed that the video popularity distribution {pi}i∈M and
the audience retention rate functions {Ri}i∈M are perfectly
known to the cache manager. This analysis serves as an upper
bound for any cache management strategy with more limited
information, as the one devised in Section V.
Let us first formalize our problem. We define the partial
allocation Yi⊆[0; 1] of video ito be the collection of (pos-
sibly) non-adjacent bytes, that are selected to be permanently
stored in the cache. Subject to a partial allocation Yi, any
requests for the remaining portions [0; 1]\Yineed to be served
by the origin video store. Due to the specific retention rate
for this video, this happens with probability R[0;1]\YiRi(τ)dτ.
Therefore, under a partial allocation vector Y, we may express
the expected traffic on the core network per request B(Y)as
B(Y) = X
i∈M
SipiZ[0;1]\Yi
Ri(τ)dτ. (2)
Considering the video size Siand cache size C, a partial
allocation vector Yis feasible whenever Pi∈M SiRYi1dx =
C. Our goal is to select a feasible vector Ythat minimizes
the incurred traffic Bs(Y), i.e.,
Y∗= argmin
Y
B(Y)(3)
s.t.Pi∈M SiRYi1dx =C
Yi⊆[0; 1]
If users always watch the whole video, i.e., Ri(τ) = 1 for
all τ∈[0; 1] and i∈ M, then the optimization (3) takes a
6We remark that in our analysis of the optimal traffic bandwidth B(Y∗)
we assumed that the videos Y∗are already present in the cache and we
did not take into account the traffic needed to fill the cache. If we wish
to incorporate this aspect, we could say that B(Y∗)is the expected traffic
achieved asymptotically over a number of requests tending to infinity.
4
simple form which is solved by the well-known store the most
popular videos policy. In this case, we would choose to fully
store, Yi= [0; 1], the videos of highest piup to the cache
capacity and no portion of the rest, i.e. Yi=∅otherwise. As
indicated by the previous section however, in reality this is not
the case, hence we expect Y∗to bring certain improvement,
that we evaluate in Section IV-C.
Technically speaking, if we lift any assumption on the
shape of the audience retention rate, the best cache allocation
should intuitively prescribe to partition all videos at the finest
granularity (at the byte level, say), order them according to
their popularity, and fill the cache with the most popular bytes.
We now provide an equivalent waterfilling characterization of
the optimal partial video allocation Y∗to solve this problem.
The main advantage of this formulation lies in the fact that it
leads to an efficient algorithm to compute Y∗, that we present
at the end of the section.
Theorem 1. The optimal partial video allocation Y∗can be
expressed as
Y∗
i(µ) = {τ:piRi(τ)≥µ} ∀ i∈ M,(4)
where µis such that Pi∈M Si|Y∗
i(µ)|=C, where |.|is the
size7of a subset of [0; 1].
Informally speaking, the water level µdetermines a popu-
larity threshold above which a byte of any video deserves to
be stored in the cache.
A. Viewing Abandonment Model
In the special case of viewing abandonment model, we
already observed that the audience retention rate Riis non-
increasing for all i∈ M. This allows us to specialize our
result in Theorem 1 as follows.
Corollary 1. Consider the viewing abandonment model with
strictly decreasing Ri, for all i∈ M. The optimal video
allocations writes Y∗= [0; η∗
i]for all i∈ M, where
η∗
i(µ) =
1 if piRi(1) ≥µ(µ≥0)
0 if pi≤µ
R−1
i(µ/pi) otherwise
Pi∈M Siη∗
i(µ) = C.
(5)
A remarkable observation here is that optimum bandwidth
performance is achieved by splitting every video in only two
parts and caching the first one. We may determine the exact
splits if the abandonment distribution is given. For instance,
if πiis truncated exponential one with parameter λi, i.e.,
πi(τ) = λi
1−e−λie−λiτ, τ ∈[0; 1],
then the following holds.
Corollary 2. Under the exponential viewing abandonment
model the optimal video allocations writes Y∗= [0; η∗
i]for
7formally defined as the Lebesgue measure
all i∈ M, where
(η∗
i(µ) = h−1
λiln µ
pi(1 −e−λi) + e−λii+
,(µ≥0)
PM
m=1 Siη∗
i(µ) = C.
(6)
B. Computation of Optimal Performance
To solve (3), we observe that it can be expressed as a
separable convex optimization problem with linear and box
constraints. If we further assume that the functions Rido
not have any plateau, then the objective function becomes
strictly convex, thus we can adapt the algorithm presented
in (Section 7.2, [19]) to our scope in order to efficiently
compute the optimal cache partial video allocation Y∗. We
present below the high-level description of the algorithm. An
interested reader may find in the Appendix the implementation
details.
Waterfilling algorithm
Set k:= 0. Set M(0) := M.
while M(k)6=∅
•refine the search of the set of indices M(k)in correspondence
to which the optimal solution is deemed to be in the interior of
the box constraint
•if the approximated solution for video i∈ M(k)falls beyond
the box [0; 1], it is rounded to the nearest boundary; it is now
optimal and discarded from M(k)
•set k:= k+ 1
end
C. Performance Evaluation with Real Data
In order to evaluate the performance of the optimal partial
allocation in a realistic scenario we utilize the average watch-
time parameters shown in Tab. I. In Fig. 6 we compare the
core network traffic B=Bs(Y∗)generated by the optimal
partial caching strategy with the one produced by the most
natural strategy prescribing to store the most popular videos
in their entirety. We observe that remarkable gains from partial
caching are achieved for cache size ratios higher than 10−2of
the total catalog size, which we typically find in current CDN
scenarios.
We then show in Fig. 7 the optimal portion of videos
that should be stored according to the same optimal caching
strategy, for different values of the cache size. Interestingly,
only very popular videos are stored in their entirety, even for
large cache sizes.
We finally remark that in this paper we normalize all the
core network traffic figures with respect to the minimum
bandwidth per video request Bnc required to serve the users
when no cache is deployed in the system, which equals
Bnc =
M
X
i=1
SipiZ1
0
Ri(τ)dτ. (7)
5
Figure 6: Core traffic generated by the optimal partial caching strategy in a realistic
scenario vs. the traffic produced by storing the most popular videos in their entirety.
We show in red the resulting performance gain by using the first strategy. We utilized
the parameters obtained via real data shown in Tab. I. The video popularity distribution
follows a Zipf law with parameter 0.8 [17]. Sis denoted as the average video size.
Figure 7: Optimal portion of videos that should be stored according to the same optimal
caching strategy in Fig. 6. Given a certain C/SM , the video with video popularity x
should be stored from its beginning up to portion y.
V. A PRAC TI CA L CHU NK -LRU SC HE ME F OR DECREASING
RET EN TI ON RATE S
After analyzing the best performance that can only be
achieved with full information on the system parameters, we
turn to the study of a practical cache update scheme that shows
good performance even when popularity piand audience
retention rate Riare unknown for each video i.
It is a widespread understanding that the Least Recently
Used (LRU) cache replacement policy represents a good
trade-off between hit-rate performance and implementation
complexity in a real scenario where no statistics on video
popularity are available to the cache manager. Moreover,
thanks to its short memory it reacts quickly to variations in
video popularity. In its simplest form though, each time a video
is requested even only partially by a user and is not found in
the cache, LRU would prescribe to cache it in its entirety
(and to update the LRU recency table accordingly). Since
users rarely watch videos entirely, as previously observed, the
standard LRU would generate extra-traffic in the core network
and would waste precious cache space to store unpopular
portions of files.
In order to counter this, we propose a new cache manage-
ment policy that generalizes the classic LRU policy. We first
Figure 8: Video split into N+1 chunks. Only the first Nare considered for chunk-LRU;
the last one is never stored in the cache.
suggest to split each video into N+1 consecutive and non-
overlapping chunks. We denote by [xi−1;xi]the i-th chunk.
Moreover, we argue that the last (i.e., the (N+ 1)-th) chunk
of each video, which is the least popular part under the
assumption of decreasing audience retention rate, should never
be stored in the cache, even if requested by a user. Intuitively,
this frees up space for more popular chunks of less popular
videos to be stored in the cache. We call νthe tail drop factor
that pinpoints the position of the last chunk. Hence, the first
Nchunks of each video are stored only if requested, and then
evicted from the cache in an LRU fashion.
Remark 1. For the sake of analysis simplicity we assume that
the chunk splitting x, ν does not depend on the identity of the
file. We leave this as a future extension.
Performing LRU on the first Nchunks presents two main
benefits. On the one hand, it reduces the extra-traffic on the
core network caused for the retrieval of video portions that
are not requested. For instance, whenever a user watches
a video from its beginning up to portion b, only the first
¯
k= mink{xk≥b}chunks are downloaded. Hence, only the
portion xk−bis stored in the cache without being accessed.
On the other hand, we exploit the fact that the tail of a
video is generally less popular than the rest [9]. Hence, by
systematically discarding the tail of each video we avoid to
evict from the cache the first chunks, which are likely to be
more popular.8
We now formally describe our algorithm which uses as input
the chunking of files and the tail drop factor. The impact of
those parameters on actual performance is analyzed in the
following subsections.
chunk-LRU Algorithm
Step 1 (Initialization):
1.1) Set the tail drop factor ν∈(0; 1]
1.2) Partition each video iinto N+ 1 chunks of the form [x0=
0; x1],[x1, x2],...,[xN−1;ν≡xN],[xN=ν;xN+1 = 1], where
xi∈[0; 1] (see Fig. 8)
1.3) An initial chunk request recency vector is available
Step 2: A request for a packet of video i∈ M belonging to its k-th chunk
[xk−1, xk]arrives
2.1) If k=N+1, then the request is handled by the core network and the
cache is not updated (i.e., the tail is never cached)
8Additionally, although this is not the focus of this paper, performing LRU
on chunks would allow to keep track of the evolution of the popularity of
each chunk. Nevertheless, the resulting benefits would be minor, since the
retention rate varies on a time scale much slower than the video popularity
dynamics.
6
2.2) Else, if 1≤k≤N, then
2.2.1) If the requested chunk is stored in the cache, then the cache sends
the packet to the user
2.2.2) If the requested chunk is not stored in the cache, then it is retrieved
from the core network and then stored in the cache, after evicting
the minimum number of least recently used chunks. Finally, the
cache sends the packet to the user
2.3) The recency vector of the chunks stored in the cache is updated in an
LRU fashion
2.4) Return to step 2)
A. Chunk-LRU Performance under Viewing Abandonment
After having described our chunk-LRU algorithm, we now
turn to the analysis of its performance. To this purpose, in
this section we will assume that the viewing abandonment
model holds. Moreover, in order to come up with our analytical
results we make the common simplifying assumption that all
videos have the same size S=Si. This is well justified
by the fact that we can break large videos into equal size
fragments, and perform chunk-LRU over the chunks of the
video fragments.
We first observe that, under the viewing abandonment model
(Section III-A1), the probability that the k-th chunk of video i
is requested by a user knowing that the user herself has already
started watching video iequals Ri(xk−1) = R1
xk−1πm(τ)dτ.
Since the requests for video ifollow by assumption a Poisson
process of intensity (proportional to) pi, then the request
process for the k-th chunk is also Poisson with reduced
intensity piRi(xk−1). Thus, thanks to an adaptation of the
popular Che’s approximation [15] we can already compute
the hit rate for a specific chunk, i.e., the probability that a
chunk is found in the cache when requested.
Let us elaborate on this. Che’s approximation was originally
proposed in [15] to compute the hit rate for files whose request
successions follow independent Poisson processes. It approxi-
mates the characteristic time tC, measuring the time that a file
spends in the cache, as a constant. When shifting the request
granularity from the video to the chunk level, the independence
property of request streams is unavoidably lost. Nevertheless
we can still rely on the intuition that when the cache size
is significantly larger than the video size the characteristic
time of each chunk is approximately equal and constant, hence
Che’s approximation still holds, which has been shown valid
in [1]. Therefore, the hit rate hk,i for the k-th chunk of video
ican be approximated as hk,i = 1 −e−piRi(xk−1)tC, where
the characteristic time tCobeys the following relation [17]:
C
S=
N
X
k=1
∆xk
M
X
i=1
hk,i,(8)
where ∆xk=xk−xk−1. Finally, the expected traffic per video
request BcLRU forwarded to the core network when the chunk-
LRU cache management policy is employed writes
BcLRU(x, ν) = (9)
S
M
X
i=1
pi N
X
k=1
Ri(xk−1)(1−hk,i )∆xk+Z1
ν
Ri(τ)dτ!
where x={x1, . . . , xN−1}.
B. Benefits of Chunk Sub-Splitting
We now focus on the impact of the chunk size on chunk-
LRU performance, measured as the traffic generated at the
core network BcLRU. Intuitively speaking, shrinking the chunk
size should translate into better traffic performance, since this
reduces the traffic surplus generated when users do not watch
a chunk in its entirety. Nevertheless this does not prove the
intuition, since modifying the chunk size also has an impact
on the characteristic time tCin a non-trivial way via Eq. (8).
Before stating the main result of this section, we first
need to introduce some notation. Let tCand tCbe the
characteristic times when only one chunk (i.e., [0; ν]) and
chunks of infinitesimal size dx (say, at the byte level) are
employed, respectively. More formally, tCand tCare the
unique roots of the two following equations:
C
S=ν
M
X
i=1 1−e−pitC
C
S=
M
X
i=1 Zν
01−e−piRi(x)tCdx,
respectively. Moreover, we say that the chunk split x0is a
sub-split with respect to xwhenever ∪i{xi}⊂∪i{x0
i}. We
finally observe that if ν=C
MS then the cache can store all
the first videos up to their portion ν; hence, it is reasonable
to constrain νwithin the interval [C
MS ; 1].
We are now ready to prove that any refinement of the chunk
granularity produces a decrease in the expected traffic load on
the core network.
Theorem 2. Let ν∈[C
MS ; 1] and let xbe a video chunk split.
Assume that
d
dτ
M
X
i=1
piRi(τ)e−piRi(τ)tC<0,∀tC∈[tC;tC], τ ∈[0; 1]
(10)
Then, any video chunk sub-split x0outperforms xin terms of
traffic generated on the core network, i.e., the following holds:
BcLRU(x0, ν)< BcLRU(x, ν).
Numerical experiments suggest that our sufficient condi-
tion (10) is very loose, and it generally holds for realistic
popularity distributions and retention rates. It is not satisfied
only in pathological cases where the distribution is extremely
concentrated around few popular files and the cache size very
small, near to the size of a single file.
C. Optimal Performance of Chunk-LRU
In this section we focus on the computation of the best
performance of chunk-LRU, optimized over the chunk size
and tail drop factor ν. We will utilize it as a benchmark for
the performance evaluation of practical chunk-LRU policies in
realistic scenarios in Section V-D.
7
In order to come up with the best performance achievable
by chunk-LRU we need to find the solution of the following
optimization problem:
BcLRU = min
N,x,ν,tC
BcLRU(x, ν)(11)
s.t.
C
S=PN
k=1 ∆xkPM
i=1 1−e−piRi(xk−1)tC
C
MS ≤ν≤1
0 = x0≤x1≤ · · · ≤ xN−1≤xN=ν.
It follows from Theorem 2 that, if condition (10) holds,
then the bandwidth utilization of any video chunk split xand
ν∈[C
MS ; 1] is lower bounded by the performance BcLRU(ν)
of the infinitesimal split (say, at the byte level). This greatly
simplifies the formulation of (11) in a two-variable constrained
optimization problem (see Eq. 12). Below we formalize this
result.
Corollary 3. Assume that condition (10) holds. For any video
chunk split xand tail drop factor ν, the traffic performance
BcLRU(x, ν)is lower bounded by the performance BcLRU of
the infinitesimal chunking approach:
BcLRU ≤BcLRU(x, ν),
where BcLRU is computed as
BcLRU = min
ν,tC
M
X
i=1 Zν
0
piRi(x)e−piRi(x)tCdx +Z1
ν
piRi(τ)dτ
s.t.C
S=PM
i=1 Rν
01−e−piRi(x)tCdx
C
MS ≤ν≤1.(12)
We stress the fact that BcLRU is the lowest core network
traffic achievable by a chunk-LRU cache management policy.
Thanks to the formulation in (12) we can prove the
following two results via standard Lagrangian optimization
techniques.
Corollary 4. If Riis continuous and Ri(1) = 0 for all i∈ M
then the optimal ν∗<1.
Corollary 5. If Ri(τ) = 1 for all τ∈[0; 1],i∈ M then
standard LRU (only one chunk for each video and ν= 1)
achieves optimal performance.
The former result states that if users never watch videos in
their entirety, then it is always optimal to never cache a non-
negligible portion of file, i.e., ν∗<1. The latter claims that,
as intuition suggests, if all users watch the whole video then
the best chunk-LRU policy is actually the standard LRU.
D. Performance evaluations with real data
In this section we numerically evaluate the traffic perfor-
mance on the core network of the proposed class of chunk-
LRU cache management policies. We compare them with the
optimal performance Bunder full information that we derived
in Section IV. We also take the performance of standard LRU
as a second term of comparison. As in Section IV, we consider
the audience retention rate scenario shown in Tab. I, estimated
from a real Youtube dataset, with the only difference that the
file size is supposed to be uniform. We show our results9
in Fig. 9. We first notice that, as hinted by Theorem 2, the
traffic generated by chunk-LRU decreases as the number N
of chunks increases (N= 4,20). The infinitesimal chunk size
approach (N=∞) is shown to achieve optimal performance
BcLRU, as claimed in Corollary 3. Notably, the chunk-LRU
performs close to its optimal performance even with a limited
number of chunks (N= 20 or also N= 4). Moreover, a
suboptimal value of the tail drop factor ν= 1 still performs
close to optimal for Nsufficiently high (see Sect. V-E for
further details). On the other hand, as expected, standard LRU
performs poorly. In fact, the traffic generated by retrieving
parts of file that are not requested by the users outweighs
the obtained benefits through cache hits even for medium-size
caches. This explains why the traffic generated by LRU can
be even higher than the one without any cache deployed.
The best tail drop factor ν∗=ν∗(N)used to produce Fig. 9
is optimized for each value of Nand cache size C, as shown
in Fig. 10. We notice that ν∗is closely related to average
watch-time, since it captures the portion of files with the lowest
popularity which need to be systematically discarded from the
cache. For small cache sizes, simulations show that νis lower
than the watch-time: in fact, to compensate for the reduced
cache size, low values of νallow to squeeze in the cache a
significant amount of different - and popular - headers of files.
E. Tuning the chunk-LRU parameters
Although the optimization of chunk-LRU parameters is
beyond the scope of this paper, next we provide guidelines
on how reasonable values could be selected.
a) Choosing the number of chunks: Increasing the num-
ber of chunks translates into an increase of the frequency
at which the cache content and the associated recency list
is updated, as well as an increase of the recency table size.
Therefore, the design of the optimal number Nof chunks in
real systems should capture the trade-off between the actual
performance of the policy (for which high values of Nare
preferable, see Cor. 3) and the required processing/memory
resources, increasing with N. Our numerical results in Fig. 9
suggest that even a small number of chunks (around 4),
that would result to a low complexity policy, can achieve
reasonably good traffic performance.
b) Choosing the tail drop factor ν:The exact optimal
value ν∗(N)can be computed by solving the problem in (12)
only if all the system parameters, i.e., the file popularity pi
and the retention rates Ri, are known to the cache controller.
For comparison purposes10, we then show in Fig. 9 the perfor-
mance achieved in the extreme case where the cache manager
is agnostic to piand Riand the tail drop parameter νis blindly
set to 1, i.e., no chunks are ever discarded. Remarkably, if the
number of chunks is sufficiently high (N= 20 in this case),
9The traffic performance is normalized w.r.t. the traffic Bnc generated when
no cache is present, as in Section IV.
The chunk-LRU policies have chunks with equal size.
10If the full information assumption holds then using chunk-LRU would
be highly suboptimal, since the theoretically optimal solution computed in
Section IV can be actually implemented.
8
Figure 9: Normalized core network traffic generated by chunk-LRU vs. the theoretical optimum Band vs.
the standard LRU. The optimal ν∗=ν∗(N)is computed for each value of Nand cache size C, as
depicted in Fig. 10. We also evaluate the performance achieved when the sub-optimal value of ν= 1 is
utilized. The video popularity distribution follows a Zipf law with parameter 0.8 [17]
Figure 10: Optimal tail drop factor ν∗for different number of chunks
N= 4,20,∞. We notice that the optimal ν∗(N)is within a
neighborhood of the average watch-time of 0.61.
the loss in performance incurred by such sub-optimal choice
is limited: the fine granularity of chunk splitting compensates
for the loss incurred by setting ν= 1.
Remark 2. We claim that a reasonable choice of ν(<1) can
be still made in realistic scenarios, based on an estimation of
the parameters pi, Ri. First of all, indeed, the optimal ν∗is
not strictly a function of the popularity of each video, but only
of the rank-dependent popularity piof the i-th most popular
video, for each i. It has been shown [17] that such rank-
dependent relation depends on the class of traffic and is slowly
varying over time, hence it is easily predictable. Secondly, we
argue the video retention rate functions Rivary on a much
slower time scale than that of video popularity, which greatly
facilitates its estimation.
VI. CONCLUSIONS
In this paper we investigated the potential of partial caching
towards minimizing core network traffic. Our numerical results
based on real YouTube access data reveal that big caches
benefit the most from such strategies, namely up to 50%
over the classic approach of storing the most popular files.
Interestingly, partial caching is beneficial even when the actual
popularity of videos is not known. In this case, practical
chunk-based LRU strategies which never cache the tail of
videos were shown to perform well as long as a sufficient
number of chunks is used.
The introduction of audience retention rate in caching
decisions opens up interesting research directions. Retention
rate is generally available in online video distribution systems
and does not evolve over time. Thus, it can be used to
decompose the problems of file popularity estimation and
optimal chunking without loss of optimality. In this context,
the generalization of existing caching mechanisms so as to op-
timally exploit the benefits of partial caching is an interesting
topic for future study.
VII. APPENDIX
A. Waterfilling Algorithm
Algorithm to compute η∗
Step 1 (Initialization): Let k= 0,C(0) := C,M(0) := M,Mµ
a:= ∅,
Mµ
b:= ∅. Define ˜
R0
i:R→Ras a strictly decreasing extension of piR0
i
over the whole real axis, i.e., ˜
R00
i(τ) = piR0
i(τ)for all τ∈[0; 1] and ˜
R00
i
is strictly decreasing over R.
Step 2 Compute µ(k)via the equation Pi∈M(k)Si[˜
R00
i]−1(µ(k)) = C(k).
Compute the sets Mµ(k)
a={m: [ ˜
R00
i]−1(µ(k))<0},Mµ(k)
b={m:
[˜
R00
i]−1(µ(k))>1},Mµ(k)={m: 0 ≤[˜
R00
i]−1(µ(k))≤1}.
Compute δ(µ(k)) = Pi∈Mµ(k)
b
Si+Pi∈Mµ(k)Si[R0
i]−1(µ(k))−C(k).
Step 3 If δ(µ(k)) = 0 or Mµ(k)=∅then set µ=µ(k),Mµ
a=Mµ
a∪
Mµ(k)
a,Mµ
b=Mµ
b∪ Mµ(k)
b,Mµ=Mµ(k), and go to step 6.
Else, if δ(µ(k))>0then go to step 4.
Else, if δ(µ(k))<0then go to step 5.
Step 4 Set η∗
i= 0 for all i∈ Mµ(k)
a. Set C(k+1) := C(k). Compute
M(k+1) := M(k)\ Mµ(k)
a,Mµ
a:= Mµ
a∪ Mµ(k)
a,k:= k+ 1. Go to
step 2.
Step 5 Set η∗
i=Sifor all i∈ Mµ(k)
b. Compute C(k+1) =C(k)−
Pi∈Mµ(k)
b
Si,M(k+1) := M(k)\ Mµ(k)
b,Mµ
b:= Mµ
b∪ Mµ(k)
b,
k:= k+ 1. Go to step 2.
Step 6 Set η∗
i= 0 for all i∈ Mµ
a;η∗
i= 1 for all i∈ Mµ
b;η∗
i=
[˜
R00
i]−1(µ(k))for all i∈ Mµ. Stop.
B. Proof of Theorem 1
Proof. As a first step, let us define fi(τ) : [0; 1] →[0; 1] as a
one-to-one function such that the permuted audience retention
rate function R0
i(τ) := Ri(f−1
i(τ)) is non decreasing. The
function fiis a permutation function that orders the video parts
in order of decreasing popularity, such that fi(τ)< fi(τ0)
if and only if Ri(τ)> Ri(τ0)11. Then, R0
iis the outcome
11We notice that such fialways exists, even though is not unique, since it
can arbitrarily break the ties among equally popular parts of a single video,
and it is in general discontinuous.
9
of such permutation. As a second step, we reformulate the
optimization problem in (3) as
Y∗= argmax
YX
i∈M
SiZYi
piRi(τ)dτ (13)
s.t.Pi∈M SiRYi1dτ =C
Yi⊆[0; 1]
We can recast the bandwidth saving optimization problem in
(13) in terms of the permuted engagement rates R0
iand by
considering only right intervals of 0 of the kind Yi= [0; ηi],
as follows:
max
η∈RMX
i∈M
piSiZηi
0
R0
i(τ)dτ (14)
s.t.Pi∈M ηiSi=C
ηi∈[0; 1].
In fact, it is not profitable to consider a larger search domain,
e.g., more complicated subsets Yof [0; 1]M: for any collection
of subsets Yit is possible to replace Yiwith the interval
[0; RYidτ]with a strict increase of the objective function while
the feasibility is still preserved. We can further simplify (14)
by defining the function R00
i(τ) = piR0
i(τ), as follows:
min
η∈RMX
i∈M Zηi
0
−R00
i(τ)dτ (15)
s.t.Pi∈M ηi=C
ηiSi∈[0; Si].
We notice that d
dηiRηi
0−R00
i(τ)dτ =−piR0
i(ηi), which is
non-decreasing in ηi. Thus we recognize in (15) a convex
optimization problem with linear and box constraints, where
the objective function is separable in the optimization variables
η. It is known that such kind of problems can be solved via
a classic water-filling technique (see [19], Chapter 6): more
specifically, there exists a positive “water level” µsuch that
the optimal portions η∗(µ)can be computed as
ηi∗(µ) =
1 if minτ∈[0;1] R00
i(τ)≥µ
0 if maxτ∈[0;1] R00
i(τ)≤µ
R00−1
i(µ) else
Pi∈M Siη∗
i(µ) = C
(16)
By rewriting (16) in terms of R0
iwe obtain the expressions:
η∗
i=
1 if piminτ∈[0;1] R0
i(τ)≥µ
0 if pimaxτ∈[0;1] R0
i(τ)≤µ
R0−1
i(µ/pi) else
Pi∈M Si|Y∗
i|=C.
and we can finally claim that
Y∗
i=f−1
i([0; η∗
i]) = {τ:piRi(τ)≥µ} ∀ i∈ M.
The thesis follows.
C. Proof of Proposition 1
Proof. Since Riis already strictly decreasing, then we can
consider fi(τ) = τand R0
i=Ri. Moreover, in this case
minτRi(τ) = 0 and maxτRi(τ) = 1. The thesis easily
follows.
D. Proof of Corollary 2
Proof. Define
˜
R−1
i(τ) = −1
λi
ln τ(1 −e−λi) + e−λi.
We notice that ˜
R−1
i(µ/pi) = R−1
i(µ/pi)when 0< µ ≤pi
and ˜
R−1
i(µ/pi)<0whenever pi> µ. Then, we can rewrite
(5) as
(η∗
i=h˜
R−1
i(µ/pi)i+
Pi∈M Siη∗
i=C.
The thesis easily follows.
E. Proof of Theorem 2
Proof. Let us first introduce the function
ξ(tC)(τ) =
M
X
i=1
piRi(τ)e−piRi(τ)tC.
We then define I(f)|x, where fis a continuous function
defined over R, the integral approximation of fvia Riemann
sums of the type:
I(f)|x=
N
X
k=1
f(xk−1)∆xk.
We notice that if fis increasing (decreasing) then I(f)|x<
(>)I(f)|x0for any sub-splitting x0. We can now rewrite
BcLRU(x, ν)as (compare with (9))
BcLRU(x, ν) = I(ξ(tC))|x
s.t. M ν −C
S=I(h(tC))|x
where h(tC)(τ) = PM
i=1 e−piRi(τ)tC. Since h(tC)(τ)is in-
creasing in τ, it easily follows from an induction argument
that the value of characteristic time for any chunk splitting is
found within [tC;tC].
Consider now a sub-splitting x0with associated characteristic
time t0
C. Since h(tC)(τ)is increasing, then I(h(tC))|x0>
I(h(tC))|x. Also, since I(h(t0
C))|x0=I(h(tC))|x, and h(t)(τ)
is decreasing in tthen t0
C> tC. We then have
BcLRU(x, ν) = I(ξ(tC))|x>I(ξ(t0
C))|x>I(ξ(t0
C))|x0
=BcLRU(x0, ν)
where the second inequality follows from the fact that ξ(t)(τ)
is decreasing in τfor any value tof the characteristic time.
The thesis is proven.
F. Proof of Corollary 4
Proof. The derivative with respect to νof the objective
function in (12) in the direction along which the constraint
is satisfied writes
q(ν) = −
M
X
i=1
(1 −e−piRi(ν)tC)piRi(ν)+ (17)
Zν
0
M
X
i=1
p2
iR2
i(τ)e−piRi(τ)tCdτ PM
i=1 1−e−piRi(ν)tC
Rν
0PM
i=1 piRi(τ)e−piRi(τ)tCdτ
10
Let us calculate q(1 −dν), which equals
dν A+B dν
C+D dν
M
X
i=1
pi|R0
i(1)| − dν
M
X
i=1
p2
i|R0
i(1)|2!.
Since A=Rν
0PM
i=1 p2
iR2
i(τ)e−piRi(τ)tCdτ > 0and B=
Rν
0PM
i=1 piRi(τ)e−piRi(τ)tCdτ > 0, then q(1 −dν)>0and
thesis is proven.
G. Proof of Corollary 5
Proof. We first observe that, if Ri(τ)=1, then for all νwe
have BcLRU([0; ν], ν) = BcLRU(x, ν)for any chunk splitting
x. Then it suffices to prove that q(ν)<0holds for all ν∈
(0; 1), i.e., that the following expression holds:
M
X
i=1
1−e−pitC!M
X
i=1
p2
ie−pitC+
−
M
X
i=1
(1 −e−pitC)pi
M
X
i=1
pie−pitC<0
REFERENCES
[1] J. Roberts and N. Sbihi, “Exploring the memory-bandwidth tradeoff in
an information-centric network,” in Proc. of ITC, 2013, pp. 1–9.
[2] “Cisco visual networking index: Forecast and methodology, 20142019,”
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/
ip-ngn- ip-next-generation-network/white paper c11-481360.html.
[3] K. W. Hwang, D. Applegate, A. Archer, V. Gopalakrishnan, S. Lee,
V. Misra, K. K. Ramakrishnan, and D. F. Swayne, “Leveraging video
viewing patterns for optimal content placement,” in Proceedings of IFIP
Conference on Networking, ser. IFIP’12, 2012, pp. 44–58.
[4] http://support.google.com/youtube/answer/1715160?hl=en-GB.
[5] M. Zeni, D. Miorandi, and F. De Pellegrini, “YOUStatAnalyzer: a tool
for analysing the dynamics of YouTube content popularity,” in Proc of
VALUETOOLS 13. ICST, 2013, pp. 286–289.
[6] S. Sen, J. Rexford, and D. Towsley, “Proxy prefix caching for multimedia
streams,” in Proc. of IEEE INFOCOM ’99, vol. 3, Mar 1999, pp. 1310–
1319 vol.3.
[7] K.-L. Wu, P. Yu, and J. Wolf, “Segmentation of multimedia streams for
proxy caching,” IEEE Transactions on Multimedia, vol. 6, no. 5, pp.
770–780, Oct 2004.
[8] L. Wang, S. Bayhan, and J. Kangasharju, “Optimal chunking and partial
caching in information-centric networks,” Computer Communications,
vol. 61, pp. 48–57, 2015.
[9] J. Yu, C. T. Chou, Z. Yang, X. Du, and T. Wang, “A dynamic caching
algorithm based on internal popularity distribution of streaming media,”
Multimedia Systems, vol. 12, no. 2, pp. 135–149, 2006.
[10] K. Agrawal, T. Venkatesh, and D. Medhi, “A dynamic popularity-based
partial caching scheme for video on demand service in IPTV networks,”
in Proc. of COMSNETS ’ 14, Jan 2014, pp. 1–8.
[11] S.-H. Lim, Y.-B. Ko, G.-H. Jung, J. Kim, and M.-W. Jang, “Inter-chunk
popularity-based edge-first caching in content-centric networking,” IEEE
Communications Letters, vol. 18, no. 8, pp. 1331–1334, Aug 2014.
[12] S. Chen, H. Wang, X. Zhang, B. Shen, and S. Wee, “Segment-based
proxy caching for Internet streaming media delivery,” IEEE Multimedia,
vol. 12, no. 3, pp. 59–67, 2005.
[13] M. Hefeeda and O. Saleh, “Traffic modeling and proportional partial
caching for peer-to-peer systems,” IEEE/ACM Transactions on Network-
ing,, vol. 16, no. 6, pp. 1447–1460, Dec 2008.
[14] U. Devi, R. Polavarapu, M. Chetlur, and S. Kalyanaraman, “On the
partial caching of streaming video,” in IEEE IWQoS, 2012, June 2012,
pp. 1–9.
[15] H. Che, Y. Tung, and Z. Wang, “Hierarchical web caching systems:
Modeling, design and experimental results,” IEEE Journal on Selected
Areas in Communications,, vol. 20, no. 7, pp. 1305–1314, 2002.
[16] W. S. Cleveland, “Robust locally weighted regression and smoothing
scatterplots,” Journal of the American statistical association, vol. 74,
no. 368, pp. 829–836, 1979.
[17] C. Fricker, P. Robert, and J. Roberts, “A versatile and accurate ap-
proximation for lru cache performance,” in 24th International Teletraffic
Congress (ITC 24), Sept 2012, pp. 1–8.
[18] http://wistia.com/doc/audience-engagement- graph.
[19] S. M. Stefanov, Separable programming: theory and methods. Springer
Science & Business Media, 2013, vol. 53.
11