Content uploaded by Aresh Dadlani
Author content
All content in this area was uploaded by Aresh Dadlani on Sep 11, 2019
Content may be subject to copyright.
1
Multi-Armed Bandit Learning for Cache Content
Placement in Vehicular Social Networks
Saeid Akhavan Bitaghsir, Aresh Dadlani, Muhammad Borhani, and Ahmad Khonsari
Abstract—In this letter, the efficient dissemination of content
in a socially-aware cache-enabled hybrid network using multi-
armed bandit learning theory is analyzed. Specifically, an overlay
cellular network over a vehicular social network is considered,
where commuters request for multimedia content from either the
stationary road-side units (RSUs), the base station, or the single
mobile cache unit (MCU), if accessible. Firstly, we propose an
algorithm to optimally distribute popular contents among the
locally deployed RSU caches. To further maximize the cache hits
experienced by vehicles, we then present an algorithm to find the
best traversal path for the MCU based on commuters’ social degree
distribution. For performance evaluation, the asymptotic regret
upper bounds of the two algorithms are also derived. Simulation
results reveal that the proposed algorithms outperform existing
content placement methods in terms of overall network throughput.
Index Terms—Vehicular social networks, multi-armed bandit,
cache content placement, mobile cache unit, cache hit rate.
I. INTRODUCTION
SOCIALLY-AWARE networking arising from the amalga-
mation of mobile social networks and vehicular ad hoc
networks is seen as a coherent paradigm to offload the ever-
growing influx of cellular traffic contributed by multimedia con-
tent sharing [1] and real-time information diffusion [2]. Widely
referred to as vehicular social networks (VSNs), such composite
networks not only represent the enabling technology for road
safety and infotainment applications in the next-generation
transportation systems, but also incorporate the impact of human
factors such as movement patterns, preferences, and multi-
faceted social ties on the intermittent vehicular connectivity [3].
Despite the overwhelming advantages offered by VSNs, one
of the many research challenges concerns the optimal distribu-
tion of content among the road-side units (RSUs) equipped with
finite-sized caches so as to maximize the average cache hit rate,
i.e. the fraction of total requests served via the local caches per
unit time, experienced by commuters in a dynamic environment
[2]. Addressing the pertinent cache content placement (CCP)
problem is of great significance as: (i) the number of RSUs
is limited by the high deployment cost incurred and (ii) the
frequency and type of content requested by users are random.
However, there only exist a handful of reported works on the
interplay between efficient content caching and VSN perfor-
mance. In [4], a content delivery framework based on D2D com-
munication between parked and moving vehicles is investigated
to primarily reduce the RSU load. While the work leverages
the storage capacity of parked vehicles to access contents, it
does not consider the centrality metrics of the socially-tied
commuters. A cross-entropy-based caching scheme is analyzed
in [5] to cache the contents at the edge of VSNs based on
Stationary RSU
Base Station
Mobile Cache Unit
Cellular Network
Vehicular Social Network
Social User Ties
V2V Communication
Fig. 1: System model with social and vehicular interactions.
vehicular requests, traffic density, and cooperation among fixed
RSUs. In [6], a distributed cooperative CCP scheme is presented
where RSUs deployed in a given locality update their caches via
periodic information exchange. Using named data networking,
a distributed probabilistic caching strategy is proposed in [7],
where vehicles consider user demands based on interest entries,
vehicle’s degree and betweenness centrality scores, and rela-
tive movement of the content provider/receiver. Nonetheless,
a mobile version of a typical RSU fed with information on
social network centrality and capable of satisfying local content
requests has not been employed in any of the prevenient efforts.
The main contribution of this letter is to model the CCP
problem by taking into account the average degree centrality
of commuters in VSNs involving a mobile cache unit (MCU).
Given a roadway scenario, we first adopt two multi-armed bandit
(MAB) approaches, namely Thompson sampling [8] and combi-
natorial bandit (ComBand) [9], to deal with the uncertainty and
partial feedback on rewards inherent in optimizing the network
throughput under different mobility models. Next, we propose
a centrality-driven algorithm to find the best traversal path for
the MCU that maximizes the cache hit ratio of the commuters.
Considering the regret upper bounds derived for the presented
algorithms, simulation results show the better performance of
our scheme as compared to existing methods.
II. SY ST EM MO DE L DESCRIPTION
Consider a cellular network (CN) over a VSN in an urban
area as shown in Fig. 1. The VSN itself is created by vehicles
communicating with each other and with the underlying CN
that includes one base station (BS) and a set of static RSUs
denoted by R={Ri}, where i∈ {1,2, . . . , m}. The commuters
are the interacting users that constitute the social network.
We denote the library of content files available at the BS by
F={Fj}, where j∈ {1,2, . . . , n}. Without loss of generality,
we assume all files to be divided into small chunks of equal
size. Also, let Cibe the cache capacity of RSU Ri∈ R and
Xi(t)={Fj:ÍFj∈ F |Fj| ≤ Ci}be the subset of files stored in Ri
2
at time t. We use X(t)={Xi(t)},i∈ {1,2, . . . , m}to denote
the system allocation state where all Xi(t)’s are valid cache
allocations at time t. Vehicles request for the library files from
the cache servers located at the RSUs only when not initially
satisfied by neighbouring vehicles via vehicle-to-vehicle (V2V)
communication. Each vehicle attempts to request the desired
file from the local RSUs within its communication range. Upon
failure, the file is then retrieved from the BS. The probability of
successful download from a local RSU depends on the strategy
undertaken to allocate the files in the RSUs. In this regard, we
define the CCP problem as determining the best allocation X∗(t)
for the local RSU caches that maximizes the hit ratio for nearby
vehicles. Formally, we aim to attain:
X∗(t) ∈ arg max
X(t)
QX(t),(1)
where QX(t)denotes the hit ratio of allocation X(t). To
express QX(t)mathematically, we first need to define the
content demand model in a geographical region. The content
demand is distributed non-uniformly over the region; files
bearing information on advertisements or recent discounts are
more popular in the neighbourhood of shopping malls rather
than any isolated location. To capture this intrinsic nature of
demands, we partition the area into r=|S| sectors S={Sk},
k∈ {1,2, . . . , r}, using hotspot clustering [6] and each sector Sk
encompasses some set of RSUs, denoted by Uk. To facilitate
content distribution in the network, we further employ a socially-
aware MCU that contains all the files and is capable of satisfying
all requests within a sector by travelling between sectors.
III. PROP OS ED MAB-BASED FORMULATION
In this section, we first formulate the CCP problem using
MAB-based Thompson sampling and ComBand approaches.
Then, we analyze a heuristic to determine the optimal traversal
path for the MCU that maximizes the local cache hits.
A. Thompson Sampling CCP Model
In this model, we consider each memory cell of an RSU to
be a player and each file in Fto be an arm to pull. Thus, each
memory cell of an RSU pulls an arm (chooses a file to cache)
and consequently, receives a reward. Let qk,j(t)denote the hit
probability of file Fjin sector Skat time t, i.e. the probability
that file Fjis available in the cache of one of the RSUs in sector
Sk. Therefore, the likelihood function is given by the following
Bernoulli distribution:
Pr(w,l)|qk,i(t)=w+l
wqk,i(t)w×1−qk,i(t)l,(2)
where (w,l) ∈ {(0,1),(1,0)} refers to the win (hit) or lose (miss)
event. Moreover, let Hk,j(t)be the number of hits for file Fjin
sector Skup to time t. Also, let Mk,j(t)denote the number of
misses for Fjin Skuntil time t. The prior function indicating
the hit distribution is given by the beta distribution below:
Prqk,j(t)=
qk,j(t)Hk,j(t)×1−qk,j(t)Mk,j(t)
β1+Hk,j(t),1+Mk,j(t),(3)
where β(·) is the normalization beta function. To solve (1), we
now present the posterior function of our model as follows:
Prqk,j(t)|(w,l)=
qk,j(t)w+Hk,j(t)×1−qk,j(t)l+Mk,j(t)
β1+w+Hk,j(t),1+l+Mk,j(t).(4)
Considering the upper bound obtained for Thompson sam-
pling in [10] along with (3) and (4), the expected utility (regret)
over the entire time period Tcan be expressed as:
ER(T)≤ (1+)
| F |
Õ
j=2
ln(T)
dqk,j(T),qk,1(T)∆j+O|F |
2,(5)
where d(qk,j,qk,1),qk,jlog qk,j
qk,1+(1−qk,j)log 1−qk,j
1−qk,1,
∈ (0,1),∆j,q∗
k−qk,j, and q∗
k=maxjqk,j. Though this CCP
model exhibits reasonable complexity, it adapts very slowly
with change in content file popularity and the rewards associated
with caching each content.
B. ComBand CCP Model
To adapt faster with the dynamicity of VSNs, we instead
assign the RSUs as players (rather than the memory cells) and
the files in Fas the arms to be pulled. Each RSU Riis rewarded
based on the Xisubset of files it selects at each time step. The
best subset of contents for each Ri, given by X∗
i, is selected from
the library based on Algorithm 1, where the cumulative reward
calculated for each RSU is GCB(Ri)=ÍT
t=1Í|S |
k=1Íj∈Xiqk,j(t).
We assess the performance of this algorithm by computing the
regret for the best fixed set of actions (weak regret [11]) defined
by Gmax −kas follows:
Gmax −k(Ri)=max
Xi∈(|F|
Ci)
T
Õ
t=1
|S |
Õ
k=1Õ
j∈Xi
qk,j(t).(6)
Here, we consider every possible combination of contents
for RSU Rito be weighted as {wXi(t):Xi∈|F |
Ci}. At each
iteration, the algorithm updates wXi(t)and computes the matrix
P| F |× | F | (t),ÍXi∈(|F|
Ci)pXi(t)1Xi·1|
Xi, where 1Xiis an |F |
dimensional vector whose j-th element is 1if Fj∈ Xiand 0,
if otherwise. At t=1, we set wXi(1)=1for all Xi∈| F |
Ciand
compute pXi(t)at each time step as:
Algorithm 1 Content Placement in Local RSUs
Input: i∈ {1,2, . . ., m},k∈ {1,2, . . . , r}, and γ∈ (0,1].
Output: X∗
i(t)for t∈ {1,2, . . . , T}.
Initialization :∀Xi∈| F |
Ci,wXi(1)=1
1: for t=1to Tdo
2: for each Xi∈| F |
Cido
3: Calculate pXi(t)using (7).
4: end for
5: Select X∗
ibased on the distribution of pXi(t).
6: Calculate reward for Riusing (6).
7: P(t) ← ÍXi∈(|F|
Ci)pXi(t)1Xi·1|
Xi.
8: L(t) ← Ci−Íj∈ Xiqk,j(t)P+(t) · 1Xi.
9: for each Xi∈| F |
Cido
10: Update weight wXi(t)using (8).
11: end for
12: end for
3
Algorithm 2 Optimal Path for MCU Traversal
Input: k∈ {1,2, . . . , r},wk(t), and ε.
Output: Sequence of sectors yielding maximum hit rate.
Initialization :∀Sk∈ S,wk(0)=1
1: for t=1to Tdo
2: Travel to sector Skwith probability calculated in (11).
3: Observe all hits and update weight using (12).
4: end for
pXi(t)=(1−γ)wXi(t)
ÍYi∈(|F |
Ci)wYi(t)+γ1
| F |
Ci,(7)
where γ∈ (0,1]. We then select Xi(t)based on the distribution
pX0(t). By setting L(t)=[Ci−Íj∈Xiqk,j(t)]P+(t) · 1Xi, where
L(t)is the pseudo-loss, P+(t)is the pseudo-inverse [9] of P(t),
and Lj(t)is the j-th element of L(t), we finally update the
weights for all Xi∈|F |
Ciin the next step as shown below:
wXi(t+1)=wXi(t)exp −γ(|F | − Ci)Íj∈XiLj(t)
Ci|F | (|F | − 1).(8)
Deducting the expected cumulative reward for Rifrom Gma x−k
in (6) yields the following asymptotic regret upper bound:
Gmax −k(Ri) − E[GCB (Ri)] ≤ 2(| F | − 1)
|F | − CiCisT|F | ln | F |
Ci
≤4qC3
iT|F | ln | F | .(9)
Time and space complexities of O(Ci|F |3)and O(|F |3)can
be achieved respectively, by updating the weight function as
wXi(t+1)=wXi(t)exp −γ(| F | −Ci)Li(t)
Ci| F |( | F |−1).
C. Optimal Traversal Path for MCU
We now present a MAB-based approach to find the optimal
traversal path for the MCU that maximizes the cache hit ratio
experienced by social commuters. Our approach is based on
the Hedge algorithm [12] in an adversarial environment where
wk(t)is used to quantify the benefit of the MCU to visit sector
Skat time t. Initially at t=0, the weights for all sectors are
equal to 1. We compute the rewards using the average cache
miss parameter for Skat time tgiven by:
ACMk(t)=1
|Uk|Õ
i∈Uk
CMi(t),(10)
where CMi(t)is the cache miss of Riat time t. As shown in
Algorithm 2, at each time interval, the MCU chooses to visit
sector Skwith probability:
Pr(k,t)=
wk(t−1)
Í|S |
k0=1wk0(t−1)
.(11)
After observing all hits at the end of each time step, wk(t)is
updated for all Si∈ S using the following function:
wk(t)=wk(t−1)z−1
k(t−1)exp (εACMk(t)),(12)
where zk(t)is the average degree centrality of social commuters
in Skat time tand εis the learning rate. Note that Algorithm 2
plays the arm with the best past rewards if ε→ ∞, while it
Fig. 2: New York, Manhattan roadway map for simulation.
selects each arm with equal probability if ε→0. The upper
bound on the mean regret for Algorithm 2 is obtained to be:
E[R(T)] ≤ 2εH∗+ln |S|
ε,(13)
where H∗is the maximum number of cache hits. Considering
ε=pln |S |/T≤1, (13) can further be simplified to:
E[R(T)] ≤ 3pTln |S| .(14)
IV. SIMULATION RESULTS AN D DISCUSSIONS
We evaluate our strategy in the 3×4kilometer roadway
map of New York City, Manhattan in Fig. 2, where RSUs are
deployed randomly along the streets. The vehicle fleet size is
randomly chosen from [40,160]with random destinations in the
SUMO simulator [13], |F | =100 data files, and |R | is taken
between [3,20]units. Vehicle-to-infrastructure (V2I) commu-
nication is enabled using the standard IEEE 802.11p protocol,
while V2V links use dedicated short-range communications.
Moreover, the RSU and V2V communication ranges lie between
[200,500]meters and [100,200]meters, respectively.
Fig. 3 compares the performance of our approach in terms of
cache hit ratio. Defined as the fraction of requests successfully
satisfied by local RSUs over the total number of requests made,
this metric is inversely impacted by the average cache miss
in each sector as shown in Fig. 3a. Factors such as added
noise in the environment and reduction in RSU cache capacity
profoundly contribute to the mean cache miss ratio. As a result,
more number of vehicles are mandated to download their re-
quested contents from the sole BS in the system. The figure also
shows the higher hit rate achieved by the proposed ComBand-
based CCP approach with respect to other existing methods [6],
which in turn improves the overall network throughput. Fig. 3b
depicts the efficiency of the MCU traversal path determined
by Algorithm 2 against the random and greedy path selection
baseline models. In the random model, the next sector to be
visited is determined randomly, whereas greedy model selects
the next sector Skbased on the highest cache miss ratio of
the RSUs in Uk. Although increasing the number of RSUs
in the sector of interest would improve the hit ratio in all
three models, the higher efficiency of the joint ComBand-Hedge
approach is mainly due to the learning involved in finding the
best traversal path based on the average user connectivity in
each sector. Adaptation of the Thompson sampling CCP model
with changes in content popularity and rewards associated with
caching contents is compared with the ComBand counterpart
4
(a) In terms of mean cache miss. (b) In terms of number of RSUs. (c) In terms of adaptation rate. (d) In terms of mobility model.
Fig. 3: Performance comparison of the MAB-based approaches benchmarked in terms of the cache hit ratio with 95% confidence interval.
Fig. 4: Download time comparison of different CCP models.
in Fig. 3c. At simulation time cycles 200 and 700, we inten-
tionally change the content popularity throughout the network.
Evidently, such abrupt changes resulted in a steep fall of the
hit ratio in both MAB-based CCP approaches. The cache hit
ratio, however, increases as both methods learn the new content
popularity from RSU feedback. This increase is observed to be
relatively more in the ComBand approach which justifies its
adaptability to dynamic VSN parameters. The performance of
the ComBand CCP method with existing methods under four
vehicular mobility models, namely the Rice University model
(RUM), stop sign model (SSM), probabilistic traffic sign model
(PTSM), and traffic light model (TLM) [14] is compared in
Fig. 3d. The figure clearly shows that the ComBand approach
not only guarantees higher achievable cache hit ratios that reach
a maximum of nearly 68% for PTSM and a minimum of 36.4%
under TLM in comparison to other CCP methods, but also
adapts well to any vehicular movement pattern.
Fig. 4 illustrates the download time of different CCP models
for different number of RSUs. Each file chunk, out of 40 files,
requested by vehicles is assumed to be 5MB in size. The V2I file
transfer speed is 27 Mbps and each vehicle communicates with
the BS via a 6Mbps link using IEEE 802.11p protocol. Here, a
chunk file is first requested from RSU. If the attempt to retrieve
the file from the RSU fails, the vehicle then requests the file
directly from BS with lower transfer speed. As ComBand offers
higher cache hit ratio, the download time of a file is lesser than
the other methods. The file download time difference between
ComBand and the next best method, i.e. MobiCacher, further
increases with the number of RSUs deployed.
V. CONCLUSION
In this letter, we have studied the optimal distribution of
contents in VSNs equipped with multiple cache-enabled fixed
RSUs and a mobile cache unit (MCU). First, two strategies
based on multi-armed bandit learning were introduced to opti-
mize cache content placement in locally deployed RSUs. Next,
by partitioning the geographical area of interest into sectors, an
efficient centrality-aware algorithm was proposed to determine
the best traversal path for the MCU that maximizes the overall
cache hit ratio experienced by the socially-connected com-
muters. Upper bounds for the regret functions corresponding
to the proposed algorithms were also derived for performance
analysis. Simulation results have shown that the learning nature
of the proposed algorithms enable them to be adapted easier in
dynamic VSN settings and under different vehicular mobility
patterns as compared to already existing methods.
Convergence analysis of the regret function in the presence
of multiple MCUs and under the joint impact of other social
attributes characterized by centrality metrics is an interesting
follow-up of this work.
REFERENCES
[1] N. Golrezaei, A. F. Molisch, A. G. Dimakis, and G. Caire, “Fem-
tocaching and device-to-device collaboration: A new architecture for
wireless video distribution,” IEEE Commun. Mag., vol. 51, no. 4, pp.
142–149, Apr. 2013.
[2] A. M. Vegni and V. Loscr, “A survey on vehicular social networks,”
IEEE Commun. Surveys Tuts., vol. 17, no. 4, pp. 2397–2419, Oct. 2015.
[3] Z. Ning, F. Xia, N. Ullah, X. Kong, and X. Hu, “Vehicular social
networks: Enabling smart mobility,” IEEE Commun. Mag., vol. 55, no. 5,
pp. 16–55, May 2017.
[4] Z. Su, Y. Hui, and S. Guo, “D2D-based content delivery with parked
vehicles in vehicular social networks,” IEEE Wireless Commun., vol. 23,
no. 4, pp. 90–95, Aug. 2016.
[5] Z. Su, Y. Hui, Q. Xu, T. Yang, J. Liu, and Y. Jia, “An edge caching
scheme to distribute content in vehicular networks,” IEEE Trans. Veh.
Technol., vol. 67, no. 6, pp. 5346–5356, Jun. 2018.
[6] S. A. Bitaghsir and A. Khonsari, “Cooperative caching for content
dissemination in vehicular networks,” Int. J. Commun. Syst., vol. 31,
no. 8, p. e3534, Feb. 2018.
[7] G. Deng, L. Wang, F. Li, and R. Li, “Distributed probabilistic caching
strategy in VANETs through named data networking,” in 2016 IEEE
Conference on Computer Communications Workshops (INFOCOM WK-
SHPS), Apr. 2016, pp. 314–319.
[8] M. J. Kim, “Thompson sampling for stochastic control: The finite
parameter case,” IEEE Trans. Autom. Control, vol. 62, no. 12, pp. 6415–
6422, Dec. 2017.
[9] N. Cesa-Bianchi and G. Lugosi, “Combinatorial bandits,” J. Comput.
Syst. Sci., vol. 78, no. 5, pp. 1404–1422, Sep. 2012.
[10] S. Agrawal and N. Goyal, “Near-optimal regret bounds for Thompson
sampling,” Journal of the ACM, vol. 64, no. 5, pp. 1–24, Sep. 2017.
5
[11] L. Xu, C. Jiang, Y. Qian, Y. Zhao, J. Li, and Y. Ren, “Dynamic privacy
pricing: A multi-armed bandit approach with time-variant rewards,”
IEEE Trans. Inf. Forensics Secur., vol. 12, no. 2, pp. 271–285, Feb.
2017.
[12] W. Krichene, M. Suarez Castillo, and A. Bayen, “On social optimal
routing under selfish learning,” IEEE Trans. Control Netw. Syst., vol. 5,
no. 1, pp. 479–488, Mar. 2018.
[13] “SUMO (Simulation of Urban MObility),” (Accessed September 11,
2019). [Online]. Available: http://sumo.sourceforge.net/
[14] X. Kong, F. Xia, Z. Ning, A. Rahim, Y. Cai, Z. Gao, and J. Ma, “Mobility
dataset generation for vehicular social networks based on floating car
data,” IEEE Trans. Veh. Technol., vol. 67, no. 5, pp. 3874–3886, May
2018.