ArticlePDF Available

Context-Aware Privacy Preservation in Network Caching: An Information Theoretic Approach

Authors:

Abstract and Figures

Caching has been recognized as a viable solution to surmount the limited capability of backhaul links in handling abundant network traffic. Although optimal approaches for minimizing the average delivery load do exist, current caching strategies fail to avert intelligent adversaries from obtaining invaluable contextual information by inspecting the wireless communication links and thus, violating users' privacy. Grounded in information theory, in this paper, we propose a mathematical model for preserving privacy in a network caching system involving a server and a cache-aided end user. We then present an efficient content caching method that maximizes the degree of privacy preservation while maintaining the average delivery load at a given level. Given the Pareto optimal nature of the proposed-constraint optimization approach, we also obtain the maximum privacy degree achievable under any given average delivery load. Numerical results and comparisons validate the correctness of the our context-oriented privacy model.
Content may be subject to copyright.
1
Context-Aware Privacy Preservation in Network
Caching: An Information Theoretic Approach
Seyedeh B. Hassanpour, Abolfazl Diyanat, Ahmad Khonsari, Seyed P. Shariatpanahi, and Aresh Dadlani
Abstract—Caching has been recognized as a viable solution to
surmount the limited capability of backhaul links in handling abun-
dant network traffic. Although optimal approaches for minimizing
the average delivery load do exist, current caching strategies fail to
avert intelligent adversaries from obtaining invaluable contextual
information by inspecting the wireless communication links and
thus, violating users’ privacy. Grounded in information theory, in
this paper, we propose a mathematical model for preservingprivacy
in a network caching system involving a server and a cache-aided
end user. We then present an efficient content caching method that
maximizes the degree of privacy preservation while maintaining
the average delivery load at a given level. Given the Pareto optimal
nature of the proposed 𝜖-constraint optimization approach, we also
obtain the maximum privacy degree achievable under any given
average delivery load. Numerical results and comparisons validate
the correctness of the our context-oriented privacy model.
Index Terms—Content caching, context-oriented privacy, error
probability bound, Pareto optimal, information theory.
I. INTRODUCTION
BACKHAUL congestion due to the ever-rising data traffic
growth is arguably the most prominent limiting factor
in 5G network performance. Network caching, where popular
contents are cached closer to the edge devices, has recently
been deemed as a feasible remedy to improve network capac-
ity and alleviate the traffic load during peak hours [1]–[3].
Despite the benefits from placing requested contents in close
proximity of interested users, proactive caching facilitates
adversary nodes to access the data and gather information on
users’ preferences and location. Preserving privacy in caching
systems is therefore, a great challenge as user data revelation
is usually against users’ confidentiality agreements [4].
In general, privacy in networks can be preserved based
on two distinct aspects. In data-oriented privacy, the data
integrity is maintained while being transmitted over the net-
work such that accessibility to on-the-fly content is prohibited
to malicious nodes [5]. Existing works on data-oriented pri-
vacy mostly emphasize on either proposing new encryption
methods or adding extra layers of encryption in the network
[6]. Contrarily, context-oriented privacy refers to preserving
the privacy of users and their requested files which can be
divulged by eavesdropping certain features of the transmitted
packets such as the time and location of creation, usage
pattern, and other file-related statistics that remain unguarded
in conventional protection mechanisms [7], [8].
With regard to cache content placement (CCP), an optimal
probabilistic caching strategy is investigated in [9] to max-
imize the cache hit probability and cache-aided throughput
in wireless D2D caching networks. In [10], a hybrid caching
scheme jointly optimized with the transmission schemes is
Server User
Adversary
Files Cache
f1
. . .
fM
0N0N
Fig. 1. A server with 𝑀files, each of size 𝑁bits, communicating with an
end user with a cache size of 𝑁bits in presence of an adversary.
proposed to handle the trade-off between the signal coop-
eration gain and the caching diversity gain. While neither
[9] nor [10] consider the presence of external adversaries,
there exist a handful of studies devoted to CCP strategies
that focus on context-oriented privacy. The authors in [11]
presented a protocol enforcing fair subdivision of limited
cache storage and context privacy provision. More recently,
a wireless caching network wherein multiple cooperative ad-
versarial nodes tap contextual information of users is studied
in [12]. The authors aimed to maximize the probability of
delivering all requested content within a given radius. To
add randomness to the eavesdroppers’ estimates, which are
based on the eavesdropped transmitted packets, they applied
probabilistic caching to obtain the optimal probabilities. By
virtue of the non-convex essence of their CCP optimization
problem, the authors adopted genetic algorithm to find the
best caching to mislead the eavesdroppers. The works in [11]
and [12], however, do not address the trade-off between traffic
load minimization and degree of context privacy.
The main contribution of this letter is to introduce an
information theoretical formulation for a new CCP protocol
that characterizes the trade-off between traffic load and context
privacy of the system. In particular, we consider a strong
adversarial node with coverage over the entire area so as
to reduce the extra traffic overhead imposed due to coop-
eration between eavesdroppers. Assuming that the adversary
is equipped with the best estimator while overhearing the
channel, we also derive analytical bounds for estimation error
in terms of the Fano-Kovalevskij inequality [13]. Finally,
using the proposed 𝜖-constraint CCP optimization model, we
minimize the average delivery rate over the channel for any
desired level of privacy through simulation results.
II. SY ST EM MO DE L DESCRIPTION
Consider the content delivery system shown in Fig. 1which
comprises of a server with 𝑀files, denoted by the set F=
{𝑓𝑘|1𝑘𝑀}, each of length 𝑁bits, and a cache-enabled
user with storage capacity of 𝑁bits. In the content placement
phase (CPP), the server preloads a file or a mixture of files into
the user’s cache using a given strategy until it is completely
2
Feasible region
PA
Pe
Pe= 1 PA
Pe= 0
0.5
01
0.5Adversary without knowledge of the delivery load
Adversary with knowldege of the delivery load in optimal strategy
Fig. 2. The adversary best estimation error probability for F={𝐴, 𝐵 }.
filled. As a result, we have the constraint Í𝑀
𝑘=1|𝑓𝑐
𝑘|=𝑁on
the preloaded files, where |𝑓𝑐
𝑘|denotes the size of the cached
portion of file 𝑓𝑘. In the content delivery phase (CDP) that
follows, the user requests for a file from F. The server then
transfers the requested file only if it has not been cached earlier
at the user’s device. We assume that the user requests for file
𝑓𝑘with probability 𝑝𝑘. Since the user requests for at least one
file, we thus have Í𝑀
𝑘=1𝑝𝑘=1.
Moreover, we consider a passive adversary that eavesdrops
the communication between the server and the user. Without
any prior knowledge on the requested file, the adversary
attempts to detect the file from set Fin the CDP. We assume
that the adversary is armed with the best estimator. Let ˆ
𝑘
denote the adversary’s estimation of the index of the requested
file and 𝑃𝑒be the adversary’s estimation error in the sense of
maximum a posterior probability (MAP), defined as:
𝑃𝑒=Pr[ˆ
𝑘𝑘],0𝑃𝑒1.(1)
We reserve the term adversary’s best estimation for the
estimation with minimum 𝑃𝑒among all possible estimations.
III. PROP OS ED SECURE CCP APP ROAC H
In this section, we will describe our approach towards
characterizing the trade-off between privacy and delivery
efficiency. For the sake of illustration, consider a server
that stores two distinct files, namely 𝐴and 𝐵, with request
probabilities 𝑃𝐴and 𝑃𝐵(𝑃𝐴𝑃𝐵), respectively. If the file
popularity is known to the adversary, then he/she can select
the most popular file as his/her estimation without having any
information about the CDP. In this case, 𝑃𝑒=1𝑃𝐴which
corresponds to the dashed line in Fig. 2.
Intuitively, an efficient CCP strategy is to minimize the total
number of transferred bits by considering the popularity of
the files requested by the user. The file popularity distribution
(𝑝𝑘)however, is known to all entities in the system, including
the adversary node. Additionally, the adversary is also aware
that the server in traditional network caching preloads the
most popular file into the user’s cache. In such a setting,
it becomes trivial to guess the index of the file (solid line
in Fig. 2). In what follows, we devise a CCP strategy that
achieves maximum ambiguity (or degree of privacy) for a
given delivery load.
A. Adversary Error Probability Bounds
To obtain the analytical upper and lower bounds on the ad-
versary’s best estimation error, we adopt the Fano-Kovalevskij
inequality for binary variables in our two file scenario, whereas
the results of [14] are used for 𝑀 > 2in the following theorem.
Theorem 1. For any estimator ˆ
𝑘such that 𝑘𝑌ˆ
𝑘is a
Markov chain with 𝑃𝑒=Pr[ˆ
𝑘𝑘], we have:
Ψ𝑃𝑒𝐻(𝑘|𝑌)
2(2)
with Ψ4
=(𝐻1(𝐻(𝑘|𝑌)),if 𝑀=2,
𝐻(𝑘|𝑌)−1
log2(𝑀1),if 𝑀 > 2,(3)
where 𝑀=|F | and 𝑌is the random variable (r.v.) of adver-
sary’s observation corresponding to the number of bits trans-
mitted from the server to the user over the network during the
CDP. The conditional entropy 𝐻(𝑘|𝑌)is the total ambiguity
in 𝑘given observation 𝑌and 𝐻1is the inverse of the binary
entropy function 𝐻(𝜗)=𝜗log2(𝜗)−(1𝜗)log2(1𝜗).
Proof. See Appendix A.
With the bounds derived for the adversary’s best estimation
error in Theorem 1, we now maximize the lower bound on
𝑃𝑒in (3). Thus, exploiting the proposed approach guarantees
that 𝑃𝑒in the sense of MAP under any estimator will always
be greater than a particular threshold 𝜉R++, i.e. 𝑃𝑒=
Pr[ˆ
𝑘𝑘] ≥ 𝜉. As a result, maximizing the error probability
lower bound in (3) is equivalent to maximizing the conditional
entropy 𝐻(𝑘|𝑌). Indeed, this maximum error probability is
what we technically define as the privacy degree.
B. Exemplary Case (𝑀=2)
In this subsection, we consider the scenario depicted in
Fig. 2. We define the indicator r.v. 𝑘which is zero, if the user
requests for file 𝐴with probability 𝑃𝐴and one, if otherwise
with probability 𝑃𝐵=1𝑃𝐴. We also let r.v. 𝑍represent
the number of bits of file 𝐴stored in the cache. For discrete
values of 𝑗, where 0𝑗𝑁, the probability mass function
(pmf) of 𝑍is given as 𝑝𝑧
𝑗,Pr[𝑍=𝑗]. Upon completion of
the CPP, when the user requests for one of the files, the server
transmits 𝑌bits to satisfy the demand (i.e. 𝑌=𝑁𝑍). The term
𝐻(𝑘|𝑌)essentially captures the adversary’s ambiguity about
the identity of the requested file by knowing the size of the
transmitted data and determines its error as stated in Lemma 1.
Lemma 1. The term 𝐻(𝑘|𝑌)is calculated as follows:
𝐻(𝑘|𝑌)=𝐻(𝑘) + 𝐻(𝑍) − 𝐻(𝑌),(4)
where 𝐻(𝑘),𝐻(𝑍), and 𝐻(𝑌)are the entropies of 𝑘,𝑍, and
𝑌, respectively.
Proof. See Appendix B.
By letting 𝒑𝑍=[𝑝𝑧
0, 𝑝𝑧
1, . . . , 𝑝 𝑧
𝑗, . . . , 𝑝 𝑧
𝑁]be the distribution
of 𝑍, we observe that designing the CCP strategy is equivalent
to determining the vector 𝒑𝑍. Since 𝐻(𝑘)is independent of
𝒑𝑍, we therefore, arrive at the following optimization model
3
RN
Z1Z2... ZM
Fig. 3. Depiction of fraction of file 𝑓𝑘∈ F in an 𝑁bit cache using r.v. 𝑍𝑘.
from (4) to achieve maximum ambiguity without imposing any
constraints on the delivery load:
𝒑
𝑍=argmax
𝒑𝑍
𝐻(𝑘|𝑌)=𝐻(𝑍) − 𝐻(𝑌)(5a)
s.t.
𝑁
Õ
𝑗=0
𝑝𝑧
𝑗=1,0𝑝𝑧
𝑗1,0𝑗𝑁. (5b)
Next, we prove in Proposition 1that an optimal solution
of (5) is uniformly distributed at the cost of transmitting 𝑁/2
bits.
Proposition 1. The uniform distribution is one of the optimal
solutions of (5).
Proof. See Appendix C.
To maximize the adversary’s ambiguity subject to a specific
delivery load constraint, we now augment the following con-
straint to the model in (5). That is to say, we maximize the
degree of privacy while keeping the number of bits transmitted
in the CDP below some constant 𝐶value:
𝒑
𝑍=argmax
𝒑𝑍
𝐻(𝑘|𝑌)=𝐻(𝑍) − 𝐻(𝑌)(6a)
s.t.
𝑁
Õ
𝑗=0
𝑝𝑧
𝑗=1,0𝑝𝑧
𝑗1,0𝑗𝑁, (6b)
𝑃𝐴(𝑁𝑍) + 𝑃𝐵𝑍𝐶, (6c)
where 𝑃𝐴(𝑁𝑍) + 𝑃𝐵𝑍is the load delivered in the CDP and
𝐶corresponds to the effective capacity of the communication
link. Note that the augmented model in (6) is efficient in the
Pareto optimality sense. Lemma 2proves an optimal solution
for this model. For 𝐶 < 𝑁/2, we obtain the optimal solution
numerically in Section IV.
Lemma 2. If 𝐶𝑁/2, then the uniform distribution is one
of the optimal solutions for the augmented model in (6).
Proof. See Appendix D.
C. General Case
We now extend our approach to any arbitrary value of 𝑀.
Suppose that the user chooses file 𝑓𝑘with probability 𝑝𝑘. We
define the random process Z={𝑍𝑘}, where 𝑍𝑘denotes the
fraction of file 𝑓𝑘(in bits) cached at the user’s end. Due to
the limited cache capacity, we have Í𝑀
𝑘=1𝑍𝑘=𝑁as depicted
in Fig. 3. Consequently, the pmf of the delivery load 𝑌can be
expressed as follows:
𝑃𝑌=Pr[𝑌=𝑗]=
𝑀
Õ
𝑘=1
𝑝𝑘×Pr[𝑍𝑘=𝑁𝑗].(7)
f1f2f3f4f5f6f7f8f9f10 f11 f12
P5
k=1 pk1
3P8
k=6 pk1
3P12
k=9 pk1
3
Fig. 4. Example of a file set Fwith 𝑀=12 split into three subsets such
that the sum of popularity in each subset equals 1
3.
It should be noted that the r.v.s 𝑍𝑘are not independent and
their distribution is given by the following matrix:
𝑷𝑍=
𝑝𝑧
10 𝑝𝑧
11 . . . 𝑝𝑧
1𝑁
𝑝𝑧
20 𝑝𝑧
21 . . . 𝑝𝑧
2𝑁
.
.
..
.
.....
.
.
𝑝𝑧
𝑀0𝑝𝑧
𝑀1. . . 𝑝𝑧
𝑀 𝑁
,(8)
where element 𝑝𝑧
𝑘 𝑗 ,Pr[𝑍𝑘=𝑗]. As a result, designing the
CCP strategy corresponds to computing matrix 𝑷𝑍. Hence, the
optimal privacy-load trade-off can be formally characterized as
the following optimization model:
𝑷
𝑍=argmax
𝑷𝑍
𝐻(𝑍1, . . . , 𝑍 𝑀) − 𝐻(𝑌)(9a)
s.t.
𝑀
Õ
𝑘=1
𝑝𝑧
𝑘 𝑗 =1,0𝑝𝑧
𝑘 𝑗 1,0𝑗𝑁 , (9b)
𝑀
Õ
𝑘=1
𝑝𝑘(𝑁𝑍𝑘) ≤ 𝐶 , (9c)
𝑀
Õ
𝑘=1
𝑍𝑘=𝑁. (9d)
Note that 𝐻(𝑘)is independent of the optimization variable
and can be removed from the cost function in (9a). Constraint
(9c) ensures that the communication cost imposed on the net-
work remains below a threshold value 𝐶. Thus, by controlling
𝐶in (9c), and then attaining the corresponding privacy degree
by solving the optimization problem, the trade-off between
traffic load and privacy degree can be managed. Evidently, (9)
reduces to (5) when 𝑀=2.
D. Sub-Optimal Heuristic for Large 𝑀and 𝑁
When the number of files (𝑀) or the size of each file (𝑁) is
large, the optimization problem in (9) becomes computation-
ally expensive to solve. For large file sizes, we can split the
files into smaller portions called chunks, instead of splitting
them at the bit level. As for large number of files, in what
follows, we present a heuristic that makes solving the problem
feasible. Although this method yields a sub-optimal solution,
it is a quid pro quo for reduction in computational complexity.
The three steps of this library splitting heuristic are as follows:
Step 1: Split the library Finto 𝑞equi-probable subsets,
such that the aggregated popularity in each subset almost
equals 1/𝑞. Fig. 4illustrates an example for 𝑞=3.
Step 2: Split the cache of each user into 𝑞memory slots
such that each slot contains approximately 𝑁/𝑞bits.
Step 3: Define a sub-problem solution as the optimal
caching of a subset into its corresponding memory
slot. Subsequently, solve the equivalent optimization sub-
problem 𝑞times for each subset.
4
0 10 20 30 40 50
0
0.2
0.4
0.6
0.8
1
j
The probability Z
Simulation results
Theoretical Formula
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
30
35
40
45
50
55
60
65
70
Privacy degree, H(k|Y)
Delivery load
Numerical results
(b)
0.5 0.6 0.7 0.8 0.9
0
0.1
0.2
0.3
0.4
0.5
PA
Adversary error probability Pe
Proposed approach
Adversary without knowing delivery load
(c)
1 2 3 5 6 10 15 30
0
0.2
0.4
0.6
0.8
1
q
ψ
(d)
Fig. 5. (a) CDF of the solution of (5). (b) Pareto-optimal curve for (5)-(6c). (c) Adversary estimation error for 𝑀=2. (d) The difference between cost function
for optimal and sub-optimal problem (𝜓) versus 𝑞.
IV. SIMULATION RESULTS
The proposed CCP model is implemented using Matlab and
a95% confidence level is adopted to demonstrate the accuracy
of the Monte Carlo simulation results. The setup comprises of
a server with library F={𝐴, 𝐵}and a user cache of size 𝑁=7
bits. We generate 20000 samples (0for requesting file 𝐴and
1for requesting file 𝐵) with 𝑃𝐴=0.7and 𝑃𝐵=0.3.
To validate Proposition 1, we use the fmincon function
in Matlab to compute an optimal solution of (5) in Fig. 5(a).
The plot shows the cumulative density function (CDF) of the
resulting distribution which is exactly same as the uniform
distribution.
Fig. 5(b) plots the curve of the delivery load constraint
(𝐶) against the degree of privacy (𝐻(𝑘|𝑌)). As evident in the
figure, 𝐻(𝑘|𝑌)=0when the delivery load constraint is 𝑃𝐵𝑁,
whereas we achieve maximum privacy (𝐻(𝑘|𝑌)=1bit)when
beyond 𝑁/2bits are transferred in the CDP.
We compare the adversary error probability (𝑃𝑒) with
respect to 𝑃𝐴in Fig. 5(c). This figure depicts the significant
increment of lower bound of 𝑃𝑒(as compared to Fig. 2) and
thus, reaching the upper bound which is 1𝑃𝐴. It is worth
noting that when 𝑃𝐴=0.5and 𝑃𝐴=1, the error probability
in our approach converges to that of an adversary with no
knowledge of the delivery load. As this figure clearly shows,
the maximum difference occurs roughly at 𝑃𝐴=0.7which
implies that our proposed approach achieves a relatively lower
degree of privacy as compared to reference points 𝑃𝐴=0.5
and 𝑃𝐴=1.
Finally, Fig. 5(d) plots the difference between the optimal
solution in (9) and sub-optimal solution discussed in Sec-
tion III-D for a library Fwith 𝑀=30 files. We denote this
difference by 𝜓and scale it between optimal (𝜓=0) and non-
optimal (𝜓=1) solution. When 𝑞=1, the optimal solution is
obtained from (9). But for example, if 𝑞=2, we divide the
library into two subsets and solve the optimization problem
for these two subsets independently. This figure shows that
we can reduce the time complexity of our problem by a factor
of 10 at the expense of obtaining a sub-optimal solution with
20% difference from the optimal solution.
V. CONCLUSION AND FUTURE WO RK
In this paper, we have proposed a caching strategy that
maximizes the adversary’s best estimation error while mini-
mizing the average delivery load, which is momentous in terms
of energy consumption and limited bandwidth in wireless
links. In the presented approach, we have formulated an 𝜖-
constraint optimization model to alter the statistical behavior
of the server so as to misguide the adversary. Furthermore, we
have maximized the Fano lower bound for the best adversary
estimation using information theory to reduce the adversary’s
accessibility to useful contextual information. Simulation re-
sults also validate the effectiveness of our approach. This
work can be further extended to investigate the same trade-off
considering multiple users where the adversary must estimate
both, the transmitted file and the requesting user.
APPENDIX
A. Proof of Theorem 1
For proof on the upper bound, we refer the reader to [13].
For the lower bound, we define the error event given estimator
ˆ
𝑘as:
𝐸=1 if ˆ
𝑘𝑘,
0 if ˆ
𝑘=𝑘. (10)
𝐻(𝐸 , 𝑘 |ˆ
𝑘)can be expanded as follows:
𝐻(𝐸 , 𝑘 |ˆ
𝑘)=𝐻(𝑘|ˆ
𝑘) + 𝐻(𝐸|𝑘, ˆ
𝑘)=𝐻(𝐸|ˆ
𝑘) + 𝐻(𝑘|𝐸, ˆ
𝑘).
If the selected file index (𝑘) and the estimated file index ( ˆ
𝑘) are
known to the adversary, then he/she can determine the error
without ambiguity, i.e. 𝐻(𝐸|𝑘, ˆ
𝑘)=0. Thus, we will have:
𝐻(𝑘|ˆ
𝑘)=𝐻(𝐸|ˆ
𝑘) + 𝐻(𝑘|𝐸, ˆ
𝑘)
(𝑎)
𝐻(𝐸) + 𝐻(𝑘|𝐸, ˆ
𝑘)(𝑏)
𝐻(𝑃𝑒) + 𝐻(𝑘|𝐸, ˆ
𝑘)
𝐻(𝑘|𝑌)(𝑐)
𝐻(𝑘|ˆ
𝑘) ≤ 𝐻(𝑃𝑒) + 𝐻(𝑘|𝐸 , ˆ
𝑘).(11)
Conditioning reduces entropy, so we have (a). The identity
(b) stems from the fact that 𝐸is a binary r.v.. For identity (c),
according to the Markov chain property, we have 𝐻(𝑘|𝑌) ≤
𝐻(𝑘|ˆ
𝑘). Therefore, we arrive at the inequality (11). We now
simplify (11) for caching two and more files as below:
1) For 𝑀=2:In this case, if the adversary knows which file
is selected, he/she can determine the error without ambiguity,
i.e. 𝐻(𝑘|𝐸, ˆ
𝑘)=0. Using this fact and (11), we arrive at:
𝐻(𝑘|𝑌) ≤ 𝐻(𝑃𝑒)=𝐻1(𝐻(𝑘|𝑌)) ≤ 𝑃𝑒,(12)
where 𝐻(𝑃𝑒)=𝑃𝑒log2(𝑃𝑒) − (1𝑃𝑒)log2(1𝑃𝑒)and 𝐻1(·)
is the inverse of 𝐻.
5
2) For 𝑀 > 2:We can write (11) as:
𝐻(𝑘|𝑌) ≤ 𝐻(𝑃𝑒) + 𝐻(𝑘|𝐸 , ˆ
𝑘)
(𝑎)
1+𝐻(𝑘|𝐸, ˆ
𝑘)(𝑏)
1+𝑃𝑒log2(𝑀1).(13)
In (13), identity (a) follows from the fact that 𝐻(𝑃𝑒) ≤ 1. The
identity (b) is due to [14, Theorem 2.10.1]. Thus, we conclude
that:
𝐻(𝑘|𝐸, ˆ
𝑘)=Pr{𝐸=0}𝐻(𝑘|𝐸=0,ˆ
𝑘)
| {z }
Equal to zero
+
Pr{𝐸=1}
| {z }
𝑃𝑒
𝐻(𝑘|𝐸=1,ˆ
𝑘) ≤ 𝑃𝑒log2(𝑀1).
Rearranging the terms results in 𝐻(𝑘|𝑌) 1
log2(𝑀1)𝑃𝑒. This com-
pletes the proof.
B. Proof of Lemma 1
𝐻(𝑘|𝑌)can be written as follows [14, §2.2]:
𝐻(𝑘|𝑌)=𝐻(𝑘, 𝑌 ) − 𝐻(𝑌).(14)
To obtain the entropy 𝐻(𝑘|𝑌), we calculate the joint entropy,
𝐻(𝑘 , 𝑌), and 𝐻(𝑌)separately. If file 𝐴is requested and 𝑁𝑗
bits of file 𝐴are stored in the cache, then the server should
transmit the remaining 𝑗bits in the CDP. To compute 𝐻(𝑘 , 𝑌 ),
we need the joint distribution Pr[𝑘=𝑖 , 𝑌 =𝑗]for 𝑖∈ {0,1}and
𝑗∈ {0,1,2, . . . , 𝑁 }, as given below:
Pr[𝑘=𝑖 , 𝑌 =𝑗]=Pr[𝑌=𝑗|𝑘=𝑖] × Pr [𝑘=𝑖],
Pr[𝑌=𝑗|𝑘=0]=Pr[𝑍=𝑁𝑗],
Pr[𝑌=𝑗|𝑘=1]=Pr[𝑍=𝑗].(15)
After some mathematical manipulations, we get:
𝐻(𝑘 , 𝑌)=𝑃𝐴log2𝑃𝐴𝑃𝐵log2𝑃𝐵(16)
=𝑃𝐴
𝑁
Õ
𝑗=0
Pr[𝑍=𝑁𝑗]log2Pr [𝑍=𝑁𝑗]
𝑃𝐵
𝑁
Õ
𝑗=0
Pr[𝑍=𝑗]log2Pr [𝑍=𝑗]=𝐻(𝑘) + 𝐻(𝑍).
Substituting (16) in (14) eventually yields (4), which com-
pletes the proof.
C. Proof of Proposition 1
We first suggest the uniform distribution as a possible
solution and then prove that this solution maximizes the cost
function in (5a) and satisfies (5b). For observation 𝑌, we have:
Pr[𝑌=𝑗]=𝑃𝐴Pr[𝑍=𝑁𝑗] + 𝑃𝐵Pr[𝑍=𝑗].(17)
Since conditioning always reduces entropy [14], we have:
𝐻(𝑘|𝑌) ≤ 𝐻(𝑘)using Lemma 1
=============𝐻(𝑍) 𝐻(𝑌).(18)
Now, we prove that for the uniform distribution, 𝐻(𝑍) −
𝐻(𝑌)becomes zero and 𝐻(𝑘|𝑌)attains its maximum value.
Furthermore, 𝑍achieves its maximum entropy (𝐻(𝑍)=
log2(𝑁+1)). Using the definition of entropy and (17), we
obtain the following:
𝐻(𝑌)=
𝑁
Õ
𝑗=0𝑃𝐴+𝑃𝐵
𝑁+1log2𝑃𝐴+𝑃𝐵
𝑁+1=log2(𝑁+1).(19)
According to (19), we have:
𝐻(𝑍) − 𝐻(𝑌)=log2(𝑁+1) − log2(𝑁+1)=0.(20)
The uniform distribution is thus, one of the optimal solutions
of (5) and not the unique solution. This completes the proof.
D. Proof of Lemma 2
Suppose the r.v. 𝑍follows a uniform distribution. Hence,
E[𝑍]=
𝑁
Õ
𝑘=0
𝑘 𝑝𝑘=
𝑁
Õ
𝑘=0
𝑘
𝑁+1=𝑁
2.(21)
According to (21), we have:
E[𝑍]=𝑁
2𝑃𝐴𝑁𝐶
𝑃𝐴𝑃𝐵
=𝐶𝑁
2.(22)
We already know that the uniform distribution is a feasible
(and not unique) solution for (5), i.e., the model without any
delivery load constraint. This completes the proof.
REFERENCES
[1] L. Li, G. Zhao, and R. S. Blum, “A survey of caching techniques in
cellular networks: Research issues and challenges in content placement
and delivery strategies,IEEE Commun. Surv. Tutor., vol. 20, no. 3, pp.
1710–1732, 2018.
[2] S. Wang, T. Wang, and X. Cao, “In-network caching: An efficient content
distribution strategy for mobile networks,IEEE Wireless Commun.,
vol. 26, no. 5, pp. 84–90, 2019.
[3] S. B. Hassanpour, A. Khonsari, S. P. Shariatpanahi, and A. Dadlani,
“Hybrid coded caching in cellular networks with D2D-enabled mobile
users,” in Proc. IEEE International Symposium on Personal, Indoor and
Mobile Radio Communications (PIMRC), 2019, pp. 1–6.
[4] J. Zhang, B. Chen, Y. Zhao, X. Cheng, and F. Hu, “Data security
and privacy-preserving in edge computing paradigm: Survey and open
issues,” IEEE Access, vol. 6, pp. 18209–18 237, 2018.
[5] A. A. Zewail and A. Yener, “Device-to-device secure coded caching,
IEEE Trans. Inf. Forensics Security, vol. 15, pp. 1513–1524, 2020.
[6] M. Mukherjee, R. Matam, L. Shu, L. Maglaras, M. A. Ferrag, N. Choud-
hury, and V. Kumar, “Security and privacy in fog computing: Chal-
lenges,” IEEE Access, vol. 5, pp. 19293–19 304, 2017.
[7] A. Diyanat, A. Khonsari, and S. P. Shariatpanahi, “A dummy-based
approach for preserving source rate privacy,IEEE Trans. Inf. Forensics
Security, vol. 11, no. 6, pp. 1321–1332, 2016.
[8] Y. Wang, Z. Tian, S. Su, Y. Sun, and C. Zhu, “Preserving location privacy
in mobile edge computing,” in Proc. IEEE International Conference on
Communications (ICC), 2019, pp. 1–6.
[9] Z. Chen, N. Pappas, and M. Kountouris, “Probabilistic caching in
wireless d2d networks: Cache hit optimal versus throughput optimal,
IEEE Commun. Lett., vol. 21, no. 3, pp. 584–587, 2017.
[10] G. Zheng, H. A. Suraweera, and I. Krikidis, “Optimization of hybrid
cache placement for collaborative relaying,IEEE Commun. Lett.,
vol. 21, no. 2, pp. 442–445, 2017.
[11] D. Andreoletti, C. Rottondi, S. Giordano, G. Verticale, and M. Tornatore,
“An open privacy-preserving and scalable protocol for a network-
neutrality compliant caching,” in Proc. IEEE International Conference
on Communications (ICC), 2019, pp. 1–6.
[12] F. Shi, L. Fan, X. Liu, Z. Na, and Y. Liu, “Probabilistic caching
placement in the presence of multiple eavesdroppers,Wireless Com-
munications and Mobile Computing, vol. 2018, 2018.
[13] M. Feder and N. Merhav, “Relations between entropy and error proba-
bility,IEEE Trans. Inf. Theory, vol. 40, no. 1, pp. 259–266, 1994.
[14] T. M. Cover and J. A. Thomas, Elements of Information Theory.
Hoboken, NJ, USA: Wiley, 2012.
... Much of the focus of researchers in recent years has been on studying the location [10], [11], [12] and the pattern privacies [13], which aim to secure the location and the usage pattern of the users, respectively. The proposed solutions to tackle the privacy issues exploit cryptography and anonymity [14], [15], [16], [17], information-theory [18], [19], machine-learning [7], [20], and dummy transmissions [12]. In [14], the authors propose a pseudonyms-based approach to conceal the real identity of the contents belonging to the content providers (CPs) and users' requests from the ISP, which is the cache owner, in a content distribution network (CDN). ...
... The authors in [18] propose a coding scheme that includes SSS and replicated subtasks to provide information-theoretic data privacy in the presence of untrustworthy edge servers. In [19] the authors applied information theory to maximize the lower bound for the best adversary's estimation error using Fano inequality. They mathematically formulate an -constraint optimization model to find the probability of catching each file to maximize the adversary's error. ...
... This assumption corresponds to practical situations where the adversary is an authorized party in the network (honest but curious). Thus, it has full access to the network information broadcast by the server in the initiation phase, e.g., the adversary can be a code running on one of the edge-caches, which are authorized components in the network [19], [26]. ...
Preprint
Full-text available
Edge caching (EC) decreases the average access delay of the end-users through caching popular content at the edge network, however, it increases the leakage probability of valuable information such as users preferences. Most of the existing privacy-preserving approaches focus on adding layers of encryption, which confronts the network with more challenges such as energy and computation limitations. We employ a chunk-based joint probabilistic caching (JPC) approach to mislead an adversary eavesdropping on the communication inside an EC and maximizing the adversary's error in estimating the requested file and requesting cache. In JPC, we optimize the probability of each cache placement to minimize the communication cost while guaranteeing the desired privacy and then, formulate the optimization problem as a linear programming (LP) problem. Since JPC inherits the curse of dimensionality, we also propose scalable JPC (SPC), which reduces the number of feasible cache placements by dividing files into non-overlapping subsets. We also compare the JPC and SPC approaches against an existing probabilistic method, referred to as disjoint probabilistic caching (DPC) and random dummy-based approach (RDA). Results obtained through extensive numerical evaluations confirm the validity of the analytical approach, the superiority of JPC and SPC over DPC and RDA.
... Many context-specific privacy models have been created recently, each with their own benefits and drawbacks. In-depth study of these current models' methodology and performance traits is provided in this part [7][8][9]. ...
Article
Full-text available
A novel context‐based privacy policy deployment model enhanced with bioinspired Q‐learning optimisations is presented. The model addresses the challenge of maintaining privacy while ensuring data integrity and usability in various settings. Leveraging datasets including Adult (Census Income), Yelp, UC Irvine Machine Learning, and Movie Lens, the authors evaluate the model's performance against state‐of‐the‐art techniques, such as GEF AL, Deep Forest, and Robust Continual Learning. The approach employs Firefly Optimiser (FFO) and Ant Lion Optimiser (ALO) algorithms to dynamically adjust privacy parameters and handle large datasets efficiently. Additionally, Q‐learning enables intelligent decision‐making and rapid adaptation to changing data and network conditions and scenarios. Evaluation results demonstrate that the model consistently outperforms reference techniques across multiple metrics, including privacy levels, scalability, fidelity, and sensitivity management. By reducing reputational harm, minimising delays, and enhancing network quality, the model offers robust privacy protection without sacrificing data utility. Overall, a dynamic context‐based privacy policy deployment approach, enhanced with bioinspired Q‐learning optimisations, presents a significant advancement in privacy preservation methods. The combination of ALO, FFO, and Q‐learning techniques offers a practical solution to evolving data privacy challenges and enhances flexibility in various use case scenarios.
... In WSNs, there are two major types of privacypreservation techniques: data-oriented and context-oriented. A data-oriented method targets the privacy of data [5], [6] collected through sensor nodes and queries posted in the network. Context-oriented privacy can be defined as concerns regarding contextual information, such as the location and timing of traffic flows [7]. ...
Article
Full-text available
Wireless Sensor Networks (WSNs) are an essential part of the Internet of Things (IoT). In WSNs, sensors are randomly deployed in harsh environments for monitoring purposes. In such environments, employing only the content protection mechanisms available leaves WSNs vulnerable to unauthorized interception by global and local adversaries. An attacker may exploit contextual data to locate the source or the sink; therefore, context privacy is an exigent part of WSN privacy that cannot be neglected. WSN is used in many applications and can transmit sensitive information. To protect sensitive information, it is necessary to provide protection techniques to prevent the adversary from breaching and exposing the location of the source and sink. Past works focus on protecting the location at the routing level; however, the adversary could bypass that and easily locate the nodes by capturing frames and discovering the source and destination addresses. In this paper, we propose the Counterfeit Clones (CC) scheme to protect nodes’ location privacy at the data link layer by using a lightweight one-way hash function to hide the MAC address. At the routing level, fake sources and sinks are deployed to obfuscate the source and the destination node identity. The performance analysis results confirm that the CC technique has a longer safety time with lower energy consumption in comparison with some of the existing solutions. Compared to contrasting algorithms, CC can increase the safety time and protect the location privacy for source and sink with faster packet transmission to the base station.
... or the cloud due to the exposure of privacy violation. Consequently, it is necessary to provision privacy protection mechanisms [212,213]. For more information about contextaware systems, interested readers may refer to [214]. ...
Article
Full-text available
The advent of new cloud-based applications such as mixed reality, online gaming, autonomous driving, and healthcare has introduced infrastructure management challenges to the underlying service network. Multi-access edge computing (MEC) extends the cloud computing paradigm and leverages servers near end-users at the network edge to provide a cloud-like environment. The optimum placement of services on edge servers plays a crucial role in the performance of such service-based applications. Dynamic service placement problem addresses the adaptive configuration of application services at edge servers to facilitate end-users and those devices that need to offload computation tasks. While reported approaches in the literature shed light on this problem from a particular perspective, a panoramic study of this problem reveals the research gaps in the big picture. This paper introduces the dynamic service placement problem and outline its relations with other problems such as task scheduling, resource management, and caching at the edge. We also present a systematic literature review of existing dynamic service placement methods for MEC environments from networking, middleware, applications, and evaluation perspectives. In the first step, we review different MEC architectures and their enabling technologies from a networking point of view. We also introduce different cache deployment solutions in network architectures and discuss their design considerations. The second step investigates dynamic service placement methods from a middleware viewpoint. We review different service packaging technologies and discuss their trade-offs. We also survey the methods and identify eight research directions that researchers follow. Our study categorises the research objectives into six main classes, proposing a taxonomy of design objectives for the dynamic service placement problem. We also investigate the reported methods and devise a solutions taxonomy comprising six criteria. In the third step, we concentrate on the application layer and introduce the applications that can take advantage of dynamic service placement. The fourth step investigates evaluation environments used to validate the solutions, including simulators and testbeds. We introduce real-world datasets such as edge server locations, mobility traces, and service requests used to evaluate the methods. We compile a list of open issues and challenges categorised by various viewpoints in the last step.
Article
Caching sheds a light on reducing long-distance data transmissions over networks, while raising significant privacy concerns. Moving one step ahead, collaborative edge caching is proposed to facilitate preserving user privacy via reducing the external data exposure. However, it still fails to avert the risk of privacy leakage from nearby edge devices. To tackle this issue, we develop an analytical framework for privacy preserving joint communication and content allocation algorithm for distributed collaborative edge caching systems, in which edge devices collaboratively cache and share the content items based on the dummy-based privacy preservation mechanism. Specifically, we define the system request uncertainty criterion from the perspective of information entropy to measure the privacy preservation performance. Consequently, the closed-form relationship between the system request uncertainty and the resource allocation decisions on both communication resources and content items can be derived. Then, we decompose the NP-hard resource management problem into two parts, and propose 1) An optimal dummy request allocation strategy through investigating special properties of the maximal allocation reward gain; 2) An asymptotically-optimal content item allocation strategy with low-complexity based on extract penalty method (EPM); which are iterated to obtain a viable solution, followed by the proof of convergence and asymptotic monotone property. Finally, the performance improvements are verified by simulations.
Article
Edge caching, which offers application versatility and a range of benefits, has exerted a significant impact on the adoption and development of fifth-generation networks and applications. While many extensive studies on edge caching have been proposed to improve certain system features, a thorough tutorial providing insight into the role of edge caching on mobile data traffic processing is still required. Motivated by these concerns, this study was designed to offer an exhaustive assessment of mobile edge caching. To clarify the role of mobile edge caching techniques, a systematic overview of the state-of-the-art caching models and operations is provided. Subsequently, a complete set of performance indicators is extensively investigated, including hit ratio, storage efficiency, energy efficiency, spectrum efficiency, service availability, and latency, to comprehensively examine each of the caching policy goals. Furthermore, an inquiry into the aforementioned metrics is conducted using popular technological methodologies, such as machine learning, game theory, and optimization techniques. In addition, common use cases and applications for the observation and assessment of caching methods in practice, are described. Finally, the remaining research challenges and future directions of edge caching are discussed.
Article
Full-text available
This paper studies device to device (D2D) coded-caching with information theoretic security guarantees. A broadcast network consisting of a server, which has a library of files, and end users equipped with cache memories, is considered. Information theoretic security guarantees for confidentiality are imposed upon the files. The server populates the end user caches, after which D2D communications enable the delivery of the requested files. Accordingly, we require that a user must not have access to files it did not request, i.e., secure caching. First, a centralized coded caching scheme is provided by jointly optimizing the cache placement and delivery policies. Next, a decentralized coded caching scheme is developed that does not require the knowledge of the number of active users during the caching phase. Both schemes utilize non-perfect secret sharing and one-time pad keying, to guarantee secure caching. Furthermore, the proposed schemes provide secure delivery as a side benefit, i.e., any external entity which overhears the transmitted signals during the delivery phase cannot obtain any information about the database files. The proposed schemes provide the achievable upper bound on the minimum delivery sum rate. Lower bounds on the required transmission sum rate are also derived using cut-set arguments indicating the multiplicative gap between the lower and upper bounds. Numerical results indicate that the gap vanishes with increasing memory size. Overall, the work demonstrates the effectiveness of D2D communications in cache-aided systems even when confidentiality constraints are imposed at the participating nodes and against external eavesdroppers.
Conference Paper
Full-text available
Content caching has emerged as a promising technique to reduce the backhaul multimedia traffic rising due to the proliferation of mobile devices. To address the bottleneck issue arising as a result of sparse wireless resources, the current literature is mainly focused on designing centralized or decentralized coded caching schemes. In this paper, we present a hybrid coded caching approach in a cellular network considering mobile users, where both downlink transmission from the base-station (BS) and device-to-device (D2D) communications are permitted. The proposed method comprises of two phases in content delivery. In the first phase, coded packets are delivered using decentralized coded caching which provides concurrent transmissions through spatial reuse. The BS broadcasts the remaining files using the centralized coded caching paradigm in the second phase in order to compensate for the diminishing returns in D2D communications as time progresses. We analytically derive and analyze the optimal switching point for which the network performance improves in terms of throughput and response time delay under two random user mobility models. Validated by simulation results, our hybrid strategy significantly reduces the finishing time as compared to existing schemes.
Conference Paper
Full-text available
The distribution of video contents generated by Content Providers (CPs) significantly contributes to increase the congestion within the networks of Internet Service Providers (ISPs). To alleviate this problem, CPs can serve a portion of their catalogues to the end users directly from servers (i.e., the caches) located inside the ISP network. Users served from caches perceive an increased QoS (e.g., average retrieval latency is reduced) and, for this reason, caching can be considered a form of traffic prioritization. Hence, since the storage of caches is limited, its subdivision among several CPs may lead to discrimination. A static subdivision that assignes to each CP the same portion of storage is a neutral but ineffective appraoch, because it does not consider the different popularities of the CPs' contents. A more effective strategy consists in dividing the cache among the CPs proportionally to the popularity of their contents. However, CPs consider this information sensitive and are reluctant to disclose it. In this work, we propose a protocol based on Shamir Secret Sharing (SSS) scheme that allows the ISP to calculate the portion of cache storage that a CP is entitled to receive while guaranteeing network neutrality and resource efficiency, but without violating its privacy. The protocol is executed by the ISP, the CPs and a Regulator Authority (RA) that guarantees the actual enforcement of a fair subdivision of the cache storage and the preservation of privacy. We perform extensive simulations and prove that our approach leads to higher hit-rates (i.e., percentage of requests served by the cache) with respect to the static one. The advantages are particularly significant when the cache storage is limited.
Article
Full-text available
The wireless caching has attracted a lot of attention in recent years, since it can reduce the backhaul cost significantly and improve the user-perceived experience. The existing works on the wireless caching and transmission mainly focus on the communication scenarios without eavesdroppers. When the eavesdroppers appear, it is of vital importance to investigate the physical-layer security for the wireless caching aided networks. In this paper, a caching network is studied in the presence of multiple eavesdroppers, which can overhear the secure information transmission. We model the locations of eavesdroppers by a homogeneous Poisson Point Process (PPP), and the eavesdroppers jointly receive and decode contents through the maximum ratio combining (MRC) reception which yields the worst case of wiretap. Moreover, the main performance metric is measured by the average probability of successful transmission, which is the probability of finding and successfully transmitting all the requested files within a radius R . We study the system secure transmission performance by deriving a single integral result, which is significantly affected by the probability of caching each file. Therefore, we extend to build the optimization problem of the probability of caching each file, in order to optimize the system secure transmission performance. This optimization problem is nonconvex, and we turn to use the genetic algorithm (GA) to solve the problem. Finally, simulation and numerical results are provided to validate the proposed studies.
Article
Full-text available
With the explosive growth of IoT (Internet of Things) devices and massive data produced at the edge of the network, the traditional centralized cloud computing model has come to a bottleneck due to the bandwidth limitation and resources constraint. Therefore, edge computing, which enables storing and processing data at the edge of the network, has emerged as a promising technology in recent years. However, the unique features of edge computing, such as content perception, real-time computing, and parallel processing, has also introduced several new challenges in the field of data security and privacy-preserving, which are also the key concerns of the other prevailing computing paradigms, such as cloud computing, mobile cloud computing, and fog computing. Despites its importance, there still lacks a survey on the recent research advance of data security and privacy-preserving in the field of edge computing. In this paper, we present a comprehensive analysis of the data security and privacy threats, protection technologies, and countermeasures inherent in edge computing. Specifically, we first make an overview of edge computing, including forming factors, definition, architecture, and several essential applications. Next, a detailed analysis of data security and privacy requirements, challenges, and mechanisms in edge computing are presented. Then, the cryptography-based technologies for solving data security and privacy issues are summarized. The state-of-the-art data security and privacy solutions in edge-related paradigms are also surveyed. Finally, we propose several open research directions of data security in the field of edge computing.
Article
Full-text available
Fog computing paradigm extends the storage, networking, and computing facilities of the cloud computing towards the edge of the networks while offloading the cloud data centers and reducing service latency to the end users. However, the characteristics of fog computing arise new security and privacy challenges. The existing security and privacy measurements for cloud computing can not be directly applied to the fog computing due to its features such as mobility, heterogeneity, large-scale geo-distribution. This article provides an overview of existing security and privacy concerns, particularly for the fog computing. Afterward, this survey highlights ongoing research effort, open challenges, and research trends in privacy and security issues for fog computing.
Article
Full-text available
Departing from the conventional cache hit optimization in cache-enabled wireless networks, we consider an alternative optimization approach for the probabilistic caching placement in stochastic wireless D2D caching networks taking into account the reliability of D2D transmissions. Using tools from stochastic geometry, we provide a closed-form approximation of cache-aided throughput, which measures the density of successfully served requests by local device caches, and we obtain the optimal caching probabilities with numerical optimization. Compared to the cache-hit-optimal case, the optimal caching probabilities obtained by cache-aided throughput optimization show notable gain in terms of the density of successfully served user requests, particularly in dense user environments.
Article
The sharp increase in wireless devices yields a huge amount of mobile data traffic, which has made either the radio access network or the core network of current mobile communication systems seriously overloaded. In-network caching arises as a promising solution to this burning issue. By introducing content centric networking infrastructure, popular content files can be intelligently stored in the radio access network so that redundant transmissions through the core network can be significantly reduced, which can substantially alleviate the load of both the core network and the backhauls between the radio access network and the core network. In this article, we discuss what to cache, how to cache and how to evaluate the performance of a cache-enabled mobile network. We first discuss the instructive caching policies, then propose reasonable performance evaluation metrics for these caching policies. We present detailed numerical results demonstrating remarkable gains by the in-network caching technique. Finally, we discuss related research directions, opportunities and challenges.
Article
Mobile data traffic is currently growing exponentially and these rapid increases have caused the backhaul data rate requirements to become the major bottleneck to reducing costs and raising revenue for operators. To address this problem, caching techniques have attracted significant attention since they can effectively reduce the backhaul traffic by eliminating duplicate data transmission that carries popular content. In addition, other system performance metrics can also be improved through caching techniques, e.g., spectrum efficiency, energy efficiency, and transmission delay. In this paper, we provide a systematical survey of the state-of-the-art caching techniques that were recently developed in cellular networks, including macro-cellular networks, heterogeneous networks (HetNets), device-to-device (D2D) networks, cloud-radio access networks (C-RANs), and fog-radio access networks (F-RANs). In particular, we give a tutorial on the fundamental caching techniques and introduce caching algorithms from three aspects, i.e., content placement, content delivery, and joint placement and delivery. We provide comprehensive comparisons among different algorithms in terms of different performance metrics, including throughput, backhaul cost, power consumption, and network delay. Finally, we summarize the main research achievements in different networks, and highlight main challenges and potential research directions.