ArticlePDF Available

Minimizing data redundancy for high reliable cloud storage systems

Authors:

Abstract and Figures

Cloud storage system provides reliable service to users by widely deploying redundancy schemes in its system – which brings high reliability to the data storage, but inversely introduces significant overhead to the system, consisting of storage cost and energy consumption. The core behind this issue is how to leverage the relationship between data redundancy and data reliability. To optimize both concurrently is apparently difficult. As such, to fix one as a constraint and then to reach another one becomes the consensus. We aim in the paper to pursue a storage allocation scheme that minimizes the data redundancy while achieving a given (high) data reliability. For this purpose, we have provided a novel model based on generating function. With this model, we have proposed a practical and efficient storage allocation scheme, which is proved to be able to minimize the data redundancy. We analytically demonstrate that the suggested solution brings several advantages, in particular the reduction of the search space and the acceleration to the computation. We also assess the improvement on the savings of data redundancy experimentally by adopting availability traces collected from real world – which encouragingly shows that the reduction of data redundancy by our solution can reach up to more than 30% as compared to the heuristic method recently proposed in the research community.
Content may be subject to copyright.
Minimizing data redundancy for high reliable cloud storage
systems
Zhen Huang
a,1
, Jinbang Chen
b,
,1
, Yisong Lin
c
, Pengfei You
a
, Yuxing Peng
a
a
National University of Defense Technology, China
b
East China Normal University, China
c
Institute of GLD, China
article info
Article history:
Received 8 October 2014
Received in revised form 30 January 2015
Accepted 17 February 2015
Available online 24 February 2015
Keywords:
Cloud storage system
Redundancy
Reliability
Generating function
abstract
Cloud storage system provides reliable service to users by widely deploying redundancy
schemes in its system – which brings high reliability to the data storage, but inversely
introduces significant overhead to the system, consisting of storage cost and energy con-
sumption. The core behind this issue is how to leverage the relationship between data
redundancy and data reliability. To optimize both concurrently is apparently difficult. As
such, to fix one as a constraint and then to reach another one becomes the consensus.
We aim in the paper to pursue a storage allocation scheme that minimizes the data redun-
dancy while achieving a given (high) data reliability. For this purpose, we have provided a
novel model based on generating function. With this model, we have proposed a practical
and efficient storage allocation scheme, which is proved to be able to minimize the data
redundancy. We analytically demonstrate that the suggested solution brings several
advantages, in particular the reduction of the search space and the acceleration to the
computation. We also assess the improvement on the savings of data redundancy
experimentally by adopting availability traces collected from real world – which encourag-
ingly shows that the reduction of data redundancy by our solution can reach up to more than
30% as compared to the heuristic method recently proposed in the research community.
Ó2015 Elsevier B.V. All rights reserved.
1. Introduction
Cloud computing, with its promise to provide reliable
service to users in an efficient and cheap manner, has
attracted significant interests from both industry and aca-
demia [1]. Different from the supercomputing systems,
inexpensive commodities are commonly used in cloud sys-
tems due to the consideration of scalability [2]. The reliabil-
ity issue in such systems is thus particularly important. To
ensure the data reliability, redundancy scheme is a basic
solution and has been extensively deployed [3]. With this
scheme, the intuitive idea is to store copies of data objects
over a set of network nodes for the successful recovery in
case of loss or failure. As such, one of the key issues is
how to allocate redundant data over a given set of network
nodes, which is conventionally referred to as the storage
allocation, aiming at achieving maximum reliability by
using minimum redundancy.
In general, two important factors need to be taken into
account when discussing the storage allocation problem:
data redundancy (‘‘RY’’ for short) and data reliability
http://dx.doi.org/10.1016/j.comnet.2015.02.013
1389-1286/Ó2015 Elsevier B.V. All rights reserved.
Corresponding author at: Room 604, Information Building, 500
Dongchuan Road, 200241 Shanghai, China. Tel.: +86 (0)21 54 34 51 88.
E-mail addresses: maths_www@163.com (Z. Huang), jbchen@cs.ecnu.
edu.cn (J. Chen).
1
This study was carried out with National Laboratory of Parallel and
Distributed Processing – Department of Computer – National University of
Defense Technology – China, and Shanghai Key Laboratory of Multidimen-
sional Information Processing – Department of Computer Science and
Technology – East China Normal University – China.
Computer Networks 81 (2015) 164–177
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
(‘‘DR’’ for short). Data redundancy RY is the answer to how
much redundant data should be deployed in the system.
Data reliability DR tells the probability that data is safely
stored in the system for a certain duration. On one hand,
increasing the data redundancy can improve the data relia-
bility and at the same time disperse the I/O overhead in a
distributed way. More importantly, it on the other hand
increases the storage overhead and the energy consump-
tion – remind that the power becomes an important con-
cern today when designing the system. According to the
reports [4,5] from IDC (International Data Corporation),
currently there exists an evident gap between the demand
on data creation and the storage capacity of hardware
devices. And the gap will grow rapidly in the next coming
decade. Furthermore, the storage systems consume around
37–40% of the energy of all IT components [6]. Therefore,
minimizing data redundancy can significantly reduce the
per-byte cost of a storage system. In this sense, minimizing
RY is crucial to a cloud storage system.
It is clear to see that low data redundancy and high data
reliability are in some sense contradictory, meaning that to
find the optimal solution with minimum data redundancy
and maximum data reliability is extremely hard. Thus,
there exists a tradeoff between them. The usual way to
cope with is to fix one and then try to find the optimal
for the other one. For this purpose, to figure out their rela-
tions is necessary. However, it is non-trivial to formulate
the relations for all these factors and further to derive
the related properties. Still, storage allocation is an open
and challenging issue in the research community.
2
In fact,
optimizing data redundancy makes more sense in real cloud
systems as: (i) besides redundancy scheme, such systems
usually adopt many other mechanisms to guarantee the data
reliability; (ii) minimizing data redundancy brings sig-
nificant decrease on the cost of the distributed systems as
presented before. For those considerations, we aim in this
paper to pursue a storage allocation scheme that minimize
data redundancy while achieving a given (high) data relia-
bility – this objective is in line with what Pamies-Juarez
et al. work for in [7,8]. In their method, Monte Carlo
approximation is used to measure the data reliability, and
a heuristic optimization algorithm is then followed to find
the best assignment. In heterogeneous settings, their results
have shown that data redundancy can be reduced up to 70%
as compared to the traditional equal – allocation schemes.
Although appealing, their method is descriptive (using a
Monte Carlo approximation) and cannot be used for generic
cases. Our contributions are as follows: we have reported in
this paper new interesting results in the form of a new and
novel model built based on a variant of generating function
[9] and suggested for storage allocation in cloud storage sys-
tems. The advantage of the new model is the achievement of
minimization of redundancy for a given (high) reliability.
We have demonstrated that, our suggested method outper-
forms the existing solutions, in particular the heuristic
method recently proposed in the research community. The
solution we have proposed is practical – with this model,
we are able to quantitatively analyze the properties of opti-
mal storage allocation – which allows us to reduce search
space and accelerate the computation. In the end, we not
only analytically evaluate the reduction of the search space,
but also experimentally assess the improvement on the sav-
ings of data redundancy using particular sets of data collect-
ed from real world.
The remaining of this paper is organized as follows:
Section 2illustrates the redundancy scheme and reviews
the previous work on storage allocation. An efficient solu-
tion we propose for the storage allocation are then given in
Section 3. In Section 4, we evaluate our method through
analysis and experiments by adopting availability traces
collected from real world. We finally conclude the paper
with a summary and future work in Section 5.
2. Background and related work
2.1. Redundancy scheme
In cloud storage system, using redundancy scheme to
achieve data reliability is straightforward. To fight against
failures, it distributes copies of data objects to a set of stor-
age nodes. Redundancy schemes have been widely studied
in the research community [10–12]. Generally speaking,
redundancy scheme can be classified into two types: repli-
cation and erasure code. Replication scheme is simple and
intuitional, which replicates each data block into ncopies
and then distributes them to different network nodes. By
contrast, erasure code (e.g., Reed–Solomon [13]) encodes
kdata blocks into ðnkÞcoded blocks, resulting in nblocks
in total. These nblocks are then distributed into different
nodes [14,15].
Conventionally, the data redundancy of erasure code is
defined as RY ¼n=k. It is clear to see that when k¼1, era-
sure code turns into replication. Thus, replication can be
regarded as the special case of erasure code. When k>1,
erasure code consumes less storage space as compared to
replication [16], meaning that erasure code can improve
the reliability with the same consumption of storage. Due
to its flexibility, erasure code becomes a promising solu-
tion for the next-generation cloud storage systems and
have been explosively discussed more recently [17–19].
Some practical cloud storage systems have started to part-
ly deploy erasure codes, such as GFS [2] and HDFS [20].
Some other cloud backup systems have fully deployed era-
sure cods, including Wuala [21] and CleverSafe [22].
Remind that we mainly focus on erasure code in this paper
when discussing the cloud storage allocation. However,
our method can be easily extended to the case of replica-
tion scheme.
2.2. Storage allocation problem
Storage allocation problems arise from the emergence
of coding schemes that store more than one redundant
block into the same storage node, such as the
Regenerating Code [23] and ER-Hierarchical Code [24],
where the repair cost for lost data can be reduced. In this
case, the computation for data reliability can be very
2
see http://storagewiki.ece.utexas.edu/doku.php?id=wiki:open_
problems.
Z. Huang et al. / Computer Networks 81 (2015) 164–177 165
complex and hard to analyze. Storage allocation problem
aims to obtain the optimal allocation scheme for a set of
nodes, with storage and reliability constraints. The mean-
ing of optimal scheme here can be achieving maximum
reliability or minimum redundancy.
Many works [7,8,25–27] have been devoted to the
cloud storage allocation issue. Leong et al. [25–27] have
pointed out many problems on storage allocation and
have addressed part of them in theory. They mainly focus
on finding maximum data reliability with a fixed redun-
dancy. However, minimizing data redundancy appears
to be more meaningful in cloud storage systems as we
have presented in Section 1. Pamies-Juarez et al. [7,8]
have paid much attention on finding the minimum data
redundancy with a given data reliability. Their objective
is similar to us but using a heuristic method, and the
complexity of their model embarrasses the analysis of
the allocation properties. In addition, different from
many allocation schemes which usually fix kfirst and
then tune nfor the purpose of optimization, their method
is designed to find the maximum kwith fixed n–which
therefore has its limitation. We incorporate both cases in
our suggested scheme, and further prove that our scheme
is optimal.
Note that, the reliability of the nodes highly affects the
data reliability of the whole system. Although some sys-
tems are supposed to be homogeneous – all nodes have
the same or similar features, the real systems in most cases
are heterogeneous and difficult to characterize. Therefore,
the different combination of nodes even though the num-
ber of nodes are the same, may result in distinct data relia-
bility for the system. Moreover, with the advance of the
network coding, more than one redundant block can be
stored in one node – which brings high variety of nodes’
combination and thus increase the complexity of the ana-
lysis. Based on all these aspects, characterizing and analyz-
ing the relations related to data reliability becomes a big
challenge. In this paper, we firstly propose a novel model
based on generating function – which allows us to effi-
ciently analyze the storage properties, and then we pro-
vides the way to simplify the problem by proofs of
several theorems.
2.3. The generation function for storage allocation
The core of the storage allocation issue lies in leveraging
the relationship between the minimization of data redun-
dancy and the maximization of data reliability, In general,
there are two kinds of methods: based on the target of
optimizing data redundancy and optimizing data reliabil-
ity. For either objective, the common difficulty is how to
represent the data reliability and furthermore how to ana-
lyze it [7,8,27]. We use in this paper the variant of generat-
ing function for the purpose of storage allocation, and it
features the following properties: (i) Simple to represent
as the generating function can be obtained by the multipli-
cation of polynomials and the expansion is sufficient for
calculation. (ii) Easy to derive the related properties by fac-
torization and analysis thanks to its simple form of repre-
sentation. (iii) It brings convenience if the coefficient of
each term in its expansion form is what we need. (iv) As
the number of polynomials increases, the computation
for their multiplication becomes more complex. For the
last feature, we provide the method in Section 4.1 to sim-
plify it by Theorems 4 and 5. We here take an example of
‘‘calibration weights’’ to explain the concept of generating
function and its applications.
The weight problem: There are three kinds of weight –
1 g, 2 g and 5 g, and the number of each kind is infinite. The
question is, how many combinations do we have for the
number of each kind of weight, so that the total weight
is exactly 10 g.
To solve this problem, we have the following two differ-
ent methods:
Method 1: enumeration by class. We first find all the pos-
sible combinations by class. For example, group of
weights with class of 1 g only, group of weights with
class of 2 g only, group of weights with class of 5 g only,
group of weights with classes of 1 g and 2 g, group of
weights with classes of 1 g and 5 g, group of weights
with classes of 2 g and 5 g, group of weights with class-
es of 1 g and 2 g and 5 g – there are seven combinations
in total. And then, for each combination, we analyze all
the possibilities. For example, if using weights of 1 g
and 2 g, we obtain four solutions: (a) 10 ¼1þ1þ1þ
1þ1þ1þ1þ1þ2, (b) 10 ¼1þ1þ1þ1þ1þ1þ
2þ2, (c) 10 ¼1þ1þ1þ1þ2þ2þ2, (d) 10 ¼1þ
1þ2þ2þ2þ2. By repeating the above step, we
finally get the total number of solutions, that is 1þ1þ
1þ4þ1þ0þ2¼10.
Method 2: the generating function. For the weight of 1 g,
the weights it can represent are 1 g, 2 g, ...,10g.Sowe
introduce the factor ð1þx
1
þx
2
þ...þx
10
Þ; similarly,
we introduce the factor ð1þx
2
þx
4
þ...þx
10
Þfor the
weight of 2 g, and the factor ð1þx
5
þx
10
Þfor the weight
of 5 g. By multiplying all the factors, we obtain the gen-
erating function fðxÞ:
fðxÞ¼ð1þx
1
þx
2
þ...þx
10
Þð1þx
2
þx
4
þ...
þx
10
Þð1þx
5
þx
10
Þ
¼1þxþ2x
2
þ2x
3
þ3x
4
þ4x
5
þ5x
6
þ6x
7
þ7x
8
þ8x
9
þ10x
10
þ10x
11
þ11x
12
þ11x
13
þ12x
14
þ12x
15
þ12x
16
þ11x
17
þ11x
18
þ10x19 þ10x
20
þ8x
21
þ7x
22
þ6x
23
þ5x
24
þ4x
25
þ3x
26
þ2x
27
þ2x28 þx
29
þx
30
ð1Þ
From the expansion of the generating function fðxÞ
(given in Eq. (1)), we know that the coefficient of the power
(with the exponent of 10) is the corresponding number of
solutions we are looking for, i.e. there are 10 possibilities in
which the total weight is exactly 10 g. In fact, the generat-
ing function makes sense as: when multiplying the factors,
x
i
in ð1þx
1
þx
2
þ...þx
10
Þ;x
j
in ð1þx
2
þx
4
þ...þx
10
Þ
and x
k
in ð1þx
5
þx
10
Þproduce the term x
iþjþk
in the final
expansion. Furthermore, iþjþkis the weight the
combination {1 g, 2 g and 5 g} represents, the coefficient
166 Z. Huang et al. / Computer Networks 81 (2015) 164–177
of x
iþjþk
is therefore the total number of solutions. As we
will represent later in Section 3.1, we extend the coeffi-
cient of each term in the polynomial from integer to deci-
mal, introducing a variant of generating function – which is
then used to model the data reliability.
3. Storage allocation scheme
Storage allocation scheme gives the way how redun-
dant data blocks are distributed and stored over a set of
network nodes. To store one file with a set of nredundant
blocks, we allocate these blocks to a given set of nodes
S¼fN
1
;...;N
jSj
g, in which the total number of nodes is
jSj. For the sake of discussion, the redundancy scheme is
denoted as RSðn;kÞ, where nis the total number of redun-
dant blocks and kindicates that any subset of kredundant
blocks would be sufficient to recover the original file. Given
that for node N
i
, its reliability is r
i
and it stores l
i
blocks,
where i2f1;...;jSjg – note that r
i
means that with this
probability all blocks that N
i
hosts are usable for data
recovery, and vice versa. Then we have n¼P
jSj
i¼1
l
i
. And
the storage allocation scheme is mapped into a set of par-
titions SA ¼fl
1
;...;l
jSj
g. Now we illustrate the problem by
a simple example. As depicted in Fig. 1(a), the original file
with koriginal blocks are encoded into five redundant
blocks in total (in this case, n¼5), where the original file
can be reconstructed using any subset of kredundant
blocks. How can we allocate these five redundant blocks
to a given set of three storage nodes whose reliabilities
are r
1
;r
2
and r
3
respectively? Two possible storage alloca-
tion schemes for this example are given in Fig. 1(b). Then
the question is: which one is better and why?
3.1. Modeling the data reliability
Before deriving the model based on generating function,
we first take the scheme SA
1
in Fig. 1(b) as an example to
illustrate: there are three nodes N
1
;N
2
and N
3
, with relia-
bility probabilities of r
1
;r
2
and r
3
respectively. For nodes
N
1
and N
2
, each hosts two redundant blocks whereas N
3
hosts only one. We define drðpÞas the probability that p
blocks are reliable and the other 5 pblocks are unreli-
able, where p2f0;...;5g. We list all the results of drðpÞ
for SA
1
as follows:
drð5Þ¼r
1
r
2
r
3
, where all three nodes N
1
;N
2
and N
3
are
reliable;
drð4Þ¼r
1
r
2
ð1r
3
Þ, where the nodes N
1
and N
2
are both
reliable and node N
3
is unreliable;
drð3Þ¼r
1
ð1r
2
Þr
3
þð1r
1
Þr
2
r
3
, where node N
3
is reli-
able and one of the two nodes in fN
1
;N
2
gis unreliable;
drð2Þ¼r
1
ð1r
2
Þð1r
3
Þþð1r
1
Þr
2
ð1r
3
Þ, where
node N
3
is unreliable and one of the two nodes in
fN
1
;N
2
gis reliable;
drð1Þ¼ð1r
1
Þð1r
2
Þr
3
, where node N
3
is reliable and
the other two nodes N
1
and N
2
are unreliable;
drð0Þ¼ð1r
1
Þð1r
2
Þð1r
3
Þ, where all the three
nodes N
1
;N
2
and N
3
are unreliable.
Then, the data reliability would be
drð5Þþdrð4Þþdrð3Þif k¼3 since at least kblocks are
needed for the successful data recovery. This example
shows that to calculate the data reliability, we have to
consider all the possible combinations of blocks over
nodes and calculate the corresponding drðpÞone by one.
So the computational cost is fairly expensive especially
when the number of blocks is large [7,8]. To overcome
this issue, we propose a generating function to formulate
the data reliability.
For the generic storage allocation, our generating func-
tion model is described as follows: For each node N
i
,we
introduce a factor GFNRðN
i
ðbÞÞ, which denotes that node
N
i
is reliable to provide l
i
blocks with probability r
i
while
node N
i
is unreliable (to provide blocks) with probability
1r
i
. The reliability GFDRðSA;bÞcan be then defined as a
(
a
)
Steps of Storage Allocation
(
b
)
Two schemes and which one is better?
Fig. 1. An example for storage allocation problem.
Z. Huang et al. / Computer Networks 81 (2015) 164–177 167
generating function of the multiplications of all factors
expressed by GFNRðN
i
ðbÞÞ,
3
which forms Eq. (2):
GFNRðN
i
ðbÞÞ ¼ r
i
b
l
i
þð1r
i
Þb
0
GFDRðSA;bÞ¼Y
jSj
i¼1
r
i
b
l
i
þð1r
i
Þb
0
hi
8
>
>
<
>
>
:
ð2Þ
Taking SA
1
of Fig. 1(b) as an example, we can obtain
GFDRðSA;bÞas follows:
GFDRðSA
1
;bÞ¼½r
1
b
2
þð1r
1
Þb
0
½r
2
b
2
þð1r
2
Þb
0
½r
3
bþð1r
3
Þb
0
¼r
1
r
2
r
3
b
5
þr
1
r
2
ð1r
3
Þb
4
þ½r
1
r
3
ð1r
2
Þþr
2
r
3
ð1r
1
Þb
3
þ½r
1
ð1r
2
Þð1r
3
Þþr
2
ð1r
1
Þð1r
3
Þb
2
þr
3
ð1r
1
Þð1r
2
Þb
1
þð1r
1
Þð1r
2
Þð1r
3
Þb
0
Note that the expanded form of GFDRðSA;bÞis a polyno-
mial, where in each term, the exponent (of the power)
denotes the number of reliable blocks (the same meaning
as pas above) and the coefficient is the sum of the prob-
abilities – for the cases that pblocks are reliable and the
rest are unreliable. This corresponds to drðpÞas we have
calculated before. From this example, we can see that the
coefficients are the sequences we are looking for. To for-
mulate the coefficients in a polynomial – for example
fðbÞ, we define ½b
p
fðbÞas the coefficient of the item b
p
.
Then, suppose that the polynomial GFDRðSA;bÞis expanded
as the form of P
n
i¼0
comðpÞb
p
, we have comðpÞ¼
½b
p
GFDRðSA;bÞ.
For the generic erasure codes, data is reliable only if at
least kblocks are reliable. Therefore, we can easily obtain
the data reliability DRðSA;kÞas the function of k, which
forms Eq. (3) by considering the terms that pPkin the
expanded form of GFDRðSA;bÞ:
DRðSA;kÞ¼X
n
p¼k
comðpÞ¼X
n
p¼k
½b
p
GFDRðSA;bÞð3Þ
Before we continue, we list the main symbols used
throughout the whole paper in Table 1.
3.2. Properties of the model
In order to find an optimal storage allocation scheme
(OSA) for a given set of network nodes, we first generate
a set of partitions SA ¼fl
1
;...;l
jSj
gwhere nblocks are
divided into jSjparts; then we assign the partitions to the
storage nodes, and finally find a scheme with minimum
redundancy while achieving a given high reliability ADR
by using a method specifically designed for this purpose.
We use RSðn;kÞto denote the redundancy scheme as before
for the generic erasure code, where nis the total number of
redundant blocks and kis the number of original blocks. As
redundancy is defined as RY ¼n=k, to obtain its minimum
value, we can either find a maximum kwith fixed nor find
a minimum nwith fixed k. Before introducing the
algorithm we propose, we have analyzed the properties
of the model by proving several theorems and propositions
in the beginning. To do that, the proof of a lemma is given
at first. The details about the relations between these for-
malized proofs and the process to find an optimal scheme
will be illustrated later in Section 3.3.
3.2.1. Upper-bound for allocation
Previous and recent studies [7,8] have explicitly demon-
strated that, the search space in which to find the optimal
solution is usually very huge – it thus becomes an impor-
tant issue to address. Obviously, there exist quite many
possibilities when allocating ndata blocks to a set of nodes
S¼fN
1
;...;N
jSj
g, in particular when nand jSjare both large
– which is commonly observed in real cloud storage sys-
tem. Therefore, if we can bound the number of blocks
assigned to each node in a properly small extent, then it
makes much sense. We provide in this section the upper-
bound for the allocation by Theorems 2 and 3. Before that,
we introduce Lemma 1 and Theorem 1 Lemma 1 provides
the fundamental for the proof of the theorems while
Theorem 1 specifies the order issue for the allocation.
Lemma 1. Suppose that f ðbÞ¼P
upþl
j
p¼l
j
c
pl
j
b
p
P
upþl
i
p¼l
i
c
pl
i
b
p
¼P
up
p¼0
c
p
b
p
b
l
j
b
l
i

, where
8
s;c
s
P0.Ifl
j
>l
i
, then
P
n
p¼k
½b
p
fðbÞP0.
Proof. Let us analyze all the cases for l
j
and l
i
one by one:
(1) If l
j
>l
i
Pk, then we can obtain Eq. (4):
X
n
p¼k
½b
p
fðbÞ¼X
upþl
j
p¼l
j
c
pl
j
X
upþl
i
p¼l
i
c
pl
i
¼X
up
p¼0
c
p
X
up
p¼0
c
p
¼0
ð4Þ
(2) If l
j
Pk>l
i
Pkup, then up þl
i
Pk; so we can
obtain Eq. (5):
X
n
p¼k
½bpfðbÞ¼X
upþl
j
p¼l
j
cpl
j
X
upþl
i
p¼k
cpl
i
¼X
up
p¼0
cpX
up
p¼kl
i
cp¼X
kl
i
1
p¼0
cpP0
ð5Þ
Table 1
Meaning of symbols.
Symbol Meaning
nNumber of blocks in total
kNumber of sufficient blocks for recovery
RY Data redundancy, RY ¼n=k
jSjTotal number of nodes
DR Data reliability
ADR Data reliability to achieve
N
i
Node N
i
;i¼f1;2;...;jSjg
SSet of nodes, S¼fN
1
;...;N
jSj
g
l
i
Number of blocks assigned to node N
i
SA A storage allocation, SA ¼fl
1
;...;l
jSj
g
r
i
The reliability of node N
i
RSet of nodes’ reliability, R¼fr
1
;...;r
jSj
g
3
In our discussion, we suppose that all nodes are independent from each
other – meaning that whether a node is reliable or not has nothing to do
with others.
168 Z. Huang et al. / Computer Networks 81 (2015) 164–177
(3) If l
j
Pk>kup >l
i
, then up þl
i
<k; so we can
obtain Eq. (6):
X
n
p¼k
½b
p
fðbÞ¼X
upþl
j
p¼l
j
c
pl
j
P0ð6Þ
(4) If k>l
j
>l
i
Pkup, then up þl
j
>up þl
i
Pk,so
we can obtain Eq. (7):
X
n
p¼k
½b
p
fðbÞ¼X
upþl
j
p¼k
c
pl
j
X
upþl
i
p¼k
c
pl
i
¼X
up
p¼kl
j
c
p
X
up
p¼kl
i
c
p
¼X
kl
i
1
p¼kl
j
c
p
P0
ð7Þ
(5) If k>l
j
Pkup >l
i
, then up þl
j
Pk>up þl
i
;so
we can obtain Eq. (8):
X
n
p¼k
½b
p
fðbÞ¼X
upþl
j
p¼k
c
pl
j
P0ð8Þ
(6) If kup >l
j
>l
i
, then k>up þl
j
>up þl
i
, so we can
obtain Eq. (9):
X
n
p¼k
½b
p
fðbÞ¼0ð9Þ
To sum up all these cases, we can confirm that
P
n
p¼k
½b
p
fðbÞP0, so Lemma 1 holds. h
Theorem 1 [The correlated order of the optimal
allocation].For a given allocation SA ¼fl
1
;...;l
jSj
gwhich is
assigned to a given set of nodes S ¼fN
1
;...;N
jSj
gwith relia-
bility set R ¼fr
1
;...;r
jSj
g;SA and R have the same order if
the allocation is optimal, meaning that
8
r
i
>r
j
, we have
l
i
>l
j
.
Proof. Assume that for the optimal allocation
SA ¼fl
1
;...;l
jSj
g, there exist i;j2f1;...;jSjg, where r
i
>r
j
but l
i
<l
j
. Let us consider another allocation
SA
0
¼l
0
1
;...;l
0
jSj
no
, where l
0
i
¼l
j
;l
0
j
¼l
i
and
8
si;j;l
0
s
¼l
s
.
Let FðbÞ¼Q
jSj
s¼1;si;sj
r
s
b
l
s
þð1r
s
Þb
0

, then we can obtain
Eq. (10) in the following steps:
GFDRðSA;bÞGFDRðSA
0
;bÞ¼FðbÞr
i
b
l
i
þð1r
i
Þb
0
hi
r
j
b
l
j
þð1r
j
Þb
0
hi
FðbÞr
i
b
l
0
i
þð1r
i
Þb
0
hi
r
j
b
l
0
j
þð1r
j
Þb
0
hi
¼FðbÞr
i
b
l
i
þð1r
i
Þb
0
hi
r
j
b
l
j
þð1r
j
Þb
0
hi
FðbÞr
i
b
l
j
þð1r
i
Þb
0
hi
r
j
b
l
i
þð1r
j
Þb
0
hi
¼FðbÞr
i
r
j
b
l
i
þl
j
þð1r
i
Þr
j
b
l
j
þr
i
ð1r
j
Þb
l
i
þð1r
i
Þð1r
j
Þb
0
hi
FðbÞr
i
r
j
b
l
i
þl
j
þð1r
i
Þr
j
b
l
i
þr
i
ð1r
j
Þb
l
j
þð1r
i
Þð1r
j
Þb
0
hi
¼FðbÞf½ð1r
i
Þr
j
r
i
ð1r
j
Þb
l
j
þ½r
i
ð1r
j
Þð1r
i
Þr
j
b
l
i
g
¼FðbÞðr
j
r
i
Þb
l
j
þðr
i
r
j
Þb
l
i
hi
¼FðbÞðr
j
r
i
Þb
l
j
b
l
i

ð10Þ
Since P
jSj
i¼1
l
i
¼n;FðbÞcan be then rewritten as
FðbÞ¼P
nl
i
l
j
p¼0
c
p
b
p
with c
p
P0. Hence Eq. (10) becomes:
GFDRðSA;bÞGFDRðSA
0
;bÞ¼ X
nl
j
l
i
p¼0
c
p
b
p
!
ðr
j
r
i
Þb
l
j
b
l
i

¼ðr
j
r
i
ÞX
nl
j
l
i
p¼0
c
p
b
p
b
l
j
b
l
i

ð11Þ
By Lemma 1 and Eq. (3), we know that
DRðSA;kÞDRðSA
0
;kÞ60. So the allocation SA
0
is better
than SA. This leads to a contradiction with the assumption.
Thus Theorem 1 holds. h
For example, considering the scheme SA
2
given in
Fig. 1(b), and a new scheme SA
3
in which node N
1
;N
2
and N
3
stores respectively one, three and one block, we
have DRðSA
2
Þ¼0:9>DRðSA
3
Þ¼0:85. Theorem 1 suggests
that, the larger reliability a node possesses, the more data
blocks should be assigned to and stored in it. Based on this
logic, is it feasible that we give all the data blocks to the
only node who is most reliable if we consider the extreme
case? In fact, the number of blocks that a node can host is
upper-bounded. Theorems 2 and 3 provide the answers for
this concern.
Theorem 2 [Upper-bound in SA for optimal allocation].For
an optimal allocation SA ¼fl
1
;...;l
jSj
g, which is assigned to a
set of nodes S ¼fN
1
;...;N
jSj
gwith a given k, then
8
l
j
2SA,
we have l
j
6k.
Proof. Assume that in the optimal allocation SA;9l
j
2SA and
l
j
>k. Let us consider another allocation SA
0
¼l
0
1
;...;l
0
jSj
no
that l
0
j
¼l
j
1and
8
sj;l
0
s
¼l
s
. Obviously, the redundancy
of SA
0
is smaller than that of SA.Let
FðbÞ¼Q
jSj
s¼1;sj
r
s
b
l
s
þð1r
s
Þb
0

, we can obtain Eq. (12):
GFDRðSA;bÞGFDRðSA
0
;bÞ
¼FðbÞr
j
b
l
j
þð1r
j
Þb
0
hi
FðbÞr
j
b
l
0
j
þð1r
j
Þb
0
hi
¼FðbÞr
j
b
l
j
r
j
b
l
0
j

¼FðbÞr
j
b
l
j
b
l
j
1
 ð12Þ
Since l
j
>kand l
j
1Pk, we can see that
DRðSA;kÞDRðSA
0
;kÞ¼0. This means that SA
0
can use
smaller redundancy to achieve the same data reliability
as compared to SA. So the allocation SA
0
is better than SA.
This leads to a contradiction with the assumption. Thus
Theorem 2 holds. h
We use the example in Fig. 1(b) for demonstration. By
enumeration method, it is easy to know that the allocation
scheme SA
1
is optimal, and DRðSA
1
Þ¼0:941. Suppose that
k¼3, then the maximum data reliability is 0.9
(<DRðSA
1
Þ) if we assign more than 3 blocks to any one
node. Therefore, Theorem 2 manifests that each node can
host at most kredundant data blocks in an optimal alloca-
tion. In fact, the original data/file can be easily reconstruct-
ed with any kredundant blocks by definition – as what we
called ‘‘sufficiency’’. Therefore, if kis quite large, in
Z. Huang et al. / Computer Networks 81 (2015) 164–177 169
particular when kapproaches n, the upper-bound in SA
makes less sense for the allocation issue. As such, to further
the restriction on the upper-bound of the elements in SA
becomes necessary.
Theorem 3 [Upper-bound in SA for optimal allocation].For
an optimal allocation SA ¼fl
1
;...;l
jSj
gwhich is assigned to a
set of nodes S ¼fN
1
;...;N
jSj
gwith a given k, SA should either
(i) has the property that
8
l
j
2SA;l
j
6nk; or (ii) assign k
blocks to node N
m
(the node which is most reliable,
i.e. r
m
¼maxfr
1
;...;r
jSj
g) and zero blocks to other nodes.
Proof. Consider an inverse proposition of case (i), we
assume that 9l
j
;l
i
2SA, where l
j
>nk.ByTheorem 1,
we have l
m
Pl
j
>nksince r
m
Pr
j
. We construct anoth-
er allocation SA
0
¼l
0
1
;...;l
0
jSj
no
, where l
0
i
¼l
i
1;l
0
m
¼l
m
þ1
and
8
si;sm;l
0
s
¼l
s
. Let FðbÞ¼Q
jSj
s¼1;si;sm
r
s
b
l
s
þð1r
s
Þb
0

, then we can obtain Eq. (13):
GFDRðSA;bÞGFDRðSA
0
;bÞ¼FðbÞr
i
b
l
i
þð1r
i
Þb
0
hi
r
m
b
l
m
þð1r
m
Þb
0
hi
FðbÞr
i
b
l
0
i
þð1r
i
Þb
0
hi
r
m
b
l
0
m
þð1r
m
Þb
0
hi
¼FðbÞr
i
b
l
i
þð1r
i
Þb
0
hi
r
m
b
l
m
þð1r
m
Þb
0
hi
FðbÞr
i
b
l
i
1
þð1r
i
Þb
0
hi
r
m
b
l
m
þ1
þð1r
m
Þb
0
hi
¼FðbÞr
i
r
m
b
l
i
þl
m
þð1r
i
Þr
m
b
l
m
h
þr
i
ð1r
m
Þb
l
i
þð1r
i
Þð1r
m
Þb
0
iFðbÞ
r
i
r
m
b
l
i
þl
m
þð1r
i
Þr
m
b
l
m
þ1
þr
i
ð1r
m
Þb
l
i
1
þð1r
i
Þð1r
m
Þb
0
hi
¼FðbÞð1r
i
Þr
m
ðb
l
m
b
l
m
þ1
Þþð1r
m
Þr
i
ðb
l
i
b
l
i
1
Þ
hi
ð13Þ
Since P
jSj
t¼1
l
t
¼n;FðbÞcan then be rewritten as
FðbÞ¼P
nl
i
l
m
p¼0
c
p
b
p
;
8
pwith c
p
P0. Based on this, Eq. (13)
can be rewritten as Eq. (14) in the following steps:
GFDRðSA;bÞGFDRðSA
0
;bÞ¼FðbÞ
ð1r
i
Þr
m
b
l
m
b
l
m
þ1

þð1r
m
Þr
i
b
l
i
b
l
i
1
hi
¼FðbÞð1r
i
Þr
m
b
l
m
b
l
m
þ1

þFðbÞð1r
m
Þr
i
b
l
i
b
l
i
1

¼ð1r
i
Þr
m
X
nl
i
l
m
p¼0
c
p
b
p
!
b
l
m
b
l
m
þ1

þð1r
m
Þr
i
X
nl
i
l
m
p¼0
c
p
b
p
!
b
l
i
b
l
i
1

¼ð1r
i
Þr
m
X
nl
i
l
m
p¼0
c
p
b
pþl
m
b
pþl
m
þ1

þð1r
m
Þr
i
X
nl
i
l
m
p¼0
c
p
b
pþl
i
b
pþl
i
1

¼ð1r
i
Þr
m
X
nl
i
l
m
p¼0
c
p
b
pþl
m
b
pþl
m
þ1

þð1r
m
Þr
i
X
nl
m
p¼l
i
c
pl
i
b
p
X
nl
m
1
p¼l
i
1
c
pl
i
þ1
b
p
0
@1
Að14Þ
Since l
m
>nk, we have k>nl
m
>nl
m
1.
Considering that the exponent of the power in each item
should be larger than k, we abandon the second term. For
the first term ð1r
i
Þr
m
P
nl
i
l
m
p¼0
c
p
b
pþl
m
b
pþl
m
þ1

, we can
obtain DRðSA;kÞDRðSA
0
;kÞ60byLemma 1.SoSA
0
is bet-
ter than SA. This leads to a contradiction with the assump-
tion. And for SA
0
, all nodes in SnN
m
hold less than kblocks
in total. So the data reliability completely depends on the
reliability of N
m
. Then we have to satisfy the case (ii) by
assigning kblocks to N
m
and zero blocks to other nodes,
which allows us to use a minimum redundancy with a
maximum data reliability r
m
.SoTheorem 3 holds. h
For the example given in Fig. 1(b), we know that SA
1
is
optimal and DRðSA
1
Þ¼0:941. Then we observe that
8
l
j
2SA
1
;l
j
6nk¼2. By Theorem 3, if the maximum data
reliability r
m
among all nodes (for distributed storage) is
smaller than the given (global) data reliability ADR, then a
node is assigned at most ðnkÞredundant data blocks.
3.2.2. Improvement for computation
As we know that during the computation of data relia-
bility, the number of the terms doubles after each multipli-
cation. Therefore, to reduce the number of the terms so as
to decrease the complexity and further to accelerate the
whole calculation becomes necessary. We address this
issue by the proof of Theorems 4 and 5.
Theorem 4 [The sum of the coefficients equals to
1].
8
s2f1;...;jSjg, suppose that Q
s
i¼1
GFNRðN
i
ðbÞÞ can be
expanded as P
sums
p¼0
c
p
b
p
, where sums ¼P
s
1
l
i
and
8
p;c
p
P0,
then P
sums
p¼0
c
p
¼1.
Proof. According to Eq. (2), we have
8
i;GFNRðN
i
ð1ÞÞ ¼ 1.
To obtain the sum of the coefficients in the expanded form
of the polynomial Q
s
i¼1
GFNRðN
i
ðbÞÞ, we use the substitution
of b¼1. By Eq. (2), we can obtain Eq. (15).
X
sums
p¼0
c
p
¼X
sums
p¼0
½b
p
Y
s
i¼1
GFNRðN
i
ð1ÞÞ
!
¼1ð15Þ
Therefore, Theorem 4 holds. The enumeration method for
drðpÞin Section 3.1 gives an example for this theorem. h
Theorem 5 [To simplify the multiplication of polynomi-
als].
8
s2f1;...;jSj1g, suppose that Q
s
i¼1
GFNRðN
i
ðbÞÞcan
be expanded as P
sums
p¼0
c
p
b
p
, where sums ¼P
s
1
l
i
and
8
p;c
p
P0, then
P
k1
p¼0
½b
p
P
k1
p¼0
c
p
b
p

GFNRðN
sþ1
ðbÞÞ
hi
¼P
k1
p¼0
½b
p
Q
sþ1
i¼1
h
GFNRðN
i
ðbÞÞ.
Proof. Since GFNRðN
sþ1
ðbÞÞ ¼ r
sþ1
b
l
sþ1
þð1r
sþ1
Þ, we then
obtain Eqs. (16) and (17):
X
k1
p¼0
c
p
b
p
!
GFNRðN
sþ1
ðbÞÞ ¼ X
k1
p¼0
c
p
b
p
!
r
sþ1
b
l
sþ1
þð1r
sþ1
Þ
hi
¼r
sþ1
X
k1
p¼0
c
p
b
pþl
sþ1
þð1r
sþ1
ÞX
k1
p¼0
c
p
b
p
ð16Þ
170 Z. Huang et al. / Computer Networks 81 (2015) 164–177
Y
sþ1
i¼1
GFNRðN
i
ðbÞÞ ¼ X
sums
p¼0
c
p
b
p
!
r
sþ1
b
l
sþ1
þð1r
sþ1
Þ
hi
¼r
sþ1
X
sums
p¼0
c
p
b
pþl
sþ1
þð1r
sþ1
ÞX
sums
p¼0
c
p
b
p
ð17Þ
We discuss two cases for l
sþ1
as follows:
If l
sþ1
<k, by Eq. (16), we can obtain Eq. (18):
X
k1
p¼0
½b
p
X
k1
p¼0
c
p
!
b
p
GFNRðN
sþ1
ðbÞÞ
"#
¼r
sþ1
X
k1l
sþ1
p¼0
c
p
þð1r
sþ1
ÞX
k1
p¼0
c
p
ð18Þ
By Eq. (17), we can obtain Eq. (19):
X
k1
p¼0
½b
p
Y
sþ1
i¼1
GFNRðN
i
ðbÞÞ
"#
¼r
sþ1
X
k1l
sþ1
p¼0
c
p
þð1r
sþ1
ÞX
k1
p¼0
c
p
ð19Þ
Using Eqs. (18) and (19), we know that Theorem 5 holds
when l
sþ1
<k.
If l
sþ1
Pk, by Eq. (16), we can obtain Eq. (20):
X
k1
p¼0
½b
p
X
k1
p¼0
c
p
!
b
p
GFNRðN
sþ1
ðbÞÞ
"#
¼ð1r
sþ1
ÞX
k1
p¼0
c
p
ð20Þ
By Eq. (17), we can obtain Eq. (21):
X
k1
p¼0
½b
p
Y
sþ1
i¼1
GFNRðN
i
ðbÞÞ
!
¼ð1r
sþ1
ÞX
k1
p¼0
c
p
ð21Þ
Using Eqs. (20) and (21), we know that Theorem 5 holds
when l
sþ1
Pk. To sum up both two cases, we confirm
that Theorem 5 holds.
h
From Theorems 4 and 5, we know that one only needs
to keep the terms containing powers whose exponent is
smaller than kfor each multiplication. And finally, we
can obtain the data reliability by subtracting the coefficient
of the result of the multiplication from 1. We then use the
example given before, i.e. the allocation scheme SA
1
in
Fig. 1 to illustrate the improvement.
The 1st multiplication: r
1
b
2
þð1r
1
Þb
0
hi
r
2
b
2
þ
h
ð1r
2
Þb
0
¼r
1
r
2
b
4
þ½r
1
ð1r
2
Þþr
2
ð1r
1
Þb
2
þð1r
1
Þ
ð1r
2
Þb
0
. We drop the term r
1
r
2
b
4
, and keep the left
½r
1
ð1r
2
Þþr
2
ð1r
1
Þb
2
þð1r
1
Þð1r
2
Þb
0
. We then
continue and go to next step.
The 2nd multiplication: r
1
ð1r
2
Þþr
2
ð1r
1
Þb
2
þ
ð1r
1
Þð1r
2
Þb
0
gr
3
b
1
þð1r
3
Þb
0
hi
¼½r
1
r
3
ð1r
2
Þþr
2
r
3
ð1r
1
Þb
3
þ½r
1
ð1r
2
Þð1r
3
Þþ r
2
ð1r
1
Þð1r
3
Þb
2
þ
ð1r
1
Þð1r
2
Þr
3
b
1
þð1r
1
Þð1r
2
Þð1r
3
Þb
0
. We throw
away the term ½r
1
r
3
ð1r
2
Þþr
2
r
3
ð1r
1
Þb
3
, and
preserve the terms ½r
1
ð1r
2
Þð1r
3
Þþr
2
ð1r
1
Þð1r
3
Þ
b
2
þð1r
1
Þð1r
2
Þr
3
b
1
þð1r
1
Þð1r
2
Þð1r
3
Þb
0
.We
then continue and go to next step.
After getting the coefficient, we subtract it from 1:
DRðSA
1
;3Þ¼1½r
1
ð1r
2
Þð1r
3
Þþ r
2
ð1r
1
Þð1r
3
Þ
ð1r
1
Þð1r
2
Þr
3
ð1r
1
Þð1r
2
Þð1r
3
Þ. Referring to
the results for drðpÞin Section 3.1, we have
DRðSA
1
;3Þ¼1drð2Þdrð1Þdrð0Þ. In fact, by
expanding all the drðpÞs, one gets the sum as
P
5
p¼0
drðpÞ¼1. Hence, DRðSA
1
;3Þ¼drð5Þþdrð4Þþ
drð3Þ- the result is verified to be correct.
From the above example, we see that one uses at most k
(k¼3) terms to multiply other two terms. With this strat-
egy, the computation is efficiently simplified and the com-
plexity is significantly improved.
3.2.3. Stop condition for searching process
In fact, for a given n, increasing kis equivalent to
decreasing the data redundancy RY as RY ¼n=k.
Accordingly, the data reliability degenerates, vice versa.
Therefore, the data reliability, as well as the data redun-
dancy, depends on two factors: nand k. Hereafter, we
devote ourselves to the discussion concerning the stop
condition for search by two cases: (i) to find a maximum
kwith fixed n– in which Proposition 3 is used (note that
Theorem 6,Propositions 1 and 2 are prepared in advance
for Proposition 3); (ii) to find a minimum nwith fixed k
Proposition 5 is adopted (Proposition 4 provides the help
for Proposition 5).
Theorem 6 [The monotonicity of the data reliability].When
n is fixed, the data reliability monotonically decreases as k
increases.
Proof. As mentioned before, data is reliable only if at least
kblocks are available. Suppose that k
1
>k
2
, by Eq. (3),we
can obtain Eq. (22) as follows:
DRðSA;k
1
ÞDRðSA;k
2
Þ¼X
n
p¼k
1
½b
p
GFDRðSA;bÞ
X
n
p¼k
2
½b
p
GFDRðSA;bÞ¼
X
k
1
p¼k
2
½b
p
GFDRðSA;bÞ60ð22Þ
As each term (i.e. ½b
p
GFDRðSA;bÞ) in the expanding form
is larger than or equal to 0, so P
k
1
p¼k
2
½b
p
GFDRðSA;bÞ60.
We then have DRðSA;k
1
Þ6DRðSA;k
2
Þ. Hence, Theorem 6
holds. For the scheme SA
1
given in Fig. 1(b), we have: (i)
if k¼5, then DRðSA
1
;5Þ¼0:612; (ii) if k¼4, then
DRðSA
1
;4Þ¼0:612 þ0:153 ¼0:765; (iii) if k¼3, then
DRðSA
1
;3Þ¼0:941. Note that Theorem 6 is similar to
Lemma 2 in [7].h
Proposition 1. Given the total number of redundant blocks n
and a data reliability ADR to achieve, if
8
k
0
0
¼k
0
þ1;9SA
0
,
where maxðSA
0
Þ6k
0
0
;DRðSA
0
;k
0
0
ÞPADR, then
8
k
0
;9SA,
where maxðSAÞ6k
0
;DRðSA;k
0
ÞPADR.
Z. Huang et al. / Computer Networks 81 (2015) 164–177 171
Proof. We consider two cases for maxðSA
0
Þand k
0
0
:
1) If maxðSA
0
Þ<k
0
0
, we set SA ¼SA
0
.ByTheorem 6,we
can further obtain that DRðSA;k
0
ÞPDRðSA
0
;k
0
0
ÞPADR
since k
0
0
>k
0
.
2) If maxðSA
0
Þ¼k
0
0
, suppose that SA
0
is in a descending
order, then l
0
1
Pl
0
2
PPl
0
jSj
. Since maxðSA
0
Þ¼k
0
0
, then
9qso that l
0
1
¼¼ l
0
q
¼k
0
0
and l
0
qþ1
<k
0
0
.
We consider another allocation SA
with descending
order as well, where l
1
¼ ¼ l
q
¼k
0
0
1¼k
0
and
8
sqþ1;jSj;l
s
¼l
0
s
. We suppose that Q
jSj
s¼qþ1
r
s
b
l
0
s
þð1r
s
Þ
hi
¼P
suml
p¼0
c
p
b
p
, where suml ¼P
jSj
i¼qþ1
l
i
and
8
p;c
p
P0. And also Q
q
s¼1
r
s
b
k
0
0
þð1r
s
Þ

¼
P
q
i¼1
c
0
i
b
ik
0
0
þQ
q
s¼1
ð1r
s
Þ
hi
, where
8
i;c
0
i
P0. Similar to
the case of k
0
0
, we have Q
q
s¼1
r
s
b
k
0
þð1r
s
Þ

¼
P
q
i¼1
c
0
i
b
ik
0
þQ
q
s¼1
ð1r
s
Þ
hi
for the case of k
0
. Then we
can obtain Eqs. (23) and (24):
GFDRðSA0;bÞ¼Y
jSj
s¼1
rsbk
0
0
þð1rsÞ
hi
¼Y
q
s¼1
rsbk
0
0
þð1rsÞ
hi
Y
jSj
s¼qþ1
rsbl
0
s
þð1rsÞ
hi
¼X
q
i¼1
c0
ibik
0
0
þY
q
s¼1
ð1rsÞ
"#
X
suml
p¼0
cpbp¼X
q
i¼1
c0
ibik
0
0
X
suml
p¼0
cpbpþY
q
s¼1
ð1rsÞX
suml
p¼0
cpbp
ð23Þ
GFDRðSA;bÞ¼Y
jSj
s¼1
rsbk
0
þð1rsÞ
hi
¼Y
q
s¼1
rsbk
0
þð1rsÞ
hi
Y
jSj
s¼qþ1
rsbl
0
s
þð1rsÞ
hi
¼X
q
i¼1
c0
ibik
0
þY
q
s¼1
ð1rsÞ
"#
X
suml
p¼0
cpbp¼X
q
i¼1
c0
ibik
0
X
suml
p¼0
cpbpþY
q
s¼1
ð1rsÞX
suml
p¼0
cpbp
ð24Þ
We can therefore obtain Eq. (25) as follows:
DR SA0;k0
0

DRðSA;k0Þ¼ X
q
i¼1
c0
iX
suml
p¼0
cpþY
q
s¼1
ð1rsÞX
suml
p¼k
0
0
cp
2
43
5
X
q
i¼1
c0
iX
suml
p¼0
cpþY
q
s¼1
ð1rsÞX
suml
p¼k
0
cp
"#
¼
k
0
0
1¼k
0
Y
q
s¼1
ð1rsÞck
0
60ð25Þ
In the deriving process of Eq. (25), we have assumed that
suml Pk
0
0
. Actually, if suml ¼k
0
<k
0
0
, the result also holds.
And if suml <k
0
<k
0
0
, then c
k
0
¼0, so the result still holds.
Given that DRðSA
0
;k
0
0
ÞPADR, so we have DRðSA
ÞP
DR SA
0
;k
0
0

PADR. Furthermore, we consider another allo-
cation SA, where
8
s1;jSj1;l
s
¼l
s
;l
jSj
¼l
jSj
þq.
Suppose that Q
jSj1
i¼1
r
i
b
l
i
þð1r
i
Þ
hi
¼P
sumls
p¼0
c
p
b
p
, where
sumls ¼P
jSj1
i¼1
l
i
and
8
p;c
p
P0, then we can obtain Eq.
(26) as follows:
GFDRðSA;bÞGFDRðSA
;bÞ¼r
jSj
X
sumls
p¼0
c
p
b
p
b
l
jSj
b
l
jSj

By Lemma 1, we can see that DRðSA;k
0
ÞDRðSA
;k
0
ÞP0
since l
jSj
>l
jSj
. Then we have DRðSA;k
0
ÞPDRðSA
;k
0
ÞP
DRðSA
0
;k
0
0
ÞPADR. Thus DRðSA;k
0
ÞPADR.
To sum up case (1) and case (2), Proposition 1 holds. h
By Proposition 1, we can quickly obtain an inverse
proposition:
Proposition 2. Given the total number of redundant blocks n
and a data reliability ADR to achieve, if 9k
0
;
8
SA where
maxðSAÞ6k
0
;DRðSA;k
0
Þ<ADR, then 9k
0
0
¼k
0
þ1;
8
SA
0
,
where maxðSA
0
Þ6k
0
0
;DR SA
0
;k
0
0

<ADR.
Since 9k
0
0
¼k
0
þ1, by iteratively using Proposition 2,
we can easily obtain the following propositions:
Proposition 3 [Stop condition for searching process with
fixed n].Given the total number of redundant blocks n and a
data reliability ADR to achieve, if 9k
0
;
8
SA, where
maxðSAÞ6k
0
;DRðSA;k
0
Þ<ADR, then 9k
0
0
>k
0
;
8
SA
0
, where
maxðSA
0
Þ6k
0
0
;DR SA
0
;k
0
0

<ADR.
From Proposition 3, we know that for a certain k
0
, if any
allocation scheme cannot reach the given data reliability
ADR, then there is no need to continue searching for larger
k
0
0
,i.e. stop searching.
Proposition 4. Given k (meaning that k blocks are sufficient
for the recovery) and a data reliability ADR to achieve, if
9n
0
;9SA, where DRðSA;kÞPADR, then
8
n
0
0
>n
0
;9SA
0
, where
DRðSA
0
;kÞPADR.
Proof. Assume that allocation SA is organized as
SA ¼fl
1
;...;l
jSj
g.
8
n
0
0
>n
0
, let us consider another alloca-
tion SA
0
¼l
0
1
;...;l
0
jSj
no
that l
0
j
¼l
j
þn
0
0
n
0
and
8
sj;l
0
s
¼l
s
. Let FðbÞ¼Q
jSj
s¼1;sj
r
s
b
l
s
þð1r
s
Þb
0
hi
, then we
can obtain Eq. (27),
GFDRðSA;bÞGFDRðSA
0
;bÞ¼FðbÞr
j
b
l
j
þð1r
j
Þb
0
hi
FðbÞr
j
b
l
0
j
þð1r
j
Þb
0
hi
¼FðbÞr
j
b
l
j
b
l
0
j
 ð27Þ
Since l
0
j
¼l
j
þn
0
0
n
0
>l
j
, we then have
DRðSA;kÞ6DRðSA
0
;kÞby Lemma 1.SoDRðSA
0
;kÞPADR.
Thus Theorem 4 holds. h
Then we can obtain the following inverse proposition:
Proposition 5. [Stop condition for searching process with
fixed k].Given k (meaning that k blocks are sufficient for the
recovery) and a data reliability ADR to achieve, if
9n
0
0
>n
0
;
8
SA
0
, where DRðSA
0
;kÞ<ADR, then
8
n
0
;
8
SA, where
DRðSA;kÞ<ADR.
172 Z. Huang et al. / Computer Networks 81 (2015) 164–177
3.3. Finding the optimal storage allocation
By Eqs. (2) and (3), we list the objective and constraints
to formulate the optimization problem, leading to
Eq. (28):
obj :minimize n=k
subject to :X
jSj
i¼1
l
i
¼n
DRðSA;kÞPADR
8
>
>
>
>
<
>
>
>
>
:
ð28Þ
Here, ADR is a (given) high data reliability to achieve, for
instance 0.9999. In the process of finding the optimal stor-
age allocation, we can either find a maximum kwith fixed
nor find a minimum nwith fixed k, both of which allow us
to obtain a minimum redundancy RY ¼n=k. For these two
cases, we use different solutions:
(1) Finding a maximum kwith fixed nTo do so, we
have to proceed in three phases: (i) partition n
blocks into jSjparts, obtaining an allocation SA, (ii)
assign the allocation SA to the set of nodes S, (iii) find
the maximum kto reach the requirement on data
reliability. We analyze the three phases one by one:
Initialize k
0
: Here, we adopt the method in [7] to
generate k
0
, which assign d
r
i
P
jSj
i¼1
r
i
neredundant
blocks to a general node N
i
, where de denotes
the nearest integer function (more precise, the
ceiling). Then k
0
is continuously obtained by the
subsequent steps, including data reliability com-
putation and the searching of maximum k.
Partition:ByTheorems 2 and 3, we can obtain the
upper bound for each part of partitions. So we have
8
i1;...;jSj;16l
i
6minfk;nkg. Moreover,
for l
i
>nk, we have to check whether the
maximum reliability of nodes is larger than ADR
or not at the beginning.
Assignment:ByTheorem 1, we can see that we
should assign the partitioned parts to storage
nodes according to the order of nodes’ reliability.
So the way to allocate is fixed for a given parti-
tion set.
Reliability calculation: In this process, we can sim-
plify the computation by using Theorems 4 and 5.
Finding the maximum k: To find a maximum k,we
can increase kby a policy specified in Theorem 6,
which allows us to quickly find a kto reach the
given ADR. With the increase of k, we can
consciously terminate the operation when ADR
cannot be achieved anyhow according to
Proposition 3.
In doing so, we can easily obtain an algorithm for the mod-
el by the above steps.
(2) Finding a minimum nwith fixed k: In this case, we
cannot generate partitions without a fixed n. The
first step is to generate an approximately optimal
n. To generate partitions and then to assign the allo-
cation to a given set of storage nodes are followed.
We analyze the phases one by one:
Initialize n
0
: At the first time, we generate the
approximately optimal nusing a method similar
to the one proposed in [7]. We allocate one
redundant block to the nodes with minimum
reliability, denoted as r
min
. Then for any storage
node N
i
, we allocate
r
i
r
min
lm
redundant blocks. So
the total number of redundant blocks is
n¼P
jSj
i¼1
r
i
r
min
lm
. We will later check whether this
is the minimum value for n.
Partition:ByTheorems 2 and 3, we can obtain the
upper bound for each part of partitions. So we
have
8
i2f1;...;jSjg;16l
i
6minfk;nkg. For
l
i
>nk, we have to check whether the maxi-
mum reliability among nodes is larger than ADR
or not at the beginning. If yes, we assign k¼n
data blocks to the node who has the maximum
reliability - so that a minimum redundancy 1 is
obtained in this case.
Assignment: As illustrated above, the allocation
step is fixed for a given partition set by
Theorem 1 – as the partitioned parts are assigned
according to the order of nodes’ reliability.
Reliability calculation: In this process, we can sim-
plify the computation by using Theorems 4 and 5.
Finding the minimum n: This step is to check
whether the reliability of all cases of allocations
is less than ADR. If it is yes, we increase naccord-
ing to Proposition 5. If it is not, we decrease nand
continue from step (Partition) again.
4. Performance evaluation
In this section, we provide the evaluation on the perfor-
mance improvement, which benefits from the cloud stor-
age allocation solution we have proposed, consisting of
the aforementioned theorems and propositions. We first
analytically discuss the reduction on the search space
and the simplification of the computation process. As the
important complementation, we also address the redun-
dancy issue by the experiments with real traces – which
confirms our expectation that given data reliability, the
improvement on data redundancy is significant by using
our solution.
4.1. Analytical evaluation on the search space
The advantages of our allocation model is demonstrated
as follows:
(1) Reduction in partition: In order to illustrate the
reduction in the phase of partition when the number
of data blocks assigned to each host is upper bounded
by minfk;nkginstead of n, we define the ratio of
number of partitions R
p
:R
p
¼
Nupðn;jSj;minfk;nk
Nupðn;jSj;nÞ
, where
Nupðn;jSj;mÞdenotes the number of partitions that n
is partitioned into jSjparts with an upper bound mfor
each part. To expand minfk;nkg, we use the data
redundancy RY ¼n=k. One can easily find that
Z. Huang et al. / Computer Networks 81 (2015) 164–177 173
minfk;nkkif RY P2 and minfk;nk
kðRY 1Þif RY <2. Note that R
p
(0 <R
p
61, according
to its definition) is expected to be as small as possible so
that the set of possible partitions can be reduced. We
have plotted the relationship between R
p
and the
redundancy RY in Fig. 2 with nblocks
4
and jSjnodes
5
(expressed as hn;jSji) in two cases: (a) number of nodes
jSjis fixed; (b) number of blocks nis fixed.
Interestingly, Rpis observed to be fairly small (close to
0) when RY is quite large (for example RY ¼10 in our
study) or quite small (RY ¼1:1). It is important to note
that in current RAID system [20], the redundancy RY is
not larger than 1.5 for RAID-5 and not larger than 1.67
for RAID-6 – the results of Rpin both systems are par-
ticularly encouraging. In addition, Fig. 2(a) exposes that
Rpbehaves highly similar when we vary the number of
data blocks n(from 60, 80 to 100) and at the same time
keep the number of nodes jSjto be a constant. Inversely,
when we change jSjwhile keeping the value of n,we
observe from Fig.2(b) that for the same RY (in between
2 and 9), Rpdecreases as jSjdecreases – meaning that
the smaller the number of nodes, the better performance
of our solution can obtain.
(2) Reduction in assignment: When using Theorem 1, the
assignment is fixed by the rule so that randomly allo-
cating data blocks to a set of storage nodes can be avoid-
ed. We define ratio R
a
¼
no:of assignments by the model
no:of assignments in a ran dom manner
.In
this case, we obtain the ratio R
a
¼1=jSj!, where
jSj!¼jSjðjSj1Þ21. For instance, if
jS15, we achieve the significant saving as
R
a
¼7:65 10
13
.
(3) Simplification in calculation:ByTheorems 4 and 5,
we are able to strictly limit the number of terms to k
in each multiplication. Hence, the worst case for our
solution in the calculation is the multiplication between
polynomial with kterms and polynomial with two
terms. Without this simplification, the computation
would be extremely complex as one has to deal with
the multiplication between polynomial with 2
jSj1
terms and polynomial with two terms. When taking
jS15 and k¼15 as an example, we see that the ratio
of the improvement by simplification can reach
k
2
jSj1
¼9:16 10
4
.
By summing up the Advantage (1) and (2) presented
above, we see that the search space can be efficiently
reduced to R
p
R
a
of the original search space. Furthermore,
when taking into account the Advantage (3), the computa-
tion process can be highly simplified. In fact, we also use
Proposition 3 or Proposition 5 to quickly terminate the
search process, which can save the time to find the
optimal.
4.2. Experimental evaluation
To get a more comprehensive and realistic evaluation of
our solution, in particular for the assessment of the redun-
dancy, we use the availability datasets from [31] – a public
repository of traces used for the study of distributed sys-
tems. Each trace defines the active-inactive state of the
nodes (so called ‘‘churn’’ in the area of distributed comput-
ing) and is collected from the real world. We choose four
different traces for our study: Microsoft (corporate PCs,
2000), Skype (Skype superpeers, 2006), WebSites (web ser-
vers, 2002) and PlanetLab (P2P, 2005). A table summariz-
ing each trace is given in Table 2. The more detail
concerning the traces may refer to [32].
We randomly select a set of jSjnodes from each trace,
and we then allocate nblocks to these nodes. For each node
1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Redundancy
Rp
Rp
Savings by partition with <n, |S|>:n blocks and |S| nodes
<100,15>
<80,15>
<60,15>
(a) R
p
with same number of nodes
1 2 3 4 5 6 7 8 9 10
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Redundancy
Savings by partition with <n,|S|>:n blocks and |S| nodes
<80,18>
<80,15>
<80,12>
(b) R
p
with same number of blocks