Content uploaded by Zhen Huang

Author content

All content in this area was uploaded by Zhen Huang on Dec 17, 2018

Content may be subject to copyright.

Minimizing data redundancy for high reliable cloud storage

systems

Zhen Huang

a,1

, Jinbang Chen

b,

⇑

,1

, Yisong Lin

c

, Pengfei You

a

, Yuxing Peng

a

a

National University of Defense Technology, China

b

East China Normal University, China

c

Institute of GLD, China

article info

Article history:

Received 8 October 2014

Received in revised form 30 January 2015

Accepted 17 February 2015

Available online 24 February 2015

Keywords:

Cloud storage system

Redundancy

Reliability

Generating function

abstract

Cloud storage system provides reliable service to users by widely deploying redundancy

schemes in its system – which brings high reliability to the data storage, but inversely

introduces signiﬁcant overhead to the system, consisting of storage cost and energy con-

sumption. The core behind this issue is how to leverage the relationship between data

redundancy and data reliability. To optimize both concurrently is apparently difﬁcult. As

such, to ﬁx one as a constraint and then to reach another one becomes the consensus.

We aim in the paper to pursue a storage allocation scheme that minimizes the data redun-

dancy while achieving a given (high) data reliability. For this purpose, we have provided a

novel model based on generating function. With this model, we have proposed a practical

and efﬁcient storage allocation scheme, which is proved to be able to minimize the data

redundancy. We analytically demonstrate that the suggested solution brings several

advantages, in particular the reduction of the search space and the acceleration to the

computation. We also assess the improvement on the savings of data redundancy

experimentally by adopting availability traces collected from real world – which encourag-

ingly shows that the reduction of data redundancy by our solution can reach up to more than

30% as compared to the heuristic method recently proposed in the research community.

Ó2015 Elsevier B.V. All rights reserved.

1. Introduction

Cloud computing, with its promise to provide reliable

service to users in an efﬁcient and cheap manner, has

attracted signiﬁcant interests from both industry and aca-

demia [1]. Different from the supercomputing systems,

inexpensive commodities are commonly used in cloud sys-

tems due to the consideration of scalability [2]. The reliabil-

ity issue in such systems is thus particularly important. To

ensure the data reliability, redundancy scheme is a basic

solution and has been extensively deployed [3]. With this

scheme, the intuitive idea is to store copies of data objects

over a set of network nodes for the successful recovery in

case of loss or failure. As such, one of the key issues is

how to allocate redundant data over a given set of network

nodes, which is conventionally referred to as the storage

allocation, aiming at achieving maximum reliability by

using minimum redundancy.

In general, two important factors need to be taken into

account when discussing the storage allocation problem:

data redundancy (‘‘RY’’ for short) and data reliability

http://dx.doi.org/10.1016/j.comnet.2015.02.013

1389-1286/Ó2015 Elsevier B.V. All rights reserved.

⇑

Corresponding author at: Room 604, Information Building, 500

Dongchuan Road, 200241 Shanghai, China. Tel.: +86 (0)21 54 34 51 88.

E-mail addresses: maths_www@163.com (Z. Huang), jbchen@cs.ecnu.

edu.cn (J. Chen).

1

This study was carried out with National Laboratory of Parallel and

Distributed Processing – Department of Computer – National University of

Defense Technology – China, and Shanghai Key Laboratory of Multidimen-

sional Information Processing – Department of Computer Science and

Technology – East China Normal University – China.

Computer Networks 81 (2015) 164–177

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier.com/locate/comnet

(‘‘DR’’ for short). Data redundancy RY is the answer to how

much redundant data should be deployed in the system.

Data reliability DR tells the probability that data is safely

stored in the system for a certain duration. On one hand,

increasing the data redundancy can improve the data relia-

bility and at the same time disperse the I/O overhead in a

distributed way. More importantly, it on the other hand

increases the storage overhead and the energy consump-

tion – remind that the power becomes an important con-

cern today when designing the system. According to the

reports [4,5] from IDC (International Data Corporation),

currently there exists an evident gap between the demand

on data creation and the storage capacity of hardware

devices. And the gap will grow rapidly in the next coming

decade. Furthermore, the storage systems consume around

37–40% of the energy of all IT components [6]. Therefore,

minimizing data redundancy can signiﬁcantly reduce the

per-byte cost of a storage system. In this sense, minimizing

RY is crucial to a cloud storage system.

It is clear to see that low data redundancy and high data

reliability are in some sense contradictory, meaning that to

ﬁnd the optimal solution with minimum data redundancy

and maximum data reliability is extremely hard. Thus,

there exists a tradeoff between them. The usual way to

cope with is to ﬁx one and then try to ﬁnd the optimal

for the other one. For this purpose, to ﬁgure out their rela-

tions is necessary. However, it is non-trivial to formulate

the relations for all these factors and further to derive

the related properties. Still, storage allocation is an open

and challenging issue in the research community.

2

In fact,

optimizing data redundancy makes more sense in real cloud

systems as: (i) besides redundancy scheme, such systems

usually adopt many other mechanisms to guarantee the data

reliability; (ii) minimizing data redundancy brings sig-

niﬁcant decrease on the cost of the distributed systems as

presented before. For those considerations, we aim in this

paper to pursue a storage allocation scheme that minimize

data redundancy while achieving a given (high) data relia-

bility – this objective is in line with what Pamies-Juarez

et al. work for in [7,8]. In their method, Monte Carlo

approximation is used to measure the data reliability, and

a heuristic optimization algorithm is then followed to ﬁnd

the best assignment. In heterogeneous settings, their results

have shown that data redundancy can be reduced up to 70%

as compared to the traditional equal – allocation schemes.

Although appealing, their method is descriptive (using a

Monte Carlo approximation) and cannot be used for generic

cases. Our contributions are as follows: we have reported in

this paper new interesting results in the form of a new and

novel model built based on a variant of generating function

[9] and suggested for storage allocation in cloud storage sys-

tems. The advantage of the new model is the achievement of

minimization of redundancy for a given (high) reliability.

We have demonstrated that, our suggested method outper-

forms the existing solutions, in particular the heuristic

method recently proposed in the research community. The

solution we have proposed is practical – with this model,

we are able to quantitatively analyze the properties of opti-

mal storage allocation – which allows us to reduce search

space and accelerate the computation. In the end, we not

only analytically evaluate the reduction of the search space,

but also experimentally assess the improvement on the sav-

ings of data redundancy using particular sets of data collect-

ed from real world.

The remaining of this paper is organized as follows:

Section 2illustrates the redundancy scheme and reviews

the previous work on storage allocation. An efﬁcient solu-

tion we propose for the storage allocation are then given in

Section 3. In Section 4, we evaluate our method through

analysis and experiments by adopting availability traces

collected from real world. We ﬁnally conclude the paper

with a summary and future work in Section 5.

2. Background and related work

2.1. Redundancy scheme

In cloud storage system, using redundancy scheme to

achieve data reliability is straightforward. To ﬁght against

failures, it distributes copies of data objects to a set of stor-

age nodes. Redundancy schemes have been widely studied

in the research community [10–12]. Generally speaking,

redundancy scheme can be classiﬁed into two types: repli-

cation and erasure code. Replication scheme is simple and

intuitional, which replicates each data block into ncopies

and then distributes them to different network nodes. By

contrast, erasure code (e.g., Reed–Solomon [13]) encodes

kdata blocks into ðnkÞcoded blocks, resulting in nblocks

in total. These nblocks are then distributed into different

nodes [14,15].

Conventionally, the data redundancy of erasure code is

deﬁned as RY ¼n=k. It is clear to see that when k¼1, era-

sure code turns into replication. Thus, replication can be

regarded as the special case of erasure code. When k>1,

erasure code consumes less storage space as compared to

replication [16], meaning that erasure code can improve

the reliability with the same consumption of storage. Due

to its ﬂexibility, erasure code becomes a promising solu-

tion for the next-generation cloud storage systems and

have been explosively discussed more recently [17–19].

Some practical cloud storage systems have started to part-

ly deploy erasure codes, such as GFS [2] and HDFS [20].

Some other cloud backup systems have fully deployed era-

sure cods, including Wuala [21] and CleverSafe [22].

Remind that we mainly focus on erasure code in this paper

when discussing the cloud storage allocation. However,

our method can be easily extended to the case of replica-

tion scheme.

2.2. Storage allocation problem

Storage allocation problems arise from the emergence

of coding schemes that store more than one redundant

block into the same storage node, such as the

Regenerating Code [23] and ER-Hierarchical Code [24],

where the repair cost for lost data can be reduced. In this

case, the computation for data reliability can be very

2

see http://storagewiki.ece.utexas.edu/doku.php?id=wiki:open_

problems.

Z. Huang et al. / Computer Networks 81 (2015) 164–177 165

complex and hard to analyze. Storage allocation problem

aims to obtain the optimal allocation scheme for a set of

nodes, with storage and reliability constraints. The mean-

ing of optimal scheme here can be achieving maximum

reliability or minimum redundancy.

Many works [7,8,25–27] have been devoted to the

cloud storage allocation issue. Leong et al. [25–27] have

pointed out many problems on storage allocation and

have addressed part of them in theory. They mainly focus

on ﬁnding maximum data reliability with a ﬁxed redun-

dancy. However, minimizing data redundancy appears

to be more meaningful in cloud storage systems as we

have presented in Section 1. Pamies-Juarez et al. [7,8]

have paid much attention on ﬁnding the minimum data

redundancy with a given data reliability. Their objective

is similar to us but using a heuristic method, and the

complexity of their model embarrasses the analysis of

the allocation properties. In addition, different from

many allocation schemes which usually ﬁx kﬁrst and

then tune nfor the purpose of optimization, their method

is designed to ﬁnd the maximum kwith ﬁxed n–which

therefore has its limitation. We incorporate both cases in

our suggested scheme, and further prove that our scheme

is optimal.

Note that, the reliability of the nodes highly affects the

data reliability of the whole system. Although some sys-

tems are supposed to be homogeneous – all nodes have

the same or similar features, the real systems in most cases

are heterogeneous and difﬁcult to characterize. Therefore,

the different combination of nodes even though the num-

ber of nodes are the same, may result in distinct data relia-

bility for the system. Moreover, with the advance of the

network coding, more than one redundant block can be

stored in one node – which brings high variety of nodes’

combination and thus increase the complexity of the ana-

lysis. Based on all these aspects, characterizing and analyz-

ing the relations related to data reliability becomes a big

challenge. In this paper, we ﬁrstly propose a novel model

based on generating function – which allows us to efﬁ-

ciently analyze the storage properties, and then we pro-

vides the way to simplify the problem by proofs of

several theorems.

2.3. The generation function for storage allocation

The core of the storage allocation issue lies in leveraging

the relationship between the minimization of data redun-

dancy and the maximization of data reliability, In general,

there are two kinds of methods: based on the target of

optimizing data redundancy and optimizing data reliabil-

ity. For either objective, the common difﬁculty is how to

represent the data reliability and furthermore how to ana-

lyze it [7,8,27]. We use in this paper the variant of generat-

ing function for the purpose of storage allocation, and it

features the following properties: (i) Simple to represent

as the generating function can be obtained by the multipli-

cation of polynomials and the expansion is sufﬁcient for

calculation. (ii) Easy to derive the related properties by fac-

torization and analysis thanks to its simple form of repre-

sentation. (iii) It brings convenience if the coefﬁcient of

each term in its expansion form is what we need. (iv) As

the number of polynomials increases, the computation

for their multiplication becomes more complex. For the

last feature, we provide the method in Section 4.1 to sim-

plify it by Theorems 4 and 5. We here take an example of

‘‘calibration weights’’ to explain the concept of generating

function and its applications.

The weight problem: There are three kinds of weight –

1 g, 2 g and 5 g, and the number of each kind is inﬁnite. The

question is, how many combinations do we have for the

number of each kind of weight, so that the total weight

is exactly 10 g.

To solve this problem, we have the following two differ-

ent methods:

Method 1: enumeration by class. We ﬁrst ﬁnd all the pos-

sible combinations by class. For example, group of

weights with class of 1 g only, group of weights with

class of 2 g only, group of weights with class of 5 g only,

group of weights with classes of 1 g and 2 g, group of

weights with classes of 1 g and 5 g, group of weights

with classes of 2 g and 5 g, group of weights with class-

es of 1 g and 2 g and 5 g – there are seven combinations

in total. And then, for each combination, we analyze all

the possibilities. For example, if using weights of 1 g

and 2 g, we obtain four solutions: (a) 10 ¼1þ1þ1þ

1þ1þ1þ1þ1þ2, (b) 10 ¼1þ1þ1þ1þ1þ1þ

2þ2, (c) 10 ¼1þ1þ1þ1þ2þ2þ2, (d) 10 ¼1þ

1þ2þ2þ2þ2. By repeating the above step, we

ﬁnally get the total number of solutions, that is 1þ1þ

1þ4þ1þ0þ2¼10.

Method 2: the generating function. For the weight of 1 g,

the weights it can represent are 1 g, 2 g, ...,10g.Sowe

introduce the factor ð1þx

1

þx

2

þ...þx

10

Þ; similarly,

we introduce the factor ð1þx

2

þx

4

þ...þx

10

Þfor the

weight of 2 g, and the factor ð1þx

5

þx

10

Þfor the weight

of 5 g. By multiplying all the factors, we obtain the gen-

erating function fðxÞ:

fðxÞ¼ð1þx

1

þx

2

þ...þx

10

Þð1þx

2

þx

4

þ...

þx

10

Þð1þx

5

þx

10

Þ

¼1þxþ2x

2

þ2x

3

þ3x

4

þ4x

5

þ5x

6

þ6x

7

þ7x

8

þ8x

9

þ10x

10

þ10x

11

þ11x

12

þ11x

13

þ12x

14

þ12x

15

þ12x

16

þ11x

17

þ11x

18

þ10x19 þ10x

20

þ8x

21

þ7x

22

þ6x

23

þ5x

24

þ4x

25

þ3x

26

þ2x

27

þ2x28 þx

29

þx

30

ð1Þ

From the expansion of the generating function fðxÞ

(given in Eq. (1)), we know that the coefﬁcient of the power

(with the exponent of 10) is the corresponding number of

solutions we are looking for, i.e. there are 10 possibilities in

which the total weight is exactly 10 g. In fact, the generat-

ing function makes sense as: when multiplying the factors,

x

i

in ð1þx

1

þx

2

þ...þx

10

Þ;x

j

in ð1þx

2

þx

4

þ...þx

10

Þ

and x

k

in ð1þx

5

þx

10

Þproduce the term x

iþjþk

in the ﬁnal

expansion. Furthermore, iþjþkis the weight the

combination {1 g, 2 g and 5 g} represents, the coefﬁcient

166 Z. Huang et al. / Computer Networks 81 (2015) 164–177

of x

iþjþk

is therefore the total number of solutions. As we

will represent later in Section 3.1, we extend the coefﬁ-

cient of each term in the polynomial from integer to deci-

mal, introducing a variant of generating function – which is

then used to model the data reliability.

3. Storage allocation scheme

Storage allocation scheme gives the way how redun-

dant data blocks are distributed and stored over a set of

network nodes. To store one ﬁle with a set of nredundant

blocks, we allocate these blocks to a given set of nodes

S¼fN

1

;...;N

jSj

g, in which the total number of nodes is

jSj. For the sake of discussion, the redundancy scheme is

denoted as RSðn;kÞ, where nis the total number of redun-

dant blocks and kindicates that any subset of kredundant

blocks would be sufﬁcient to recover the original ﬁle. Given

that for node N

i

, its reliability is r

i

and it stores l

i

blocks,

where i2f1;...;jSjg – note that r

i

means that with this

probability all blocks that N

i

hosts are usable for data

recovery, and vice versa. Then we have n¼P

jSj

i¼1

l

i

. And

the storage allocation scheme is mapped into a set of par-

titions SA ¼fl

1

;...;l

jSj

g. Now we illustrate the problem by

a simple example. As depicted in Fig. 1(a), the original ﬁle

with koriginal blocks are encoded into ﬁve redundant

blocks in total (in this case, n¼5), where the original ﬁle

can be reconstructed using any subset of kredundant

blocks. How can we allocate these ﬁve redundant blocks

to a given set of three storage nodes whose reliabilities

are r

1

;r

2

and r

3

respectively? Two possible storage alloca-

tion schemes for this example are given in Fig. 1(b). Then

the question is: which one is better and why?

3.1. Modeling the data reliability

Before deriving the model based on generating function,

we ﬁrst take the scheme SA

1

in Fig. 1(b) as an example to

illustrate: there are three nodes N

1

;N

2

and N

3

, with relia-

bility probabilities of r

1

;r

2

and r

3

respectively. For nodes

N

1

and N

2

, each hosts two redundant blocks whereas N

3

hosts only one. We deﬁne drðpÞas the probability that p

blocks are reliable and the other 5 pblocks are unreli-

able, where p2f0;...;5g. We list all the results of drðpÞ

for SA

1

as follows:

drð5Þ¼r

1

r

2

r

3

, where all three nodes N

1

;N

2

and N

3

are

reliable;

drð4Þ¼r

1

r

2

ð1r

3

Þ, where the nodes N

1

and N

2

are both

reliable and node N

3

is unreliable;

drð3Þ¼r

1

ð1r

2

Þr

3

þð1r

1

Þr

2

r

3

, where node N

3

is reli-

able and one of the two nodes in fN

1

;N

2

gis unreliable;

drð2Þ¼r

1

ð1r

2

Þð1r

3

Þþð1r

1

Þr

2

ð1r

3

Þ, where

node N

3

is unreliable and one of the two nodes in

fN

1

;N

2

gis reliable;

drð1Þ¼ð1r

1

Þð1r

2

Þr

3

, where node N

3

is reliable and

the other two nodes N

1

and N

2

are unreliable;

drð0Þ¼ð1r

1

Þð1r

2

Þð1r

3

Þ, where all the three

nodes N

1

;N

2

and N

3

are unreliable.

Then, the data reliability would be

drð5Þþdrð4Þþdrð3Þif k¼3 since at least kblocks are

needed for the successful data recovery. This example

shows that to calculate the data reliability, we have to

consider all the possible combinations of blocks over

nodes and calculate the corresponding drðpÞone by one.

So the computational cost is fairly expensive especially

when the number of blocks is large [7,8]. To overcome

this issue, we propose a generating function to formulate

the data reliability.

For the generic storage allocation, our generating func-

tion model is described as follows: For each node N

i

,we

introduce a factor GFNRðN

i

ðbÞÞ, which denotes that node

N

i

is reliable to provide l

i

blocks with probability r

i

while

node N

i

is unreliable (to provide blocks) with probability

1r

i

. The reliability GFDRðSA;bÞcan be then deﬁned as a

(

a

)

Steps of Storage Allocation

(

b

)

Two schemes and which one is better?

Fig. 1. An example for storage allocation problem.

Z. Huang et al. / Computer Networks 81 (2015) 164–177 167

generating function of the multiplications of all factors

expressed by GFNRðN

i

ðbÞÞ,

3

which forms Eq. (2):

GFNRðN

i

ðbÞÞ ¼ r

i

b

l

i

þð1r

i

Þb

0

GFDRðSA;bÞ¼Y

jSj

i¼1

r

i

b

l

i

þð1r

i

Þb

0

hi

8

>

>

<

>

>

:

ð2Þ

Taking SA

1

of Fig. 1(b) as an example, we can obtain

GFDRðSA;bÞas follows:

GFDRðSA

1

;bÞ¼½r

1

b

2

þð1r

1

Þb

0

½r

2

b

2

þð1r

2

Þb

0

½r

3

bþð1r

3

Þb

0

¼r

1

r

2

r

3

b

5

þr

1

r

2

ð1r

3

Þb

4

þ½r

1

r

3

ð1r

2

Þþr

2

r

3

ð1r

1

Þb

3

þ½r

1

ð1r

2

Þð1r

3

Þþr

2

ð1r

1

Þð1r

3

Þb

2

þr

3

ð1r

1

Þð1r

2

Þb

1

þð1r

1

Þð1r

2

Þð1r

3

Þb

0

Note that the expanded form of GFDRðSA;bÞis a polyno-

mial, where in each term, the exponent (of the power)

denotes the number of reliable blocks (the same meaning

as pas above) and the coefﬁcient is the sum of the prob-

abilities – for the cases that pblocks are reliable and the

rest are unreliable. This corresponds to drðpÞas we have

calculated before. From this example, we can see that the

coefﬁcients are the sequences we are looking for. To for-

mulate the coefﬁcients in a polynomial – for example

fðbÞ, we deﬁne ½b

p

fðbÞas the coefﬁcient of the item b

p

.

Then, suppose that the polynomial GFDRðSA;bÞis expanded

as the form of P

n

i¼0

comðpÞb

p

, we have comðpÞ¼

½b

p

GFDRðSA;bÞ.

For the generic erasure codes, data is reliable only if at

least kblocks are reliable. Therefore, we can easily obtain

the data reliability DRðSA;kÞas the function of k, which

forms Eq. (3) by considering the terms that pPkin the

expanded form of GFDRðSA;bÞ:

DRðSA;kÞ¼X

n

p¼k

comðpÞ¼X

n

p¼k

½b

p

GFDRðSA;bÞð3Þ

Before we continue, we list the main symbols used

throughout the whole paper in Table 1.

3.2. Properties of the model

In order to ﬁnd an optimal storage allocation scheme

(OSA) for a given set of network nodes, we ﬁrst generate

a set of partitions SA ¼fl

1

;...;l

jSj

gwhere nblocks are

divided into jSjparts; then we assign the partitions to the

storage nodes, and ﬁnally ﬁnd a scheme with minimum

redundancy while achieving a given high reliability ADR

by using a method speciﬁcally designed for this purpose.

We use RSðn;kÞto denote the redundancy scheme as before

for the generic erasure code, where nis the total number of

redundant blocks and kis the number of original blocks. As

redundancy is deﬁned as RY ¼n=k, to obtain its minimum

value, we can either ﬁnd a maximum kwith ﬁxed nor ﬁnd

a minimum nwith ﬁxed k. Before introducing the

algorithm we propose, we have analyzed the properties

of the model by proving several theorems and propositions

in the beginning. To do that, the proof of a lemma is given

at ﬁrst. The details about the relations between these for-

malized proofs and the process to ﬁnd an optimal scheme

will be illustrated later in Section 3.3.

3.2.1. Upper-bound for allocation

Previous and recent studies [7,8] have explicitly demon-

strated that, the search space in which to ﬁnd the optimal

solution is usually very huge – it thus becomes an impor-

tant issue to address. Obviously, there exist quite many

possibilities when allocating ndata blocks to a set of nodes

S¼fN

1

;...;N

jSj

g, in particular when nand jSjare both large

– which is commonly observed in real cloud storage sys-

tem. Therefore, if we can bound the number of blocks

assigned to each node in a properly small extent, then it

makes much sense. We provide in this section the upper-

bound for the allocation by Theorems 2 and 3. Before that,

we introduce Lemma 1 and Theorem 1 –Lemma 1 provides

the fundamental for the proof of the theorems while

Theorem 1 speciﬁes the order issue for the allocation.

Lemma 1. Suppose that f ðbÞ¼P

upþl

j

p¼l

j

c

pl

j

b

p

P

upþl

i

p¼l

i

c

pl

i

b

p

¼P

up

p¼0

c

p

b

p

b

l

j

b

l

i

, where

8

s;c

s

P0.Ifl

j

>l

i

, then

P

n

p¼k

½b

p

fðbÞP0.

Proof. Let us analyze all the cases for l

j

and l

i

one by one:

(1) If l

j

>l

i

Pk, then we can obtain Eq. (4):

X

n

p¼k

½b

p

fðbÞ¼X

upþl

j

p¼l

j

c

pl

j

X

upþl

i

p¼l

i

c

pl

i

¼X

up

p¼0

c

p

X

up

p¼0

c

p

¼0

ð4Þ

(2) If l

j

Pk>l

i

Pkup, then up þl

i

Pk; so we can

obtain Eq. (5):

X

n

p¼k

½bpfðbÞ¼X

upþl

j

p¼l

j

cpl

j

X

upþl

i

p¼k

cpl

i

¼X

up

p¼0

cpX

up

p¼kl

i

cp¼X

kl

i

1

p¼0

cpP0

ð5Þ

Table 1

Meaning of symbols.

Symbol Meaning

nNumber of blocks in total

kNumber of sufﬁcient blocks for recovery

RY Data redundancy, RY ¼n=k

jSjTotal number of nodes

DR Data reliability

ADR Data reliability to achieve

N

i

Node N

i

;i¼f1;2;...;jSjg

SSet of nodes, S¼fN

1

;...;N

jSj

g

l

i

Number of blocks assigned to node N

i

SA A storage allocation, SA ¼fl

1

;...;l

jSj

g

r

i

The reliability of node N

i

RSet of nodes’ reliability, R¼fr

1

;...;r

jSj

g

3

In our discussion, we suppose that all nodes are independent from each

other – meaning that whether a node is reliable or not has nothing to do

with others.

168 Z. Huang et al. / Computer Networks 81 (2015) 164–177

(3) If l

j

Pk>kup >l

i

, then up þl

i

<k; so we can

obtain Eq. (6):

X

n

p¼k

½b

p

fðbÞ¼X

upþl

j

p¼l

j

c

pl

j

P0ð6Þ

(4) If k>l

j

>l

i

Pkup, then up þl

j

>up þl

i

Pk,so

we can obtain Eq. (7):

X

n

p¼k

½b

p

fðbÞ¼X

upþl

j

p¼k

c

pl

j

X

upþl

i

p¼k

c

pl

i

¼X

up

p¼kl

j

c

p

X

up

p¼kl

i

c

p

¼X

kl

i

1

p¼kl

j

c

p

P0

ð7Þ

(5) If k>l

j

Pkup >l

i

, then up þl

j

Pk>up þl

i

;so

we can obtain Eq. (8):

X

n

p¼k

½b

p

fðbÞ¼X

upþl

j

p¼k

c

pl

j

P0ð8Þ

(6) If kup >l

j

>l

i

, then k>up þl

j

>up þl

i

, so we can

obtain Eq. (9):

X

n

p¼k

½b

p

fðbÞ¼0ð9Þ

To sum up all these cases, we can conﬁrm that

P

n

p¼k

½b

p

fðbÞP0, so Lemma 1 holds. h

Theorem 1 [The correlated order of the optimal

allocation].For a given allocation SA ¼fl

1

;...;l

jSj

gwhich is

assigned to a given set of nodes S ¼fN

1

;...;N

jSj

gwith relia-

bility set R ¼fr

1

;...;r

jSj

g;SA and R have the same order if

the allocation is optimal, meaning that

8

r

i

>r

j

, we have

l

i

>l

j

.

Proof. Assume that for the optimal allocation

SA ¼fl

1

;...;l

jSj

g, there exist i;j2f1;...;jSjg, where r

i

>r

j

but l

i

<l

j

. Let us consider another allocation

SA

0

¼l

0

1

;...;l

0

jSj

no

, where l

0

i

¼l

j

;l

0

j

¼l

i

and

8

s–i;j;l

0

s

¼l

s

.

Let FðbÞ¼Q

jSj

s¼1;s–i;s–j

r

s

b

l

s

þð1r

s

Þb

0

, then we can obtain

Eq. (10) in the following steps:

GFDRðSA;bÞGFDRðSA

0

;bÞ¼FðbÞr

i

b

l

i

þð1r

i

Þb

0

hi

r

j

b

l

j

þð1r

j

Þb

0

hi

FðbÞr

i

b

l

0

i

þð1r

i

Þb

0

hi

r

j

b

l

0

j

þð1r

j

Þb

0

hi

¼FðbÞr

i

b

l

i

þð1r

i

Þb

0

hi

r

j

b

l

j

þð1r

j

Þb

0

hi

FðbÞr

i

b

l

j

þð1r

i

Þb

0

hi

r

j

b

l

i

þð1r

j

Þb

0

hi

¼FðbÞr

i

r

j

b

l

i

þl

j

þð1r

i

Þr

j

b

l

j

þr

i

ð1r

j

Þb

l

i

þð1r

i

Þð1r

j

Þb

0

hi

FðbÞr

i

r

j

b

l

i

þl

j

þð1r

i

Þr

j

b

l

i

þr

i

ð1r

j

Þb

l

j

þð1r

i

Þð1r

j

Þb

0

hi

¼FðbÞf½ð1r

i

Þr

j

r

i

ð1r

j

Þb

l

j

þ½r

i

ð1r

j

Þð1r

i

Þr

j

b

l

i

g

¼FðbÞðr

j

r

i

Þb

l

j

þðr

i

r

j

Þb

l

i

hi

¼FðbÞðr

j

r

i

Þb

l

j

b

l

i

ð10Þ

Since P

jSj

i¼1

l

i

¼n;FðbÞcan be then rewritten as

FðbÞ¼P

nl

i

l

j

p¼0

c

p

b

p

with c

p

P0. Hence Eq. (10) becomes:

GFDRðSA;bÞGFDRðSA

0

;bÞ¼ X

nl

j

l

i

p¼0

c

p

b

p

!

ðr

j

r

i

Þb

l

j

b

l

i

¼ðr

j

r

i

ÞX

nl

j

l

i

p¼0

c

p

b

p

b

l

j

b

l

i

ð11Þ

By Lemma 1 and Eq. (3), we know that

DRðSA;kÞDRðSA

0

;kÞ60. So the allocation SA

0

is better

than SA. This leads to a contradiction with the assumption.

Thus Theorem 1 holds. h

For example, considering the scheme SA

2

given in

Fig. 1(b), and a new scheme SA

3

in which node N

1

;N

2

and N

3

stores respectively one, three and one block, we

have DRðSA

2

Þ¼0:9>DRðSA

3

Þ¼0:85. Theorem 1 suggests

that, the larger reliability a node possesses, the more data

blocks should be assigned to and stored in it. Based on this

logic, is it feasible that we give all the data blocks to the

only node who is most reliable if we consider the extreme

case? In fact, the number of blocks that a node can host is

upper-bounded. Theorems 2 and 3 provide the answers for

this concern.

Theorem 2 [Upper-bound in SA for optimal allocation].For

an optimal allocation SA ¼fl

1

;...;l

jSj

g, which is assigned to a

set of nodes S ¼fN

1

;...;N

jSj

gwith a given k, then

8

l

j

2SA,

we have l

j

6k.

Proof. Assume that in the optimal allocation SA;9l

j

2SA and

l

j

>k. Let us consider another allocation SA

0

¼l

0

1

;...;l

0

jSj

no

that l

0

j

¼l

j

1and

8

s–j;l

0

s

¼l

s

. Obviously, the redundancy

of SA

0

is smaller than that of SA.Let

FðbÞ¼Q

jSj

s¼1;s–j

r

s

b

l

s

þð1r

s

Þb

0

, we can obtain Eq. (12):

GFDRðSA;bÞGFDRðSA

0

;bÞ

¼FðbÞr

j

b

l

j

þð1r

j

Þb

0

hi

FðbÞr

j

b

l

0

j

þð1r

j

Þb

0

hi

¼FðbÞr

j

b

l

j

r

j

b

l

0

j

¼FðbÞr

j

b

l

j

b

l

j

1

ð12Þ

Since l

j

>kand l

j

1Pk, we can see that

DRðSA;kÞDRðSA

0

;kÞ¼0. This means that SA

0

can use

smaller redundancy to achieve the same data reliability

as compared to SA. So the allocation SA

0

is better than SA.

This leads to a contradiction with the assumption. Thus

Theorem 2 holds. h

We use the example in Fig. 1(b) for demonstration. By

enumeration method, it is easy to know that the allocation

scheme SA

1

is optimal, and DRðSA

1

Þ¼0:941. Suppose that

k¼3, then the maximum data reliability is 0.9

(<DRðSA

1

Þ) if we assign more than 3 blocks to any one

node. Therefore, Theorem 2 manifests that each node can

host at most kredundant data blocks in an optimal alloca-

tion. In fact, the original data/ﬁle can be easily reconstruct-

ed with any kredundant blocks by deﬁnition – as what we

called ‘‘sufﬁciency’’. Therefore, if kis quite large, in

Z. Huang et al. / Computer Networks 81 (2015) 164–177 169

particular when kapproaches n, the upper-bound in SA

makes less sense for the allocation issue. As such, to further

the restriction on the upper-bound of the elements in SA

becomes necessary.

Theorem 3 [Upper-bound in SA for optimal allocation].For

an optimal allocation SA ¼fl

1

;...;l

jSj

gwhich is assigned to a

set of nodes S ¼fN

1

;...;N

jSj

gwith a given k, SA should either

(i) has the property that

8

l

j

2SA;l

j

6nk; or (ii) assign k

blocks to node N

m

(the node which is most reliable,

i.e. r

m

¼maxfr

1

;...;r

jSj

g) and zero blocks to other nodes.

Proof. Consider an inverse proposition of case (i), we

assume that 9l

j

;l

i

2SA, where l

j

>nk.ByTheorem 1,

we have l

m

Pl

j

>nksince r

m

Pr

j

. We construct anoth-

er allocation SA

0

¼l

0

1

;...;l

0

jSj

no

, where l

0

i

¼l

i

1;l

0

m

¼l

m

þ1

and

8

s–i;s–m;l

0

s

¼l

s

. Let FðbÞ¼Q

jSj

s¼1;s–i;s–m

r

s

b

l

s

þð1r

s

Þb

0

, then we can obtain Eq. (13):

GFDRðSA;bÞGFDRðSA

0

;bÞ¼FðbÞr

i

b

l

i

þð1r

i

Þb

0

hi

r

m

b

l

m

þð1r

m

Þb

0

hi

FðbÞr

i

b

l

0

i

þð1r

i

Þb

0

hi

r

m

b

l

0

m

þð1r

m

Þb

0

hi

¼FðbÞr

i

b

l

i

þð1r

i

Þb

0

hi

r

m

b

l

m

þð1r

m

Þb

0

hi

FðbÞr

i

b

l

i

1

þð1r

i

Þb

0

hi

r

m

b

l

m

þ1

þð1r

m

Þb

0

hi

¼FðbÞr

i

r

m

b

l

i

þl

m

þð1r

i

Þr

m

b

l

m

h

þr

i

ð1r

m

Þb

l

i

þð1r

i

Þð1r

m

Þb

0

iFðbÞ

r

i

r

m

b

l

i

þl

m

þð1r

i

Þr

m

b

l

m

þ1

þr

i

ð1r

m

Þb

l

i

1

þð1r

i

Þð1r

m

Þb

0

hi

¼FðbÞð1r

i

Þr

m

ðb

l

m

b

l

m

þ1

Þþð1r

m

Þr

i

ðb

l

i

b

l

i

1

Þ

hi

ð13Þ

Since P

jSj

t¼1

l

t

¼n;FðbÞcan then be rewritten as

FðbÞ¼P

nl

i

l

m

p¼0

c

p

b

p

;

8

pwith c

p

P0. Based on this, Eq. (13)

can be rewritten as Eq. (14) in the following steps:

GFDRðSA;bÞGFDRðSA

0

;bÞ¼FðbÞ

ð1r

i

Þr

m

b

l

m

b

l

m

þ1

þð1r

m

Þr

i

b

l

i

b

l

i

1

hi

¼FðbÞð1r

i

Þr

m

b

l

m

b

l

m

þ1

þFðbÞð1r

m

Þr

i

b

l

i

b

l

i

1

¼ð1r

i

Þr

m

X

nl

i

l

m

p¼0

c

p

b

p

!

b

l

m

b

l

m

þ1

þð1r

m

Þr

i

X

nl

i

l

m

p¼0

c

p

b

p

!

b

l

i

b

l

i

1

¼ð1r

i

Þr

m

X

nl

i

l

m

p¼0

c

p

b

pþl

m

b

pþl

m

þ1

þð1r

m

Þr

i

X

nl

i

l

m

p¼0

c

p

b

pþl

i

b

pþl

i

1

¼ð1r

i

Þr

m

X

nl

i

l

m

p¼0

c

p

b

pþl

m

b

pþl

m

þ1

þð1r

m

Þr

i

X

nl

m

p¼l

i

c

pl

i

b

p

X

nl

m

1

p¼l

i

1

c

pl

i

þ1

b

p

0

@1

Að14Þ

Since l

m

>nk, we have k>nl

m

>nl

m

1.

Considering that the exponent of the power in each item

should be larger than k, we abandon the second term. For

the ﬁrst term ð1r

i

Þr

m

P

nl

i

l

m

p¼0

c

p

b

pþl

m

b

pþl

m

þ1

, we can

obtain DRðSA;kÞDRðSA

0

;kÞ60byLemma 1.SoSA

0

is bet-

ter than SA. This leads to a contradiction with the assump-

tion. And for SA

0

, all nodes in SnN

m

hold less than kblocks

in total. So the data reliability completely depends on the

reliability of N

m

. Then we have to satisfy the case (ii) by

assigning kblocks to N

m

and zero blocks to other nodes,

which allows us to use a minimum redundancy with a

maximum data reliability r

m

.SoTheorem 3 holds. h

For the example given in Fig. 1(b), we know that SA

1

is

optimal and DRðSA

1

Þ¼0:941. Then we observe that

8

l

j

2SA

1

;l

j

6nk¼2. By Theorem 3, if the maximum data

reliability r

m

among all nodes (for distributed storage) is

smaller than the given (global) data reliability ADR, then a

node is assigned at most ðnkÞredundant data blocks.

3.2.2. Improvement for computation

As we know that during the computation of data relia-

bility, the number of the terms doubles after each multipli-

cation. Therefore, to reduce the number of the terms so as

to decrease the complexity and further to accelerate the

whole calculation becomes necessary. We address this

issue by the proof of Theorems 4 and 5.

Theorem 4 [The sum of the coefﬁcients equals to

1].

8

s2f1;...;jSjg, suppose that Q

s

i¼1

GFNRðN

i

ðbÞÞ can be

expanded as P

sums

p¼0

c

p

b

p

, where sums ¼P

s

1

l

i

and

8

p;c

p

P0,

then P

sums

p¼0

c

p

¼1.

Proof. According to Eq. (2), we have

8

i;GFNRðN

i

ð1ÞÞ ¼ 1.

To obtain the sum of the coefﬁcients in the expanded form

of the polynomial Q

s

i¼1

GFNRðN

i

ðbÞÞ, we use the substitution

of b¼1. By Eq. (2), we can obtain Eq. (15).

X

sums

p¼0

c

p

¼X

sums

p¼0

½b

p

Y

s

i¼1

GFNRðN

i

ð1ÞÞ

!

¼1ð15Þ

Therefore, Theorem 4 holds. The enumeration method for

drðpÞin Section 3.1 gives an example for this theorem. h

Theorem 5 [To simplify the multiplication of polynomi-

als].

8

s2f1;...;jSj1g, suppose that Q

s

i¼1

GFNRðN

i

ðbÞÞcan

be expanded as P

sums

p¼0

c

p

b

p

, where sums ¼P

s

1

l

i

and

8

p;c

p

P0, then

P

k1

p¼0

½b

p

P

k1

p¼0

c

p

b

p

GFNRðN

sþ1

ðbÞÞ

hi

¼P

k1

p¼0

½b

p

Q

sþ1

i¼1

h

GFNRðN

i

ðbÞÞ.

Proof. Since GFNRðN

sþ1

ðbÞÞ ¼ r

sþ1

b

l

sþ1

þð1r

sþ1

Þ, we then

obtain Eqs. (16) and (17):

X

k1

p¼0

c

p

b

p

!

GFNRðN

sþ1

ðbÞÞ ¼ X

k1

p¼0

c

p

b

p

!

r

sþ1

b

l

sþ1

þð1r

sþ1

Þ

hi

¼r

sþ1

X

k1

p¼0

c

p

b

pþl

sþ1

þð1r

sþ1

ÞX

k1

p¼0

c

p

b

p

ð16Þ

170 Z. Huang et al. / Computer Networks 81 (2015) 164–177

Y

sþ1

i¼1

GFNRðN

i

ðbÞÞ ¼ X

sums

p¼0

c

p

b

p

!

r

sþ1

b

l

sþ1

þð1r

sþ1

Þ

hi

¼r

sþ1

X

sums

p¼0

c

p

b

pþl

sþ1

þð1r

sþ1

ÞX

sums

p¼0

c

p

b

p

ð17Þ

We discuss two cases for l

sþ1

as follows:

If l

sþ1

<k, by Eq. (16), we can obtain Eq. (18):

X

k1

p¼0

½b

p

X

k1

p¼0

c

p

!

b

p

GFNRðN

sþ1

ðbÞÞ

"#

¼r

sþ1

X

k1l

sþ1

p¼0

c

p

þð1r

sþ1

ÞX

k1

p¼0

c

p

ð18Þ

By Eq. (17), we can obtain Eq. (19):

X

k1

p¼0

½b

p

Y

sþ1

i¼1

GFNRðN

i

ðbÞÞ

"#

¼r

sþ1

X

k1l

sþ1

p¼0

c

p

þð1r

sþ1

ÞX

k1

p¼0

c

p

ð19Þ

Using Eqs. (18) and (19), we know that Theorem 5 holds

when l

sþ1

<k.

If l

sþ1

Pk, by Eq. (16), we can obtain Eq. (20):

X

k1

p¼0

½b

p

X

k1

p¼0

c

p

!

b

p

GFNRðN

sþ1

ðbÞÞ

"#

¼ð1r

sþ1

ÞX

k1

p¼0

c

p

ð20Þ

By Eq. (17), we can obtain Eq. (21):

X

k1

p¼0

½b

p

Y

sþ1

i¼1

GFNRðN

i

ðbÞÞ

!

¼ð1r

sþ1

ÞX

k1

p¼0

c

p

ð21Þ

Using Eqs. (20) and (21), we know that Theorem 5 holds

when l

sþ1

Pk. To sum up both two cases, we conﬁrm

that Theorem 5 holds.

h

From Theorems 4 and 5, we know that one only needs

to keep the terms containing powers whose exponent is

smaller than kfor each multiplication. And ﬁnally, we

can obtain the data reliability by subtracting the coefﬁcient

of the result of the multiplication from 1. We then use the

example given before, i.e. the allocation scheme SA

1

in

Fig. 1 to illustrate the improvement.

The 1st multiplication: r

1

b

2

þð1r

1

Þb

0

hi

r

2

b

2

þ

h

ð1r

2

Þb

0

¼r

1

r

2

b

4

þ½r

1

ð1r

2

Þþr

2

ð1r

1

Þb

2

þð1r

1

Þ

ð1r

2

Þb

0

. We drop the term r

1

r

2

b

4

, and keep the left

½r

1

ð1r

2

Þþr

2

ð1r

1

Þb

2

þð1r

1

Þð1r

2

Þb

0

. We then

continue and go to next step.

The 2nd multiplication: f½r

1

ð1r

2

Þþr

2

ð1r

1

Þb

2

þ

ð1r

1

Þð1r

2

Þb

0

gr

3

b

1

þð1r

3

Þb

0

hi

¼½r

1

r

3

ð1r

2

Þþr

2

r

3

ð1r

1

Þb

3

þ½r

1

ð1r

2

Þð1r

3

Þþ r

2

ð1r

1

Þð1r

3

Þb

2

þ

ð1r

1

Þð1r

2

Þr

3

b

1

þð1r

1

Þð1r

2

Þð1r

3

Þb

0

. We throw

away the term ½r

1

r

3

ð1r

2

Þþr

2

r

3

ð1r

1

Þb

3

, and

preserve the terms ½r

1

ð1r

2

Þð1r

3

Þþr

2

ð1r

1

Þð1r

3

Þ

b

2

þð1r

1

Þð1r

2

Þr

3

b

1

þð1r

1

Þð1r

2

Þð1r

3

Þb

0

.We

then continue and go to next step.

After getting the coefﬁcient, we subtract it from 1:

DRðSA

1

;3Þ¼1½r

1

ð1r

2

Þð1r

3

Þþ r

2

ð1r

1

Þð1r

3

Þ

ð1r

1

Þð1r

2

Þr

3

ð1r

1

Þð1r

2

Þð1r

3

Þ. Referring to

the results for drðpÞin Section 3.1, we have

DRðSA

1

;3Þ¼1drð2Þdrð1Þdrð0Þ. In fact, by

expanding all the drðpÞs, one gets the sum as

P

5

p¼0

drðpÞ¼1. Hence, DRðSA

1

;3Þ¼drð5Þþdrð4Þþ

drð3Þ- the result is veriﬁed to be correct.

From the above example, we see that one uses at most k

(k¼3) terms to multiply other two terms. With this strat-

egy, the computation is efﬁciently simpliﬁed and the com-

plexity is signiﬁcantly improved.

3.2.3. Stop condition for searching process

In fact, for a given n, increasing kis equivalent to

decreasing the data redundancy RY as RY ¼n=k.

Accordingly, the data reliability degenerates, vice versa.

Therefore, the data reliability, as well as the data redun-

dancy, depends on two factors: nand k. Hereafter, we

devote ourselves to the discussion concerning the stop

condition for search by two cases: (i) to ﬁnd a maximum

kwith ﬁxed n– in which Proposition 3 is used (note that

Theorem 6,Propositions 1 and 2 are prepared in advance

for Proposition 3); (ii) to ﬁnd a minimum nwith ﬁxed k–

Proposition 5 is adopted (Proposition 4 provides the help

for Proposition 5).

Theorem 6 [The monotonicity of the data reliability].When

n is ﬁxed, the data reliability monotonically decreases as k

increases.

Proof. As mentioned before, data is reliable only if at least

kblocks are available. Suppose that k

1

>k

2

, by Eq. (3),we

can obtain Eq. (22) as follows:

DRðSA;k

1

ÞDRðSA;k

2

Þ¼X

n

p¼k

1

½b

p

GFDRðSA;bÞ

X

n

p¼k

2

½b

p

GFDRðSA;bÞ¼

X

k

1

p¼k

2

½b

p

GFDRðSA;bÞ60ð22Þ

As each term (i.e. ½b

p

GFDRðSA;bÞ) in the expanding form

is larger than or equal to 0, so P

k

1

p¼k

2

½b

p

GFDRðSA;bÞ60.

We then have DRðSA;k

1

Þ6DRðSA;k

2

Þ. Hence, Theorem 6

holds. For the scheme SA

1

given in Fig. 1(b), we have: (i)

if k¼5, then DRðSA

1

;5Þ¼0:612; (ii) if k¼4, then

DRðSA

1

;4Þ¼0:612 þ0:153 ¼0:765; (iii) if k¼3, then

DRðSA

1

;3Þ¼0:941. Note that Theorem 6 is similar to

Lemma 2 in [7].h

Proposition 1. Given the total number of redundant blocks n

and a data reliability ADR to achieve, if

8

k

0

0

¼k

0

þ1;9SA

0

,

where maxðSA

0

Þ6k

0

0

;DRðSA

0

;k

0

0

ÞPADR, then

8

k

0

;9SA,

where maxðSAÞ6k

0

;DRðSA;k

0

ÞPADR.

Z. Huang et al. / Computer Networks 81 (2015) 164–177 171

Proof. We consider two cases for maxðSA

0

Þand k

0

0

:

1) If maxðSA

0

Þ<k

0

0

, we set SA ¼SA

0

.ByTheorem 6,we

can further obtain that DRðSA;k

0

ÞPDRðSA

0

;k

0

0

ÞPADR

since k

0

0

>k

0

.

2) If maxðSA

0

Þ¼k

0

0

, suppose that SA

0

is in a descending

order, then l

0

1

Pl

0

2

PPl

0

jSj

. Since maxðSA

0

Þ¼k

0

0

, then

9qso that l

0

1

¼¼ l

0

q

¼k

0

0

and l

0

qþ1

<k

0

0

.

We consider another allocation SA

with descending

order as well, where l

1

¼ ¼ l

q

¼k

0

0

1¼k

0

and

8

s2½qþ1;jSj;l

s

¼l

0

s

. We suppose that Q

jSj

s¼qþ1

r

s

b

l

0

s

þð1r

s

Þ

hi

¼P

suml

p¼0

c

p

b

p

, where suml ¼P

jSj

i¼qþ1

l

i

and

8

p;c

p

P0. And also Q

q

s¼1

r

s

b

k

0

0

þð1r

s

Þ

¼

P

q

i¼1

c

0

i

b

ik

0

0

þQ

q

s¼1

ð1r

s

Þ

hi

, where

8

i;c

0

i

P0. Similar to

the case of k

0

0

, we have Q

q

s¼1

r

s

b

k

0

þð1r

s

Þ

¼

P

q

i¼1

c

0

i

b

ik

0

þQ

q

s¼1

ð1r

s

Þ

hi

for the case of k

0

. Then we

can obtain Eqs. (23) and (24):

GFDRðSA0;bÞ¼Y

jSj

s¼1

rsbk

0

0

þð1rsÞ

hi

¼Y

q

s¼1

rsbk

0

0

þð1rsÞ

hi

Y

jSj

s¼qþ1

rsbl

0

s

þð1rsÞ

hi

¼X

q

i¼1

c0

ibik

0

0

þY

q

s¼1

ð1rsÞ

"#

X

suml

p¼0

cpbp¼X

q

i¼1

c0

ibik

0

0

X

suml

p¼0

cpbpþY

q

s¼1

ð1rsÞX

suml

p¼0

cpbp

ð23Þ

GFDRðSA;bÞ¼Y

jSj

s¼1

rsbk

0

þð1rsÞ

hi

¼Y

q

s¼1

rsbk

0

þð1rsÞ

hi

Y

jSj

s¼qþ1

rsbl

0

s

þð1rsÞ

hi

¼X

q

i¼1

c0

ibik

0

þY

q

s¼1

ð1rsÞ

"#

X

suml

p¼0

cpbp¼X

q

i¼1

c0

ibik

0

X

suml

p¼0

cpbpþY

q

s¼1

ð1rsÞX

suml

p¼0

cpbp

ð24Þ

We can therefore obtain Eq. (25) as follows:

DR SA0;k0

0

DRðSA;k0Þ¼ X

q

i¼1

c0

iX

suml

p¼0

cpþY

q

s¼1

ð1rsÞX

suml

p¼k

0

0

cp

2

43

5

X

q

i¼1

c0

iX

suml

p¼0

cpþY

q

s¼1

ð1rsÞX

suml

p¼k

0

cp

"#

¼

k

0

0

1¼k

0

Y

q

s¼1

ð1rsÞck

0

60ð25Þ

In the deriving process of Eq. (25), we have assumed that

suml Pk

0

0

. Actually, if suml ¼k

0

<k

0

0

, the result also holds.

And if suml <k

0

<k

0

0

, then c

k

0

¼0, so the result still holds.

Given that DRðSA

0

;k

0

0

ÞPADR, so we have DRðSA

ÞP

DR SA

0

;k

0

0

PADR. Furthermore, we consider another allo-

cation SA, where

8

s2½1;jSj1;l

s

¼l

s

;l

jSj

¼l

jSj

þq.

Suppose that Q

jSj1

i¼1

r

i

b

l

i

þð1r

i

Þ

hi

¼P

sumls

p¼0

c

p

b

p

, where

sumls ¼P

jSj1

i¼1

l

i

and

8

p;c

p

P0, then we can obtain Eq.

(26) as follows:

GFDRðSA;bÞGFDRðSA

;bÞ¼r

jSj

X

sumls

p¼0

c

p

b

p

b

l

jSj

b

l

jSj

By Lemma 1, we can see that DRðSA;k

0

ÞDRðSA

;k

0

ÞP0

since l

jSj

>l

jSj

. Then we have DRðSA;k

0

ÞPDRðSA

;k

0

ÞP

DRðSA

0

;k

0

0

ÞPADR. Thus DRðSA;k

0

ÞPADR.

To sum up case (1) and case (2), Proposition 1 holds. h

By Proposition 1, we can quickly obtain an inverse

proposition:

Proposition 2. Given the total number of redundant blocks n

and a data reliability ADR to achieve, if 9k

0

;

8

SA where

maxðSAÞ6k

0

;DRðSA;k

0

Þ<ADR, then 9k

0

0

¼k

0

þ1;

8

SA

0

,

where maxðSA

0

Þ6k

0

0

;DR SA

0

;k

0

0

<ADR.

Since 9k

0

0

¼k

0

þ1, by iteratively using Proposition 2,

we can easily obtain the following propositions:

Proposition 3 [Stop condition for searching process with

ﬁxed n].Given the total number of redundant blocks n and a

data reliability ADR to achieve, if 9k

0

;

8

SA, where

maxðSAÞ6k

0

;DRðSA;k

0

Þ<ADR, then 9k

0

0

>k

0

;

8

SA

0

, where

maxðSA

0

Þ6k

0

0

;DR SA

0

;k

0

0

<ADR.

From Proposition 3, we know that for a certain k

0

, if any

allocation scheme cannot reach the given data reliability

ADR, then there is no need to continue searching for larger

k

0

0

,i.e. stop searching.

Proposition 4. Given k (meaning that k blocks are sufﬁcient

for the recovery) and a data reliability ADR to achieve, if

9n

0

;9SA, where DRðSA;kÞPADR, then

8

n

0

0

>n

0

;9SA

0

, where

DRðSA

0

;kÞPADR.

Proof. Assume that allocation SA is organized as

SA ¼fl

1

;...;l

jSj

g.

8

n

0

0

>n

0

, let us consider another alloca-

tion SA

0

¼l

0

1

;...;l

0

jSj

no

that l

0

j

¼l

j

þn

0

0

n

0

and

8

s–j;l

0

s

¼l

s

. Let FðbÞ¼Q

jSj

s¼1;s–j

r

s

b

l

s

þð1r

s

Þb

0

hi

, then we

can obtain Eq. (27),

GFDRðSA;bÞGFDRðSA

0

;bÞ¼FðbÞr

j

b

l

j

þð1r

j

Þb

0

hi

FðbÞr

j

b

l

0

j

þð1r

j

Þb

0

hi

¼FðbÞr

j

b

l

j

b

l

0

j

ð27Þ

Since l

0

j

¼l

j

þn

0

0

n

0

>l

j

, we then have

DRðSA;kÞ6DRðSA

0

;kÞby Lemma 1.SoDRðSA

0

;kÞPADR.

Thus Theorem 4 holds. h

Then we can obtain the following inverse proposition:

Proposition 5. [Stop condition for searching process with

ﬁxed k].Given k (meaning that k blocks are sufﬁcient for the

recovery) and a data reliability ADR to achieve, if

9n

0

0

>n

0

;

8

SA

0

, where DRðSA

0

;kÞ<ADR, then

8

n

0

;

8

SA, where

DRðSA;kÞ<ADR.

172 Z. Huang et al. / Computer Networks 81 (2015) 164–177

3.3. Finding the optimal storage allocation

By Eqs. (2) and (3), we list the objective and constraints

to formulate the optimization problem, leading to

Eq. (28):

obj :minimize n=k

subject to :X

jSj

i¼1

l

i

¼n

DRðSA;kÞPADR

8

>

>

>

>

<

>

>

>

>

:

ð28Þ

Here, ADR is a (given) high data reliability to achieve, for

instance 0.9999. In the process of ﬁnding the optimal stor-

age allocation, we can either ﬁnd a maximum kwith ﬁxed

nor ﬁnd a minimum nwith ﬁxed k, both of which allow us

to obtain a minimum redundancy RY ¼n=k. For these two

cases, we use different solutions:

(1) Finding a maximum kwith ﬁxed nTo do so, we

have to proceed in three phases: (i) partition n

blocks into jSjparts, obtaining an allocation SA, (ii)

assign the allocation SA to the set of nodes S, (iii) ﬁnd

the maximum kto reach the requirement on data

reliability. We analyze the three phases one by one:

Initialize k

0

: Here, we adopt the method in [7] to

generate k

0

, which assign d

r

i

P

jSj

i¼1

r

i

neredundant

blocks to a general node N

i

, where de denotes

the nearest integer function (more precise, the

ceiling). Then k

0

is continuously obtained by the

subsequent steps, including data reliability com-

putation and the searching of maximum k.

Partition:ByTheorems 2 and 3, we can obtain the

upper bound for each part of partitions. So we have

8

i2½1;...;jSj;16l

i

6minfk;nkg. Moreover,

for l

i

>nk, we have to check whether the

maximum reliability of nodes is larger than ADR

or not at the beginning.

Assignment:ByTheorem 1, we can see that we

should assign the partitioned parts to storage

nodes according to the order of nodes’ reliability.

So the way to allocate is ﬁxed for a given parti-

tion set.

Reliability calculation: In this process, we can sim-

plify the computation by using Theorems 4 and 5.

Finding the maximum k: To ﬁnd a maximum k,we

can increase kby a policy speciﬁed in Theorem 6,

which allows us to quickly ﬁnd a kto reach the

given ADR. With the increase of k, we can

consciously terminate the operation when ADR

cannot be achieved anyhow according to

Proposition 3.

In doing so, we can easily obtain an algorithm for the mod-

el by the above steps.

(2) Finding a minimum nwith ﬁxed k: In this case, we

cannot generate partitions without a ﬁxed n. The

ﬁrst step is to generate an approximately optimal

n. To generate partitions and then to assign the allo-

cation to a given set of storage nodes are followed.

We analyze the phases one by one:

Initialize n

0

: At the ﬁrst time, we generate the

approximately optimal nusing a method similar

to the one proposed in [7]. We allocate one

redundant block to the nodes with minimum

reliability, denoted as r

min

. Then for any storage

node N

i

, we allocate

r

i

r

min

lm

redundant blocks. So

the total number of redundant blocks is

n¼P

jSj

i¼1

r

i

r

min

lm

. We will later check whether this

is the minimum value for n.

Partition:ByTheorems 2 and 3, we can obtain the

upper bound for each part of partitions. So we

have

8

i2f1;...;jSjg;16l

i

6minfk;nkg. For

l

i

>nk, we have to check whether the maxi-

mum reliability among nodes is larger than ADR

or not at the beginning. If yes, we assign k¼n

data blocks to the node who has the maximum

reliability - so that a minimum redundancy 1 is

obtained in this case.

Assignment: As illustrated above, the allocation

step is ﬁxed for a given partition set by

Theorem 1 – as the partitioned parts are assigned

according to the order of nodes’ reliability.

Reliability calculation: In this process, we can sim-

plify the computation by using Theorems 4 and 5.

Finding the minimum n: This step is to check

whether the reliability of all cases of allocations

is less than ADR. If it is yes, we increase naccord-

ing to Proposition 5. If it is not, we decrease nand

continue from step (Partition) again.

4. Performance evaluation

In this section, we provide the evaluation on the perfor-

mance improvement, which beneﬁts from the cloud stor-

age allocation solution we have proposed, consisting of

the aforementioned theorems and propositions. We ﬁrst

analytically discuss the reduction on the search space

and the simpliﬁcation of the computation process. As the

important complementation, we also address the redun-

dancy issue by the experiments with real traces – which

conﬁrms our expectation that given data reliability, the

improvement on data redundancy is signiﬁcant by using

our solution.

4.1. Analytical evaluation on the search space

The advantages of our allocation model is demonstrated

as follows:

(1) Reduction in partition: In order to illustrate the

reduction in the phase of partition when the number

of data blocks assigned to each host is upper bounded

by minfk;nkginstead of n, we deﬁne the ratio of

number of partitions R

p

:R

p

¼

Nupðn;jSj;minfk;nkgÞ

Nupðn;jSj;nÞ

, where

Nupðn;jSj;mÞdenotes the number of partitions that n

is partitioned into jSjparts with an upper bound mfor

each part. To expand minfk;nkg, we use the data

redundancy RY ¼n=k. One can easily ﬁnd that

Z. Huang et al. / Computer Networks 81 (2015) 164–177 173

minfk;nkg¼kif RY P2 and minfk;nkg¼

kðRY 1Þif RY <2. Note that R

p

(0 <R

p

61, according

to its deﬁnition) is expected to be as small as possible so

that the set of possible partitions can be reduced. We

have plotted the relationship between R

p

and the

redundancy RY in Fig. 2 with nblocks

4

and jSjnodes

5

(expressed as hn;jSji) in two cases: (a) number of nodes

jSjis ﬁxed; (b) number of blocks nis ﬁxed.

Interestingly, Rpis observed to be fairly small (close to

0) when RY is quite large (for example RY ¼10 in our

study) or quite small (RY ¼1:1). It is important to note

that in current RAID system [20], the redundancy RY is

not larger than 1.5 for RAID-5 and not larger than 1.67

for RAID-6 – the results of Rpin both systems are par-

ticularly encouraging. In addition, Fig. 2(a) exposes that

Rpbehaves highly similar when we vary the number of

data blocks n(from 60, 80 to 100) and at the same time

keep the number of nodes jSjto be a constant. Inversely,

when we change jSjwhile keeping the value of n,we

observe from Fig.2(b) that for the same RY (in between

2 and 9), Rpdecreases as jSjdecreases – meaning that

the smaller the number of nodes, the better performance

of our solution can obtain.

(2) Reduction in assignment: When using Theorem 1, the

assignment is ﬁxed by the rule so that randomly allo-

cating data blocks to a set of storage nodes can be avoid-

ed. We deﬁne ratio R

a

¼

no:of assignments by the model

no:of assignments in a ran dom manner

.In

this case, we obtain the ratio R

a

¼1=jSj!, where

jSj!¼jSjðjSj1Þ21. For instance, if

jSj¼15, we achieve the signiﬁcant saving as

R

a

¼7:65 10

13

.

(3) Simpliﬁcation in calculation:ByTheorems 4 and 5,

we are able to strictly limit the number of terms to k

in each multiplication. Hence, the worst case for our

solution in the calculation is the multiplication between

polynomial with kterms and polynomial with two

terms. Without this simpliﬁcation, the computation

would be extremely complex as one has to deal with

the multiplication between polynomial with 2

jSj1

terms and polynomial with two terms. When taking

jSj¼15 and k¼15 as an example, we see that the ratio

of the improvement by simpliﬁcation can reach

k

2

jSj1

¼9:16 10

4

.

By summing up the Advantage (1) and (2) presented

above, we see that the search space can be efﬁciently

reduced to R

p

R

a

of the original search space. Furthermore,

when taking into account the Advantage (3), the computa-

tion process can be highly simpliﬁed. In fact, we also use

Proposition 3 or Proposition 5 to quickly terminate the

search process, which can save the time to ﬁnd the

optimal.

4.2. Experimental evaluation

To get a more comprehensive and realistic evaluation of

our solution, in particular for the assessment of the redun-

dancy, we use the availability datasets from [31] – a public

repository of traces used for the study of distributed sys-

tems. Each trace deﬁnes the active-inactive state of the

nodes (so called ‘‘churn’’ in the area of distributed comput-

ing) and is collected from the real world. We choose four

different traces for our study: Microsoft (corporate PCs,

2000), Skype (Skype superpeers, 2006), WebSites (web ser-

vers, 2002) and PlanetLab (P2P, 2005). A table summariz-

ing each trace is given in Table 2. The more detail

concerning the traces may refer to [32].

We randomly select a set of jSjnodes from each trace,

and we then allocate nblocks to these nodes. For each node

1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Redundancy

Rp

Rp

Savings by partition with <n, |S|>:n blocks and |S| nodes

<100,15>

<80,15>

<60,15>

(a) R

p

with same number of nodes

1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Redundancy

Savings by partition with <n,|S|>:n blocks and |S| nodes

<80,18>

<80,15>

<80,12>

(b) R

p

with same number of blocks

Fig. 2. Savings by partition with n redundant blocks and jSjstorage nodes.

Table 2

Basic information of each trace.

System Type # of Nodes Period Year

Microsoft corporate PCs 51,663 35 days 2000

Skype P2P 4000 1 month 2006

WebSites Web servers 129 8 months 2002

PlanetLab P2P 200–400 1.5 year 2005

4

Note that, in the broad discussion in the literature [28,29,12,24,30], the

number of data blocks for allocation is usually few tens.

5

In the existing practical systems such as WUALA [21] and CLEVERSAFE

[22], the number of nodes is not more than 16. Referring to [7,8], the

number of nodes is set to be 14 in their discussion. For these considerations,

we range this number from 12 to 18 throughout our analysis.

174 Z. Huang et al. / Computer Networks 81 (2015) 164–177

</