Conference Paper

A Fundamental Trade-Off between the Download Cost and Repair Bandwidth in Distributed Storage Systems

Dept. of Electr. Eng., Shahed Univ., Tehran, Iran
DOI: 10.1109/NETCOD.2010.5487685 Conference: Network Coding (NetCod), 2010 IEEE International Symposium on
Source: IEEE Xplore

ABSTRACT Distributed storage systems are mainly justified due to the limited amount of storage capacity and improving the reliability through distributing data over multiple storage nodes. However, it may happen the data is stored in unreliable nodes, while it is desirable the end user to have a reliable access to the stored data. So, in an event that a node is damaged, to prevent the system reliability to regress, it is necessary to regenerate a new node with the same amount of stored data as the damaged node to retain the number of storage nodes, thereby having the previous reliability. This requires the new node to connect to some of existing nodes, and downloads the required information, thereby occupying some bandwidth, called the repair bandwidth. On the other hand, it is more likely the cost of downloading varies across different nodes. This paper aims at investigating the fundamental trade-off between the download cost and repair bandwidth, and more importantly, it is shown any point on this curve can be achieved through the use of the so called generalized regenerating codes which is an enhancement to the regenerating codes introduced by Dimakis et al. in.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Regenerating codes are a class of recently developed codes for distributed storage that, like Reed-Solomon codes, permit data recovery from any subset of nodes within the -node network. However, regenerating codes possess in addition, the ability to repair a failed node by connecting to an arbitrary subset of nodes. It has been shown that for the case of functional repair, there is a tradeoff between the amount of data stored per node and the bandwidth required to repair a failed node. A special case of functional repair is exact repair where the replacement node is required to store data identical to that in the failed node. Exact repair is of interest as it greatly simplifies system implementation. The first result of this paper is an explicit, exact-repair code for the point on the storage-bandwidth tradeoff corresponding to the minimum possible repair bandwidth, for the case when . This code has a particularly simple graphical description, and most interestingly has the ability to carry out exact repair without any need to perform arithmetic operations. We term this ability of the code to perform repair through mere transfer of data as repair by transfer. The second result of this paper shows that the interior points on the storage-bandwidth tradeoff cannot be achieved under exact repair, thus pointing to the existence of a separate tradeoff under exact repair. Specifically, we identify a set of scenarios which we term as “helper node pooling,” and show that it is the necessity to satisfy such scenarios that overconstrains the system.
    IEEE Transactions on Information Theory 03/2012; 58(3):1837-1852. DOI:10.1109/TIT.2011.2173792 · 2.65 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such big systems, node failures take place on a regular basis. When a node fails or leaves the system, to maintain the same level of redundancy, it is expected to regenerate the redundant data at a replacement node as soon as possible. Previous studies aim to minimize the network traffic in the regeneration process, but in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always mean minimizing regeneration time. Considering the heterogeneous link capacities, Li et al. proposed a tree-structured regeneration scheme, called RCTREE, to bypass the low-capacitated link encountered in direct transmissions. However, we find that RCTREE may rapidly lose data integrity after several regenerations. In this paper, we reconsider the problem of minimizing regeneration time in networks with heterogeneous link capacities. We derive the minimum amount of data to be transmitted through each link to preserve data integrity. We prove that building an optimal regeneration tree is NP-complete and propose a heuristic algorithm for a near-optimal solution. We further introduce a flexible regeneration scheme, which allows providers to generate different amount of coded data. Simulation results show that the flexible tree-structured regeneration scheme can reduce the regeneration time significantly.
    IEEE INFOCOM 2014 - IEEE Conference on Computer Communications; 04/2014
  • [Show abstract] [Hide abstract]
    ABSTRACT: Erasure codes are applied in distributed storage systems to provide data robustness against server failures by storing data redundancy among many storage servers. A (n, k) erasure code encodes a data object, which is represented as k elements, into a codeword of n elements such that any k out of these n codeword elements can recover the data object back. Decentralized erasure codes are proposed for distributed storage systems without a central authority. The characteristic of decentralization makes resulting storage systems more scalable and suitable for loosely-organized networking environments. However, different from conventional erasure codes, decentralized erasure codes trade some probability of a successful data retrieval for decentralization. Although theoretical lower bounds on the probability are overwhelming from a theoretical aspect, it is essential to know what the data retrievability is in real applications from a practical aspect. We focus on decentralized erasure code based storage systems and investigate data retrievability from both theoretical and practical aspects. We conduct simulation for random processes of storage systems to evaluate data retrievability. Then we compare simulation results and analytical values from theoretical bounds. By our comparison, we find that data retrievability is underestimated by those bounds. Data retrievability is over 99% in most cases in our simulations, where the order of the used finite field is an 8-bit prime. Data retrievability can be enlarged by using a larger finite field. We believe that data retrievability of decentralized erasure code based storage systems is acceptable for real applications.
    Software Security and Reliability (SERE), 2013 IEEE 7th International Conference on; 01/2013