A Fundamental Trade-Off between the Download Cost and Repair Bandwidth in Distributed Storage Systems
ABSTRACT Distributed storage systems are mainly justified due to the limited amount of storage capacity and improving the reliability through distributing data over multiple storage nodes. However, it may happen the data is stored in unreliable nodes, while it is desirable the end user to have a reliable access to the stored data. So, in an event that a node is damaged, to prevent the system reliability to regress, it is necessary to regenerate a new node with the same amount of stored data as the damaged node to retain the number of storage nodes, thereby having the previous reliability. This requires the new node to connect to some of existing nodes, and downloads the required information, thereby occupying some bandwidth, called the repair bandwidth. On the other hand, it is more likely the cost of downloading varies across different nodes. This paper aims at investigating the fundamental trade-off between the download cost and repair bandwidth, and more importantly, it is shown any point on this curve can be achieved through the use of the so called generalized regenerating codes which is an enhancement to the regenerating codes introduced by Dimakis et al. in.
- SourceAvailable from: psu.edu
Article: Network information flow[show abstract] [hide abstract]
ABSTRACT: We introduce a new class of problems called network information flow which is inspired by computer network applications. Consider a point-to-point communication network on which a number of information sources are to be multicast to certain sets of destinations. We assume that the information sources are mutually independent. The problem is to characterize the admissible coding rate region. This model subsumes all previously studied models along the same line. We study the problem with one information source, and we have obtained a simple characterization of the admissible coding rate region. Our result can be regarded as the max-flow min-cut theorem for network information flow. Contrary to one's intuition, our work reveals that it is in general not optimal to regard the information to be multicast as a “fluid” which can simply be routed or replicated. Rather, by employing coding at the nodes, which we refer to as network coding, bandwidth can in general be saved. This finding may have significant impact on future design of switching systemsIEEE Transactions on Information Theory 08/2000; · 2.62 Impact Factor
Conference Proceeding: High Availability in DHTs: Erasure Coding vs. Replication.[show abstract] [hide abstract]
ABSTRACT: High availability in peer-to-peer DHTs requires data redun- dancy. This paper compares two popular redundancy schemes: replica- tion and erasure coding. Unlike previous comparisons, we take the char- acteristics of the nodes that comprise the overlay into account, and con- clude that in some cases the benets from coding are limited, and may not be worth its disadvantages.Peer-to-Peer Systems IV, 4th International Workshop, IPTPS 2005, Ithaca, NY, USA, February 24-25, 2005, Revised Selected Papers; 01/2005