A Fundamental Trade-Off between the Download Cost and Repair Bandwidth in Distributed Storage Systems
ABSTRACT Distributed storage systems are mainly justified due to the limited amount of storage capacity and improving the reliability through distributing data over multiple storage nodes. However, it may happen the data is stored in unreliable nodes, while it is desirable the end user to have a reliable access to the stored data. So, in an event that a node is damaged, to prevent the system reliability to regress, it is necessary to regenerate a new node with the same amount of stored data as the damaged node to retain the number of storage nodes, thereby having the previous reliability. This requires the new node to connect to some of existing nodes, and downloads the required information, thereby occupying some bandwidth, called the repair bandwidth. On the other hand, it is more likely the cost of downloading varies across different nodes. This paper aims at investigating the fundamental trade-off between the download cost and repair bandwidth, and more importantly, it is shown any point on this curve can be achieved through the use of the so called generalized regenerating codes which is an enhancement to the regenerating codes introduced by Dimakis et al. in.
- [Show abstract] [Hide abstract]
ABSTRACT: Distributed storage systems aim at providing a reliable storage over unreliable nodes thorough introducing redundancy. In these systems when a node is failed to retain the previous reliability a newcomer is connected to existing nodes and downloads the same amount of information as the damage node. Thus, a great deal of data transferring, called the repair bandwidth, is imposed to the system. Recently, regenerating codes are introduced to reduce the repair bandwidth through using the notion of network coding. Furthermore, in the generalized regenerating codes are proposed which is an extension of regenerating codes for the case of having different download cost from surviving nodes. The current paper provides a real world example to explain the main difference between regenerating codes and generalized regenerating codes.
- [Show abstract] [Hide abstract]
ABSTRACT: Erasure code based distributed storage systems provide data robustness by storing encoded-fragments over servers. To maintain data robustness, a repair mechanism recovers a storage system from server failures by repairing encoded-fragments. For decentralized erasure code based storage systems, we propose a decentralized repair mechanism. Our mechanism has the following features. Firstly, an encoded-fragment is replenished by a combination of a number u of encoded-fragments that are randomly chosen. Secondly, the number u depends on the number of the available encoded-fragments and is independent of the pattern of missing encoded-fragments. Thirdly, multiple encoded-fragments are simultaneously replenished in parallel. We measure the communication cost in terms of the number u of required network connections for replenishing an encoded-fragment. We then conducted a numerical analysis by using traces of real systems. We find that our requirement on u is smaller than that from existing methods. Both theoretical and numerical results show that our decentralized repair mechanism outperforms existing ones in terms of the communication cost under the same consideration of efficiency cost for storage.
Article: Selective Regenerating Codes[Show abstract] [Hide abstract]
ABSTRACT: Regenerating codes are mainly justified due to their ability to reduce the repair bandwidth incurred by a newcomer node. This happens when a node fails or leaves the network, thus a new node is initiated, attempting to connect to existing nodes to reconstruct the data. This paper aims to investigate the case in which the newcomer can wisely select some of existing nodes to connect to, so as to reduce the repair bandwidth. Accordingly, selective regenerating codes are proposed, showing the corresponding repair bandwidth is dramatically reduced as compared to that of existing codes.IEEE Communications Letters 08/2011; 15(8):854-856. DOI:10.1109/LCOMM.2011.061611.102271 · 1.46 Impact Factor