ArticlePDF Available

Abstract

A new family of maximum distance separable (MDS) array codes is presented. The code arrays contain p information columns and r independent parity columns, each column consisting of p-1 bits, where p is a prime. We extend a previously known construction for the case r=2 to three and more parity columns. It is shown that when r=3 such extension is possible for any prime p. For larger values of r, we give necessary and sufficient conditions for our codes to be MDS, and then prove that if p belongs to a certain class of primes these conditions are satisfied up to r&les;8. One of the advantages of the new codes is that encoding and decoding may be accomplished using simple cyclic shifts and XOR operations on the columns of the code array. We develop efficient decoding procedures for the case of two- and three-column errors. This again extends the previously known results for the case of a single-column error. Another primary advantage of our codes is related to the problem of efficient information updates. We present upper and lower bounds on the average number of parity bits which have to be updated in an MDS code over GF (2<sup>m</sup>), following an update in a single information bit. This average number is of importance in many storage applications which require frequent updates of information. We show that the upper bound obtained from our codes is close to the lower bound and, most importantly, does not depend on the size of the code symbols
A preview of the PDF is not available
... In an application like SSS, if the size of the secret is relatively large, implementation of an RS code has to be done multiple times. The array codes described in [9], [16]- [18], [20], [21] have symbols (which correspond to columns in the array) of size p − 1, where p is a prime number. Certainly, p can be as large as needed, while large symbols in an RS code require a large look-up table for the corresponding finite field and may not be practical. ...
... Code C in Algorithm 2 is a generalization of the EVEN-ODD code [20] and has different names in literature: generalized EVENODD code [21], independent parity (IP) code [21], and Blaum-Bruck-Vardy code [22]. We will refer to it as an IP code. ...
... Code C in Algorithm 2 is a generalization of the EVEN-ODD code [20] and has different names in literature: generalized EVENODD code [21], independent parity (IP) code [21], and Blaum-Bruck-Vardy code [22]. We will refer to it as an IP code. ...
Article
The Shamir secret sharing (SSS) scheme requires a Maximum Distance Separable (MDS) code, and in its most common implementation, a Reed-Solomon (RS) code is used. In this letter, we observe that the encoding procedure can be made simpler and faster by dropping the MDS condition and specifying the possible symbols that can be shared. In particular, the process can be made even faster by using array codes based on exclusive-or (XOR) operations instead of RS codes.
... In Section III, we consider the advantages of using the modified Shamir scheme of Section II with array codes as opposed to RS codes. In particular, we illustrate the ideas with Generalized EVENODD codes [3]. In Section IV, we address other possibilities, like adapting the modified Shamir scheme to Generalized Row-Diagonal Parity (GRDP) codes [1], [7] and identifying cases in which some participants report incorrect symbols. ...
... In an application like the Shamir scheme described in Section II, if the size of the secret is pretty large, implementation of a RS code has to be done multiple times. Array codes like the ones described in [1]- [3], [5]- [7] can have symbols (which correspond to columns in the array) of size p − 1, where p is a prime number. Certainly, p can be as large as needed, while large symbols in a RS code require a large look-up table in the corresponding finite field and may not be practical. ...
... In order to apply our particular version of the Shamir scheme as described in Section II, we need to consider the parity-check matrix H ′ as given by (1), while H is given by (2). Such a resulting code is a generalization of the EVENODD code [2] and has different names in literature: generalized EVENODD code [3], independent parity (IP) code [3] or Blaum-Bruck-Vardy code [9]. The MDS condition of these codes has been extensively studied for r 4 [3], [9], but for our purpose the modified Shamir scheme will always work for r < k, as in the case of RS codes we studied in Section II. ...
Preprint
Full-text available
The Shamir secret sharing scheme requires a Maximum Distance Separable (MDS) code, and in its most common implementation, a Reed-Solomon (RS) code is used. In this paper, we observe that the encoding procedure can be made simpler and faster by dropping the MDS condition and specifying the possible symbols that can be shared. In particular, the process can be made even faster by using array codes based on XOR operations instead of RS codes.
... When g(x) = 1 and q = 2, the ring C pτ (1, 2, d) has been used in the literature to give efficient repair for a family of binary MDS array codes [9], [18] and to provide new constructions of regenerating codes with lower computational complexity [19]. When g(x) = 1, q = 2, and τ = 1, the ring is discussed in [12], [13], [15], [20]- [22]. When τ = 1, the ring is used to construct array codes with local properties [2]. ...
... Therefore, GEIP(p, τ, k, r, q, g(x) = 1) are MDS for r ≤ 3, if 1 + x i and h(x) are relatively prime. When 4 ≤ r ≤ 8, we can list the prime numbers p for which GEIP(p, τ, k, r, q, g(x) = 1) are MDS with similar proof of the MDS condition in [20]. ...
Preprint
Full-text available
A maximum distance separable (MDS) array code is composed of $m\times (k+r)$ arrays such that any $k$ out of $k+r$ columns suffice to retrieve all the information symbols. Expanded-Blaum-Roth (EBR) codes and Expanded-Independent-Parity (EIP) codes are two classes of MDS array codes that can repair any one symbol in a column by locally accessing some other symbols within the column, where the number of symbols $m$ in a column is a prime number. By generalizing the constructions of EBR and EIP codes, we propose new MDS array codes, such that any one symbol can be locally recovered and the number of symbols in a column can be not only a prime number but also a power of an odd prime number. Also, we present an efficient encoding/decoding method for the proposed generalized EBR (GEBR) and generalized EIP (GEIP) codes based on the LU factorization of a Vandermonde matrix. We show that the proposed decoding method has less computational complexity than existing methods. Furthermore, we show that the proposed GEBR codes have both a larger minimum symbol distance and a larger recovery ability of erased lines for some parameters when compared to EBR codes. We show that EBR codes can recover any $r$ erased lines of a slope for any parameter $r$, which was an open problem in [2].
... The update complexity of an array code, denoted as θ, is defined as the average number of parity symbols affected by updating a single data symbol [8]. For an (n, k, m) irregular array codes, a definition-implied lower bound for update complexity is θ ≥ n − k. ...
... Previous results on update complexity indicate that the lower bound n − k is not attainable by (n, k) horizontal MDS array codes with 1 < k < n − 1 [8]. Later, Xu and Bruck [10] introduced an (n, k) vertical MDS array code that can achieve θ = n − k. ...
Preprint
Full-text available
In this paper, we consider the update bandwidth in distributed storage systems~(DSSs). The update bandwidth, which measures the transmission efficiency of the update process in DSSs, is defined as the total amount of data symbols transferred in the network when the data symbols stored in a node are updated. This paper contains the following contributions. First, we establish the closed-form expression of the minimum update bandwidth attainable by irregular array codes. Second, after defining a class of irregular array codes, called Minimum Update Bandwidth~(MUB) codes, which achieve the minimum update bandwidth of irregular array codes, we determine the smallest code redundancy attainable by MUB codes. Third, the code parameters, with which the minimum code redundancy of irregular array codes and the smallest code redundancy of MUB codes can be equal, are identified, which allows us to define MR-MUB codes as a class of irregular array codes that simultaneously achieve the minimum code redundancy and the minimum update bandwidth. Fourth, we introduce explicit code constructions of MR-MUB codes and MUB codes with the smallest code redundancy. Fifth, we establish a lower bound of the update complexity of MR-MUB codes, which can be used to prove that the minimum update complexity of irregular array codes may not be achieved by MR-MUB codes. Last, we construct a class of $(n = k + 2, k)$ vertical maximum-distance separable (MDS) array codes that can achieve all of the minimum code redundancy, the minimum update bandwidth and the optimal repair bandwidth of irregular array codes.
... We note that one can choose any family of MDS codes for the above theorem, e.g., Reed-Solomon codes [30], and vector codes [31]. In the case of vector codes, the codeword symbols of the MDS codes are from a vector space rather than a finite field. ...
Preprint
Full-text available
This paper presents flexible storage codes, a class of error-correcting codes that can recover information from a flexible number of storage nodes. As a result, one can make a better use of the available storage nodes in the presence of unpredictable node failures and reduce the data access latency. Let us assume a storage system encodes $k\ell$ information symbols over a finite field $\mathbb{F}$ into $n$ nodes, each of size $\ell$ symbols. The code is parameterized by a set of tuples $\{(R_j,k_j,\ell_j): 1 \le j \le a\}$, satisfying $k_1\ell_1=k_2\ell_2=...=k_a\ell_a$ and $k_1>k_2>...>k_a = k, \ell_a=\ell$, such that the information symbols can be reconstructed from any $R_j$ nodes, each node accessing $\ell_j$ symbols. In other words, the code allows a flexible number of nodes for decoding to accommodate the variance in the data access time of the nodes. Code constructions are presented for different storage scenarios, including LRC (locally recoverable) codes, PMDS (partial MDS) codes, and MSR (minimum storage regenerating) codes. We analyze the latency of accessing information and perform simulations on Amazon clusters to show the efficiency of presented codes.
... In an (n, k) erasure code based DSS, whole data can be recreated by downloading entire data from any k nodes. Examples of an erasure code based DSS are Redundant Array of Inexpensive Disks (RAID) [3], [4], Oceanstore [5], and Windows Azure [6], etc. ...
... Among the various XOR-based erasure codes, STAR code [12] stands out due to its high performance and efficiency in both encoding and decoding [6,7]. It is a special case of the extended EVENODD with three parities [3,4]. STAR code can tolerate up to three erasures [12], i.e., m = 3 , while k can be any integer as system requires. ...
Article
Full-text available
Data protection is essential in large-scale storage systems. Over the years, erasure codes, which provide the system ability to reconstruct data when damage occurs, have been proven effective and integrated within various large storage systems. With the emergence of new data storage technologies, such as SSD and NVMe [33, 34], the performance of erasure codes may soon become a potential bottleneck in the whole system. While encoding performance of XOR-based codes has been studied and optimized [7, 19, 20], there is a need of decoding performance to match. This paper addresses new methods in improving the decoding speed for XOR-based erasure codes. A new decoding algorithm is proposed, with which CPU cache can be utilized more efficiently. Various sets of experiments are conducted on different platforms, and the results show that, with the new decoding algorithm, general decoding speed gains considerable improvements.
Chapter
EVENODD, RDP, and X-code are Maximum Distance Separable - MDS codes in that they incur the minimum level of redundancy to tolerate two disk failures with parity coding. EVENODD outperforms Reed-Solomon codes and RDP is shown to outperform EVENODD in the number of XORs required. Two-dimensional codes are also discussed.
Article
Full-text available
A crucial issue in the design of very large disk arrays is the protection of data against catastrophic disk failures. Although today single disks are highly reliable, when a disk array consists of 100 or 1000 disks, the probability that at least one disk will fail within a day or a week is high. In this paper we address the problem of designing erasure-correcting binary linear codes that protect against the loss of data caused by disk failures in large disk arrays. We describe how such codes can be used to encode data in disk arrays, and give a simple method for data reconstruction. We discuss important reliability and performance constraints of these codes, and show how these constraints relate to properties of the parity check matrices of the codes. In so doing, we transform code design problems into combinatorial problems. Using this combinatorial framework, we present codes and prove they are optimal with respect to various reliability and performance constraints.
Article
Increasing performance of CPUs and memories will be squandered if not matched by a similar performance increase in I/O. While the capacity of Single Large Expensive Disks (SLED) has grown rapidly, the performance improvement of SLED has been modest. Redundant Arrays of Inexpensive Disks (RAID), based on the magnetic disk technology developed for personal computers, offers an attractive alternative to SLED, promising improvements of an order of magnitude in performance, reliability, power consumption, and scalability. This paper introduces five levels of RAIDs, giving their relative cost/performance, and compares RAID to an IBM 3380 and a Fujitsu Super Eagle.
Article
A stable method is proposed for the numerical solution of a linear system of equations having a generalized Vandermonde matrix. The method is based on Gaussian elimination and establishes explicit expressions for the elements of the resulting upper triangular matrix. These elements can be computed by means of sums of exclusively positive terms. In an important special case these sums can be reduced to simple recursions. Finally the method is retraced for the case of a confluent type of generalized Vandermonde matrix.
Conference Paper
The author presents a method for encoding an array of disks in such a way that the information is protected against two disk failures. The encoding requires two redundant disks and this performance is optimal. The encoding and decoding circuits require only exclusive-OR operations, giving them an advantage over other methods involving finite field operations, like Reed-Solomon codes