Moshe Schwartz's research while affiliated with Ben-Gurion University of the Negev and other places

Publications (147)

Preprint
Full-text available
Motivated by applications to DNA-storage, flash memory, and magnetic recording, we study perfect burst-correcting codes for the limited-magnitude error channel. These codes are lattices that tile the integer grid with the appropriate error ball. We construct two classes of such perfect codes correcting a single burst of length $2$ for $(1,0)$-limit...
Preprint
We prove a new lower bound on the field size of locally repairable codes (LRCs). Additionally, we construct maximally recoverable (MR) codes which are cyclic. While a known construction for MR codes has the same parameters, it produces non-cyclic codes. Furthermore, we prove necessary and sufficient conditions that specify when the known non-cyclic...
Preprint
Motivated by DNA storage in living organisms, and by known biological mutation processes, we study the reverse-complement string-duplication system. We fully classify the conditions under which the system has full expressiveness, for all alphabets and all fixed duplication lengths. We then focus on binary systems with duplication length $2$ and pro...
Article
Error-correcting codes over sets, with applications to DNA storage, are studied. The DNA-storage channel receives a set of sequences, and produces a corrupted version of the set, including sequence loss, symbol substitution, symbol insertion/deletion, and limited-magnitude errors in symbols. Various parameter regimes are studied. New bounds on code...
Article
We study whether an asymmetric limited-magnitude ball may tile Zn. This ball generalizes previously studied shapes: crosses, semi-crosses, and quasi-crosses. Such tilings act as perfect error-correcting codes in a channel which changes a transmitted integer vector in a bounded number of entries by limited-magnitude errors. A construction of lattice...
Article
Motivated by an application to database linear querying, such as private information-retrieval protocols, we suggest a fundamental property of linear codes – the generalized covering radius. The generalized covering-radius hierarchy of a linear code characterizes the trade-off between storage amount, latency, and access complexity, in such database...
Preprint
Full-text available
Motivated by applications to DNA storage, we study reconstruction and list-reconstruction schemes for integer vectors that suffer from limited-magnitude errors. We characterize the asymptotic size of the intersection of error balls in relation to the code's minimum distance. We also devise efficient reconstruction algorithms for various limited-mag...
Preprint
Full-text available
We study generalized covering radii, a fundamental property of linear codes that characterizes the trade-off between storage, latency, and access in linear data-query protocols such as PIR. We prove lower and upper bounds on the generalized covering radii of Reed-Muller codes, as well as finding their exact value in certain extreme cases. With the...
Article
We construct integer error-correcting codes and covering codes for the limited-magnitude error channel with more than one error. The codes are lattices that pack or cover the space with the appropriate error ball. Some of the constructions attain an asymptotic packing/covering density that is constant. The results are obtained via various methods,...
Article
We propose a list-decoding scheme for reconstruction codes in the context of uniform-tandem-duplication noise, which can be viewed as an application of the associative memory model to this setting. We find the uncertainty associated with $m>2$ strings (where a previous paper considered $m=2$ ) in asymptotic terms, where code-words are taken fro...
Article
We study scalar-linear and vector-linear solutions of the generalized combination network. We derive new upper and lower bounds on the maximum number of nodes in the middle layer, depending on the network parameters and the alphabet size. These bounds improve and extend the parameter range of known bounds. Using these new bounds we present a lower...
Preprint
We study permutations over the set of $\ell$-grams, that are feasible in the sense that there is a sequence whose $\ell$-gram frequency has the same ranking as the permutation. Codes, which are sets of feasible permutations, protect information stored in DNA molecules using the rank-modulation scheme, and read using the shotgun sequencing technique...
Preprint
Full-text available
Motivated by an application to database linear querying, such as private information-retrieval protocols, we suggest a fundamental property of linear codes -- the generalized covering radius. The generalized covering-radius hierarchy of a linear code characterizes the trade-off between storage amount, latency, and access complexity, in such databas...
Article
Linear codes over finite extension fields have widespread applications in theory and practice. In some scenarios, the decoder has a sequential access to the codeword symbols, giving rise to a hierarchical erasure structure. In this paper we develop a mathematical framework for hierarchical erasures over extension fields, provide several bounds and...
Preprint
We construct maximally recoverable codes (corresponding to partial MDS codes) which are based on linearized Reed-Solomon codes. The new codes have a smaller field size requirement compared with known constructions. For certain asymptotic regimes, the constructed codes have order-optimal alphabet size, asymptotically matching the known lower bound.
Preprint
We study the Singleton-type bound that provides an upper limit on the minimum distance of locally repairable codes. We present an improved bound by carefully analyzing the combinatorial structure of the repair sets. Thus, we show the previous bound is unachievable for certain parameters. We then also provide explicit constructions of optimal codes...
Preprint
Error-correcting codes over sets, with applications to DNA storage, are studied. The DNA-storage channel receives a set of sequences, and produces a corrupted version of the set, including sequence loss, symbol substitution, symbol insertion/deletion, and limited-magnitude errors in symbols. Various parameter regimes are studied. New bounds on code...
Article
Motivated by mutation processes occurring in in-vivo DNA-storage applications, a channel that mutates stored strings by duplicating substrings as well as substituting symbols is studied. Two models of such a channel are considered: one in which the substitutions occur only within the duplicated substrings, and one in which the location of substitut...
Preprint
We study scalar-linear and vector-linear solutions of the generalized combination network. We derive new upper and lower bounds on the maximum number of nodes in the middle layer, depending on the network parameters and the alphabet size. These bounds improve and extend the parameter range of known bounds. Using these new bounds we present a lower...
Preprint
We study whether an asymmetric limited-magnitude ball may tile $\mathbb{Z}^n$. This ball generalizes previously studied shapes: crosses, semi-crosses, and quasi-crosses. Such tilings act as perfect error-correcting codes in a channel which changes a transmitted integer vector in a bounded number of entries by limited-magnitude errors. A constructio...
Preprint
We construct integer error-correcting codes and covering codes for the limited-magnitude error channel with more than one error. The codes are lattices that pack or cover the space with the appropriate error ball. Some of the constructions attain an asymptotic packing/covering density that is constant. The results are obtained via various methods,...
Article
Minimal multicast networks are fascinating and efficient combinatorial objects, where the removal of a single link makes it impossible for all receivers to obtain all messages. We study the structure of such networks, and prove some constraints on their possible solutions. We then focus on the combination network, which is one of the simplest and m...
Preprint
Full-text available
A method for encoding information in DNA sequences is described. The method is based on the precision-resolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writin...
Article
A method for encoding information in DNA sequences is described. The method is based on the precision-resolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writin...
Article
Full-text available
In this paper, locally repairable codes which have optimal minimum Hamming distance with respect to the bound presented by Prakash et al. are considered. New upper bounds on the length of such optimal codes are derived. The new bounds apply to more general cases, and have weaker requirements compared with the known ones. In this sense, they both im...
Preprint
A growing number of works have, in recent years, been concerned with in-vivo DNA as medium for data storage. This paper extends the concept of reconstruction codes for uniform-tandem-duplication noise to the model of associative memory, by finding the uncertainty associated with $m>2$ strings (where a previous paper considered $m=2$). That uncertai...
Preprint
We study scalar-linear and vector-linear solutions to the generalized combination network. We derive new upper and lower bounds on the maximum number of nodes in the middle layer, depending on the network parameters. These bounds improve and extend the parameter range of known bounds. Using these new bounds we present a general lower bound on the g...
Preprint
Optimal locally repairable codes with information locality are considered. Optimal codes are constructed, whose length is also order-optimal with respect to a new bound on the code length derived in this paper. The length of the constructed codes is super-linear in the alphabet size, which improves upon the well known pyramid codes, whose length is...
Preprint
Motivated by mutation processes occurring in in-vivo DNA-storage applications, a channel that mutates stored strings by duplicating substrings as well as substituting symbols is studied. Two models of such a channel are considered: one in which the substitutions occur only within the duplicated substrings, and one in which the location of substitut...
Preprint
We study restricted permutations of sets which have a geometrical structure. The study of restricted permutations is motivated by their application in coding for flash memories, and their relevance in different applications of networking technologies and various channels. We generalize the model of $\mathbb{Z}^d$-permutations with restricted moveme...
Article
Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processe...
Article
Locally repairable codes are desirable for distributed storage systems to improve the repair efficiency. In this paper, a new combination of codes with locality and codes with multiple disjoint repair sets (also called availability) is introduced. Accordingly, a Singleton-type bound is derived for the new code, which contains those bounds in [9], [...
Article
DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited...
Article
We study random string-duplication systems, which we call Pólya string models. These are motivated by a class of mutations that are common in most organisms and lead to an abundance of repeated sequences in their genomes. Unlike previous works that study the combinatorial capacity of string-duplication systems, or in a probabilistic setting, variou...
Conference Paper
We study random string-duplication systems, which we call Pólya string models. These are motivated by a class of mutations that are common in most organisms and lead to an abundance of repeated sequences in their genomes. Unlike previous works that study the combinatorial capacity of string-duplication systems, or in a probabilistic setting, variou...
Preprint
Full-text available
The generalization of De Bruijn sequences to infinite sequences with respect to the order $n$ has been studied iand it was shown that every de Bruijn sequence of order $n$ in at least three symbols can be extended to a de Bruijn sequence of order $n + 1$. Every de Bruijn sequence of order $n$ in two symbols can not be extended to order $n + 1$, but...
Article
Full-text available
Background Tandem repeat sequences are common in the genomes of many organisms and are known to cause important phenomena such as gene silencing and rapid morphological changes. Due to the presence of multiple copies of the same pattern in tandem repeats and their high variability, they contain a wealth of information about the mutations that have...
Preprint
Full-text available
The combination network is one of the simplest and insightful networks in coding theory. The vector network coding solutions for this network and some of its sub-networks are examined. For a fixed alphabet size of a vector network coding solution, an upper bound on the number of nodes in the network is obtained. This bound is an MDS bound for subsp...
Conference Paper
Synthesis of DNA molecules offers unprecedented advances in storage technology. Yet, the microscopic world in which these molecules reside induces error patterns that are fundamentally different from their digital counterparts. Hence, to maintain reliability in reading and writing, new coding schemes must be developed. In a reading technique called...
Preprint
Locally repairable codes which are optimal with respect to the bound presented by Prakash et al. are considered. New upper bounds on the length of such optimal codes are derived. The new bounds both improve and generalize previously known bounds. Optimal codes are constructed, whose length is order-optimal when compared with the new upper bounds. T...
Preprint
Full-text available
Genomic evolution can be viewed as string-editing processes driven by mutations. An understanding of the statistical properties resulting from these mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processe...
Article
We study array codes which are based on subspaces of a linear space over a finite field, using spreads, q-Steiner systems, and subspace transversal designs. We present several constructions of such codes which are q-analogs of some known block codes such as the Hamming and simplex codes.We examine the locality and availability of the constructed co...
Preprint
We study random string-duplication systems, which we call P\'olya string models. These are motivated by DNA storage in living organisms, and certain random mutation processes that affect their genome. Unlike previous works that study the combinatorial capacity of string-duplication systems, or various string statistics, this work provides exact cap...
Conference Paper
Recent advances in coding for distributed storage systems have reignited the interest in scalar codes over extension fields. In parallel, the rise of large-scale distributed systems has motivated the study of computing in the presence of stragglers, i.e., servers that are slow to respond or unavailable. This paper addresses storage systems that emp...
Article
Synthesis of DNA molecules offers unprecedented advances in storage technology. Yet, the microscopic world in which these molecules reside induces error patterns that are fundamentally different from their digital counterparts. Hence, to maintain reliability in reading and writing, new coding schemes must be developed. In a reading technique called...
Article
Private information retrieval has been reformulated in an information-theoretic perspective in recent years. The two most important parameters considered for a PIR scheme in a distributed storage system are the storage overhead and PIR rate. We take into consideration a third parameter, the access complexity of a PIR scheme, which characterizes the...
Article
Full-text available
Recent advances in coding for distributed storage systems have reignited the interest in scalar codes over extension fields. In parallel, the rise of large-scale distributed systems has motivated the study of computing in the presence of stragglers, i.e., servers that are slow to respond or unavailable. This paper addresses storage systems that emp...
Preprint
DNA as a data storage medium has several advantages, including far greater data density compared to electronic media. We propose that schemes for data storage in the DNA of living organisms may benefit from studying the reconstruction problem, which is applicable whenever multiple reads of noisy data are available. This strategy is uniquely suited...
Article
We consider the communication of information in the presence of synchronization errors. Specifically, we consider permutation channels in which a transmitted codeword x = (x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> , ... , x <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink...
Article
We find a new formula for the limit of the capacity of certain sequences of multidimensional semiconstrained systems as the dimension tends to infinity. We do so by generalizing the notion of independence entropy, originally studied in the context of constrained systems, to the study of semiconstrained systems. Using the independence entropy, we ob...
Article
Synthesis of DNA molecules offers unprecedented advances in storage technology. Yet, the microscopic world in which these molecules reside induces error patterns that are fundamentally different from their digital counterparts. Hence, to maintain reliability in reading and writing, new coding schemes must be developed. In a reading technique called...
Article
We construct Gray codes over permutations for the rank-modulation scheme, which are also capable of correcting errors under the infinity-metric. These errors model limited-magnitude or spike errors, for which only single-error-detecting Gray codes are currently known. Surprisingly, the error-correcting codes we construct achieve better asymptotic r...
Article
Full-text available
A shift rule for the prefer-max De Bruijn sequence is formulated, for all sequence orders, and over any finite alphabet. An efficient algorithm for this shift rule is presented, which has linear (in the sequence order) time and memory complexity.
Conference Paper
Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into acco...
Article
The problem of one-way file synchronization, henceforth called “file updates”, is studied in this work. Specifically, a client edits a file, where the edits are modeled by insertions and deletions (InDels). An old copy of the file is stored remotely at a data-centre, and is also available to the client. We consider the problem of throughput- and co...
Article
We study covering codes of permutations with the ℓ∞-metric. We provide a general code construction, which combines short building-block codes into a single long code. We focus on cyclic transitive groups as building blocks, determining their exact covering radius, and showing a linear-time algorithm for finding a covering codeword. When used in the...
Article
Ever-increasing amounts of data are created and processed in internet-scale companies such as Google, Facebook, and Amazon. The efficient storage of such copious amounts of data has thus become a fundamental and acute problem in modern computing. No single machine can possibly satisfy such immense storage demands. Therefore, distributed storage sys...
Article
Full-text available
We prove that there exist non-linear binary cyclic codes that attain the Gilbert-Varshamov bound.
Article
Duplication mutations play a critical role in the generation of biological sequences. Simultaneously, they have a deleterious effect on data stored using in-vivo DNA data storage. While duplications have been studied both as a sequence-generation mechanism and in the context of error correction, for simplicity these studies have not taken into acco...
Article
We study the size (or volume) of balls in the metric space of permutations, Sn, under the infinity metric. We focus on the regime of balls with radius r = r · (n−1), r ∈ [0, 1], i.e., a radius that is a constant fraction of the maximum possible distance. We provide new lower bounds on the size of such balls. These new lower bounds reduce the asympt...
Conference Paper
Conference Paper
We study random string-duplication systems, called Pólya string models, motivated by certain random mutation processes in the genome of living organisms. Unlike previous works that study the combinatorial capacity of string-duplication systems, or peripheral properties such as symbol frequency, this work provides exact capacity or bounds on it, for...
Conference Paper
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected...
Article
The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected...