Eitan Yaakobi

Eitan Yaakobi
Technion - Israel Institute of Technology | technion · Faculty of Computer Science

About

275
Publications
12,584
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,364
Citations

Publications

Publications (275)
Preprint
We consider the problem of correcting insertion and deletion errors in the $d$-dimensional space. This problem is well understood for vectors (one-dimensional space) and was recently studied for arrays (two-dimensional space). For vectors and arrays, the problem is motivated by several practical applications such as DNA-based storage and racetrack...
Preprint
The rapid development of DNA storage has brought the deletion and insertion channel to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. Similar to any other metric, the size of a ball is...
Preprint
de Bruijn sequences of order $\ell$, i.e., sequences that contain each $\ell$-tuple as a window exactly once, have found many diverse applications in information theory and most recently in DNA storage. This family of binary sequences has rate of $1/2$. To overcome this low rate, we study $\ell$-tuples covering sequences, which impose that each $\e...
Preprint
Full-text available
Reliability is an inherent challenge for the emerging nonvolatile technology of racetrack memories, and there exists a fundamental relationship between codes designed for racetrack memories and codes with constrained periodicity. Previous works have sought to construct codes that avoid periodicity in windows, yet have either only provided existence...
Preprint
Full-text available
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are...
Preprint
The channel output entropy of a transmitted word is the entropy of the possible channel outputs and similarly the input entropy of a received word is the entropy of all possible transmitted words. The goal of this work is to study these entropy values for the $k$-deletion, $k$-insertion channel, where exactly $k$ symbols are deleted, inserted in th...
Preprint
Full-text available
This paper studies the adversarial torn-paper channel. This problem is motivated by applications in DNA data storage where the DNA strands that carry the information may break into smaller pieces that are received out of order. Our model extends the previously researched probabilistic setting to the worst-case. We develop code constructions for any...
Preprint
Synthetic polymer-based storage seems to be a particularly promising candidate that could help to cope with the ever-increasing demand for archival storage requirements. It involves designing molecules of distinct masses to represent the respective bits $\{0,1\}$, followed by the synthesis of a polymer of molecular units that reflects the order of...
Preprint
This paper tackles two problems that are relevant to coding for insertions and deletions. These problems are motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to outpu...
Article
Resistive memories, such as phase change memories and resistive random access memories have attracted significant attention in recent years due to their better scalability, speed, rewritability, and yet non-volatility. However, their limited endurance is still a major drawback that has to be improved before they can be widely adapted in large-scale...
Article
In this paper, we study a model that mimics the programming operation of memory cells. This model was first introduced by Lastras-Montano et al. for continuous-alphabet channels, and later by Bunte and Lapidoth for discrete memoryless channels (DMC). Under this paradigm we assume that cells are programmed sequentially and individually. The programm...
Article
A new family of codes, called clustering-correcting codes, is presented in this paper. This family of codes is motivated by the special structure of the data that is stored in DNA-based storage systems. The data stored in these systems has the form of unordered sequences, also called strands, and every strand is synthesized thousands to millions of...
Article
Private information retrieval (PIR) protocols ensure that a user can download a file from a database without revealing any information on the identity of the requested file to the servers storing the database. While existing protocols strictly impose that no information is leaked on the file’s identity, this work initiates the study of the tradeoff...
Article
A functional PIR array code is a coding scheme which encodes some s information bits into a t × m array such that every linear combination of the s information bits has k mutually disjoint recovering sets. Every recovering set consists of some of the array’s columns while it is allowed to read at most l encoded bits from every column in order to re...
Preprint
Lifted Reed-Solomon and multiplicity codes are classes of codes, constructed from specific sets of $m$-variate polynomials. These codes allow for the design of high-rate codes that can recover every codeword or information symbol from many disjoint sets. Recently, the underlying approaches have been combined for the bi-variate case to construct lif...
Article
Lifted Reed-Solomon and multiplicity codes are classes of codes, constructed from specific sets of m-variate polynomials. These codes allow for the design of high-rate codes that can recover every codeword or information symbol from many disjoint sets. Recently, the underlying approaches have been combined for the bi-variate case to construct lifte...
Preprint
\emph{Resistive memories}, such as \emph{phase change memories} and \emph{resistive random access memories} have attracted significant attention in recent years due to their better scalability, speed, rewritability, and yet non-volatility. However, their \emph{limited endurance} is still a major drawback that has to be improved before they can be w...
Article
The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize the window property of de Bruijn sequences, on its shorter subsequen...
Article
The sequence reconstruction problem corresponds to the model in which a sequence from some code is transmitted over several noisy channels that produce distinct outputs. Then, the channels’ outputs, received by the decoder, are used to recover the transmitted sequence, and the main problem under this paradigm is to calculate the minimum number of c...
Article
This paper studies the problem of constructing codes correcting deletions in arrays. Under this model, it is assumed that an n × n array can experience deletions of rows and columns. These deletion errors are referred to as (tr, tc)-criss-cross deletions if tr rows and tc columns are deleted, while a code correcting these deletion patterns is calle...
Preprint
The concept of DNA storage was first suggested in 1959 by Richard Feynman who shared his vision regarding nanotechnology in the talk "There is plenty of room at the bottom". Later, towards the end of the 20-th century, the interest in storage solutions based on DNA molecules was increased as a result of the human genome project which in turn led to...
Preprint
The problem of string reconstruction based on its substrings spectrum has received significant attention recently due to its applicability to DNA data storage and sequencing. In contrast to previous works, we consider in this paper a setup of this problem where multiple strings are reconstructed together. Given a multiset $S$ of strings, all their...
Article
A private proximity retrieval (PPR) scheme is a protocol which allows a user to retrieve the identities of all records in a database that are within some distance r from the user’s record x. The user’s privacy at each server is given by the fraction of the record x that is kept private. In this paper, this research is initiated and protocols that o...
Article
In this paper we consider the problem of encoding data into repeat-free sequences in which sequences are imposed to contain any k-tuple at most once (for predefined k). First, the capacity of the repeat-free constraint are calculated. Then, an efficient algorithm, which uses two bits of redundancy, is presented to encode length-n sequences for k =...
Article
Partial MDS (PMDS) and sector-disk (SD) codes are classes of erasure correcting codes that combine locality with strong erasure correction capabilities. We construct PMDS and SD codes with local regeneration where each local code is a bandwidth-optimal regenerating MDS code. In the event of a node failure, these codes reduce both, the number of ser...
Article
This paper studies reconstruction of strings based upon their substrings spectrum. Under this paradigm, it is assumed that all substrings of some fixed length are received and the goal is to reconstruct the string. While many existing works assumed that substrings are received error free, we follow in this paper the noisy setup of this problem that...
Preprint
The rapid development of DNA storage has brought the deletion and insertion channel, once again, to the front line of research. When the number of deletions is equal to the number of insertions, the Levenshtein metric is the right measure for the distance between two words of the same length. The size of a ball is one of the most fundamental parame...
Article
Full-text available
In this work private information retrieval (PIR) codes are studied. In a k-PIR code, s information bits are encoded in such a way that every information bit has k mutually disjoint recovery sets. The main problem under this paradigm is to minimize the number of encoded bits given the values of s and k, where this value is denoted by P(s, k). The ma...
Article
In graph theory, a tree is one of the more popular families of graphs with a wide range of applications in computer science as well as many other related fields. While there are several distance measures over the set of all trees, we consider here the one which defines the so-called tree distance , defined by the minimum number of edit operatio...
Preprint
Full-text available
Motivated by applications in machine learning and archival data storage, we introduce function-correcting codes, a new class of codes designed to protect a function evaluation on the data against errors. We show that function-correcting codes are equivalent to irregular distance codes, i.e., codes that obey some given distance requirement between e...
Preprint
Full-text available
This paper investigates the problem of correcting multiple criss-cross deletions in arrays. More precisely, we study the unique recovery of $n \times n$ arrays affected by any combination of $t_\mathrm{r}$ row and $t_\mathrm{c}$ column deletions such that $t_\mathrm{r} + t_\mathrm{c} = t$ for a given $t$. We refer to these type of deletions as $t$-...
Article
A private information retrieval (PIR) protocol guarantees that a user can privately retrieve files stored in a database without revealing any information about the identity of the requested file. Existing information-theoretic PIR protocols ensure perfect privacy, i.e., zero information leakage to the servers storing the database, but at the cost o...
Preprint
This paper studies two families of constraints for two-dimensional and multidimensional arrays. The first family requires that a multidimensional array will not contain a cube of zeros of some fixed size and the second constraint imposes that there will not be two identical cubes of a given size in the array. These constraints are natural extension...
Preprint
Locality enables storage systems to recover failed nodes from small subsets of surviving nodes. The setting where nodes are partitioned into subsets, each allowing for local recovery, is well understood. In this work we consider a generalization introduced by Gopalan et al., where, viewing the codewords as arrays, constraints are imposed on the col...
Preprint
A \textit{functional $k$-batch} code of dimension $s$ consists of $n$ servers storing linear combinations of $s$ linearly independent information bits. Any multiset request of size $k$ of linear combinations (or requests) of the information bits can be recovered by $k$ disjoint subsets of the servers. The goal under this paradigm is to find the min...
Article
The \begin{document}$ k $\end{document}-deck of a sequence is defined as the multiset of all its subsequences of length \begin{document}$ k $\end{document}. Let \begin{document}$ D_k(n) $\end{document} denote the number of distinct \begin{document}$ k $\end{document}-decks for binary sequences of length \begin{document}$ n $\end{document}. For bina...
Preprint
Full-text available
In the trace reconstruction problem a length- n string x yields a collection of noisy copies, called traces , y 1 , …, y t where each y i is independently obtained from x by passing through a deletion channel , which deletes every symbol with some fixed probability. The main goal under this paradigm is to determine the required minimum number of i....
Preprint
Partial MDS (PMDS) and sector-disk (SD) codes are classes of erasure correcting codes that combine locality with strong erasure correction capabilities. We construct PMDS and SD codes where each local code is a bandwidth-optimal regenerating MDS code. In the event of a node failure, these codes reduce both, the number of servers that have to be con...
Article
Motivation: Recent years have seen a growing number and an expanding scope of studies using synthetic oligo libraries for a range of applications in synthetic biology. As experiments are growing by numbers and complexity, analysis tools can facilitate quality control andsupport better assessment and inference. Results: We present a novel analysi...
Preprint
Full-text available
Lifted Reed-Solomon codes and multiplicity codes are two classes of evaluation codes that allow for the design of high-rate codes that can recover every codeword or information symbol from many disjoint sets. Recently, the underlying approaches have been combined to construct lifted bi-variate multiplicity codes, that can further improve on the rat...
Preprint
Private information retrieval (PIR) protocols ensure that a user can download a file from a database without revealing any information on the identity of the requested file to the servers storing the database. While existing protocols strictly impose that no information is leaked on the file's identity, this work initiates the study of the tradeoff...
Conference Paper
Full-text available
In a Private Information Retrieval (PIR) protocol, a user can download a file from a database without revealing the identity of the file to each individual server. A PIR protocol is called t-private if the identity of the file remains concealed even if t of the servers collude. Graph based replication is a simple technique, which is prevalent in bo...
Preprint
Full-text available
Correcting insertions/deletions as well as substitution errors simultaneously plays an important role in DNA-based storage systems as well as in classical communications. This paper deals with the fundamental task of constructing codes that can correct a single insertion or deletion along with a single substitution. A non-asymptotic upper bound on...
Preprint
Full-text available
The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be...
Preprint
Full-text available
This paper studies the problem of constructing codes correcting deletions in arrays. Under this model, it is assumed that an $n\times n$ array can experience deletions of rows and columns. These deletion errors are referred to as $(t_{\mathrm{r}},t_{\mathrm{c}})$-criss-cross deletions if $t_{\mathrm{r}}$ rows and $t_{\mathrm{c}}$ columns are delete...
Preprint
Full-text available
The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. Motivated by modern storage devices, we introduced a variant of the problem where the number of noisy reads $N$ is fixed (K...
Article
A covering code is a set of codewords with the property that the union of balls, suitably defined, around these codewords covers an entire space. Generally, the goal is to find the covering code with the minimum size codebook. While most prior work on covering codes has focused on the Hamming metric, we consider the problem of designing covering co...
Article
A functional k-Private Information Retrieval (k-PIR) code of dimension s consists of n servers storing linear combinations of s linearly independent information symbols. Any linear combination of the s information symbols can be recovered by k disjoint subsets of servers. The goal is to find the minimum number of servers for given k and s.We provid...
Article
In this paper we study array-based codes over graphs for correcting multiple node failures. These codes have applications to neural networks, associative memories, and distributed storage systems. We assume that the information is stored on the edges of a complete undirected graph and a node failure is the event where all the edges in the neighborh...
Preprint
A functional PIR array code is a coding scheme which encodes some $s$ information bits into a $t\times m$ array such that every linear combination of the $s$ information bits has $k$ mutually disjoint recovering sets. Every recovering set consists of some of the array's columns while it is allowed to read at most $\ell$ encoded bits from every colu...
Preprint
A private information retrieval (PIR) protocol guarantees that a user can privately retrieve files stored in database without revealing any information about the identity of the requested file. Existing information-theoretic PIR protocols ensure strong privacy, i.e., zero information leakage to the server, but at the cost of high download. In this...
Preprint
This paper studies the problem of reconstructing a word given several of its noisy copies. This setup is motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output th...
Preprint
Partial MDS (PMDS) and sector-disk (SD) codes are classes of erasure codes that combine locality with strong erasure correction capabilities. We construct PMDS and SD codes where each local code is a bandwidth-optimal regenerating MDS code. The constructions require significantly smaller field size than the only other construction known in literatu...
Preprint
Full-text available
In this work private information retrieval (PIR) codes are studied. In a $k$-PIR code, $s$ information bits are encoded in such a way that every information bit has $k$ mutually disjoint recovery sets. The main problem under this paradigm is to minimize the number of encoded bits given the values of $s$ and $k$, where this value is denoted by $P(s,...