Ryan Gabrys

Ryan Gabrys
Naval Information Warfare Center · Division 532

phd

About

96
Publications
4,338
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,126
Citations

Publications

Publications (96)
Preprint
Due to its higher data density, longevity, energy efficiency, and ease of generating copies, DNA is considered a promising storage technology for satisfying future needs. However, a diverse set of errors including deletions, insertions, duplications, and substitutions may arise in DNA at different stages of data storage and retrieval. The current p...
Preprint
Polarization is an unprecedented coding technique in that it not only achieves channel capacity, but also does so at a faster speed of convergence than any other technique. This speed is measured by the "scaling exponent" and its importance is three-fold. Firstly, estimating the scaling exponent is challenging and demands a deeper understanding of...
Preprint
The $k$-deck problem is concerned with finding the smallest value $S(k)$ of a positive integer $n$ such that there exist at least two strings of length $n$ that share the same $k$-deck, i.e., the same multiset of subsequences of length $k$. We introduce the new problem of gapped $k$-deck reconstruction: For a given gap parameter $s$, we seek the sm...
Preprint
Full-text available
Trades, introduced by Hedayat, are two sets of blocks of elements which may be exchanged (traded) without altering the counts of certain subcollections of elements within their constituent blocks. They are of importance in applications where certain combinations of elements dynamically become prohibited from being placed in the same group of elemen...
Preprint
Polymerase chain reaction (PCR) testing is the gold standard for diagnosing COVID-19. PCR amplifies the virus DNA 40 times to produce measurements of viral loads that span seven orders of magnitude. Unfortunately, the outputs of these tests are imprecise and therefore quantitative group testing methods, which rely on precise measurements, are not a...
Preprint
Full-text available
We consider the problem of designing low-redundancy codes in settings where one must correct deletions in conjunction with substitutions or adjacent transpositions; a combination of errors that is usually observed in DNA-based data storage. One of the most basic versions of this problem was settled more than 50 years ago by Levenshtein, or one subs...
Chapter
Full-text available
Prior work has explored the use of defensive cyber deception to manipulate the information available to attackers and to proactively misinform on behalf of both real and decoy systems. Such approaches can provide advantages to defenders by detecting inadvertent attacker interactions with decoy systems, by delaying attacker forward progress, by decr...
Preprint
The problem of reconstructing strings from substring information has found many applications due to its importance in genomic data sequencing and DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, gener...
Article
In this paper we consider the problem of encoding data into repeat-free sequences in which sequences are imposed to contain any k-tuple at most once (for predefined k). First, the capacity of the repeat-free constraint are calculated. Then, an efficient algorithm, which uses two bits of redundancy, is presented to encode length-n sequences for k =...
Article
There are few mathematicians whose contributions go beyond named conjectures and theorems: Vladimir Iosifovich Levenshtein ( , 1935–2017) is one such true exception. During the five decades of his active research career, he enriched combinatorics, coding, and information theory with elegant problem formulations, ingenious algorithmic solutions, a...
Preprint
Semiquantitative group testing (SQGT) is a pooling method in which the test outcomes represent bounded intervals for the number of defectives. Alternatively, it may be viewed as an adder channel with quantized outputs. SQGT represents a natural choice for Covid-19 group testing as it allows for a straightforward interpretation of the cycle threshol...
Article
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this article, we investigate codes that correct either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two l...
Chapter
Full-text available
Prior work has explored the use of defensive cyber deception to manipulate the information available to attackers and to proactively lie on behalf of both real and decoy systems. Such approaches can provide advantages to defenders by delaying attacker forward progress and thereby decreasing or eliminating attacker payoffs. In this work, we expand p...
Preprint
The first part of the paper presents a review of the gold-standard testing protocol for Covid-19, real-time, reverse transcriptase PCR, and its properties and associated measurement data such as amplification curves that can guide the development of appropriate and accurate adaptive group testing protocols. The second part of the paper is concerned...
Article
Full-text available
Storage architectures ranging from minimum bandwidth regenerating encoded distributed storage systems to declustered-parity RAIDs can employ dense partial Steiner systems to support fast reads, writes, and recovery of failed storage units. To enhance performance, popularities of the data items should be taken into account to make frequencies of acc...
Conference Paper
Full-text available
Prior work has explored the use of defensive cyber deception to manipulate the information available to attackers and to proactively lie on behalf of both real and decoy systems. Such approaches can provide advantages to defenders by delaying attacker forward progress and thereby decreasing or eliminating attacker payoffs. In this work, we expand p...
Preprint
The problem of string reconstruction from substring information has found many applications due to its relevance in DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry read...
Article
Full-text available
Motivated by average-case trace reconstruction and coding for portable DNA-based storage systems, we initiate the study of coded trace reconstruction, the design and analysis of high-rate efficiently encodable codes that can be efficiently decoded with high probability from few reads (also called traces) corrupted by edit errors. Codes used in curr...
Article
Full-text available
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Preprint
Motivated by polymer-based data-storage platforms that use chains of binary synthetic polymers as the recording media and read the content via tandem mass spectrometers, we propose a new family of codes that allows for both unique string reconstruction and correction of multiple mass errors. We consider two approaches: The first approach pertains t...
Preprint
We consider the problem of correcting mass readout errors in information encoded in binary polymer strings. Our work builds on results for string reconstruction problems using composition multisets [Acharya et al., 2015] and the unique string reconstruction framework proposed in [Pattabiraman et al., 2019]. Binary polymer-based data storage systems...
Conference Paper
Full-text available
Cyber deception focuses on providing advantage to defenders through manipulation of the information provided to attackers. Game theory is one of the methods that has been used to model cyber deception. In this work, we first introduce a simple game theoretic model of deception that captures the essence of interactions between an attacker and defend...
Preprint
Full-text available
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two line...
Preprint
Full-text available
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two line...
Preprint
Distributed computing, in which a resource-intensive task is divided into subtasks and distributed among different machines, plays a key role in solving large-scale problems, e.g., machine learning for large datasets or massive computational problems arising in genomic research. Coded computing is a recently emerging paradigm where redundancy for d...
Chapter
Full-text available
Cyber deception focuses on providing advantage to defenders through manipulation of the information provided to attackers. Game theory is one of the methods that has been used to model cyber deception. In this work, we first introduce a simple game theoretic model of deception that captures the essence of interactions between an attacker and defend...
Article
Cassuto and Blaum recently studied the symbol-pair channel, a model where every two consecutive symbols are read together. This special channel structure is motivated by the limitations of the reading process in high density data storage systems, where it is no longer possible to read individual symbols. In this new paradigm, the errors are not ind...
Preprint
Full-text available
In this paper we consider the problem of encoding data into repeat-free sequences in which sequences are imposed to contain any $k$-tuple at most once (for predefined $k$). First, the capacity and redundancy of the repeat-free constraint are calculated. Then, an efficient algorithm, which uses a single bit of redundancy, is presented to encode leng...
Article
In this paper, we consider the problem of synchronizing two sets of data where the size of the symmetric difference between the sets is small and, in addition, the elements in the symmetric difference are related through the Hamming distance metric. Upper and lower bounds are derived on the minimum amount of information exchange. Furthermore, expli...
Conference Paper
We consider the application of a genetic algorithm (GA) to the problem of hiding information in program executables. In a nutshell, our approach is to re-order instructions in a program in such way that aims to maximize the amount of data that can be embedded while, at the same time, ensuring the functionality of the executable is not altered. In t...
Preprint
Full-text available
Storage architectures ranging from minimum bandwidth regenerating encoded distributed storage systems to declustered-parity RAIDs can be designed using dense partial Steiner systems in order to support fast reads, writes, and recovery of failed storage units. In order to ensure good performance, popularities of the data items should be taken into a...
Preprint
Motivated by polymer-based data-storage platforms that use chains of binary synthetic polymers as the recording media and read the content via tandem mass spectrometers, we propose a new family of codes that allows for unique string reconstruction and correction of one mass error. Our approach is based on introducing redundancy that scales logarith...
Preprint
Full-text available
Motivated by average-case trace reconstruction and coding for portable DNA-based storage systems, we initiate the study of \emph{coded trace reconstruction}, the design and analysis of high-rate efficiently encodable codes that can be efficiently decoded with high probability from few reads (also called \emph{traces}) corrupted by edit errors. Code...
Preprint
Full-text available
We are concerned with the problem of designing large families of subsets over a common labeled ground set that have small pairwise intersections and the property that the maximum discrepancy of the label values within each of the sets is less than or equal to one. Our results, based on transversal designs, factorizations of packings and Latin recta...
Article
Partial MDS (PMDS) codes are a class of erasurecorrecting array codes that combine local correction of the rows with global correction of the array. An m × n array code is called an (r; s) PMDS code if each row belongs to an [n, n - r, r + 1] MDS code and the code can correct erasure patterns consisting of r erasures in each row together with s mor...
Article
In this work, we investigate the problem of constructing codes capable of correcting two deletions. In particular, we construct a code that requires redundancy approximately 8 log2 n + O(log2 log2 n) bits of redundancy, where n denotes the length of the code. To the best of the authors’ knowledge, this represents the best known construction in that...
Preprint
In this work, we consider the problem of synchronizing two sets of data where the size of the symmetric difference between the sets is small and, in addition, the elements in the symmetric difference are related through the Hamming distance metric. Upper and lower bounds are derived on the minimum amount of information exchange. Furthermore, explic...
Article
FIVE THOUSAND YEARS AGO, a man died in the Alps. It's possible he died from a blow to the head, or he may have bled to death after being shot in the shoulder with an arrow. There's a lot we don't know about Otzi (named for the Otztal Alps, where he was discovered), despite the fact that researchers have spent almost 30 years studying him.
Article
The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codeboo...
Article
The sequence reconstruction problem, first proposed by Levenshtein, models the setup in which a sequence from some set is transmitted over several channels, and the decoder receives the outputs from every channel. The channels are almost independent as it is only required that all outputs are different from each other. The main problem of interest...
Article
Motivated by applications in DNA-based storage, we introduce the new problem of code design in the Damerau metric. The Damerau metric is a generalization of the Levenshtein distance which, in addition to deletions, insertions and substitution errors also accounts for adjacent transposition edits. We first provide constructions for codes that may co...
Article
Full-text available
We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and for synchronization of communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for...
Article
Full-text available
DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures en...
Article
We introduce a new family of codes, termed asymmetric Lee distance (ALD) codes, designed to correct errors arising in DNA-based storage systems and systems with parallel string transmission protocols. ALD codes are defined over a quaternary alphabet and analyzed in this particular setting, but the derived results hold for other alphabet sizes as we...
Article
Full-text available
This paper studies codes that correct a burst of deletions or insertions. Namely, a code will be called a b-burstdeletion/ insertion-correcting code if it can correct a burst of deletions/ insertions of any b consecutive bits. While the lower bound on the redundancy of such codes was shown by Levenshtein to be asymptotically log(n)+b�1, the redunda...
Article
We introduce a new variant of the $k$-deck problem, which in its traditional formulation asks for determining the smallest $k$ that allows one to reconstruct any binary sequence of length $n$ from the multiset of its $k$-length subsequences. In our version of the problem, termed the hybrid k-deck problem, one is given a certain number of special su...
Conference Paper
In this work, we consider a variant of the set reconciliation problem where the estimate for the size of the symmetric difference may be inaccurate. Given this setup, we propose a new method to reconciling sets of data and we then compare our method to the Invertible Bloom Filter approach proposed by Eppstein et al. [2].
Preprint
Full-text available
DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency 1,2,3 , 4,5,6 . The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently reading them via high-throughput sequencing technologies. All existing...
Article
This work studies problems in data reconstruction, an important area with numerous applications. In particular, we examine the reconstruction of binary and nonbinary sequences from synchronization (insertion/deletion-correcting) codes. These sequences have been corrupted by a fixed number of symbol insertions (larger than the minimum edit distance...
Article
Motivated by charge balancing constraints for rank modulation schemes, we introduce the notion of balanced permutations and derive the capacity of balanced permutation codes. We also describe simple interleaving methods for permutation code constructions and show that they approach capacity
Article
Full-text available
We introduce the new problem of code design in the Damerau metric. The Damerau metric is a generalization of the Levenshtein distance which also allows for adjacent transposition edits. We first provide constructions for codes that may correct either a single deletion or a single adjacent transposition and then proceed to extend these results to co...
Article
We continue our study of a new family of asymmetric Lee codes that arise in the design and implementation of emerging DNA-based storage systems and systems which use parallel string transmission protocols. The codewords are defined over a quaternary alphabet, although the results carry over to other alphabet sizes, and have symbol distances dictate...
Conference Paper
We consider a new family of codes, termed asymmetric Lee distance codes, that arise in the design and implementation of DNA-based storage systems and systems with parallel string transmission protocols. The codewords are defined over a quaternary alphabet, although the results carry over to other alphabet sizes; furthermore, symbol confusability is...
Conference Paper
In this work, we consider the problem of synchronizing two sets of data where the size of the symmetric difference between the sets is small and, in addition, the elements in the symmetric difference are related. In this introductory work, the elements within the symmetric difference are related through the Hamming distance metric. Upper and lower...
Article
The goal of this paper is to present constructions of high-rate nonbinary write-once memory (WOM) codes for multilevel flash memories. The constructions provided here are all based on the basic idea of mapping high-rate binary codebooks to nonbinary codebooks. The proposed codes maintain the same length and encoding complexity as their underlying b...
Article
The development of good codes which are capable of correcting more than a single deletion remains an elusive task. Recent papers, such as that by Kulkarni and Kiyavash [3], instead focus on the more tractable problem of deriving upper bounds on the cardinalities of such codes. In the present work, we develop Gilbert-Varshamov-type lower bounds on t...
Article
In this work, the model introduced by Gabrys is extended to account for the presence of unreliable memory cells. Leveraging data analysis on errors taking place in a TLC Flash device, we show that memory cells can be broadly categorized into reliable and unreliable cells, where the latter are much more likely to be in error. Our approach programs u...
Conference Paper
Codes based on multiset permutations, or multipermutations, have attracted recent attention due to their applications to non-volatile memories. Most of the literature studying multipermutations is focused on codes capable of correcting errors in the Kendall tau and Ulam metrics. In this work, we make a first effort towards studying synchronization...
Conference Paper
Full-text available
Motivated by the rank modulation scheme for flash memories, we consider an information representation system with relative values (permutations) and study codes for correcting deletions. In contrast to the case of a deletion in a regular (with absolute values) representation system, a deletion in this new paradigm results in a new permutation over...
Conference Paper
Full-text available
Error-correcting codes for permutations have received a considerable attention in the past few years, especially in applications of the rank modulation scheme for flash memories. While several metrics have been studied like the Kendall's τ, Ulam, and Hamming distances, no recent research has been carried for erasures and deletions over permutations...
Article
Full-text available
This paper studies new bounds and constructions that are applicable to the combinatorial granular channel model previously introduced by Sharov and Roth. We derive new bounds on the maximum cardinality of a grain-error-correcting code and propose constructions of codes that correct grain-errors. We demonstrate that a permutation of the classical gr...
Article
In non-volatile memories, reading stored data is typically done through the use of predetermined fixed thresholds. However, due to problems commonly affecting such memories, including voltage drift, overwriting, and inter-cell coupling, fixed threshold usage often results in significant asymmetric errors. To combat these problems, Zhou, Jiang, and...
Article
Flash memory is a promising new storage technology. Supported by empirical data collected from a Flash memory device, we propose a class of codes that exploits the asymmetric nature of the error patterns in a Flash device using tensor product operations. We call these codes graded bit-error-correcting codes. As demonstrated on the data collected fr...