About
129
Publications
10,109
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,230
Citations
Introduction
Han Mao Kiah currently works at the Division of Mathematical Sciences (DMS), Nanyang Technological University. Han Mao does research in Applied Mathematics.
Additional affiliations
February 2014 - February 2015
August 2013 - February 2014
Education
August 2010 - February 2014
August 2002 - December 2005
Publications
Publications (129)
In this work, given n,ϵ > 0, two efficient encoding (decoding) methods are presented for mapping arbitrary data to (from) n×n binary arrays in which the weight of every row and every column is within [(1/2–ϵ)n, (1/2+ϵ)n], which is referred to as the ϵ-balanced constraint. The first method combines the divide and conquer algorithm and a modification...
We study and propose schemes that map messages onto constant-weight codewords using variable-length prefixes. We provide polynomial-time computable formulas that estimate the average number of redundant bits incurred by our schemes. In addition to the exact formulas, we also perform an asymptotic analysis and demonstrate that our scheme uses $\frac...
Transmit a codeword $x$, that belongs to an $(\ell-1)$-deletion-correcting code of length $n$, over a $t$-deletion channel for some $1\le \ell\le t<n$. Levenshtein, in 2001, proposed the problem of determining $N(n,\ell,t)+1$, the minimum number of distinct channel outputs required to uniquely reconstruct $x$. Prior to this work, $N(n,\ell,t)$ is k...
The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize the window property of de Bruijn sequences, on its shorter subsequen...
We propose coding techniques that simultaneously limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given ℓ, ϵ > 0, we propose simple and efficient encoders/decoders that transform binary sequences...
The problem of computing the permanent of a matrix has attracted interest since the work of Ryser(1963) and Valiant(1979). On the other hand, trellises were extensively studied in coding theory since the 1960s. In this work, we establish a connection between the two domains. We introduce the canonical trellis $T_n$ that represents all permutations,...
We propose new repair schemes for Reed-Solomon codes that use subspace polynomials and hence generalize previous works in the literature that employ trace polynomials. The Reed-Solomon codes are over Fqℓ and have redundancy r = n-k ≥ qm, 1 ≤ m ≤ ℓ, where n and k are the code length and dimension, respectively. In particular, for one erasure, we sho...
It is well known that, whenever k divides n, the complete k‐uniform hypergraph on n vertices can be partitioned into disjoint perfect matchings. Equivalently, the set of k‐subsets of an n‐set can be partitioned into parallel classes so that each parallel class is a partition of the n‐set. This result is known as Baranyai's theorem, which guarantees...
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this article, we investigate codes that correct either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two l...
The Hamming ball of radius \begin{document}$ w $\end{document} in \begin{document}$ \{0,1\}^n $\end{document} is the set \begin{document}$ \mathcal{B}(n,w) $\end{document} of all binary words of length \begin{document}$ n $\end{document} and Hamming weight at most \begin{document}$ w $\end{document}. We consider injective mappings \begin{document}$...
The \begin{document}$ k $\end{document}-deck of a sequence is defined as the multiset of all its subsequences of length \begin{document}$ k $\end{document}. Let \begin{document}$ D_k(n) $\end{document} denote the number of distinct \begin{document}$ k $\end{document}-decks for binary sequences of length \begin{document}$ n $\end{document}. For bina...
The class of multiset combinatorial batch codes (MCBCs) was introduced by Zhang
et al.
(2018) as a generalization of combinatorial batch codes (CBCs), which are replication-based batch codes. The MCBCs allow multiple users to retrieve items in parallel in a distributed storage and a fundamental objective in this study is to determine the minimum...
We apply the generalized sphere-packing bound to two classes of subblock-constrained codes.
À la
Fazeli
et al.
(2015), we make use of automorphisms to significantly reduce the number of variables in the associated linear programming problem. In particular, we study binary
constant subblock-composition
codes (CSCCs), characterized by the prope...
We propose new repair schemes for Reed-Solomon codes that use subspace polynomials and hence generalize previous works in the literature that employ trace polynomials. The Reed-Solomon codes are over $\mathbb{F}_{q^\ell}$ and have redundancy $r = n-k \geq q^m$, $1\leq m\leq \ell$, where $n$ and $k$ are the code length and dimension, respectively. I...
It is well known that, whenever $k$ divides $n$, the complete $k$-uniform hypergraph on $n$ vertices can be partitioned into disjoint perfect matchings. Equivalently, the set of $k$-subsets of an $n$-set can be partitioned into parallel classes so that each parallel class is a partition of the $n$-set. This result is known as Baranyai's theorem, wh...
In this paper, we first propose coding techniques for DNA-based data storage which account the maximum homopolymer runlength and the GC-content. In particular, for arbitrary $\ell,\epsilon > 0$, we propose simple and efficient $(\epsilon, \ell)$-constrained encoders that transform binary sequences into DNA base sequences (codewords), that satisfy t...
The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be...
The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. Motivated by modern storage devices, we introduced a variant of the problem where the number of noisy reads $N$ is fixed (K...
To equip DNA-based data storage with random-access capabilities, Yazdi
et al.
(2018) prepended DNA strands with specially chosen address sequences called primers and provided certain design criteria for these primers. We provide explicit constructions of error-correcting codes that are suitable as primer addresses and equip these constructions wi...
In a bus with n wires, each wire has two states, '0' or '1', representing one bit of information. Whenever the state transitions from '0' to '1', or '1' to '0', joule heating causes the temperature to rise, and high temperatures have adverse effects on on-chip bus performance. Recently, the class of low-power cooling (LPC) codes was proposed to con...
We propose coding techniques that limit the length of homopolymers runs, ensure the GC-content constraint, and are capable of correcting a single edit error in strands of nucleotides in DNA-based data storage systems. In particular, for given $\ell, {\epsilon} > 0$, we propose simple and efficient encoders/decoders that transform binary sequences i...
The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. The common setup assumes the codebook to be the entire space and the problem is to determine the minimum number of distinct...
A robust positioning pattern is a large array that allows a mobile device to locate its position by reading a possibly corrupted small window around it. In this paper, we provide constructions of binary positioning patterns, equipped with efficient locating algorithms, that are robust to a constant number of errors and have redundancy within a cons...
The linear complexity of a sequence $s$ is one of the measures of its predictability. It represents the smallest degree of a linear recursion which the sequence satisfies. There are several algorithms to find the linear complexity of a periodic sequence $s$ of length $N$ (where $N$ is of some given form) over a finite field $F_q$ in $O(N)$ symbol f...
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two line...
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. In this paper, we investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two line...
Write-once memory (WOM) is a storage device consisting of binary cells that can only increase their levels. A t-write WOM code is a coding scheme that makes it possible to write t times to a WOM without decreasing the levels of any of the cells. The sum-rate of a WOM code is the ratio between the total number of bits written to the memory during th...
The class of multiset combinatorial batch codes (MCBCs) was introduced by Zhang et al. (2018) as a generalization of combinatorial batch codes (CBCs). MCBCs allow multiple users to retrieve items in parallel in a distributed storage and a fundamental objective in this study is to determine the minimum total storage given certain requirements.
We r...
Private Information Retrieval (PIR) array codes were introduced by Fazeli et al. (2015) to reduce the storage overhead in designing PIR protocols. Blackburn and Etzion (2017) introduced the (virtual server) rate to quantify the storage overhead of the codes, and when $s>2$ (here, $\frac{1}{s}$ is the proportion of the database storing in one server...
An indel refers to a single insertion or deletion, while an edit refers to a single insertion, deletion or substitution. We investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoder...
In 1989 we organized the first Benelux‐Japan workshop on Information and Communication theory in Eindhoven, the Netherlands. This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have man...
This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay...
To equip DNA-based data storage with random-access capabilities, Yazdi et al. (2018) prepended DNA strands with specially chosen address sequences called primers and provided certain design criteria for these primers. We provide explicit constructions of error-correcting codes that are suitable as primer addresses and equip these constructions with...
We apply the generalized sphere-packing bound to two classes of subblock-constrained codes. A la Fazeli et al. (2015), we made use of automorphism to significantly reduce the number of variables in the associated linear programming problem. In particular, we study binary constant subblock-composition codes (CSCCs), characterized by the property tha...
We demonstrate that certain Johnson-type bounds are asymptotically exact for a variety of classes of codes, namely, constant-composition codes, nonbinary constant-weight codes, group divisible codes, and multiply constant-weight codes. We achieve this via an application of the theory of decomposition of edge-colored digraphs.
We investigate constant-composition constrained codes for mitigation of intercell interference for multilevel cell flash memories with dynamic threshold scheme. The first explicit formula for the maximum size of a q-ary F-avoiding code with a given composition and certain families of substrings F is presented. In addition, we provide methods to det...
A class of low-power cooling (LPC) codes, to control simultaneously both the peak temperature and the average power consumption of interconnects, was introduced recently. An $(n,t,w)$-LPC code is a coding scheme over $n$ wires that (A) avoids state transitions on the $t$ hottest wires (cooling), and (B) limits the number of transitions to $w$ in ea...
The Hamming ball of radius $w$ in $\{0,1\}^n$ is the set ${\cal B}(n,w)$ of all binary words of length $n$ and Hamming weight at most $w$. We consider injective mappings $\varphi: \{0,1\}^m \to {\cal B}(n,w)$ with the following domination property: every position $j \in [n]$ is dominated by some position $i \in [m]$, in the sense that "switching of...
FIVE THOUSAND YEARS AGO, a man died in the Alps. It's possible he died from a blow to the head, or he may have bled to death after being shot in the shoulder with an arrow. There's a lot we don't know about Otzi (named for the Otztal Alps, where he was discovered), despite the fact that researchers have spent almost 30 years studying him.
Tandem duplication is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain et al. (2017) proposed the study of codes that correct tandem duplications. Known code constructions are based on {\em irreducible words}. We study efficient encoding/decodi...
The class of geometric orthogonal codes (GOCs) were introduced by Doty and Winslow (2016) for more robust macrobonding in DNA origami. They observed that GOCs are closely related to optical orthogonal codes (OOCs). It is possible for GOCs to have size greater than OOCs of corresponding parameters due to slightly more relaxed constraints on correlat...
High temperatures have dramatic negative effects on interconnect performance and, hence, numerous techniques have been proposed to reduce the power consumption of on-chip buses. However, existing methods fall short of fully addressing the thermal challenges posed by high-performance interconnects. In this paper, we introduce new efficient coding sc...
We introduce the notion of weakly mutually uncorrelated (WMU) sequences, motivated by applications in DNA-based data storage systems and for synchronization of communication devices. WMU sequences are characterized by the property that no sufficiently long suffix of one sequence is the prefix of the same or another sequence. WMU sequences used for...
Tandem duplication in DNA is the process of inserting a copy of a segment of DNA adjacent to the original position. Motivated by applications that store data in living organisms, Jain et al. (2016) proposed the study of codes that correct tandem duplications to improve the reliability of data storage. We investigate algorithms associated with the s...
The subblock energy-constrained codes (SECCs) have recently been shown to be suitable candidates for simultaneous energy and information transfer, where bounds on SECC capacity were presented for communication over noisy channels. In this paper, we study binary SECCs with given error correction capability, by considering codes with a certain minimu...
The study of binary constant subblock-composition codes (CSCCs) has recently gained attention due to their application in diverse fields. These codes are a class of constrained codes where each codeword is partitioned into equal sized subblocks, and every subblock has the same fixed weight. We present novel upper and lower bounds on the asymptotic...
We introduce a new family of codes, termed asymmetric Lee distance (ALD) codes, designed to correct errors arising in DNA-based storage systems and systems with parallel string transmission protocols. ALD codes are defined over a quaternary alphabet and analyzed in this particular setting, but the derived results hold for other alphabet sizes as we...