Content uploaded by Kees Schouhamer Immink

Author content

All content in this area was uploaded by Kees Schouhamer Immink on Apr 07, 2019

Content may be subject to copyright.

Soft-Decision Decoding for DNA-Based

Data Storage

Mu Zhang∗, Kui Cai∗, Kees A. Schouhamer Immink†, and Pingping Chen∗

∗Science and Math Cluster, Singapore University of Technology and Design, Singapore 487372

†Turing Machines Inc, Willemskade 15d, 3016 DK Rotterdam, The Netherlands

Abstract—This paper presents novel soft-decision decoding

(SDD) of error correction codes (ECCs) that substantially im-

prove the reliability of DNA-based data storage system compared

with conventional hard-decision decoding (HDD). We propose a

simpliﬁed system model for DNA-based data storage according

to the major characteristics and different types of errors

associated with the prevailing DNA synthesis and sequencing

technologies. We compute analytically the error-free probability

of each sequenced DNA oligonucleotide (oligo), based on which

the soft-decision log-likelihood ratio (LLR) of each oligo can be

derived. We apply the proposed SDD algorithms to the recently

proposed DNA Fountain scheme. Simulation results show that

SDD achieves an error rate improvement of two to three orders

of magnitude over HDD, thus demonstrating its potential to

improve the information density of DNA-based data storage

systems.

I. INTRODUCTION

DNA-based data storage has emerged as a promising

candidate for the storage of Big Data. It features extremely

high data storage density (for example 1 exabytes/mm3), long

lasting stability of hundreds to a thousand year, and ultra-

low power consumption for operation and maintenance [1],

[2]. Information storage in DNA has been demonstrated by

several research groups [3]-[8]. At the beginning, due to the

limitation of DNA manipulation technologies, only a small

amount of data was stored in DNA molecules. During recent

few years, DNA productivity has been increased signiﬁcantly,

and storage of megabytes of data has been demonstrated.

The success of DNA-based data storage is largely attributed

to the usage of error correction codes (ECCs). Both the

information writing and reading are prone to errors due to

the speciﬁc bio-chemical and bio-physical processes. Fur-

thermore, the reliability of data storage is hampered by

the substitution errors, insertion errors, and deletion errors

simultaneously [3]. ECCs are a requirement for guaranteeing

data storage reliability. In [4] and [5], repetition codes are

used for data protection. In [3] and [6], two-dimensional

interleaved Reed-Solomon (RS) codes with stronger error

correction capability are applied. Recently, an efﬁcient infor-

mation storage architecture, named DNA Fountain [8], has

been proposed. It combines the Luby transform (LT) codes

and RS codes, and achieves a higher information storage

density than the earlier designs.

In prior art DNA storage systems, ECCs are all decoded

by hard-decision decoding (HDD), which in general requires

large coding redundancy and hence lowers the information

density. Although it is well known that soft-decision decoding

(SDD) can provide a signiﬁcantly performance gain over

HDD, researchers so far were not able to apply SDD of ECCs

to DNA storage systems. This is mainly due to the fact that the

complicated DNA synthesis and sequencing processes cause

much difﬁculty of generating the soft-decision log-likelihood

ratio (LLR) for each DNA oligonucleotide (or oligo for short)

to support the SDD.

In this paper, we ﬁrst characterize the DNA-based data

storage system with the prevailing DNA synthesis and se-

quencing technologies. We then propose a simpliﬁed system

model, through which the LLR is derived analytically based

on the number of occurrences of each sequenced oligo of the

system. We apply the proposed SDD to decode LT codes in

DNA Fountain, and demonstrate its error performance over

the conventional HDD.

The rest of this paper is organized as follows. In section II,

we introduce the DNA-based data storage technology and the

DNA Fountain scheme. In Section III, we present a simpliﬁed

DNA storage system model, as well as the calculation of

LLRs for each sequenced oligo. The proposed DNA Fountain

with SDD as well as the simulation results are given in

Section IV. Finally, Section V concludes the paper.

II. PRELIMINARIES

A. DNA-based Data Storage

A DNA strand, or oligo, is a chain of almost arbitrary

combinations of four base nucleotides, namely Adenine (A),

Cyanine (C), Guanine (G), and Thymine (T). Each base

can represent two bits of information. Modern array-based

synthesis technologies used for DNA storage can synthesize

oligos with length up to about 200 nucleotides [5]. Large

ﬁles must be partitioned into small segments and written into

different oligos. DNA synthesis is hampered mainly by two

biochemical constraints. First, the homopolymer run length

of nucleotide is limited. Long homopolymer runs increase the

error probability. In practical, the maximum homopolymer run

length is set to 1-3. Second, the GC-content of each sequence,

i.e., the percentage of the bases G and C in the sequence,

should not be too high or too low. Sequences violating these

constraints will cause more synthesis or sequencing errors [8].

Given the desired sequences, DNA synthesizer can synthesize

nearly 105different oligos in parallel [9], creating up to

1.2×107copies of each DNA string [10], depending on the

technology used. All these oligos are mixed together in a pool,

which serves as the storage media for DNA data storage.

Reading information in DNA storage is realized by ran-

domly and independently sequencing the oligos in the pool,

with each sequenced oligo as one read. The number of reads

is usually much smaller than the total number of oligos

in the pool. For instance, only 0.1% of oligos in the

pool are consumed for sequencing in an experiment in [4].

Some synthesized sequences may not be sequenced at all in

the reading process. Moreover, there might be a portion of

sequences lost during the DNA manipulations. Thus, erasure

codes are required to recover the input information from the

limited number of sequenced oligos. In addition, polymerase

chain reaction (PCR) is performed to amplify the oligos in the

pool before sequencing. It increases the oligo concentration

for the ease of sequencing and allows multiple access for the

storage system.

In DNA-based data storage, both synthesis and sequencing

are prone to error in the bio-chemical and bio-physical

processes. Most of the recent works reported in the litera-

ture adopt the array-based synthesis and the next generation

sequencing techniques for DNA storage, leading to similar

error patterns and raw error probabilities. It has been found

that substitution, insertion, and deletion base errors and oligo

missing occur in DNA storage systems. Therefore, effec-

tive error detection and correction schemes are required for

improving both the reliability and the information storage

density of DNA storage systems.

B. DNA Fountain architecture

The DNA Fountain is a DNA-based data storage architec-

ture that realized error free data writing and reading with a

high number of bits per nucleotide in the literature. It consists

of an RS code for each oligo as the inner code and an LT code

for a set of oligos as the outer code. Because insertion and

deletion errors in the oligos are problematic for efﬁcient error

correction, the RS code in DNA Fountain is only used for

error detection. Oligos with undesired lengths due to insertion

or deletion errors, or those violating the parity-check of the

RS code are discarded and considered missing for the outer

LT code. LT codes are a class of capacity-achieving codes for

erasure channels [12]. Thus, it can tackle the oligo missing

(i.e. oligo dropout) due to various errors. For a given set of k

input symbols, an LT code can create any desired number of

packages, each consisting of the indices and the summation of

random dinput symbols. Here, dfollows the Robust Soliton

Distribution (RSD) µK,c,δ(d)[12], given by

µK,c,δ(d) = ρ(d) + τ(d)

Z,(1)

where

ρ(d) = 1/K if d= 1;

1

d(d−1) for d= 2, ..., K,

τ(d) =

s

Kd for d= 1,2, ..., K/s −1;

slog(s/δ)

Kfor d=K/s;

0for d > K/s,

Partition

Binary file

LT enc.

RS enc.

Mapping

Screening

Synthesis

Recovered file

Combine

LT dec.

RS dec.

Demapping

Sequencing

Segments

Random packs

Droplets

Base sequences

Repeat until enough

oligos are created

Oligo

Pool

Writing Reading

Fig. 1. Block diagram of data writing and reading of DNA Fountain.

and Z=d(ρ(d) + τ(d)) is a normalization coefﬁcient.

Due to the randomness of LT codes, the biochemical con-

straints of DNA manipulations can be satisﬁed by discarding

all the invalid sequences, at the expense of a long encoding

latency.

Fig. 1 shows a diagram of DNA Fountain. The binary

source ﬁle is ﬁrst partitioned into non-overlapping segments

of a certain length. Packages of segments are then produced

by selecting a random subset of segments using the RSD

distribution and adding them bitwise together under a binary

ﬁeld. Each package is attached with a unique seed created

by a pseudo random number generator (PRNG). This is

essentially the encoding process of the LT code. The obtained

package with its seed is then encoded by an RS code to

obtain a short message called droplet. After that, the binary

droplet is mapped into a DNA base sequence, and a screening

process is performed where the invalid droplets that violate

the biochemical constraints are rejected. The LT-RS-screening

process is then performed iteratively until a sufﬁcient number

of valid droplets is created and synthesized into an oligo

pool. By sequencing the oligos in the pool, demapping the

obtained DNA base sequences into binary droplets, followed

by decoding of the RS code and LT code, the source ﬁle can

be recovered.

III. DNA-BASED DATA STORAG E SYS TE M MODELING

AN D LLR CA LC UL ATIO N

A. DNA-based data storage system model

In this subsection, we propose a simpliﬁed system model

that characterizes the DNA-based data storage following the

analyses of experimental data of open literatures [8]-[11].

Suppose that N′unique input sequences, each with nbases

as the data payload, are synthesized, resulting in Sreads

after oligo sequencing. Then S′output sequences are obtained

after inner-code-parity-checking and merging of identical

sequences. By making a few assumptions, we can derive a

simpliﬁed DNA storage system model shown in Fig. 2.

狭

Random sampling

with replacement

Subs. ins. & del.

errors injection

Merging

identical reads

Sample population

N n-tuples

Random Samples

S n-tuples

Sequence reads

S reads

Output sequences

S’ n-tuples

Removing

erased sequences

Input sequences

N’ n-tuples

Fig. 2. Simpliﬁed DNA storage system model.

We model the synthesis and sequencing processes as a

random sampling process such that the output sequences

are randomly sampled from the N′input sequences. The

sampling consists of two stages. First, some sequences are

sampled as erasures such that they are missed during the

DNA manipulations. The second stage is to sample reads

from the remaining sequences. At this stage, we assume all

sequences in the population have the same number of copies

created by the synthesizer. We further assume that the PCR

ampliﬁcation is ideal such that all synthesized sequences

are equally ampliﬁed error free. Then, the input sequence

will either be an erasure as shown by the ﬁrst block of

Fig. 2, or be sampled with a constant probability. Since the

number of reads is much smaller than the number of oligos

in the pool, the second stage can be considered as a uniform

random sampling with replacement. Next, we noticed that

the synthesis and sequencing errors are independent with

each other, and they occur consecutively in the DNA storage

system. We thus combine the errors generated by the two

processes, by injecting the combined amount of substitution

errors, insertion errors, and deletion errors respectively into

the sampled input sequence. The fourth stage of the system

model merges all identical reads to obtain S′output sequences

for information recovery.

B. Log-Likelihood Ratio calculation

All DNA storage architectures proposed in the literature use

HDD for error control. In general, HDD is less reliable and

hence requires more redundancy for achieving a target error

rate than SDD. Speciﬁcally, for DNA Fountain, there exist

simultaneously the insertion and deletion errors that may not

be detectable by the inner RS codes. The traditional HDD of

DNA Fountain, i.e., the inverse LT (ILT) [12], does not have

error correction capability, and a single erroneous oligo that

was accepted as error free may result in a large number of

decoded errors. This motivates us to seek for soft information

to enable SDD of ECCs to increase the reliability for DNA

data storage.

Recall that by sequencing the oligo pool, we may obtain

multiple output sequences carrying information of the same

input sequence. Since base errors occur randomly and in-

dependently in different copies of the same input sequence

created by the synthesizer [4], output sequences with more

occurrences are more likely to be error free. Consider that an

output sequence occurs rtimes, denoted by event Dr, with

r= 1,2, ..., S. We show that the LLR of the sequence can

be derived explicitly as follows.

Since all output sequences have nbases after RS decoding,

this ensures that the insertion and deletion errors do not occur,

or only occur in pairs. We refer to a pair of insertion and

deletion errors as an i-d error and let Est be the event that a

sequence is corrupted by ssubstitution errors and ti-d errors.

Moreover, the validity of each output sequence is checked by

the inner code, and we use event Cto denote the case where

the output sequence is a valid codeword. The LLR of an

output sequence occurring rtimes is thus given by

Lr= log P(E00|C, Dr)

1−P(E00|C, Dr).(2)

To compute P(E00|C, Dr), we assume that the inner code

has a minimum distance dmin. Let Bdenote the event that

the sequence has greater than or equal to dmin code symbol

errors. Applying the law of conditional probability and total

probability, we obtain

P(E00|C, Dr) = P(E00, C, Dr)

P(E00, C, Dr) + P(B , C, Dr),(3)

where

P(E00, C, Dr) = P(C|E00 , Dr)P(E00)P(Dr|E00 ),

and

P(B, C, Dr)

=P(C|B, Dr)

s,t

P(Est)P(Dr|B , Est)P(B|Est).

Therefore, we obtain

Lr= log [P(C|E00, Dr)P(E00)P(Dr|E00 )

P(C|B, Dr)∑s,t P(Est )P(Dr|B, Est )P(B|Est)].

(4)

Note that the terms associated with event Bin (4) depend

on the error detection capability of the inner code. In the

following, we use the inner code of [8] as an example to

derive all the corresponding terms in (4) to obtain Lr. The

proposed derivations can be generalized to other inner codes

in a straightforward way. In [8], the inner code is an RS

code over GF(256) with nccode symbols and 2 parity-check

symbols. For simplicity, we assume that the oligo length nis

a multiple of 4 such that nc=n/4. This code has dmin = 3

and can detect up to two errors over GF(256). Thus, output

sequences with less than three code symbol errors can always

be detected by the inner code. Then, we can compute the

probabilities in (4) to obtain Lr.

Apparently, P(C|E00, Dr) = 1 and P(C|B, Dr) =

P(C|B). The probability P(C|B)is the undetected error

rate of the inner code under the condition that the sequence

has greater than two code symbol errors. Due to the exis-

tence of the i-d errors, each error pattern can be considered

as a random nc-tuple over GF(256) with weight greater

than two. The total number of such nc-tuples is given by

256nc−1−255nc−2552nc!

(nc−2)!2! , with 256nc−2tuples

forming the complete set of the inner codewords. For the case

of DNA Fountain, we have nc≫2. Thus, the probability for

a random vector with weight greater than 2 to be a codeword

is given by

P(C|B, Dr) = 256nc−2

256nc−1−255nc−2552nc!

(nc−2)!2!

≈256−2.(5)

Then, we can compute P(Est)for various base error rates,

given by

P(Est) = n!

(n−s−2t)!s!t!t!ps

spt

ipt

d(1 −p)n−s−2t,(6)

where ps,pi, and pdare the raw substitution, insertion, and

deletion error rates of the system, respectively, with p=ps+

pi+pd, and s, t ≥0.

Next, we compute P(B|Est),P(Dr|E00), and

P(Dr|B, Est )in (4). Note that P(B|Est)and P(Dr|B , Est)

depend on the error pattern associated with Est, and the error

pattern of the i-d error is related to the input sequence. As an

approximation, we assume all sequenced bases affected by

the i-d errors are incorrect. Moreover, since the base errors

occur rarely, e.g., one error per hundred bases [6] or less

[4], the probability of having more than three base errors

is trivial. Therefore, we only consider three types of error

patterns: E01 (1 i-d error), E11 (1 substitution error and 1

i-d error), and E30 (3 substitution errors). Hence we have

P(B|Est)≈1with {st}={01,11,30}.

According to our proposed system model, P(Dr|E00)

and P(Dr|B, Est )are probabilities with the corresponding

sequences being sampled rtimes in the random sampling

with replacement process. They can be calculated based on

the number of samples and the population of the sampling

associated with P(Dr|E00)and P(Dr|B , Est), denoted by

Sst and Nst, respectively. Let Pr,st be the uniﬁed form of

P(Dr|E00)and P(Dr|B , Est). We thus have

Pr,st =(Sst −1)!

(Sst −r)!(r−1)! 1

Nst r−11−1

Nst Sst−r

.

(7)

In (7), the number of samples is given by Sst =⌊S·

P(Est)⌋. To determine Nst, we need to ﬁrst compute the

number of error patterns for all possible cases, denoted by

n00,n01 ,n11, and n31 , respectively. For E00, the error pattern

is always 0,i.e.,n00 = 1. For other cases, and by considering

each substitution error or insertion error has three different

error patterns while each deletion error has one, we have

n01 = 3 n!

(n−2)! −nc

4!

2! −(nc−1)2!4!

3!

4!

3!,

n11 = 32n!

(n−3)! −nc4! −nc!

(nc−2)!

4!

3!

4!

2!

−(nc−1)2!4!

3!

4!

3!

6!

5!,

n30 = 33n!

(n−3)!3! −nc

4!

3! −nc!

(nc−2)!2! 4!

3!

4!

2!2!,

We can then obtain the population of each sampling given as

Nst =N·nst.

At this point, we have derived all the probabilities involved

in (4) and thus the soft information Lrof each oligo can be

obtained.

IV. SOF T-DECISION DECODING FOR DNA-BAS ED DATA

STORAG E

In this section, we apply SDD to DNA-based data storage

system and investigate its performance gain over HDD. In

principle, all existing DNA storage systems with ECCs can

use the proposed LLR calculation to carry out SDD for more

reliable data retrieval. As an example, we apply SDD to DNA

Fountain [8].

In [8], the source data is stored in n= 152 bases per oligo.

Each oligo consists of 32 bytes of data payload, 4 bytes of

seed for the PRNG of the LT code, and 2 bytes of parity-

check symbols of a (38, 36) shortened RS code over GF(256).

During the screening stage, sequences with homopolymer

run length greater than 3 or GC-content exceeds the range

of [0.45, 0.55] are rejected. In the writing process, 67088

segments of binary message are encoded into 72000 base

sequences, and thus multiple copies of 72000 unique oligos

are synthesized. In the reading process, different number of

reads, e.g., from 750000 to 32000000, are performed [8] to

evaluate the performance of DNA Fountain. For the ease of

simulations, we consider the case with 750000 reads in this

work.

An ILT, the traditional HDD of LT codes, is essentially

a simpliﬁed Gaussian elimination. It does not have error

correction capability. That is, even if the ILT is successful,

the recovered messages may be in error. Recall that the LT

code is a binary linear block code with a random generator

matrix G= [P I], where Pis randomly generated with

column weight distribution following the RSD, and Iis an

identity matrix. We can then obtain its parity-check matrix

H= [I PT]. Since the RSD produces a large number of

degree-2 nodes, His a sparse matrix. Therefore, the LT

code can be decoded directly by using the belief propagation

algorithm (BPA) of low-density parity-check (LDPC) codes

[13], with the soft information derived in Section III-B.

In our simulations, as the raw error rate of the system is

not given in [8], we follow [3] and [7] to set the substitution,

insertion, and deletion error rates, respectively. In particular,

based on the error analyses in [3] and [7], we can obtain

0123456

ps10-3

10-4

10-3

10-2

10-1

FER

HDD

SDD

pd=310-3

pd=3.2510-3

pd=3.510-3

pd=3.7510-3

pd=410-3

pd=4.2510-3

pi= pd / 5

Fig. 3. FER comparison of DNA Fountain with SDD and HDD.

the ranges of different types of raw error rates, i.e.,ps∼

[6×10−4,4.5×10−3],pi∼[5.4×10−4,1×10−3], and pd∼

[1.5×10−3,5×10−3]. Moreover, it has been observed that the

deletion error rate pdis approximately three to six times as

much as the insertion error rate pi, and the substitution error

rate psvaries, with the total raw error rate being in the range

of ∼[2 ×10−3,1×10−2]. Therefore, we set pd= 5piand

vary the values of psin our simulations. In addition, based

on the supplementary materials of [8], the erasure rate of the

input sequence of the system is set to 5×10−3.

Moreover, different from LDPC codes, the LT code in DNA

Fountain is nonsystematic, i.e., none of the information bits

are written into oligos. Hence the soft information obtained

from the DNA storage channel are only associated with the

bit positions of Pin Gand correspondingly Iin H. In the

simulations the LLRs of oligos associated with Iin Hcan

be computed for each set of raw error rates according to (4),

based on their number of occurrence. The LLRs of all the

other oligos are set to 0.

Fig. 3 illustrates the simulated frame error rate (FER)

performance of DNA fountain with SDD and HDD, respec-

tively. Note that the FERs are evaluated for the frames with

sufﬁcient number of sequenced oligos such that the ILT can

be successfully carried out. It can be seen from Fig. 3 that

SDD outperforms HDD with an FER reduction of two to

three orders of magnitude, over a wide range of substitution,

insertion, and deletion errors. This demonstrates the potential

of the proposed SDD for improving system’s tolerance to

various types of errors, and increasing the information density

of DNA-based data storage system.

V. CONCLUSION

In this paper, we have investigated, for the ﬁrst time,

the SDD of ECCs for improving the error performance of

DNA-based data storage. In particular, we have proposed

a simpliﬁed system model for the DNA-based data storage

system through analyzing system’s major characteristics and

different types of errors. We have derived the error-free

probability of each sequenced oligo, based on which we

obtain its LLR that enables SDD of ECCs. To demonstrate

the effectiveness of the proposed SDD, we have applied it

to decode the LT code of DNA Fountain. Simulation results

have shown that for DNA Fountain, the proposed SDD can

effectively improve system’s tolerance to various types of

errors, and it achieves an FER reduction of two to three orders

of magnitude over HDD.

ACK NOW LE DG EM EN T

This work is supported by Singapore Ministry of Educa-

tion Academic Research Fund Tier 2 MOE2016-T2-2-054,

SUTD-ZJU grant ZJURP1500102, and SUTD SRG grant

SRLS15095.

REF ER EN CE S

[1] M. E. Allentoft, M. Collins, D. Harker, J. Haile, C. L. Oskam, M. L.

Hale, P. F. Campos, J. A. Samaniego, M. T. P. Gilbert, E. Willerslev,

G. Zhang, R. P. Scoﬁeld, R. N. Holdaway, and M. Bunce, “The half-

life of DNA in bone: measuring decay kinetics in 158 dated fossils,”

Proceedings of the Royal Society of London B: Biological Sciences,

2012.

[2] C. Bancroft, T. Bowler, B. Bloom, and C. T. Clelland, “Long-term

storage of information in DNA,” Science, vol. 293, no. 5536, pp. 1763–

1765, 2001.

[3] M. Blawat, K. Gaedke, I. Huetter, X.-M. Chen, B. Turczyk, S. Inverso,

B. W. Pruitt, and G. M. Church, “Forward error correction for DNA data

storage,” Procedia Computer Science, vol. 80, pp. 1011–1022, 2016.

[4] N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust,

B. Sipos, and E. Birney, “Towards practical, high-capacity, low-

maintenance information storage in synthesized DNA,” Nature, vol.

494, no. 7435, pp. 77–80, 2013.

[5] J. Bornholt, R. Lopez, D. M. Carmean, L. Ceze, G. Seelig, and

K. Strauss, “A DNA-based archival storage system,” SIGPLAN Not.,

vol. 51, no. 4, pp. 637–649, Mar. 2016.

[6] R. N. Grass, R. Heckel, M. Puddu, D. Paunescu, and W. J. Stark,

“Robust chemical preservation of digital information on DNA in silica

with error-correcting codes,” Angewandte Chemie International Edition,

vol. 54, no. 8, pp. 2552–2555, 2015.

[7] L. Organick, S. D. Ang, Y. J. Chen, R. Lopez, S. Yekhanin, K.

Makarychev, M. Z. Racz, G. Kamath, P. Gopalan, B. Nguyen, C.

Takahashi, S. Newman, H. Y. Parker, C. Rashtchian, G. G. K. Stewart,

R. Carlson, J. Mulligan, D. Carmean, G. Seelig, L. Ceze, and K. Strauss,

“Scaling up DNA data storage and random access retrieval,” bioRxiv,

2017.

[8] Y. Erlich and D. Zielinski, “DNA fountain enables a robust and efﬁcient

storage architecture,” Science, vol. 355, no. 6328, pp. 950–954, 2017.

[9] S. Kosuri and G. M. Church, “Large-scale de novo DNA synthesis:

technologies and applications,” Nature methods, vol. 11, no. 5, pp. 499–

507, 2014.

[10] E. M. LeProust, B. J. Peck, K. Spirin, H. B. McCuen, B. Moore,

E. Namsaraev, and M. H. Caruthers, “Synthesis of high-quality libraries

of long (150mer) oligonucleotides by a novel depurination controlled

process,” Nucleic acids research, vol. 38, no. 8, pp. 2522–2540, 2010.

[11] R. Heckel, G. Mikutis, and R. N. Grass, “A Characteriza-

tion of the DNA Data Storage Channel.” arXiv:1803.03322, http-

s://arxiv.org/abs/1803.03322, 2018.

[12] M. Luby, “LT codes,” in IEEE Symp. Found. of Comp. Science, 2002,

pp. 271–280.

[13] W. E. Ryan and S. Lin, Channel codes: classical and modern. Cam-

bridge University Press, 2009.