Page 1

On the reduced-complexity of LDPC decoders

for ultra-high-speed optical transmission

Ivan B. Djordjevic,1* Lei Xu,2 and Ting Wang2

1Department of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona 85721, USA

2NEC Laboratories America, Princeton, New Jersey 08540, USA

*ivan@ece.arizona.edu

Abstract: We propose two reduced-complexity (RC) LDPC decoders,

which can be used in combination with large-girth LDPC codes to enable

ultra-high-speed serial optical transmission. We show that optimally

attenuated RC min-sum sum algorithm performs only 0.46 dB (at BER of

109) worse than conventional sum-product algorithm, while having lower

storage memory requirements and much lower latency. We further study the

use of RC LDPC decoding algorithms in multilevel coded modulation with

coherent detection and show that with RC decoding algorithms we can

achieve the net coding gain larger than 11 dB at BERs below 109.

©2009 Optical Society of America

OCIS codes: (060.0060) Fiber optics and optical communications; (060.1660) Coherent

communications; (060.4080) Modulation; (999.9999) Forward error correction.

References and links

1. R. Saunders, M. Traverso, T. Schmidt, and C. Malouin, “Economics of 100Gb/s transport,” in Proc.

OFC/NFOEC 2010, Paper No. NMB2, San Diego, CA, March 21–25, 2010.

2. I. B. Djordjevic, W. Ryan, and B. Vasic, Coding for Optical Channels (Springer, 2010).

3. I. B. Djordjevic, and H. G. Batshon, Lei Xu and T. Wang, “Coded polarization-multiplexed iterative polar

modulation (PM-IPM) for beyond 400 Gb/s serial optical transmission,” in Proc. OFC/NFOEC 2010, Paper No.

OMK2, San Diego, CA, March 21–25, 2010.

4. J. Chen, A. Dholakia, E. Eleftheriou, M. Fossorier, and X.-Y. Hu, “Reduced-complexity decoding of LDPC

codes,” IEEE Trans. Commun. 53(8), 1288–1299 (2005).

5. M. P. C. Fossorier, M. Mihaljevic, and H. Imai, “Reduced-complexity iterative decoding of low-density parity-

check codes based on belief propagation,” IEEE Trans. Commun. 47(5), 673–680 (1999).

1. Introduction

In response to high-bandwidth demands due to rapid growth of data-centric applications and

deployment of broadband access networks, the network operators are upgrading their dense

wavelength division multiplexing (DWDM) networks from 10Gb/s per channel to more

spectrally efficient 40 Gb/s and 100 Gb/s [1]. The 400 Gb/s and 1 Tb/s serial optical

transmissions are regarded to be the next steps after 100Gb/s and have already started

attracting interests from research community in both academia and industry [2]. In order to

achieve ultra-high-speed optical transmission at 400Gb/s and beyond with commercially

available equipment operating at 40 Giga symbols/s (40 GS/s), we have recently proposed the

use of iterative polarization quantization (IPQ)-based modulation scheme with component

codes being large-girth low density parity check (LDPC) codes [3]. This scheme, however,

requires the implementation of sum-product algorithm (SPA), commonly used in decoding of

LDPC codes, at 40 Gb/s, which is challenging to implement even with state-of-the-art

electronic integration circuits technology.

In order to reduce the complexity of SPA, a number of different approximate algorithms

were proposed [2,4,5]. The main focus was to reduce the complexity of check-node (c-node)

update rule. Because c-node update rule is the key step in SPA, imperfect approximations lead

to significant BER performance degradation. In this paper we follow a different strategy.

Instead of trying to reduce the complexity of c-node update rule, we try to reduce the

complexity of variable-node (v-node) update rule. Two reduced complexity (RC) LDPC

decoding algorithms are introduced: (i) RC min-sum algorithm and (ii) RC a posteriori

#135803 - $15.00 USD

(C) 2010 OSA

Received 29 Sep 2010; accepted 12 Oct 2010; published 20 Oct 2010

25 October 2010 / Vol. 18, No. 22 / OPTICS EXPRESS 23371

Page 2

probability (APP) algorithm. We show that even complete elimination of v-node update rule

leads to only 0.46 dB degradation when large-girth LDPC codes are used. We further study

the use of RC decoding algorithm in polarization multiplexed (PolMUX) 32-IPQ with symbol

rate of 50 Giga symbols/s (50 GS/s) and show that that net coding gains (NCGs) beyond 11

dB (at BER 109) are possible.

The paper is organized as follows. In Section 2, we provide two RC LDPC decoding

algorithms and evaluate their BER performance, decoding complexity, and memory

requirements. In Section 3, we describe polarization-multiplexed IPQ scheme suitable for

ultra-high-speed serial optical transmission and evaluate its performance when the proposed

RC LDPC decoding algorithms are used. Concluding remarks are given in Section 4.

2. Reduced-complexity LDPC decoders

Before we fully describe the proposed RC algorithms, we introduce several definitions that

will facilitate the description. A regular (n, k) LDPC code is a linear block code whose parity-

check matrix H contains exactly Wc 1's per column and exactly Wr = Wc(n/(n-k)) 1’s per row,

where Wc << (n-k). Decoding of LDPC codes is based on SPA, which is an iterative decoding

algorithm where extrinsic probabilities are iterated forward and back between variable and

check nodes of bipartite (Tanner) graph representation of a parity-check matrix H. The Tanner

graph of an LDPC code is drawn according to the following rule: check node c is connected to

variable node v whenever element hcv in H is a 1. For v-node v (c-node c) we define its

neighborhood N(v) (N(c)) as the set of c-nodes (v-nodes) connected to it. For the completeness

of presentation we briefly describe the log-domain SPA (for detailed description an interested

reader is referred to [2]).

Gallager SPA

0. Initialization: For v = 0,1,…,n-1; initialize the messages Lvc to be sent from v-node v

to c-node c to channel log-likelihood ratios (LLRs) Lch(v)*, namely Lvc = Lch(v).

1. c-node update rule: For c = 0,1,…,n-k-1; compute

\

N c

.

cvvc

v

LL

The box-plus

operator is defined iteratively by

x

2

2

12

1

1

sign, logtanh / 2

kk

k

k

LLLLx

.

The box operator for |N(c)\{v}| components is obtained by recursively applying 2-

component version defined above.

2. v-node update rule: For v = 0,1,…,n-1; set

v

\

N v

ch

vccv

c

LLL

for all c-nodes for

which hcv = 1.

3. Bit decisions: Update L(v) (v = 0,…,n-1) by

v

ch

cv

N v

L vLL

and set ˆ 1

v when

L(v)<0 (otherwise, ˆ

been reached then stop, otherwise go to step 1.

0

v

). If ˆ

T0

vH

or pre-determined number of iterations has

*The channel LLR is defined by Lch(v) = log[P(v = 0|y)/P(v = 1|y)], where y is the channel

sample. For example, for asymmetric AWGN channel Lch(v) = log(σ1/σ0)-(y-μ0)2/2σ0

+ (y-μ1)2/2σ1

where μi (σi) denote the mean (standard deviation) corresponding to symbol i (i =

0,1).

Because the c-node update rule involves log and tanh functions, it is computationally

intensive, and there exist many approximation approaches. A popular one is the min-sum-

plus-correction-term approximation [4]. Namely, it can be shown that “box-plus” operator

can also be calculated by

12

1

k

correction factor defined by c(x,y) = log[1 + exp(-|x + y|)]-log[1 + exp(-|x-y|)], commonly

2

2, while for symmetric AWGN (σ1 = σ0 = σ) channel Lch(v) = 2y/σ2,

2

12

signmin,,,

k

LLLLLc x y

where c(x,y) denotes the

#135803 - $15.00 USD

(C) 2010 OSA

Received 29 Sep 2010; accepted 12 Oct 2010; published 20 Oct 2010

25 October 2010 / Vol. 18, No. 22 / OPTICS EXPRESS 23372

Page 3

implemented as a look-up table (LUT). In our approach, we concentrate on variable node

update rule in order to reduce the complexity of SPA. It is clear from algorithm above that the

computational complexity of c-node update rule is much higher than that of v-node, while

memory storage requirements are comparable. Because the v-node update rule step is not that

critical compared to c-node update rule, we propose to completely eliminate it. Our reduced

complexity SPA, which will be called here RC-min-sum algorithm given below, is therefore

composed only steps 1 and 3. To compensate for BER performance loss due to elimination of

v-node update rule we propose to re-formulate the c-node update rule by introducing the

attenuation factor α in box-plus operator as follows:

2

1212

1

signmin,.

k

k

LLLLL

In

addition to reducing memory storage requirements, the proposed RC algorithm improves the

latency, which is of crucial importance for ultrahigh-speed implementation. The RC min-sum

algorithm can be formulated as follows.

Reduced-complexity min-sum algorithm

0. Initialization: For v = 0,1,…,n-1; initialize the v-node reliabilities Lv to channel LLRs,

Lv = Lch(v).

1. c-node update rule: For c = 0,1,…,n-k-1; compute

\

N c

.

cvv

v

LL

The box-plus

operator is defined by

2

1212

1

sign min,.

k

k

LLLLL

2. Bit decisions: Update L(v) (v = 0,…,n-1) by

v

ch

cv

N v

L vLL

and set ˆ 1

v when

L(v)<0 (otherwise, ˆ

been reached then stop, otherwise go to step 1.

The second RC algorithm to be described here is based on the RC decoding algorithm

proposed by Fossorier et al. [5]. The original algorithm was applicable to AWGN channels

only. Our proposed algorithm, which will be called here reduced complexity a posteriori

probability (RC-APP) algorithm, is independent on channel assumption and can be formulated

as follows.

Reduced complexity a posteriori probability algorithm

0

v

). If ˆ

T0

vH

or pre-determined number of iterations has

0. Initialization: For v = 0,1,…,n-1; determine hard decisions zv = {1-sign[Lch(v)]}/2 and

magnitudes mv = | Lch(v)| from channel LLRs Lch(v).

1. c-node update rule: For c = 0,1,…,n-k-1; compute the magnitudes mcv to be sent from

c-node c to v-node v by

N c

\

min,

cvv

v

mm

where α is the attenuation factor.

Compute also

\

N c

mod2.

cvv

v

zz

2. Bit decisions: Update the v-node magnitudes mv (v = 0,…,n-1) by

N c

determined number of iterations has been reached then stop; otherwise go to step 1.

Note that this algorithm requires only the use of summations, comparators and mod 2

adders. The attenuations can be implemented by appropriate register shifts (e.g., 0.5

corresponds to one register shift to the right). The decoding complexity can be estimated in

terms of number of required operations per single iteration. For example, the c-node update

rule in RC-min-sum algorithm is dominated by number of comparators, which is (dc-2), where

dc is the c-node degree. The conventional min-sum algorithm requires dv-additions (where dv

is the v-node degree) in v-node update rule, while the bit-decision step requires (dv + 1)

additions. Therefore, the RC-min-sum algorithm requires ndv additions less compared to

ch

1 2

cv

vcv

mLvzm

and set zv = zv1 for mv<0. If

T0

zH

or pre-

#135803 - $15.00 USD

(C) 2010 OSA

Received 29 Sep 2010; accepted 12 Oct 2010; published 20 Oct 2010

25 October 2010 / Vol. 18, No. 22 / OPTICS EXPRESS 23373

Page 4

conventional min-sum algorithm. The complexity of Gallager SPA is even higher since the c-

node update rule requires 15(dc-2) additions per check node. To study the memory allocation

requirements, we assume that partially parallel implementation is used, with bit-processing

elements (BPEs) and check processing elements (CPEs) being assigned as shown in Fig. 1. In

Table 1, we summarize the memory allocation for quasi-cyclic LDPC(16935, 13550) code,

where we use the following notation: MEM B and MEM C denote the memories used to store

bit node and check node edge values; MEM E stores the codeword estimate; MEM I stores the

initial LLRs, and MEM F stores final LLRs (required in bit decision step). When RC-min-sum

algorithm is used, the MEM B block is not needed at all. Finally, since v-node update rule

does not exist in RC-min-sum algorithm, the decoding latency will be lower. Although exact

latency improvement is dependent on the implementation platform, our study indicates that

proposed RC-min-sum algorithm has lower complexity compared to conventional min-sum

algorithm and SPA.

Table 1: Memory allocation requirements of LDPC(16935, 13550) code of column-weight

3 (for p = 1129 and |S| = 15)

Name/ MEM

M

EM B

M

EM C

M

EM E

ME

M I

M

EM F

Data word (bits)

Address word (bits)

Memory block size

(words)

8

16

50

11

16

50

1

15

16

8

15

1693

5

8

15

16

805 805 935 935

0

I

I

I

…

II

1

II

2

II

…

…

…

…

…

……

c-1

I

PS[c-1]

P2S[c-1]

…

P(r-1)S[c-1]

P(r-1)S[c-1]

0

1

2

…

r-1

r-1

PS[1]

P2S[1]

…

P(r-1)S[1]

P(r-1)S[1]

PS[2]

P2S[2]

…

P(r-1)S[2]

P(r-1)S[2]

0

I

I

I

…

12

…

…

…

…

…

c-1

I

PS[c-1]

P2S[c-1]

…

0

1

2

…

PS[1]

P2S[1]

…

PS[2]

P2S[2]

…

CPEs

BPEs

Fig. 1. Assignment of v-nodes and c-nodes to BPEs and CPEs, respectively. I denotes the

identity matrix of size pxp (p is a prime), P is the permutation matrix given by P = (pij)pxp, pi,i+1

= pp,1 = 1 (zero otherwise), and r and c represent the number of block-rows and block-columns

in parity-check matrix. The set of integers S are carefully chosen from the set {0,1,…,p-1} so

that the cycles of short length (4 and 6) are avoided.

5 5.56 6.57

10

-9

10

-8

10

-7

10

-6

10

-5

10

-4

10

-3

10

-2

10

-1

Bit-error rate, BER

Q-factor, Q [dB]

Min-sum-CT

SPA

0.9-SPA

Min-sum

RC-min-sum

RC-min-sum-CT

0.44-RC-min-sum-CT

RC-APP

0.4-RC-APP

0.8-min-sum

Fig. 2. BER performance of RC LDPC decoding algorithms in comparison with SPA. Min-

sum-CT: min-sum-plus-correction-term algorithm. (The constant in front of algorithm refers to

optimum attenuation factor. The attenuation factor in SPA is introduced in box plus operator

only.)

We turn our attention now to the BER performance evaluation of proposed reduced

complexity algorithms. In Fig. 2, we show the results of Monte Carlo simulations for different

LDPC decoding algorithms and optimum attenuation factors (obtained numerically). The

large-girth LDPC(16935, 13550) code is used in simulations. It is interesting to notice that

min-sum-plus-correction-term algorithm faces negligible performance loss, which is in

#135803 - $15.00 USD

(C) 2010 OSA

Received 29 Sep 2010; accepted 12 Oct 2010; published 20 Oct 2010

25 October 2010 / Vol. 18, No. 22 / OPTICS EXPRESS 23374

Page 5

agreement with [2,4]. The min-sum algorithm with optimum attenuation factor α = 0.8 shows

similar performance as SPA, while RC-min-sum with optimum attenuation factor of 0.44

faces only 0.46 dB degradation at BER of 109. The storage complexity and latency of

optimally attenuated RC-min-sum algorithm are much lower, which makes the proposed

algorithm as an excellent candidate for high-speed implementation. The optimally attenuated

RC-APP algorithm performs 0.2 dB worse than optimally attenuated RC-min-sum algorithm.

It is interesting to notice that it is possible to improve performance of SPA by 0.1 dB at BER

= 109 for attenuation factor of 0.9. In Fig. 3 we study the influence of quantization effect on

BER performance of various decoders. When 4 bits (3 bits for magnitude and one bit for sign)

are used, there is only extra 0.05 dB penalty for both 0.8-min-sum and 0.4-RC-APP

algorithms (at BER of 107). When 3 bits are used (2 for magnitude and 1 for sign), the

corresponding degradations are 0.26 dB and 0.38 dB respectively. Finally, when only 1 bit is

used for magnitude and 1 for sign, 0.8-min-sum algorithm exhibits about 0.89 dB degradation,

while degradation for 0.4-RC-APP is much worse. Given this description of proposed RC

LDPC decoding algorithms, in next Section we study their application in PolMUX multilevel

coded modulation with coherent detection.

5.56.06.5 7.07.5 8.0

10

-7

10

-6

10

-5

10

-4

10

-3

10

-2

0.4-RC-APP:

Bit-error rate, BER

Q-factor, Q [dB] (per info bit)

Min-sum-CT:

0.8-min-sum:

double

3 bits+sign

2 bits+sign

double

3 bits+sign

2 bits+sign

1 bit+sign

double

3 bits+sign

2 bits+sign

1 bit+sign

Fig. 3. Quantization effect influence on BER performance.

3. PolMUX IPQ coded-modulation based on large-girth LDPC codes and RC decoders

The PolMUX IPQ-based coded modulation scheme suitable for beyond 400 Gb/s per

wavelength optical transmission is shown in Fig. 4. The mx + my (index x (y) corresponds to

x-(y-) polarization) independent data streams are encoded using different LDPC codes of code

rates Ri = Ki/N (i{x,y}), where Kx (Ky) denotes the number of information bits used in the

binary LDPC code corresponding to x- (y-) polarization, and N denotes the codeword length.

The mx (my) input bit streams from mx (my) different information sources, pass through

identical encoders that use LDPC codes with code rate Rx (Ry) designed using a quasi-cyclic

code design [2]. The outputs of the encoders are then bit-interleaved by an mx × N (my × N)

bit-interleaver where the sequences are written row-wise and read column-wise from a block-

interleaver. The mapper maps each mx (my) bits, taken from interleaver, into a 2

ary) IPQ signal constellation point based on the LUT, which is for M = 32 given in Table 2

(mi denotes the ith circle radius, Li denotes the number of constellation points per ith circle, ri

denotes the decision regions in amplitude coordinate). The corresponding constellation

diagram is shown in Fig. 5, as we described in [3]. The IPQ mapper x (y) assigns mx (my) bits

to a constellation point represented in polar coordinates as si,x = |si,x|exp(ji,x) [si,y =

|si,y|exp(ji,y)]. One output of IPQ mapper is used as input of amplitude modulator (AM), while

the second output is used as input to phase modulator (PM), as shown in Fig. 4(a). Notice that

we use (mx + my) encoders/decoders operating in parallel at data rate of Rb instead of only one

encoder/decoder operating at date rate of (mx + my)Rb. At the receiver side (see Fig. 4(b)), the

outputs at I- and Q-branches in two polarizations, are sampled at the symbol rate, while the

x

m-ary (2

y

m-

#135803 - $15.00 USD

(C) 2010 OSA

Received 29 Sep 2010; accepted 12 Oct 2010; published 20 Oct 2010

25 October 2010 / Vol. 18, No. 22 / OPTICS EXPRESS 23375