Page 1

INTRODUCTION

Automatic repeat request (ARQ) protocols are

used to improve the reliability of communica-

tions networks. In systems employing ARQ, the

receiver asks for retransmission of packets that

are corrupted. Because only error detection is

required to determine whether a packet should

be accepted, the coding overhead is small, and

the system throughput is not considerably affect-

ed, especially when the channel quality is good.

However, when the channel deteriorates, the

retransmissions may result in significant through-

put loss. A possible remedy is to use an error

correcting code separate from ARQ in order to

provide a more reliable channel, but this also

reduces the throughput of the system. Instead of

following this layered approach, hybrid ARQ

(HARQ) systems attempt to reap the benefits of

both ARQ and forward error correction (FEC)

by combining the two schemes [1]. The HARQ

receiver handles error detection and correction

as well as retransmission requests simultaneous-

ly. A retransmission is requested only when the

receiver detects an uncorrectable error. More-

over, the packets are kept at the receiver to be

used again for decoding after each retransmis-

sion. By combining error correction and retrans-

mission, and appropriately choosing an FEC

scheme whose aim is to correct the most fre-

quent errors, HARQ can achieve better through-

put performance than ARQ for a given channel.

Because HARQ can contribute to more efficient

use of the available resources, it has been includ-

ed in latest-generation wireless systems such as

IEEE 802.16e [2] and 3GPP-LTE [3].

The simplest HARQ scheme, called Chase

(or code) combining HARQ (CC-HARQ) or

type I HARQ, consists of retransmitting the

same symbol sequence repeatedly until the

receiver decodes the packet successfully. More

sophisticated incremental redundancy (IR-)

HARQ schemes (Type II/III) transmit different

symbol sequences in general. The difference

emanates from employing different coding

schemes for the same data, using different cod-

ing polynomials, or modulating different subsets

of the encoder output. The focus of this article is

the use of HARQ in latest-generation wireless

systems that employ bit-interleaved coded modu-

lation (BICM). The architecture of BICM-based

wireless systems employing HARQ is described

by considering a special case of IR-HARQ as an

example, where a different subset of bits of a

mother code is sent during each transmission.

Because IR-HARQ can benefit from a coding

gain, it generally performs better than CC-

HARQ for optimal receiver implementations.

However, suboptimal practical implementations

may affect the performance of IR-HARQ, espe-

cially in multiple-input multiple-output (MIMO)

systems. On the other hand, when CC-HARQ is

used, data are only coded once at the transmit-

ter. Moreover, it is possible to reduce complexity

and memory by combining received symbols, as

explained in more detail in the following. Hence,

for systems that cannot afford large complexity

and storage requirements, it may be preferable

to use CC-HARQ instead of IR-HARQ.

Arguably, the most important recent physical

layer enhancement of wireless systems is the use

IEEE Communications Magazine • January 2009

32

0163-6804/09/$25.00 © 2009 IEEE

ABSTRACT

Hybrid ARQ, an extension of ARQ that incor-

porates forward error correction coding, is a

retransmission scheme employed in current com-

munications systems. The use of HARQ can

contribute to efficient utilization of the available

resources and the provision of reliable services

in latest-generation systems. This article focuses

on wireless systems using HARQ with emphasis

on the multiple-input multiple-output paradigm.

MIMO-HARQ offers new opportunities because

of the additional degrees of freedom introduced

by the multiple antennas at the transmitter and

receiver. The architecture of MIMO transceivers

that are based on bit-interleaved coded modula-

tion and employ HARQ is described. Addition-

ally, receiver implementations are presented and

compared in terms of complexity, memory

requirements, and performance.

ADVANCES IN SIGNAL PROCESSING FOR

COMMUNICATIONS

Jungwon Lee and Hui-Ling Lou, Marvell Semiconductor

Dimitris Toumpakaris, University of Patras

Edward W. Jang and John M. Cioffi, Stanford University

Transceiver Design for MIMO Wireless

Systems Incorporating Hybrid ARQ

Page 2

IEEE Communications Magazine • January 2009

33

of MIMO transmission. Multiple antennas pro-

vide additional degrees of freedom, leading to

significant capacity increase. Multiple antennas

can also be used to provide beamforming gains

and reduce the outage probability. Therefore, it

is of particular interest to examine how HARQ

can be incorporated into MIMO transceivers

and its impact on the system performance, com-

plexity, and storage requirements. The perfor-

mance of MIMO-HARQ depends not only on

noise and temporal channel variations that affect

SISO-HARQ as well, but also on the interfer-

ence between the signals transmitted by the mul-

tiple antennas. As described later in this article,

in some cases the design involves a trade-off

between system performance and receiver com-

plexity and memory requirements. Simplifying

the receiver of a MIMO-HARQ system to

reduce storage and complexity may increase sen-

sitivity to interstream interference.

Not only can HARQ be viewed as a retrans-

mission technique that exploits time diversity; it

can also be used in the context of systems that

employ macrodiversity. If a mobile station com-

municates with two or more base stations that

can exchange information, the system can com-

bine the signals of the base stations before

decoding using the same techniques as for

HARQ. In that case, the HARQ receiver stor-

age requirements translate to requirements on

the necessary bandwidth for the communication

between the base stations. Therefore, results

derived for HARQ can be applied to such sys-

tems, which may become increasingly common

in the future.

This article is organized as follows. In the

next section the architecture of a single-input

single-output (SISO) transceiver using BICM

and HARQ is presented. The MIMO case is

then considered with different receiver imple-

mentations. The following section contains some

discussion of MIMO system design based on the

employed HARQ scheme, receiver complexity,

and storage requirements. Finally, some con-

cluding remarks are provided.

SISO-HARQ SYSTEMS

EMPLOYING BICM

Figure 1 depicts the transmitter of a MIMO sys-

tem employing BICM and HARQ. In a typical

SISO system the architecture of Fig. 1a can be

employed, with the difference that the last block,

which maps symbols to different antennas, is not

necessary because the system only uses one

antenna. A bit sequence d = [d[0], d[1], …, d[L

– 1]] of length L is encoded using a rate r moth-

er code to produce the encoded bit sequence c

= [c[0], c[1], …, c[L/r – 1]]. For example, in

IEEE 802.16e-compliant systems, a rate-1/3 con-

volutional turbo code (CTC) can be employed

[2]. For each block of L bits, 3L bits are pro-

duced. The first L (systematic) bits are the origi-

nal input bits. The encoder block also contains

the interleaving operations, if any. A subset of

the mother code bits is selected for transmission.

When CC-HARQ is used, the bit selection mod-

ule always outputs the same sequence. For IR-

HARQ, the indices of the selected bits depend

on the transmission index. An example of IR-

HARQ bit selection for IEEE 802.16 systems is

given in Fig. 2. Although the CTC case is exam-

ined in the figure, the bit selection is similar

when convolutional coding, block turbo coding,

or LDPC codes are used. In Fig. 2a, 2L bits are

sent during each transmission. During the first

transmission, b(0)= [c[0], c[1], …, c[2L – 1]] =

[d[0], d[1], …, d[L – 1], cL, c[L + 1], …, c[2L –

1]]. During the second transmission, b(1)=

[c[2L], c[2L + 1], …, c[3L – 1], c[0], c[1], …, c[L

– 1]] = [c[2L], c[2L + 1], …, c[3L – 1], d[0],

d[1], …, d[L – 1]], where the fact that the first L

bits of the CTC are the systematic bits is used.

I Figure 1. Transmitter architectures for MIMO systems employing BICM and

HARQ. For SISO systems, the blocks mapping symbols to antennas are not

used.

_ x(i)

Symbols-to-

antennas

mapping

s(i)

Bits-to-

symbols

mapping

b(i)

Encoder

(a) MIMO-HARQ transmitter

d

Bit

selection

c

_ x(i)

Symbol

vector

selection

_ x‘

Symbols-to-

antennas

mapping

s‘

Encoder

(b) MIMO-HARQ transmitter employing symbol vector selection

d

Bits-to-

symbols

mapping

c

I Figure 2. Examples of bit selection for IR-HARQ transmission.

d

Data bits

L

c

Data bits

L

Parity bits 1

L

b(0)

Data bits

L

Data bits

LL/5

Parity bits 1

L

b(1)

Parity bits 2

L

Parity bits 1

Parity

bits 2

4L/52L/5

Data bits

L

b(2)

Data bits

L

(a) Rate-1/2 output(b) Rate-5/6 output

Parity

bits 2

Data bits

3L/53L/5

•

•

•

•

•

•

Parity bits 1

L

Parity bits 2

L

Page 3

IEEE Communications Magazine • January 2009

34

Therefore, the systematic bits are included in

both transmissions, whereas different parity bits

are sent during the odd and even transmissions.

In Fig. 2b, 6L/5 bits are sent during each trans-

mission. The second transmission only contains

parity bits, whereas during the third transmis-

sion, only the first 3L/5 systematic bits are sent.

The second rate-5/6 scheme is more susceptible

to errors, but requires fewer resources, because

fewer bits need to be sent during each transmis-

sion. b(i)is then sent to a bits-to-symbols map-

per. Typical modulation schemes are binary

phase shift keying (BPSK), quaternary PSK

(QPSK), 16-quadrature amplitude modulation

(QAM), and 64-QAM. The symbol sequence s(i)

= [s(i)[0], s(i)[1], …, s(i)[M]] is then sent to the

channel using single- or multicarrier schemes.

Both IEEE 802.16e and 3GPP-LTE rely on mul-

ticarrier transmission. The length M of the sym-

bol sequence depends on the modulation scheme

and is equal to the length of b(i)divided by the

number of bits mapped to each symbol s(i)[m].

As mentioned previously, in general, the bit

sequence d may be re-encoded at each transmis-

sion i. For example, in [4] the re-encoded bit

sequence results in a sequence c(i). By appropri-

ate design of the coded sequence c(i)and the bit

selection process, the coding gain of IR-HARQ

can be improved. Code design that also exploits

the MIMO channel to improve the performance

of HARQ is an active area of research [5].

Although IR-HARQ code design is a very inter-

esting topic per se, this article attempts to

address the implementation of a transceiver for

HARQ and the design trade-offs from a generic

point of view. Clearly, the exact performance

and complexity trade-offs will depend on the

details of the HARQ scheme employed. The

specific IEEE 802.16e IR-HARQ scheme

assumed in the remainder of the article merely

serves as an example and to facilitate the discus-

sion of the transceiver architectures.

Figure 3a presents a receiver for the HARQ

system of Fig. 1a. Flat fading is considered, and

the effect of the channel can be modeled as mul-

tiplication with a complex number. Therefore,

I Figure 3. Receiver architectures for MIMO systems employing BICM and HARQ.

^d

+

LLRs

γnt(i)

+

LLR

calculator

x^nt(i)wnt(i)

D

D

γ1(i)

+

LLR

calculator

x^1(i)w1(i)

γ0(i)

LLR

calculator

x^nt

LLR

calculator

•

•

•

x^1

ML detector and

LLR calculator

Channel

estimates H(i)

Decoder

Symbol

combining

LLRs

^

d

^

d

^

d

^

d

^

d

ML detector and

LLR calculator

Channel

estimates H(i)

(a) MIMO-HARQ receiver (d) Pre-equalization symbol-level combining MIMO-HARQ receiver

(b) Symbol-level combining MIMO-HARQ receiver (e) Post-equalization symbol combining MIMO-HARQ receiver

(c) Bit-level combining MIMO-HARQ receiver (f) Bit-level combining MIMO-HARQ receiver employing equalization

_ y(0)

_ y(1)

•

•

•

•

•

•

•

•

•

x^nt(i)

•

•

•

•

•

•

_ y(i)

LLR

calculator

x^0

Channel

combining

MIMO

equalizer

MIMO

equalizer

MIMO

decoder

Decoder

H(i)

_ y(i)

+

D

D

LLR

calculator

x^0(i)w0(i)

H(i)

MIMO

equalizer

H(i)

_ y(i)

_ _ y

~

H

~

Symbol

combining

_ y(i)

Channel

combining

Decoder

LLRs

_ y

~[m]=Σ

i H(i)*[m]_ y(i)[m]

~

H[m]=Σ

i H(i)*[m]H(i)[m]

MIMO

equalizer

H(i)

Decoder

Decoder

_ y(i)

_ _

~y

H

~

_ y(N)

LLR

calculator

LLR

calculator

LLR

accumulation

x^1(i)

LLR

calculator

x^0(i)

•

•

•

Page 4

IEEE Communications Magazine • January 2009

35

the only difference with from MIMO case shown

in the figure is that for SISO-HARQ, the com-

plex matrix H(i)[m], where m is the symbol index,

comprises only one complex element, h(i)[m].

For OFDM systems, h(i)[m] equals the frequency

response of the subcarrier through which symbol

m is transmitted. A maximum likelihood (ML)

detector uses all channel outputs y(i)= [y(i)[0],

y(i)[1], …, y(i)[M]] corresponding to different

transmissions i, together with the channel esti-

mates h(i)= [h(i)[0], h(i)[1], …, h(i)[M]] to calcu-

late the log-likelihood ratio (LLR) for each bit

of the mother code c. Therefore, the detector

also incorporates knowledge of the employed

code, the bit selection pattern, and the bit-to-

symbol mapping. For example, in Fig. 2a, after

the first transmission, the LLRs of the first 2L

bits of the mother code are calculated, whereas

the LLRs for the remaining L bits are set to

zero. The LLRs are then sent to the decoder

that produces an estimate dˆfor the original bit

sequence d. If the receiver determines that dˆis

corrupted, a second transmission is requested.

The decision may be based on parity bits includ-

ed in d, such as a cyclic redundancy check (CRC)

code, or metrics obtained while decoding. After

the second transmission, both y(0)and y(1)are

used, together with the channel estimates h(0)

and h(1), to yield LLRs for all the bits of the

mother code. The LLRs for the systematic bits

are more reliable after the second transmission,

because new information has been received.

Decoding proceeds with LLRs for all bits of the

mother code. If the decoded sequence dˆis still

found to be corrupted, a third transmission is

requested, and so on. The standard retransmis-

sion strategies of ARQ, such as stop-and-wait,

go-back-N, and selective-repeat, can also be used

for HARQ.

As the number of transmissions grows, the

ML detector and LLR calculator block becomes

increasingly complex. The block has to be

designed for the maximum allowed number of

transmissions, N. The required memory is also

an issue, because all received symbols and chan-

nel estimates need to be stored until decoding

succeeds or the maximum number of transmis-

sions is reached. As explained below, when CC-

HARQ is used, the design of the receiver can be

simplified and the required memory reduced

without affecting performance. Receiver simplifi-

cation can also be achieved for IR-HARQ by

controlling the bit selection scheme at the trans-

mitter.

When CC-HARQ is employed, the same bit

sequence b is produced during all transmissions.

Therefore, the symbol sequence s(i)sent to the

channel does not depend on i. It can be shown

that instead of feeding directly all received

sequences y(i)and channel estimates h(i)to the

ML detector and LLR calculator block, they can

be used to derive an equivalent sequence y~and

an equivalent channel estimates sequence h

performing maximal-ratio combining (MRC) [6].

Only y~and h

tor and LLR calculator to produce the LLRs for

the mother code bits. This symbol-level combin-

ing scheme is shown in Fig. 3b. For the SISO

case, the matrix sequences H(i)should be

replaced by sequences h(i)of scalars, whereas

~by

~need to be used by the ML detec-

sequences of received vectors y _(i)are replaced by

sequences of scalars y(i). The symbol-level com-

bining scheme is equivalent to the receiver of

Fig. 3a and is therefore optimal. The storage

requirements at the receiver are reduced by a

factor equal to the limit on total transmissions,

N. Combining y(i)and h(i)consists of multiplying

with complex scalar values (h*(i)[m]) [6]. The

approach of Fig. 3b can also be used in IR-

HARQ systems as long as the bit selector is

designed so that the alignment between bits and

symbols does not change. For example, for 16-

QAM, if each symbol is formed using bits [c[4m],

c[4m + 1], c[4m + 2], c[4m + 3]], a new symbol

containing [c[4m], c[4m + 1], c[4m + 2], c[4m +

3]] will be generated after a certain number of

retransmissions. Then the received symbols con-

taining the same bits in the same order can be

combined before detection. For example, in one

of the modulation and coding schemes used in

IEEE 802.16e systems where the data packet

length is 54 bytes, the CTC rate is 1/3, and 64-

QAM is used, 54 × 8 × 3 = 1296 bits correspond

to 1296/8 = 162 64-QAM symbols. The simpli-

fied receiver can be used, because bits 8m to 8m

+ 7 will always be mapped to symbol m. For IR-

HARQ, the required storage at the receiver is

proportional to the maximum number of differ-

ent symbols s that can be generated, which

equals L/(r × b), where r is the rate of the moth-

er code, L is the length of the original data

sequence, and b is the number of bits transmit-

ted in each symbol. Hence, more storage is

required than with CC-HARQ. This also means

that the ML detector and LLR calculator block

becomes more complex because it needs to pro-

cess an equivalent symbol sequence of length

L/(r × b) that is larger than M.

If bits and symbols are not aligned, it may not

be possible to use the receiver of Fig. 3b, because

the number of different symbols s may exceed

L/(r × b) and become very large. In order to sim-

plify the receiver, ML detection and LLR calcu-

lation can be performed separately for each y(i),

as shown in Fig. 3c. The LLRs per mother code

bit and per transmission are then simply added

together to produce the LLR value sent to the

decoder. This bit-level combining scheme has

suboptimal performance. However, the perfor-

mance loss in SISO systems is generally small.

By using bit-level combining, the storage require-

ments at the receiver are reduced when the bits

of the mother code are fewer than the maximum

number of received symbols s and channel esti-

mates h that need to be stored. This is true, in

general, unless the allowed maximum number of

retransmissions is small. The complexity of the

ML detector and LLR calculator block is also

reduced.

Since the performance loss of the bit-level

combining receiver is not significant for SISO-

HARQ systems, IR-HARQ generally performs

better than CC-HARQ even when bit-level com-

bining is used. This happens because the loss

incurred by the suboptimal implementation of

the receiver is usually smaller than the coding

gain of IR-HARQ. However, as explained in the

following section, because of interstream inter-

ference, bit-level combining in MIMO-HARQ

may result in significant performance degrada-

When CC-HARQ is

used, the design of

the receiver can be

simplified and the

required memory

can be reduced

without affecting

performance.

Receiver

simplification can

also be achieved for

IR-HARQ by

controlling the bit

selection scheme at

the transmitter.

Page 5

IEEE Communications Magazine • January 2009

36

tion. Thus, for practical MIMO receiver imple-

mentations subject to complexity and memory

constraints, the choice between CC-HARQ and

IR-HARQ may not always be straightforward.

It should also be noted that some cases have

been identified where CC-HARQ performs bet-

ter than IR-HARQ in SISO systems even when

the optimal receiver of Fig. 3a is employed.

Examples include systems where the effect of

fading on different IR-HARQ codewords varies,

especially when the codewords that contain the

systematic part of the code are severely affected

[7].

HARQ APPLIED TO

MIMO SYSTEMS

As shown in Fig. 1a, compared to the SISO case,

the transmitter for MIMO-HARQ includes an

additional symbols-to-antennas mapping block

after the generation of the modulated symbols

s(i)that determines from which antenna each

symbol will be transmitted. In the general case, a

given symbol s may be transmitted from more

than one antenna, or the antennas may transmit

linear combinations of the original symbols.

Specifically, the symbols-to-antennas mapper

generates a sequence of nt× 1 symbol vectors x(i)

= [x _(i)[0], x _(i)[1], …, x _(i)[K]] based on the symbol

sequence s(i)= [s(i)[0], s(i)[1], …, s(i)[M]], where

ntis the number of transmit antennas. Each vec-

tor x _ is sent through the MIMO channel, result-

ing in an nr× 1 vector y _at the receiver, where nr

is the number of receive antennas. As in the

SISO case, flat fading is considered, and the

effect of the MIMO channel is modeled using a

sequence H(i)of nr× ntmatrices. The capacity

and diversity gains that can be achieved depend

on the correlation between the received signals

(i.e., the condition of the channel matrix. Ideally,

a well conditioned channel matrix is desired.

Therefore, in addition to noise and fading, the

two factors affecting transmission in SISO sys-

tems, MIMO systems are also subject to inter-

stream interference. When the interference is

high, transmission may be severely affected even

when the received power per antenna is large.

HARQ can be used in MIMO systems to com-

bat interstream interference in addition to noise

and channel gain fluctuations caused by fading.

Although not shown in the figure, the symbol-to-

antenna mapper may also employ a space-time

block code (STBC).

When CC-HARQ is employed, the simplified

transmitter of Fig. 1b can be used. The transmit-

ter can also be used for IR-HARQ, as long as

the alignment between bits and signal vectors

does not change. A bits-to-symbols mapper cre-

ates a symbol sequence s′ ′ based on the encoded

bits sequence c, and is followed by a symbols-to-

antennas mapper that transforms s′ ′ to a symbol

vector sequence x′ ′. The transmitter is simpler

because the symbol vector sequence x′ ′ can be

precomputed. However, the main benefit is the

simplification of the receiver, as described in the

following.

Similar to the SISO case, as shown in Fig. 3a,

the nr× 1 received symbol vectors can be sent to

an ML detector and LLR calculator block that

combats interstream interference in addition to

compensating for noise and channel fading. The

ML detector and LLR calculator block is more

complex than the SISO case, because matrix and

vector operations are involved. Once the LLRs

are produced, decoding proceeds in exactly the

same way as in SISO systems.

Some questions now emerge. When the align-

ment between bits and symbol vectors is fixed,

can symbol-level combining be used for MIMO-

HARQ similar to the SISO case? As shown

later, that is indeed possible by an extension of

the SISO-MRC scheme. Can the receiver be

simplified if the alignment between bits and sym-

bol vectors is not fixed or symbol-level combin-

ing is not possible in early retransmissions, and

what are the implications to the system perfor-

mance and complexity? As described below, the

performance penalty when trying to simplify the

receiver of MIMO-HARQ systems using bit-

level combining is larger than in SISO systems.

The performance degradation increases further

when equalization is used instead of ML detec-

tion. Therefore, when realistic MIMO receiver

implementations are desired, a careful assess-

ment of the performance loss of IR-HARQ

because of bit-level combining should be made.

These questions are addressed in more detail in

the remainder of this section.

CC-HARQ is considered first. The observa-

tions can be extended to the case of IR-HARQ

where bit-to-symbol vector alignment is pre-

served. The same symbol vector sequence x is

sent during each retransmission. As in the SISO

case, instead of using the receiver of Fig. 3a, the

architecture of Fig. 3b can be employed. It can

be shown that an MRC-like combining scheme

can be used to form an equivalent nt× 1 symbol

vector sequence y _

matrix sequence H

vector and channel matrix estimate sequences

y _(i)and H(i), respectively. Each H

tian matrix of size nt× nt[6]. Thus, the MIMO-

HARQ problem is converted to an equivalent

single-transmission MIMO problem, because the

sizes of y _

retransmission. An ML detector and LLR calcu-

lator block that uses only one symbol vector

sequence and one channel estimate sequence

can then be used. Therefore, the memory

requirements of the symbol-level combining

receiver of Fig. 3b are reduced from those of the

receiver of Fig. 3a. This simplification of the

receiver is aided by reusing the same ML detec-

tor and LLR combiner block after each trans-

mission. Moreover, numerical techniques such as

QR decomposition can be used for implementa-

tion [6]. The receivers of Figs. 3a and 3b are

equivalent, so there is no loss in performance.

For IR-HARQ, the receiver of Fig. 3b can be

used by considering all different symbol vectors

that may be generated. The length of y _

will be at least as large as K, the length of x(i).

When the alignment between bits and symbol

vectors is not fixed, the bit-level combining

receiver of Fig. 3c can be employed if using sym-

bol-level combining is impractical. However, this

architecture is not optimal and may result in sig-

nificant performance degradation when the dif-

ferent paths of the MIMO channel are

~and an equivalent channel

~from the received symbol

~[m] is a Hermi-

~and H

~remain the same after each

~and H

~

The performance

penalty when trying

to simplify the

receiver of

MIMO-HARQ

systems using

bit-level combining is

larger compared to

SISO systems.

The performance

degradation

increases further

when equalization is

used instead of

ML detection.

Page 6

IEEE Communications Magazine • January 2009

37

correlated. The main cause is not the combining

of bits instead of symbol vectors, but the sepa-

rate detection and LLR calculation after each

transmission. When the channel matrix H is ill

conditioned, erroneous decisions may be made

about the individual elements of a symbol vector

x even when the quality of the received symbol

vector is good. On the other hand, when symbol-

level combining is used, detection and LLR cal-

culation are performed after gathering

information from all retransmissions. From the

viewpoint of the architecture of Fig. 3b, the con-

dition of the H(i)[m] equivalent matrix is better

than that of some of the matrices H

Although bit-level combining is suboptimal, it

is also less complex, because the same blocks are

reused in Fig. 3c regardless of the number of

retransmissions. The required storage is also

reduced because only the accumulated LLRs of

the bits of the mother code need to be stored.

In some systems the ML detector and LLR

calculator block may be too complex to imple-

ment, even in the simplified receiver of Fig. 3b.

In this case equalization across the spatial

streams can be used (recall that flat fading is

assumed). Linear or decision feedback equaliz-

ers (DFEs) (zero-forcing [ZF] or minimum

mean square error [MMSE]) can be employed.

The MIMO equalization schemes described

above are well known and not particular to

HARQ. They can be implemented efficiently,

for example, using QR decomposition. This brief

overview is given in order to facilitate the discus-

sions in the remainder of this section.

When CC-HARQ is employed, the receiver

of Fig. 3d can be used. First, the spatial streams

are decoupled using an equalizer. Then, for each

element xj

sequence x^, separate LLR calculators are used

that take into account the mapping of the trans-

mit symbols into symbol vectors and the corre-

sponding channel estimates. In general, the xj

are soft values and are not sliced to the nearest

constellation symbol. Each time symbol vectors

from a new transmission arrive, they are com-

bined with the symbol vectors of all previous

transmissions, and the equivalent symbol vectors

are re-equalized using the equivalent channel

matrix sequence. Similar to the ML case, the

pre-equalization symbol-level combining operation

does not result in information loss. For this rea-

son, the scheme of Fig. 3d exhibits the best per-

formance among all equalization-based

architectures [8]. After each retransmission, the

equivalent vector sequence y _

receiver, together with the equivalent channel

matrix sequence H

of size nt× nt. Hence, K × nt× (nt+ 1)/2 com-

plex entries are required for H

plex entries for y _

In order to reduce storage, a post-equalization

symbol-level combining scheme, shown in Fig. 3e,

can be used. The received signal vectors y _(i)are

equalized after each retransmission, and the

resulting symbol vector sequences x

bined before LLR calculation. Only the y _(i)are

used to obtain the x

optimal way to combine the x

which consists of multiplying each element xj

x

~[m].

^of the equalized symbol vector

^

~is stored at the

~. Each H

~[m] is Hermitian and

~and K × ntcom-

~.

^(i)are com-

^(i). It can be shown that the

^(i)is using MRC,

^(i)of

^(i)with a complex weight that depends on the

channel estimate H(i)and accumulating the

result with the values from previous transmis-

sions [8]. The resulting weighted sum is normal-

ized before LLR calculation. Post-equalization

symbol-level combining reduces receiver memory

because instead of a sequence of K Hermitian

matrices, only K × ntnormalization weights γj(i)

need to be stored in addition to the weighted

and accumulated x

tion symbol-level combining exhibits perfor-

mance loss compared to pre-equalization

combining [8, 9]. Therefore, for fixed bit-to-sym-

bol vector alignment, use of post-equalization

combining is motivated by the need to reduce

the storage at the receiver. Even when ntis

small, the savings can be significant when K is

large. The storage requirements can be reduced

further (by K × ntcomplex values per transmis-

sion) by combining the xj

[9] at the cost of additional performance degra-

dation. The receiver of Fig. 3e can also be used

for IR-HARQ. The difference is that K should

be replaced by the number of all possible symbol

vectors x that may be sent from the transmitter

before reaching the transmission limit N.

When the bit-to-symbol vector alignment is

not fixed or the number of symbol vectors is

large, the structure of Fig. 3f can be employed,

whose difference with that of Fig. 3e is that the

LLRs are calculated directly after equalization.

The performance of the receiver of Fig. 3f is

inferior compared to the other schemes. The

largest part of the performance degradation is

caused by the separate equalization after each

transmission without combining information

from different transmissions. What needs to be

stored now are the LLRs of the bits of the moth-

er code c that are sent to the channel. Hence, if

use of the receiver is considered for HARQ with

fixed bit-to-symbol vector alignment, in order to

determine whether memory reduction can be

achieved compared to other architectures, the

total number of different symbols that are sent

to the channel needs to be taken into account.

Table 1 summarizes the MIMO-HARQ

receiver architectures presented in this section

and their memory requirements.

^(i). However, post-equaliza-

^(i)using equal weights

COMPARISON OF RECEIVER

ARCHITECTURES AND EXAMPLES

In the previous section it was argued that the

receiver implementation depends on the transmis-

sion scheme (CC- or IR-HARQ), whether the

alignment between bits and symbol vectors is

fixed, and the constraints in memory and complex-

ity. Simplifying the receiver may come at a price.

As an example of the performance degrada-

tion caused by suboptimal receiver implementa-

tions, an IEEE 802.16e compliant system using

partial usage of subchannels (PUSC) and spatial

multiplexing (Matrix B) is considered [2]. Two

transmit and two receive antennas are employed,

communicating through a vehicular Type A

channel with a high degree of spatial correlation

and Doppler speed equal to 120 km/h. The data

are encoded using the mother rate-1/3 CTC. Bits

are punctured sequentially to produce sequences

of equal length, as in Fig. 2.

In some systems,

the ML detector and

LLR calculator block

may be too complex

to implement.

In this case,

equalization across

the spatial streams

can be used. Linear

or Decision-Feedback

Equalizers (DFE)

(Zero-Forcing [ZF] or

Minimum

Mean-Square Error

[MMSE]) can be

employed.

Page 7

IEEE Communications Magazine • January 2009

38

In Fig. 4a the bit-level combining receiver of

Fig. 3f is employed using zero-forcing linear

equalization (ZF-BLC). 64-QAM and the rate-

1/2 code of Fig. 2a are considered. IR-HARQ

has a coding gain of more than 1 dB over CC-

HARQ because of the additional parity bits that

are transmitted. However, when the optimal pre-

equalization symbol-combining receiver of Fig.

3d is used with CC-HARQ (MRC-ZF), the sys-

tem exhibits a gain of almost 2 dB over IR-

HARQ. CC-HARQ also outperforms IR-HARQ

when ML detection is used instead of equaliza-

tion, as seen from curves MRC-ML and ML-

BLC that correspond to the receivers of Figs. 3b

and 3c, respectively. The performance advantage

of IR-HARQ can be recaptured using the receiv-

er of Fig. 3a at the cost of increased complexity

and memory requirements.

When a rate-5/6 code is used, the coding gain

of IR-HARQ over CC-HARQ is much larger

than the rate-1/2 code (on the order of 4 dB, as

shown in Fig. 4b). Therefore, although symbol-

level combining improves the performance of

CC-HARQ, IR-HARQ still achieves a gain of

approximately 1 dB. The gain is attained for

both equalizer-based and ML-based implemen-

tations.

CONCLUDING REMARKS

This article examines the implementation of

HARQ in wireless systems employing BICM,

mainly in the MIMO context. Because of the

introduction of new dimensions, a number of

different architectures can be used for the receiv-

er. In general, the designer needs to take into

account the complexity and memory constraints,

channel characteristics, and maximum allowed

? ? Table 1. Comparison of memory requirements and performance of MIMO-HARQ receiver implementa-

tions.

Receiver implementationStorage requirements Comments

Generic (Fig. 3a)

K × N × nr× (1 + nt)

K: length of symbol vector sequence per

HARQ transmission

N: maximum number of transmissions

nt/nr: number of transmit/receive antennas

Can be used with any HARQ scheme.

Optimal.

Symbol-level combining

with ML detection (Fig. 3b)

For CC-HARQ:

For IR-HARQ:

KCn

n

t

t

×××+

+

2

1

1

Kn

n

t

t

××+

+

2

1

1

C: number of distinct equivalent symbol

vectors and equivalent channel estimate

sequences

C = L/(r × b × nt× K) for fixed bit-to-symbol

vector alignment.

Optimal.

Bit-level combining with

ML detection (Fig. 3c)

CC-HARQ: K × nt× b

IR-HARQ: L/r

r: rate of mother code

L: length of original data sequence (in bits)

b: bits per symbol.

Suboptimal.

Pre-equalization symbol-

level combining (Fig. 3d)

For CC-HARQ:

For IR-HARQ:

KCn

n

t

t

×××+

+

2

1

1

Kn

n

t

t

××+

+

2

1

1

C: number of distinct equivalent symbol

vectors and equivalent channel estimate

sequences

C = L/(r × b × nt× K) for fixed bit-to-symbol

vector alignment.

Inferior to generic.

Post-equalization symbol-

level combining (Fig. 3e)

For CC-HARQ: 2 × K × nt

For IR-HARQ: 2 × K × C × nt

C: number of distinct equivalent symbol

vectors and equivalent channel estimate

sequences

C = L/(r × b × nt× K) for fixed bit-to-symbol

vector alignment.

Inferior to pre-equalization combining.

Bit-level combining with

equalization (Fig. 3f)

CC-HARQ: K × nt× b

IR-HARQ: L/r

r : rate of mother code

L: length of original data sequence (in bits)

b: bits per symbol

Inferior to all the above.

The designer needs

to take into account

the complexity and

memory constraints,

the channel

characteristics,

and the maximum

allowed number of

retransmissions

before deciding on

the MIMO-HARQ

scheme.

Page 8

IEEE Communications Magazine • January 2009

39

number of retransmissions before deciding on

the MIMO-HARQ scheme. In order to improve

the performance of HARQ with low receiver

complexity, proper bit-to-symbol vector align-

ment can be used to enable symbol-level com-

bining at the receiver. Moreover, new code

designs could focus on developing IR-HARQ

schemes that are robust to suboptimal receiver

implementations.

REFERENCES

[1] S. Lin, D. J. Costello, Jr., and M. J. Miller, “Automatic-

Repeat Request Error-Control Schemes,” IEEE Commun.

Mag., vol. 22, Dec. 1984, pp. 5–17.

[2] IEEE Std. 802.16e-2005, “IEEE Standard for Local and

Metropolitan Area Networks, Part 16: Air Interface for

Fixed Broadband Wireless Access Systems, Amendment

2: Physical and Medium Access Control Layers for Com-

bined Fixed and Mobile Operation in Licensed Bands,”

Feb. 2006.

[3] 3GPP TS 25.201 V8.0.0 (2008-03), “3rd Generation

Partnership Project; Technical Specification Group Radio

Access Network; Physical Layer — General Description

(Release 8).”

[4] K. R. Narayanan and G. Stüber, “A Novel ARQ Tech-

nique Using the Turbo Coding Principle,” IEEE Com-

mun. Lett., vol. 1, no. 2, Mar. 1997, pp. 49–51.

[5] Z. Ding and M. Rice, “Hybrid-ARQ Code Combining for

MIMO Using Multidimensional Space-Time Trellis

Codes,” Proc. IEEE ISIT ’07, Glasgow, Scotland, June

2007.

[6] E. W. Jang et al., “Optimal Combining Schemes for

MIMO Systems with Hybrid ARQ,” Proc. IEEE ISIT ’07,

Nice, France, June 2007.

[7] J.-F. Cheng, “Coding Performance of Hybrid ARQ

schemes,” IEEE Trans. Commun., vol. 54, no. 6, June

2006, pp. 1017–29.

[8] D. Toumpakaris et al., “Storage-Performance Tradeoff

for Receivers of MIMO Systems Using Hybrid ARQ,”

Proc. 9th IEEE Int’l. Wksp. Sig. Processing Advances in

Digital Commun., Recife, Brazil, July 2008.

[9] E. N. Onggosanusi et al., “Hybrid ARQ Transmission and

Combining for MIMO Systems,” Proc. IEEE ICC, vol. 5,

May 2003, pp. 3205–09.

BIOGRAPHIES

JUNGWON LEE [S’00, M’05] (jungwon@stanfordalumni.org)

received a Ph.D. degree in electrical engineering from Stan-

ford University in 2005. From 2000 to 2003 he worked as

an intern for National Semiconductor, Telcordia Technolo-

gies, and AT&T Shannon Labs Research, and as a consul-

tant for Ikanos Communications. Since 2003 he has worked

for Marvell Semiconductor Inc., Santa Clara, California,

where he is now a principal engineer/senior manager. His

specific research interests are in wireless and wireline com-

munication theory with emphasis on OFDM and single-car-

rier system design, transmission optimization, resource

allocation, cross-layer design, and estimation and detection

theory.

DIMITRIS TOUMPAKARIS [S’98, M’04] (dtouba@upatras.gr)

received his Diploma in electrical and computer engineer-

ing from the National Technical University of Athens,

Greece, in 1997, and his M.S. and Ph.D. degrees in electri-

cal engineering from Stanford University in 1999 and 2003,

respectively. He was a senior design engineer in Marvell

Semiconductor Inc., Santa Clara, California, from 2003 to

2006. He has also worked as an intern for Bell-Labs, CERN,

and France Télécom, and as a consultant for Ikanos Com-

munications and Marvell Semiconductor Inc. He is currently

I Figure 4. MIMO system, Type-A vehicular channel, 120 km/h, high inter-stream correlation, PUSC, spatial multiplexing: a) 64-QAM,

code rate = 5/6, packet size = 54 bytes; b) 64-QAM, code rate = 5/6, packet size = 60 bytes.

SNR [dB]

(a)

10

10-5

10-4

BER

10-3

10-2

10-1

100

101

1214 16 18202224

1 transmission, ZF

2 transmissions, CC, MRC-ZF

2 transmissions, CC, ZF-BLC

2 transmissions, IR, ZF-BLC

2 transmissions, CC, MRC-ML

2 transmissions, IR, ML-BLC

SNR [dB]

(b)

10-5

10-4

BER

10-3

10-2

10-1

100

101

30252015

1 transmission, ZF

2 transmissions, CC, MRC-ZF

2 transmissions, CC, ZF-BLC

2 transmissions, IR, ZF-BLC

2 transmissions, CC, MRC-ML

2 transmissions, IR, ML-BLC

Page 9

IEEE Communications Magazine • January 2009

40

an assistant professor in the Wireless Telecommunications

Laboratory, Department of Electrical and Computer Engi-

neering, University of Patras, Greece. His current research

interests include information theory with emphasis on

multi-user communications systems, digital communica-

tion, synchronization and estimation, and cross-layer opti-

mization.

EDWARD W. JANG [S’04] (ej1130@stanford.edu) received his

B.S. degree in electrical engineering from Seoul National

University, Korea, in 2002, and his M.S. degree in electrical

engineering from Stanford University in 2004. He is cur-

rently pursuing his Ph.D. degree at Stanford University. His

research interests include transmission schemes for systems

with a limited feedback rate and MIMO systems with

HARQ.

HUI-LING LOU (lou@stanfordalumni.org) is a senior engineer-

ing director at Marvell Semiconductor, Santa Clara, Califor-

nia, leading teams responsible for physical layer standards,

systems, and architecture design and development for

mobile WiMax chip sets, and investigating next generation

wireless technologies. She has also formed and led physical

layer standards and systems teams that designed, devel-

oped and productized Marvell’s first 802.11n, Bluetooth,

and digital FM chip sets. Prior to Marvell, she spent nine

years at Bell Laboratories Research, Murray Hill, New Jersey,

where she designed algorithms, systems, and efficient

hardware architectures for cellular and digital broadcasting

systems. She also developed a reconfigurable trellis codec

chip for Amati Communications as a consultant in 1992.

She completed her M.S.E.E. and Ph.D. degrees at Stanford

University in 1988 and 1992, respectively. She has more

than 60 patents, granted and pending, and has published

more than 50 peer-reviewed publications.

JOHN M. CIOFFI [F‘96] (cioffi@stanford.edu) received his

B.S.in electrical engineering in 1978 from the University of

Illinois and his Ph.D. in electrical engineering in 1984 from

Stanford University. He was with Bell Laboratories,

1978–1984, and IBM Research, 1984–1986. He has been

a professor of electrical engineering at Stanford since

1986. He founded Amati Com. Corp in 1991 (purchased by

TI in 1997) and was officer/director from 1991–1997. He

currently is on the board of cirectors of ASSIA (Chairman),

ClariPhy, Teranetics, Vector Silicon Inc., and the Marconi

Foundation. He is on the advisory boards of Focus Ven-

tures, Quantenna, and Amicus. His specific interests are in

the area of high-performance digital transmission. Various

awards include International Marconi Fellow (2006), Holder

of Hitachi America Professorship in Electrical Engineering at

Stanford (2002); Member, National Academy of Engineer-

ing (2001); IEEE Kobayashi Medal (2001); IEEE Millennium

Medal (2000); IEE JJ Tomson Medal (2000); 1999 U. of Illi-

nois Outstanding Alumnus, 1991 and 2007 IEEE Communi-

cations Magazine best paper; 1995 ANSI T1 Outstanding

Achievement Award; NSF Presidential Investigator

(1987–1992), ISSLS 2004, ICC 2006, 2007, and 2008 Con-

ference Best-Paper awards. He has published over 250

papers and holds over 80 patents, of which many are

heavily licensed including key necessary patents for the

international standards in ADSL, VDSL, DSM, and WiMAX.