Page 1

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

1

Superimposed Training Based Channel

Estimation and Data Detection for OFDM

Amplify-and-Forward Cooperative Systems

under High Mobility

Lanlan He, Yik-Chung Wu, Shaodan Ma, Tung-Sang Ng and H. Vincent Poor

Abstract

In this paper, joint channel estimation and data detection in orthogonal frequency division mul-

tiplexing (OFDM) amplify-and-forward (AF) cooperative systems under high mobility is investigated.

Unlike previous works on cooperative systems in which a number of subcarriers are solely occupied by

pilots, partial data-dependent superimposed training (PDDST) [8] is considered here, thus preserving

the spectral efficiency. Firstly, a closed-form channel estimator is developed based on the least squares

(LS) method with Tikhonov regularization and a corresponding data detection algorithm is proposed

using the linear minimum mean square error (LMMSE) criterion. In the derived channel estimator, the

unknown data is treated as part of the noise and the resulting data detection may not meet the required

performance. To address this issue, an iterative method based on the variational inference approach is

Lanlan He was with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong.

She is now with the Huawei Tech. Investment Company (email: llhe@eee.hku.hk).

Yik-Chung Wu and Tung-Sang Ng are with the Department of Electrical and Electronic Engineering, The University of Hong

Kong, Hong Kong (email: {ycwu, tsng}@eee.hku.hk).

Shaodan Ma was with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong. She

is now with the Department of Electrical and Computer Engineering, University of Macau, Macau (email: shaodanma@umac.mo).

H. Vincent Poor is with the Department of Electrical Engineering, Princeton University, Princeton, NJ 08544 USA (email:

poor@princeton.edu).

The work was supported in part by the General Research Fund (GRF) from Hong Kong Research Grant Council (Project

No.: HKU 7154/08E), in part by the GRF (Project No. HKU 7191/11E), and in part by the U.S. National Science Foundation

under Grant CNS-09-05398.

∗The corresponding author is Lanlan He.

September 10, 2011DRAFT

Page 2

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

2

derived to improve performance. Simulation results show that the data detection performance of the

proposed iterative algorithm initialized by the LMMSE data detector is close to the ideal case with

perfect channel state information.

Index Terms

Amplify-and-forward, Orthogonal frequency division multiplexing (OFDM), Time-varying channels.

I. INTRODUCTION

Cooperative communications has attracted much attention recently due to its advantages in

enhancing link reliability and increasing channel capacity [1]–[5]. Since orthogonal frequency

division multiplexing (OFDM) transmission has been adopted as the transmission technology

for next generation broadband wireless standards (such as IEEE 802.16 and long term evolution

(LTE)), this results in the need to develop receiver algorithms for OFDM based cooperative

communications [3]–[5]. On the other hand, another important goal for the next generation

of wireless broadband systems is to support high user mobility. With high mobility, the time-

variation of the channel within one OFDM symbol cannot be ignored and channel responses

vary sample by sample, which makes channel estimation challenging. Moreover, high mobility

destroys the orthogonality among subcarriers and induces intercarrier interference (ICI), which

also complicates data detection [6]–[9]. As a result, channel estimation and data detection for

OFDM cooperative systems under high mobility is very challenging.

Channel estimation and data detection for cooperative communications with time-invariant

channels has been studied in [1]–[5]. Specifically, channel estimation is studied in [1] and [2] un-

der the assumption of time-invariant flat fading channels. Targeting OFDM transmission, training

sequence based channel estimation is investigated in [3] and [4], and data detection performance

analysis is considered in [5] by assuming perfect channel state information. However, algorithms

applicable to time-varying channels cannot be obtained through direct extension of these previous

works. Recently, channel estimation for cooperative OFDM systems with time-varying channels

has been studied in [10], in which the amplify-and-forward (AF) scheme is adopted due to

its low complexity and minimal delay. By exploiting a time-frequency representation of the

received signals, channel estimation algorithms are proposed for two different scenarios. In the

September 10, 2011DRAFT

Page 3

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

3

first scenario, the corresponding channels are individually estimated at the relay and destination,

whereas in the second one, the cascaded source-relay-destination channel is jointly estimated

at the destination. Semi-blind (i.e., only certain subcarriers are occupied by pilots) channel

estimation with unknown data estimated by methods proposed for time-invariant channels is

considered, which may lead to severe modeling errors and performance degradation. Moreover,

only one relay is considered in [10] and the extension to multiple relays using the time-frequency

representation framework is by no means straightforward.

In this paper, an OFDM-based AF cooperative system with one source, multiple relays and

one destination is considered. In order to reduce the computational load on relays and time delay

of the whole system, no channel estimation is performed at the relays, which corresponds to the

second scenario in [10]. Unlike previous works in which subcarriers are occupied by either pilots

or data [10], [11], here partial data-dependent superimposed training (PDDST) [8] is adopted

for channel estimation and data detection. Notice that superimposed training (ST) based channel

estimation and data detection have been widely investigated in point-to-point systems [8], [12]–

[14]. This superimposed training is even more practical for systems over time-varying channels.

Otherwise, a number of subcarriers in each OFDM symbol need to be assigned to pilots for

channel estimation purpose, which would result in low spectral efficiency [10], [11].

The contributions of this paper are as follow. First, based on the generalized complex exponen-

tial basis expansion model (GCE-BEM), the system model is reformulated in a nontrivial way

to obtain an expression similar to that for a conventional point-to-point OFDM system. After

that, a closed-form channel estimator is developed based on the least squares (LS) method with

Tikhonov regularization [15] and a corresponding data detection algorithm is proposed using

the linear minimum mean square error (LMMSE) criterion. In the LS-based channel estimator,

since the unknown data is treated as part of the noise, the resulting system performance may

not meet the requirements. To address this issue, an iterative method based on the variational

inference approach is derived to improve performance. The variational inference approach is

useful in cases when direct access or maximization of the posterior distribution of parameters

to be estimated is difficult, if not impossible. In particular, the variational inference approach

constructs a distribution similar to that of the true posterior but with a tractable form [16]. Since

it is basically a Bayesian framework, statistical information (such as channel statistics, power of

data and noise) is exploited to aid the estimation. Finally, computer simulations are performed to

September 10, 2011 DRAFT

Page 4

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

4

demonstrate the effectiveness of the proposed channel estimation and data detection algorithms.

The rest of the paper is organized as follows. The channel and system model for OFDM

cooperative systems is introduced in Section II. The system model is then reformulated using

the GCE-BEM in Section III. In Section IV, a channel estimator is developed based on the

LS method with the Tikhonov regularization and a corresponding data detection algorithm is

proposed using the LMMSE criterion. In Section V, based on the variational inference approach,

an iterative enhancement algorithm is developed. Simulation results are presented in Section VI

to demonstrate the effectiveness of the proposed algorithms. Finally, conclusions are drawn in

Section VII.

Notation: Boldface uppercase and lowercase letters are used for matrices and vectors respec-

tively. Superscripts T, H and † denote transpose, Hermitian and pseudo-inverse respectively.

The symbol IN denotes the N × N identity matrix. The symbol diag{x} signifies the diagonal

matrix with vector x on its diagonal and ?x? represents the L2 norm of x. Tr{X} and |X|

are the trace and the determinant of a square matrix X respectively. Symbols E{·} and ?{·}

denote the expectation and the real part of the operand in the brackets respectively. The symbol

⊗ denotes convolution. The matrix F denotes the fast Fourier transform (FFT) matrix with

[F]m,n=

1

√Ne−j2πmn/Nand ?a? rounds a to the nearest integer greater than or equal to a.

II. SYSTEM MODEL

In this paper, we consider a cooperative system with a source S, a destination D and K relays

Rkscattered between S and D. Each of these elements is equipped with a single antenna. Denote

the channels from the source S to the k-th relay Rkand from the k-th relay Rkto the destination

D as hk(t,τ) and gk(t,τ) respectively. There is no direct link between source and destination. It is

assumed that the K relays are stationary but the source and destination are moving at high speed.

Thus the propagation channels are modeled as multi-path time-varying channels. Specifically,

the source-relay channel hk(t,τ) has Lh

tap denoted by ?2

kindependent taps with the average power of the l-th

l,k. The auto-correlation of the l-th tap follows the classical Jakes’ model [6]

given by E{hk(mTs,l)h∗

of the l-th tap at time nTs (Ts is the sample interval), J0(·) represents the zero-order Bessel

function of the first kind, and fhis the maximal Doppler shift between source and the relays.

Similarly, the relay-destination channel gk(t,τ) has Lg

k(nTs,l)} = ?2

l,kJ0(2πfh(n − m)Ts), where hk(nTs,l) is the sample

kindependent taps with the average power

September 10, 2011DRAFT

Page 5

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

5

of the l-th tap denoted by ς2

l,k. The auto-correlation of the l-th tap is E{gk(mTs,l)g∗

l,kJ0(2πfg(n−m)Ts), where gk(nTs,l) is the sample of the l-th tap at time nTs, and fgrepresents

the maximal Doppler shift between the relays and destination.

k(nTs,l)} =

ς2

Transmitted signal at the source: In an OFDM system, the source data in the frequency domain

x = [x(0),··· ,x(N − 1)]Tis modulated onto N parallel subcarriers to obtain the time domain

signal s = FHx. In this paper, we consider partial data-dependent superimposed training [8],

which is a general description on the placement of pilots and data. The i-th element of x is

given by

x(i) =

(1 − ξ)xd(i) + xp(i)

xd(i)

∀

i ∈ ?

otherwise

(1)

where ?, with cardinality Np, is the index set of subcarriers on which both data and pilots are

transmitted. Therefore the transmitted symbol x(i) at the i-th subcarrier is a linear combination

of a pilot symbol and a data symbol. The total power for the i-th subcarrier (i ∈ ?) is (1 −

ξ)2E?xd(i)?2+E?xp(i)?2. Furthermore, without loss of generality, it is assumed that the average

power of xd(i) is σ2

x. From (1), we have

x = A?xd+ E?xp,

(2)

where A?is a diagonal matrix with

[A?]i,i=

1 − ξ

∀

otherwise

i ∈ ?

1

,

(3)

E?is a matrix collecting columns of INwith indices in ?, and xpand xddenote pilot and data

vectors respectively.

From (1), it is noticed that the PDDST includes the following three cases:

• When 0 < ξ < 1, the data component at each subcarrier i ∈ ? is (1 − ξ)xd(i). In this case,

both data and pilots are transmitted on subcarrier set ?.

• In the case ξ = 1, the data component at each subcarrier i ∈ ? is nulled, and the PDDST

reduces to the semi-blind case in which subcarriers are uniquely occupied by pilots or data.

This case is also referred to as data-dependent superimposed training (DDST) [13].

• In the case ξ = 0, the PDDST reduces to traditional superimposed training [12], and xp

becomes the training in the frequency domain while the data xdremains intact.

September 10, 2011DRAFT

Page 6

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

6

Upon transmission, a cyclic prefix (CP) of length Lcplonger than maxK

inserted at the beginning of the time domain OFDM signal s to prevent intersymbol interference

k=1(Lh

k+ Lg

k− 1) is

(ISI) at the destination D.

Signal processing at relays: The signal received at the k-th relay rk(n) is given by (Ts is

omitted from the time index for notational simplicity)

rk(n) =

Lh

?

k−1

l=0

hk(n,l)s(n − l) + vk(n),

(4)

where vk(n) denotes additive white Gaussian noise (AWGN) with average power σ2

reception, the k-th relay Rksimply amplifies the incoming signal rk(n). The transmitted signal

from the k-th relay is thus written as [3]

vk. Upon

zk(n) = αkrk(n)

(5)

where

αk=

√pk

?

σ2

s

?Lh

k−1

l=0

?2

l,k+ σ2

vk

(6)

with pkbeing the transmission power at the relay Rkand σ2

Received signal at the destination: The received signal at the destination is a superposition of

sis the average power of s.

the signals from K relays and is given by

y(n) =

K

?

k=1

Lg

?

k−1

l=0

gk(n,l)zk(n − l) + w(n).

(7)

After removing the CP, the received signal vector y = [y(0),··· ,y(N − 1)]Tis

y =

K

?

k=1

Gkzk+ w,

(8)

where zk= [zk(−(Lg

matrix given by

k− 1)),··· ,zk(0),··· ,zk(N − 1)]T, Gkis an N × (N + Lg

k− 1) channel

Gk=

gk(0,Lg

k− 1)

···

gk(0,0)

gk(1,Lg

k− 1)

···

···

···

gk(1,0)

···

···

gk(N − 1,Lg

k− 1)gk(N − 1,0)

(9)

September 10, 2011DRAFT

Page 7

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

7

and w denotes an AWGN vector [w(0),··· ,w(N − 1)]Twith average power σ2

according to (4) and (5), the signal vector zkcan be compactly expressed as

w. Furthermore,

zk= αkHkILc

ks + αkvk

(10)

where Hkis an (N + Lg

k− 1) × (N + Lg

hk(−(Lg

hk(−(Lg

k+ Lh

k− 2) channel matrix given by

k− 1)···hk(−(Lg

k− 2),Lh

·········

hk(N − 1,Lh

Hk=

k− 1),Lh

k− 1),0)

k− 1)···hk(−(Lg

k− 2),0)

k− 1)···hk(N − 1,0)

kbeing the last Lc

,

(11)

ILc

columns of the identity matrix, and vkdenotes an AWGN vector [vk(−(Lg

1)]T. Based on (8), (10) and s = FHx, the received signal vector y is

k= [ELc

k,IN]Tcharacterizes the effect of the CP with ELc

k? Lg

k+ Lh

k− 2

k−1)),··· ,vk(0),··· ,vk(N−

y =

K

?

k=1

αkGkHkILc

kFHx +

K

?

?

k=1

αkGkvk+ w

???

?¯ w

.

(12)

III. REFORMULATION WITH THE BASIS EXPANSION MODEL

Channel state information is generally required for data detection. It is clear from (9) and (11)

that the number of unknown channel parameters in channel matrices Gkand Hkare NLg

(N + Lg

kand

k− 1)Lh

krespectively, which are much larger than the number of received samples N.

This makes direct channel estimation impossible. However, due to the fact that the time-varying

channel is time-correlated, the GCE-BEM [6] can be adopted to represent the channels, and

the number of unknown channel parameters can be significantly reduced. With GCE-BEM, the

matrix Hkin (11) can be approximated by

Hk∼=

ρh

?

m=−ρh

Φ(m)Hb

k(m)

(13)

where ρh= ?ChNfhTs? with Chbeing the oversampling factor in the Doppler domain,

?

Φ(m) = diag

ej2πm(−(Lg

k−1))/ChN,··· ,ej2πm(N−1)/ChN?

(14)

September 10, 2011DRAFT

Page 8

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

8

and

Hb

k(m) =

hb

k(m,Lh

hb

k− 1)···hb

k(m,Lh

k(m,0)

k− 1)···hb

·········

k(m,Lh

k(m,0)

hb

k− 1)···hb

k(m,0)

(15)

with hb

k(m,l) denoting the BEM coefficient characterizing the source-relay channel hk(t,τ).

Here the oversampling factor Chis an integer adjusting the Doppler range sampled by the BEM

and the number of basis vectors used to represent the time-varying channel. Similarly, the matrix

Gkin (9) can be approximated by

Gk∼=

ρg

?

q=−ρg

Θ(q)Gb

k(q)

(16)

where ρg= ?CgNfgTs? with Cg(similar to Ch) being the corresponding oversampling factor,

Θ(q) = diag{ej2πq(0)/CgN,··· ,ej2πq(N−1)/CgN}

(17)

and

Gb

k(q) =

gb

k(q,Lg

k− 1)···gb

k(q,Lg

k(q,0)

gb

k− 1)···gb

·········

k(q,Lg

k(q,0)

gb

k− 1)···gb

k(q,0)

(18)

with gb

k(q,l) denoting the BEM coefficient characterizing the relay-destination channel gk(t,τ).

Substituting (13) and (16) into (12), it follows that

y =

K

?

k(q) and Hb

k=1

ρg

?

q=−ρg

ρh

?

k(m) are unknown and must be estimated. Notice that, from

m=−ρh

αkΘ(q)Gb

k(q)Φ(m)Hb

k(m)ILc

kFHx + ¯ w.

(19)

Clearly from (19), Gb

the point of view of data detection, only the combined channels are needed. Therefore, in the

following, by exploiting the Toeplitz structure of Gb

k(q) and Hb

k(m), it is proved in Appendix

A that (19) can be equivalently written as

y =

ρg

?

q=−ρg

ρh

?

m=−ρh

Γ(q,m)FHdiag{FLeu(q,m)}x + ¯ w

(20)

where Γ(q,m) = diag{ej2π(0)(κm+ϑq)/(κChN),··· ,ej2π(N−1)(κm+ϑq)/(κChN)} with κCh= ϑCgbe-

ing the least common multiple of Chand Cg, FLecollects the first Le= maxK

k=1Lc

k+1 columns of

September 10, 2011DRAFT

Page 9

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

9

F, and u(q,m) =?K

Notice that, from the definition of Γ(q,m), different combinations of (q,m) may result in

k=1αk[?gb

k(q,0),··· ,gb

k(q,Lg

k−1)e−j2π(Lg

k−1)/(ChN)?⊗?hb

k(m,0),··· ,hb

k(m,Lk−

1)?,01×(Le−Lc

the same diagonal matrix Γ(q,m), i.e., Γ(q1,m1) = Γ(q2,m2) when κq1+ ϑm1= κq2+ ϑm2.

Combining those terms in (20) with the same Γ(q,m) and denoting the number of distinct

k−1)]T. From (20), it is clear that all the channel effects are summarized in u(q,m).

matrices Γ(q,m) as Nb, the received signal vector (20) can be rewritten as

y =

Nb

?

j=1

Γ(γj)FHdiag{FLeu(γj)}x + ¯ w

(21)

where Γ(γj) denotes a distinct diagonal matrix Γ(q,m) with κq + ϑm = γj and u(γj) =

?

u(γj) is also unknown and required to be estimated.

In particular, substituting x = A?xd+ E?xpinto (21), we have

κq+ϑm=γju(q,m). Our aim is to estimate the data xd(contained in x) based on (21), while

y =

Nb

?

?

j=1

Γ(γj)FHdiag{FLeu(γj)}A?

???

?Ξ[u]

xd+

Nb

?

?

j=1

Γ(γj)FHdiag{FLeu(γj)}E?xp

???

?η[u]

+ ¯ w

(22)

with u = [uT(γ1),··· ,uT(γNb)]T. This equation can be used to estimate the unknown data xdif

u(γj) has been estimated. On the other hand, for the convenience of estimating u(γj), it will be

more efficient to put u(γj) in a linear relationship with y. Based on (21), reversing the positions

of u(γj) and x and writing the summation into matrix form gives

y = B[x]u + ¯ w

(23)

with B[x] = [Γ(γ1)FHdiag{x}FLe,··· ,Γ(γNb)FHdiag{x}FLe]. The unknown u can be esti-

mated by using only the subcarriers containing pilots, which is detailed in the next section.

IV. CHANNEL ESTIMATION AND DATA DETECTION

Based on (23) and x = A?xd+ E?xp, taking the FFT and stacking all the received samples

corresponding to the subcarrier set ? (where PDDST is located) gives

¯ yp= F?B[A?xd+ E?xp]u + F?¯ w

= F?B[E?xp]u + F?B[A?xd]u + F?¯ w

?

???

?δ

(24)

where F?collects rows of F corresponding to the subcarrier set ?.

September 10, 2011DRAFT

Page 10

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

10

According to the definition of the matrix operator B[·] in (23), it is composed of a number of

submatrices Γ(γj)FHdiag{x}FLewhich differ only in Γ(γj). Since Γ(γj) = diag{ej2π(0)γj/(κChN),

··· ,ej2π(N−1)γj/(κChN)} and γj<< κChN, Γ(γj) for different values of j turn out to be quite

similar to each other, which would result in columns of B[E?xp] being similar and lead to

the problem being ill conditioned. Thus if LS were directly applied to estimate u, the estimate

would be far away from the true value. Moreover, unlike a rank deficient problem which can

be solved via truncated singular value decomposition with a determined numerical rank, for an

ill-conditioned matrix, the singular values decay gradually to zero with no significant gap, and

therefore, there is no notion of a numerical rank [15]. To deal with this ill-conditioning problem,

we hereby employ the Tikhonov regularization method [15] to estimate u in (24). By treating

δ as an effective noise, the estimation problem using LS with Tikhonov regularization can be

stated as

min

u[(F?B[E?xp]u − ¯ yp)H(F?B[E?xp]u − ¯ yp) + λ2uHLHLu]

(25)

where L denotes a regularization matrix and is chosen according to different criteria (e.g., an

identity matrix for minimum energy, a banded matrix for maximal flatness), the variable λ

signifies the regularization parameter that balances the minimization of the two terms in (25).

For a given λ, based on the LS criterion, the solution for (25) is readily obtained as

uλ= (BH[E?xp]FH

?F?B[E?xp] + λ2LHL)−1BH[E?xp]FH

?¯ yp.

(26)

Notice that in (26), the value of λ could significantly affect the estimation performance by either

over-regularization or under-regularization. It is therefore crucial to choose an appropriate λ.

An intuitive approach to design λ has been developed based on the concept of the L-curve

[15], which are curves of the smoothing norm ?Luλ?2versus the corresponding residual norm

?F?B[E?xp]uλ− ¯ yp?2for different λ. Following this approach, the regularization parameter

λ is chosen as the value corresponding to the point with the maximal curvature on the curve

(log?F?B[E?xp]uλ− ¯ yp?,log?Luλ?) [15].

After the estimate of u is obtained, data can be detected based on the system model (22).

Using the assumptions that the noise is AWGN and independent from the channels, and the

September 10, 2011 DRAFT

Page 11

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

11

channel responses are independent from tap to tap, it is easy to verify that

E{¯ wH¯ w} = E{(

K

?

K

?

k=1

αkGkvk+ w)H(

K

?

k=1

αkGkvk+ w)}

= E{

k=1

α2

kσ2

vkGkGH

k+ σ2

wIN} = (

K

?

k=1

α2

kσ2

vk

Lg

?

k−1

l=0

ς2

l,k+ σ2

w)IN? σ2IN.

(27)

Based on (22) and (26) and treating ¯ w as the effective noise [1], [3], the approximate LMMSE

solution of xdis then given by

˜ xd= (ΞH[uλ]Ξ[uλ] +σ2

σ2

x

IN)−1ΞH[uλ](y − η[uλ]).

(28)

Finally, ˆ xd= Qant[˜ xd] is a data estimator, where Qant[·] denotes the hard decision of the operand

in the brackets.

Remark 1: The complexity of the LMMSE scheme is dominated by one NbLe×NbLematrix

inversion on BH[E?xp]FH

on ΞH[uλ]Ξ[uλ] +σ2

σ2

O(M3). Therefore, the complexity of the LMMSE scheme is O((NbLe)3+ N3

?F?B[E?xp]+λ2LHL shown in (26) and one Nd×Ndmatrix inversion

xINdshown in (28). For an M × M matrix, the complexity of inversion is

d).

V. THE VARIATIONAL INFERENCE APPROACH TO ITERATIVE CHANNEL ESTIMATION AND

DATA DETECTION

In the previous section, the unknown data is treated as noise (refer to (24)) and it becomes

a bottleneck for channel estimation, which in turn affects the data detection performance. In

this section, u and xdwill be jointly estimated to improve performance. Moreover, unlike the

Tikhonov regularization method discussed above, we solve the ill-posed problem in a Bayesian

framework. Specifically, our aim is to estimate u and xd, which maximize the posterior proba-

bility density function (pdf) p(u,xd|y). However, the computation of p(u,xd|y) is complicated

due to the discrete nature of the data, not to mention the maximization of p(u,xd|y) with respect

to xd. To overcome this problem, we consider the variational inference approach which looks

for a parameterized distribution Q(u,xd) to closely represent the posterior pdf p(u,xd|y) [17].

Once Q(u,xd) is found, estimates of u and xdare simply obtained by maximizing Q(u,xd).

To derive Q(u,xd) closest to p(u,xd|y), we minimize the following free energy function

defined as in [16]:

?

F =

u,xd

Q(u,xd)log

Q(u,xd)

p(u,xd,y)dudxd.

(29)

September 10, 2011DRAFT

Page 12

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

12

Minimizing the free energy function is equivalent to minimizing the difference between Q(u,xd)

and p(u,xd|y). A simplification can be made by factorizing Q(u,xd) into a product form (also

known as a mean-field approximation) [18], i.e., Q(u,xd) = Q1(u)Q2(xd), which is equivalent

to assuming that u and xdare independent conditioned on y. Then a simple expression for the

variational free energy in (29) is given by

?

=

u

?

−

u,xd

F =

u,xd

Q(u,xd)log

Q(u,xd)

p(y|u,xd)p(u)p(xd)dudxd

?

Q1(u)logp(u)du −

?

−

Q1(u)logQ1(u)du +

xd

?

Q2(xd)logQ2(xd)dxd

u

xd

Q2(xd)logp(xd)dxd

?

Q1(u)Q2(xd)logp(y|u,xd)dudxd.

(30)

A. Free Energy Function

According to (30), the computation of the free energy function requires the likelihood function

p(y|u,xd) and the prior statistics p(u) and p(xd). With x = A?xd+ E?xpand based on (23),

the likelihood function p(y|u,xd) is then given by

1

(πσ2)Nexp{−1

With respect to the prior statistics, we let p(xd) be complex Gaussian with zero mean and

covariance matrix Λxd, i.e., p(xd) = N(0,Λxd) [17], where Λxdis a diagonal matrix with

diagonal elements depending on the average power of xd. Note that instead of defining a discrete

distribution over the signal constellation, we have made a Gaussian approximation of p(xd),

which will lead to a linear detector. On the other hand, the statistics of u are difficult if not

p(y|u,xd) =

σ2?y − B[A?xd+ E?xp]u?}.

(31)

impossible to derive. However, it can be shown that E{u} = 0 and the covariance matrix Ru

can be obtained in closed-form (shown in Appendix B). Therefore, u can be approximated as

being Gaussian distributed with pdf

p(u) =

1

πNbLe|Ru|exp{−uHR−1

uu}.

(32)

Besides specifying the likelihood function and prior statistics above, the forms of Q2(xd) and

Q1(u) need to be fixed. In view of the discrete nature of the data, a close approximation is [17]

Q2(xd) = δ(¯ xd− xd),

(33)

September 10, 2011 DRAFT

Page 13

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

13

where δ(·) denotes a vector Dirac delta function with the properties?δ(¯ xd− xd)dxd= 1 and

Q1(u) is chosen to be a Gaussian pdf

?δ(¯ xd− xd)f(xd)dxd= f(¯ xd) for any smooth function f(·). For convenience in maximization,

Q1(u) =

1

πNbLe|Ψu|exp{−(u − mu)HΨ−1

u(u − mu)}

(34)

with muand Ψubeing the posterior mean and covariance matrix of u respectively.

With (31), (32) (33), (34) and p(xd) = N(0,Λxd), the five terms in (30) can be respectively

computed as

?

+ 2?{mH

= −NbLelogπ − log|Ψu| − NbLe,

?

?

?

and

?

+ Tr{BH[A?¯ xd+ E?xp]B[A?¯ xd+ E?xp](mumH

u

Q1(u)logQ1(u)dh = −NbLelogπ − log|Ψu| − mH

uΨ−1

umu

uΨ−1

umu} − Tr{Ψ−1

u(mumH

u+ Ψu)}

(35)

xd

Q2(xd)logQ2(xd)dxd = 0,

(36)

u

Q1(u)logp(u)du = −NbLelogπ − log|Ru| − Tr{R−1

u(mumH

u+ Ψu)},

(37)

xd

Q2(xd)logp(xd)dxd = −Ndlogπ − log|Λxd| − ¯ xH

dΛ−1

xd¯ xd

(38)

u,xd

Q1(u)Q2(xd)logp(y|u,xd)dudxd = −N log(πσ2) −1

σ2

?

yHy − 2?{yHB[A?¯ xd+ E?xp]mu}

u+ Ψu)

?

.

(39)

Substituting (35), (36), (37), (38) and (39) into (30) and dropping constant terms, we have

F(mu,Ψu, ¯ xd) ∝ − log|Ψu| + Tr{R−1

+1

u(mumH

u+ Ψu)} + ¯ xH

dΛ−1

xd¯ xd

σ2

?

yHy − 2?{yHB[A?¯ xd+ E?xp]mu}

+ Tr{BH[A?¯ xd+ E?xp]B[A?¯ xd+ E?xp](mumH

u+ Ψu)}

?

.

(40)

Notice that, after integration, the free energy function depends only on mu, Ψuand ¯ xd.

September 10, 2011DRAFT

Page 14

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

14

B. Iterative minimization of the free energy function

The remaining task is to obtain ( ˆ mu,ˆΨu, ˆ xd) by minimizing F(mu,Ψu, ¯ xd). After that, a

data estimate can be obtained by maximizing Q2(xd) given ˆ xd. Accordingly, ˆ xdis an estimate

of xd. Similarly, a channel estimate can be acquired by maximizing Q1(u) given ˆ muandˆΨu.

Notice that, since Q1(u) is designed to be a complex Gaussian pdf, it is maximized at u = ˆ mu,

which can be considered as a maximum a posteriori probability (MAP) channel estimator. For

minimization of the free energy given in (40) with respect to (mu,Ψu, ¯ xd), it is found that, given

¯ xd, there exist closed-form solutions for mu and Ψu. On the other hand, given mu and Ψu,

we can derive a closed-form solution for ¯ xd. Therefore, F(mu,Ψu, ¯ xd) is minimized iteratively,

starting with an initial value of (mu,Ψu, ¯ xd). The update at the ithiteration follows as:

Updating muand Ψugiven ˆ xi−1

By setting the first order derivative of the free energy (40) with respect to muto zero, we

have the estimate of muas

d

ˆ mi

u=?BH[A?ˆ xi−1

Similarly, by setting the first order derivative of the free energy with respect to Ψuto zero, we

have the estimate of Ψuas

d

+ E?xp]B[A?ˆ xi−1

d

+ E?xp] + σ2R−1

u

?−1BH[A?ˆ xi−1

d

+ E?xp]y.

(41)

ˆΨi

u= σ2?BH[A?ˆ xi−1

uandˆΨi

d

+ E?xp]B[A?ˆ xi−1

d

+ E?xp] + σ2R−1

u

?−1.

(42)

Updating ¯ xdgiven ˆ mi

Notice that the free energy given by (40) depends on ¯ xd in a non-linear way. To obtain a

closed-form solution for ¯ xd, we first transform (40) into a linear function of ¯ xd. Given the

eigen-decomposition ofˆΨi

u

u=?Nb

j=1βi

jψi

j(ψi

j)H, we have

Tr{BH[A?¯ xd+ E?xp]B[A?¯ xd+ E?xp]ˆΨi

Nb

?

Putting (43) into (40), minimizing the free energy given by (40) with respect to ¯ xdis equivalent

u}

=

j=1

βi

j(ψi

j)HBH[A?¯ xd+ E?xp]B[A?¯ xd+ E?xp]ψi

j.

(43)

September 10, 2011DRAFT

Page 15

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing pubs-permissions@ieee.org.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

15

to minimizing

˜F(¯ xd) ∝¯ xH

dΛ−1

xd¯ xd+1

σ2

?

− 2?{yHB[A?¯ xd+ E?xp] ˆ mi

u}

+ ( ˆ mi

u)HBH[A?¯ xd+ E?xp]B[A?¯ xd+ E?xp] ˆ mi

u

+

Nb

?

j=1

βi

j(ψi

j)HBH[A?¯ xd+ E?xp]B[A?¯ xd+ E?xp]ψi

j

?

.

(44)

Due to the fact that the right hand side of (22) is equivalent to that of (23), we have Ξ[u]xd+

η[u] = B[A?xd+E?xp]u, which still holds when xdis replaced by ¯ xdand u is replaced by any

vector having compatible dimensions with Ξ[·], η[·] and B[·]. Therefore, (44) can be rewritten

as

˜F(¯ xd) ∝¯ xH

σ2

− (Ξ[ ˆ mi

Nb

?

dΛ−1

xd¯ xd−1

?

2?{yH(Ξ[ ˆ mi

u]¯ xd+ η[ ˆ mi

u])}

u]¯ xd+ η[ ˆ mi

u])H(Ξ[ ˆ mi

u]¯ xd+ η[ ˆ mi

u])

−

j=1

βi

j(Ξ[ψi

j]¯ xd+ η[ψi

j])H(Ξ[ψi

j]¯ xd+ η[ψi

j])

?

∝ − 2?{(yHΞ[ ˆ mi

u] − ηH[ ˆ mi

u]Ξ[ ˆ mi

u] −

Nb

?

j=1

βi

jηH[ψi

j]Ξ[ψi

j])¯ xd}

+ ¯ xH

d(ΞH[ ˆ mi

u]Ξ[ ˆ mi

u] +

Nb

?

j=1

βi

jΞH[ψi

j]Ξ[ψi

j] + σ2Λ−1

xd)¯ xd.

(45)

Although˜F(¯ xd) in (45) is a quadratic form of ¯ xd, strictly speaking, minimizing˜F(¯ xd) with

respect to ¯ xd is still a multidimensional search problem due to the discrete nature of ¯ xd. To

overcome this problem, we first relax ¯ xd to be continuous, which leads to a low-complexity

linear solution. By setting the first order derivative of˜F(¯ xd) with respect to ¯ xdto zero, we have

˜ xi

d=(ΞH[ ˆ mi

u]Ξ[ ˆ mi

u] +

Nb

?

j=1

βi

jΞH[ψi

j]Ξ[ψi

j] + σ2Λ−1

xd)−1

× (ΞH[ ˆ mi

u]y − ΞH[ ˆ mi

u]η[ ˆ mi

u] −

Nb

?

j=1

βi

jΞH[ψi

j]η[ψi

j]).

(46)

Then constellation mapping is carried out to obtain the estimate of ¯ xdas ˆ xi

In summary, the proposed iterative algorithm alternates among (41), (42) and (46) until

d= Qant[˜ xi

d].

Fi−Fi−1

Fi

is smaller than a threshold ?. Within the framework of the variational inference approach, the

proposed algorithm will approach an approximation of the desired posterior pdf. In the case that

September 10, 2011DRAFT