Page 1

Error-Entropy based Channel State Estimation of

Spatially Correlated MIMO-OFDM

H.D. Tuan1, H.H. Kha1and H.H. Nguyen2

1University of New South Wales, Sydney, Australia. Email: {h.d.tuan,h.k.ha}@unsw.edu.au

2University of Saskatchewan, Saskatoon, Canada. Email: ha.nguyen@usask.ca.

Abstract—This paper deals with optimized training sequences

to estimate multiple-input multiple-output orthogonal frequency-

division multiplexing (MIMO-OFDM) channel states in the

presence of spatial fading correlations. The optimization criterion

is the entropy minimization of the error between the high multi-

dimensional and correlated channel state and its estimator. The

globally optimized training sequences are exactly solved by a

semi-definite programming (SDP) of tractable computational

complexity O((Mt(Mt + 1)/2)2.5), where Mt is the transmit

antenna number. With new tight two-sided bounds for the

objective function, the optimal value of the generic SDP can

be approximately solved by the standard water-filling algorithm.

Intensive simulation results are provided to illustrate the perfor-

mance of our methods.

Index terms. Error-entropy, training sequences, spatial cor-

relation, MIMO, OFDM, convex programming.

I. INTRODUCTION

Accurate knowledge of channel state estimation (CSE) at

both receiver and transmitter sides of mobile wireless multi-

input multi-output orthogonal frequency division multiplex-

ing (MIMO-OFDM) systems is the base to greatly increase

communication capacity by space-time exploratory tools such

as pre-coding and beamforming. In fact, the channel capacity

can be gained with CSE available not only at the receiver but

also at transmitter, which obviously depends on the estimation

quality [1]. For broadband wireless systems, which experience

frequency-selective fading, OFDM can turn the channel into

parallel flat fading sub-channels. However, it is not efficient

to estimate these flat fading sub-channels separately because

(i) they are not independent but rather correlated in time

and space due to inadequate antenna spacing or scattering,

and (ii) their total dimension is much larger than that of

channel dimension. This is why CSE in MIMO-OFDM sys-

tems cannot be readily extended from that for flat fading

channels [2]. In fact, CSE in MIMO-OFDM is a highly-

structured matrix problem, which cannot be easily factorized

for tractably computational solutions. The first estimator for

spatially correlated MIMO-OFDM channels was obtained in

[2] but its proposed training solution is locally optimized

at extreme SNR regimes only. More recently, the globally

optimized training sequences and estimator have been more

thoroughly and correctly addressed in [3].

All modern wireless communication systems are digital,

which means that the channel estimation is in fact further

distorted by digitalization such as variable-rate quantization

and compression under limited bit budget. Thus, traditional

mean square error (MSE) is not a true distortion measure

for quality of digitalized estimators. On the other hand, it is

well known [4], [5] that the Shannon distortion-rate function

[6] defined by the error-entropy instead of MSE is a much

more appropriate measure, especially for jointly estimating and

quantizing random correlated vectors. Only recently, there is

an increasing interest in error-entropy minimization for CSE of

flat-fading MIMO systems [7], [8] but only locally optimized

solutions at extreme (low or high) SNR regimes have been

appropriately addressed.

In the present paper, we also adopt error-entropy minimiza-

tion (EEM) in optimizing the training sequences for CSE of

spatially correlated frequency-selective MIMO-OFDM chan-

nels, which in fact needs a quite different convexity analysis

and supporting mathematical framework than the global MSE

minimization (MSEM) based CSE treated in [3]. Our contribu-

tions are two-fold: 1) To show in Section III that the problem

can be lossless recast as tractable semi-definite programming

(SDP) and thus is computationally solved by available SDP

solvers. The dimension of the only matrix variable in this

SDP is fixed at Mt×Mt, where Mtis the number of transmit

antennas. This means that the computational complexity of this

SDP-based solution is O((Mt(Mt+ 1)/2)2.5), which is not

only low but also does not depend on the size of the training

sequences. Moreover, the correlation between channels can be

exploited to reduce the sub-carrier number used for channel

estimation while maintaining computational tractability. The

CSE quality is maintained even only a small portion of OFDM

sub-carriers is used for training (within one OFDM block, the

rate occupance of training sequences can be reduced to 1/256);

2) To provide in Section IV both-sided tight upper and lower

bounds for the matrix objective function in general cases such

that the corresponding two-sided bound optimization problems

can be solved by the standard water-filling algorithms of

low numerical complexity. Our intensive simulation V shows

that these water-filling based solutions effectively yield nearly

global minimal value of the original exact EEM.

Notations: Bold capital and lower case letters denote matrices and column

vectors, respectively. (·)Tand (·)Hdenote transpose and Hermitian transpose

operations, respectively. The symbol ⊗ is used for the Kronecker product of

two matrices and vec(X) denotes the vectorization operation of matrix X.

tr(·) and | · | stand for the trace and determinant of the matrix, respectively.

X ≥ 0 means X is Hermitian symmetric and positive semi-definite. In is

the identity matrix of dimension n × n. The expectation operation is E{·},

while CN(0,σ2) denotes a circularly symmetric complex Gaussian random

variable. Furthermore, [Aij]i,j=0,1,...,Nwith matrices Aijmeans the matrix

3468978-1-4577-0539-7/11/$26.00 ©2011 IEEEICASSP 2011

Page 2

with block entries Aij. Analogously, diag[Ai]i=0,1,...,N means the matrix

with diagonal blocks Aiand zero off-diagonal blocks. All logarithms in the

paper are base-2.

II. MIMO-OFDM SYSTEMS AND TRAINING DESIGN

Consider a broadband MIMO-OFDM system with Mttrans-

mit antennas, Mrreceive antennas, whose frequency-selective

MIMO fading channel is described by the transfer matrix

L−1

?

process H?∈ CMr×Mtrepresents the gains of the ?th MIMO

path. The elements (H?)m,nof H?are (possibly correlated)

circularly symmetric complex Gaussian random variables that

remain unchanged over the period of channel estimation.

The spatial correlations of MIMO channels can be modeled

by the Kronecker structure of their channel matrix gain [9]

H? = R1/2

deterministic Hermitian symmetric matrix Rt?= R1/2

CMt×Mtmodels the correlation between the transmit antennas

while Rr?= R1/2

r?

∈ CMr×Mrcaptures the correlation

between the receive antennas. Hw? ∈ CMr×Mtis time-

varying stationary process, whose elements are independent

and identical distributed zero-mean circularly symmetric com-

plex Gaussian random variables with unit variance.

For h := (vecT(H0),vecT(H1),...,vecT(HL−1))T

CLMtMr, its correlation matrix is Rh

diag[Rtj ⊗ Rrj]j=0,1,...,L−1 Suppose the MIMO-OFDM

system uses M = 2log2Msub-carriers to turn the frequency-

selective fading channel into M parallel flat fading sub-

channels. Each block of length M goes through an OFDM

modulator to form an OFDM block and is transmitted

via one transmit antenna. The OFDM cyclic prefix length

is chosen to be longer than the channel order, L − 1, to

avoid the inter block interference (IBI). Thus sequences

x(j) ∈ CMt,j = 0,1,...,M − 1 are transmitted from Mt

antennas on the jth sub-carrier. Inside just one OFDM block,

N = 2log2N<< M training symbols are inserted on the 0th,

(2log2M−log2N)th,..., ((N −1)2log2M−log2N)th sub-carriers

for channel estimation. The training sequences are inserted

as s(k) = x(k2log2M−log2N) ∈ CMt, k = 0,1,...,N − 1.

Bydefining

WN

=e−j2π/N,

functioncorrespondingto

sub-channelis

Hf(k) :=

L−1

?

Thus, the normalized input-output equation for each pilot sub-

carrier is r(k) =

where

ρ

istheaverage

r(k)=(r0(k),r1(k),...,rMr−1(k))T

isthe

kthreceived signal

(s0(k),s1(k),...,sMt−1(k))T∈ CMtis the training vector,

and n(k)=(n0(k),n1(k),...,nMr−1(k))T

additive white Gaussian noise (AGWN), whose elements are

i.i.d CN(0,1) random variables.

H(z) =

?=0

H?z−?, where each time-varying stationary

r?Hwl(R1/2

t?)T, ? = 0,1,...,L − 1, where the

t?R1/2

t?

∈

r?R1/2

∈

:= E[hhH] =

thechannel

k2log2M−log2Nth

H(Wk2log2M−log2N

M

H?W?k

transfer

the

)=

?=0

H?W?k2log2M−log2N

M

=

L−1

?

?=0

N, k = 0,1,...,N−1.

?

ρ

MtHf(k)s(k)+n(k), k = 0,1,...,N−1,

signal-to-noise-ratio(SNR),

CMr

∈

s(k)

vector,

=

represents

For convenience, define the training symbol matrix S =

[s(0) s(1) ... s(N − 1)]T

tion for the received signal for training can be com-

pactly represented as r=

r = (rT(0),rT(1),...,rT(N − 1))T

(nT(0),nT(1),...,nT(N − 1))T

[F0S F1S ... FL−1S] ⊗ IMr ∈ CNMr×LMtMr, F? =

diag{Wkl

[(SHFH

There are MrN measurements for the estimation of

LMtMr unknown parameters, which are the entries of ma-

trices H? ∈ CMr×Mt, ? = 0,1,...,L − 1. When all the

entries of H?are independent, to make the estimation problem

meaningful, it requires that the number of measurements be

not less than the number of unknowns [8], i.e. N ≥ LMt.

However, as the entries of H? are correlated, using large

N would make optimization problems much less efficient

because there is not much freedom in optimizing solutions.

In fact, N ≥ 2L is sufficient if the channel correlation is well

exploited.

The CSE problem is to obtain an estimator ˆh for

h with a known (deterministic) training signal S. As

all random variables r,h and n are Gaussian with zero

mean, ˆh is Wiener filter (or MMSE estimator) ˆh

?

the error e=h −ˆh is a Gaussian random vari-

able with zero mean and covariance Re

?

to find the training matrix S to obtain the conditional mean

ˆh of the channel state h (which is considered as time-

varying stationary process) under some criterion and subject

to the normalized training power constraint tr(SHS) = NMt.

Traditional estimation methods try to minimize the error

between the channel state h and the conditional meanˆh in

terms of the least square [10], or mean square [2], [3], [11]

min

S∈CN×Mt, tr(SHS)=NMt

the training sequences such that the error-entropy log|πeRe|

is minimized:

min

which is equivalent to

∈

CN×Mt. Then, the equa-

?

∈ CNMr, n =

∈

ρ

MtM(S)h + n, where

CNMr, M(S)=

N}k=0,1,...,N−1, ? = 0,1,...,L−1, MH(S)M(S) =

?FmS) ⊗ IMr]?,m=0,1,...,L−1.

=

ρ

MtRhMH(S)

?

ρ

MtM(S)RhMH(S) + I

?−1

r. Obviously,

:= Rh−ˆh

=

R−1

h+

ρ

MtMH(S)M(S)

?−1

. In general, training design is

tr(Re). Here we aim at designing

S∈CN×Mtlog|Re|

s.t.

tr(SHS) = NMt,

max

S∈CN×Mt, tr(SHS)=NMtlog|R−1

The error-entropy is nothing but the Shannon-distortion func-

tion [6], which is also the distance between probability distri-

butions of the random variable h and its estimatorˆh. Thus,

it provides an accurate measure of mismatch between h and

ˆh as random variables. Therefore, it is also robust against the

uncertainties with the correlation matrix Rh.

h+

ρ

MtMH(S)M(S)|. (1)

III. OPTIMIZED TRAINING SEQUENCE BY CONVEX

PROGRAMMING

The key result of this section is presented in the following

theorem.

3469

Page 3

Theorem 1: Suppose that there is Q ∈ CN×Mtof Mt

columns qi∈ CN, i = 1,...,Mtsuch that

QHFH

mF?qj]i,j=1,2,...,Mt= 0Mt

for 0 ≤ m < ? ≤ L − 1, and

QHQ = [qH

iqj]i,j=1,2,...,Mt= IMt.

Under singularvalue

Rt?

=U?Λ?UH

Rr?

Λ?= diag(λ?1,...,λ?Mt), Υ?= diag(γ?1,...,γ?Mr), ? =

0,1,...,L−1, the optimization problem in (1) in S ∈ CN×Mt

is equivalent to the following SDP in X ∈ CMt×Mt:

L−1

?

The optimal solution Soptof (1) is obtained from the optimal

solution Xoptof (4) as Sopt= Q¯Xopt= QX1/2

A matrix Q of Mt columns qi, i = 1,2,...,Mt can be

constructed as follows. Take q1= (q1(0),q1(2),...,q1(N −

1))Twith |q1(i)| = 1/√N, i = 0,1,...,N − 1, so that

||q1|| = 1. Then qi, i = 2,...,Mt are defined from q1

by qi(k) = q1(k)WK(i−1)k

N

N ≥ LMt, K is chosen by K = ?N/Mt?.

On the other hand, a smaller value of N gives more freedom

for optimization because the role of¯X in (2) becomes more

relevant. However, for N < LMt, the convex optimization

problem (4) is only an upper bound of the nonconvex opti-

mization problem (1). For N even such that N ≥ 2L, the

choice K = L results in qH

iqj=

mF?Q = [qH

iFH

(2)

(3)

decompositions

=

(SVDs)

with

?,V?Υ?VH

?

max

0≤X∈CMt×Mt,tr{X}=NMt

?=0

log|R−1

t?⊗Υ−1

?+

ρ

MtX⊗IMr|.

(4)

opt.

, k = 0,1,...,N − 1. For

N−1

?

k=0

WK(j−i)k

N

= 1.

We end this section by complexity analysis for SDP (4)

in the matrix variable X of dimension Mt(Mt− 1)/2. It

can be solved by the existing SDP software for max-det

(such as YALMIP with SDP solvers [12]) in polynomial time

O((Mt(Mt+1)/2)2.5) dependent on Mtonly. As the number

Mt is typically low, its computational load is low as well.

Nevertheless, in the interest of a much faster computation,

specialized convex programming algorithms to find the solu-

tion of (4) are developed in the next sections.

IV. CLOSED-FORM ITERATIVE ALGORITHMS VIA TIGHT

TWO-SIDED BOUNDS

L−1

?

diag[γi,max]i=1,2,...,Mr, Υmin

where

γi,max

=max?=0,1,...,L−1Υ?(i,i),

min?=0,1,...,L−1Υ?(i,i), i = 1,2,...,Mr.

Theorem 2: In terms of Υmax, the objective function

in (4) is two-side bounded by Llog|R−1⊗ Υ−1

ρ

MtX ⊗ IMr| ≤

?=0

max+ρ

MtX⊗IMr|+

Mrlog(|Υmax|/|Υ?|)].

Define

R=

?=0

Rt?/L

and

Υmax

=

=

diag[γi,min]i=1,2,...,Mr,

γi,min

=

max+

L−1

?

log|R−1

t?⊗ Υ−1

?

+

ρ

MtX ⊗ IMr| ≤

Llog|R−1⊗Υ−1

L−1

?

?=0

[Mtlog(|R|/|Rt?|)−

Furthermore, in terms of Υmin, the objective function in

(4) is two-side bounded by Llog|R−1⊗ Υ−1

L−1

?

ρ

MtX ⊗ IMr| ≤ Llog|R−1⊗ Υ−1

L−1

?

In light of Theorem 2 and making SVD

min+

ρ

MtX ⊗

IMr| − Mr

?=0

log(|Υ?|/|Υmin|) ≤

L−1

?

min+

?=0

log|R−1

ρ

MtX ⊗ IMr| +

t?⊗ Υ−1

?

+

Mt

?=0

log(|R|/|Rt?|).

L−1

?

?=0

Rt?

=

UΛUH, Λ = diag(λ1,...,λMt) to facilitate the variable

change X = Udiag(y1,y2,...,yMt)UH, problem (4) can

be approximated by the following optimization problems in

yi≥ 0,i = 1,2,...,Mt,

Mt

?

with Υ = diag[γj]j=1,2,...,Mr∈ {Υmax,Υmin}.

Problem (5) can be solved in exactly the same manner by

using the iterative water filling procedure [13]. Moreover, we

can find an optimal solution in a closed-form expression to

its upper bound yi = max{Mt(μ − a−1

1

LMrλi

j=1

V. SIMULATION RESULTS

Considered in the simulation are MIMO systems with

uniform linear antenna arrays at both the transmitter

and receiver. For the ?th path cluster with path gain

σ2

and receiver can be presented by [14] [Rt?]m,n

σ?e−j2π|n−m|Δtcos(¯θt?)e−1

[Rr?]m,n= σ?e−j2π|n−m|Δrcos(¯θr?)e−1

where Δt=dt

receive antenna spacings, dtand drare the absolute antenna

spacings with wavelength λ =

¯θt?and¯θr?are the mean angle of departure from the transmit

array and the mean angle of arrival at the receive array.

σθt?= σθr?= 8.6◦are the cluster angle spreads perceived

by the transmitter and receiver. In figure legends, we refer

to problems (4) and its upper bound based water-filling

given at the end of Section IV as “min-error-entropy” and

“water-filling”, while “Υminbound”, and “Υmaxbound” are

referred to the water filling given at the end of Section IV

for (5) with Υ = Υmaxand Υ = Υmin. For comparison, the

performance of the best equi-power mode assigning S in (1)

by S =√NQ is also plotted. It is worthwhile to point out that

a typical MIMO-OFDM system employs M = 210= 1024

sub-carriers. So in all of the below examples, the multiplexing

rates N/M for training symbols within one OFDM block are

32/1024 = 1/64, 16/1024 = 1/128 and 8/1024 = 1/256,

which are quite low and clearly indications for advantage of

the proposed frequency multiplexing training schemes.

max

i=1yi=NMt

?Mt

i=1

Mr

?

j=1

log

?

λ−1

iγ−1

j

+

ρ

LMtyi

?−1

,

(5)

i)/ρ),0} with ai =

Mt

?

Mr

?

γjand μ > 0 is chosen so that

i=1

yi= NMt.

?, the spatial correlation matrices at the transmitter

=

2(2π|n−m|Δtsin(¯θt?)σθt?)2,

2(2π|n−m|Δrsin(¯θr?)σθr?)2,

λdenote the relative transmit and

λand Δr=dr

c

fcand carrier frequency fc,

3470

Page 4

Example 1: MIMO-OFDM systems equipped with Mt= 4

and Mt

= 6, Mr

= 2, N

(σ2

0.5λ,¯θt? = 13◦, ? = 0,...,4, (¯θr0,¯θr1,¯θr2,¯θr3,¯θr4) =

(290◦,300◦,315◦,320◦,335◦). Note that N ≥ LMt so the

SDP (4) is equivalent to (1). Figure 1 plots the error-entropy

of channel state estimation log(|πeRe|) versus SNR in training

phase. It can been seen from Figure 1 that all of our proposed

methods practically result in the same error-entropy. This

suggests that our approximate solution is highly accurate

and the bounds employed in the approximate optimization

problems are very tight. It can be clearly observed that our

proposed methods are significantly better than the equi-power

method. Moreover, the error-entropy is significantly reduced

as the number of transmit antennas increases from Mt= 4 to

Mt= 6.

Example 2: All parameters are the same as those in

Example 1, except the number of subcarriers used to transmit

training symbols is either N = LMt= 16 = 24or reduced to

N = 2L = 8 = 23. As can been seen from Figures 2, for the

same total transmitted power, the smaller number of training

symbols can provide better performance for low SNR region.

The reason is that, the smaller number of training symbols

N will lead to a higher signal to noise ratio for a given total

power. In the low SNR region, the performance improvement

in terms of the error-entropy or MSE linearly increases with

SNR.

= 25

= 32, L = 5,

0,σ2

1,σ2

2,σ2

3,σ2

4) = (0.3,0.2,0.2,0.15,0.15), dt = dr =

VI. CONCLUSIONS

We have considered the optimal design of training se-

quences for spatially correlated MIMO-OFDM channels,

which is formulated in terms of error-entropy minimization

between the channel state and channel output. Such a design

problem has been shown transformed into a semi-definite

program, which can be efficiently solved by standard SDP

solvers. Furthermore, we have also proposed highly precise

approximation algorithms with a significantly lower complex-

ity. The numerical results confirm that our proposed methods

outperform the previously proposed methods in terms of the

error-entropy.

REFERENCES

[1] E. Biglieri, R. Calderbank, A. Constantinides, A. Goldsmith, A. Paulraj,

and H. Poor, MIMO Wireless Communication. Cambridge University

Press, Cambridege, 2007.

[2] H. Zhang, Y. G. Li, A. Reid, and J. Terry, “Optimum training symbol

design for MIMO OFDM in correlated fading channels,” IEEE Trans.

Wireless Commun., vol. 5, pp. 2343–2347, Sep. 2006.

[3] H. D. Tuan, H. H. Kha, H. H. Nguyen, and V. J. Luong, “Optimized

training sequences for spatially correlated MIMO-OFDM,” IEEE Trans.

Wireless Communication, vol. 19, pp. 2768–2778, Sept. 2010.

[4] A. Gersho and R. Gray, Vector Quantization and Signal Compression.

Springer, 1992.

[5] P. Viola and W. Well, “Alignment by maximization of mutual informa-

tion,” Int. J. of Computer Vision, vol. 24, pp. 137–154, Feb. 1997.

[6] T. M. Cover and J. A. Thomas, Elements of Information Theory. New

York: Wiley, 1991.

[7] M. Biguesh, S. Gazor, and M. Shariat, “Optimal training sequence for

MIMO wireless systems in colored environments,” IEEE Trans. Signal

Processing, vol. 57, pp. 3144–3153, August 2009.

[8] B. Hassibi and B. M. Hochwald, “How much training is needed in

multiple-antenna wireless links,” IEEE Trans. Inform. Theory, vol. 49,

pp. 951–963, Apr. 2003.

[9] H. B¨ olcskei, “Principles of MIMO-OFDM wireless systems,” CRC

Handbook on Signal Processing for Communications, M. Ibnkahla, Ed.,

2004.

[10] Y. G. Li, N. Seshadi, and S. Ariyavisitakul, “Channel estimation for

OFDM systems with transmitter diversity in mobile wireless channels,”

IEEE J. Select. Areas Commun., vol. 17, pp. 461–471, Mar. 1999.

[11] H. D. Tuan, V. Luong, and H. H. Nguyen, “Optimization of training

sequences for spatially correlated MIMO-OFDM,” in Proc. 2009 IEEE

Conf. on Acoustics, Speech and Signal Processing (ICASSP), (Taipei,

Taiwan), pp. 2681–2684, Apr. 2009.

[12] J. Lofberg, “Yalmip : A toolbox for modeling and optimization in

MATLAB,” in Proceedings of the CACSD Conference, (Taipei, Taiwan),

2004.

[13] V. Nguyen, H. D. Tuan, H. H. Nguyen, and N. N. Tran, “Optimal

superimposed training design for spatially correlated fading MIMO

channels,” IEEE Trans. Wireless Commun., vol. 7, pp. 3206–3217, Aug.

2008.

[14] H. B¨ olcskei, D. Gesbert, and A. J. Paulraj, “On the capacity of OFDM-

based spatial multiplexing systems,” IEEE Trans. Commun., vol. 50,

pp. 225–234, Feb. 2002.

-500

-450

-400

Error-Entropy

-10-505 10 1520

-1050

-1000

-950

-900

SNR (dB)

Min-error-entropy

Water-filling

?min bound

?max bound

Equi-power

Mt=4, Mr=2

Mt=6, Mr=2

Fig. 1.

different methods.

Error-entropy of MIMO-OFDM channel state estimation with

05 10

Total Power Ptotal (dB)

1520 2530

-240

-220

-200

-180

-160

-140

Error-Entropy

N=16

N=8

Fig. 2.

lengths.

Error-entropy of channel state estimation with different training

3471