Reduced complexity MMSE detection for BLAST architectures
ABSTRACT Theoretical and experimental studies have shown that layered space-time architectures like the BLAST system can exploit the capacity advantage of multiple antenna systems in rich-scattering environments. We present a new efficient algorithm for detecting such architectures with respect to the MMSE criterion. This algorithm utilizes a sorted QR decomposition of the channel matrix and leads to a simple successive detection structure. The algorithm needs only a fraction of the computational effort compared to the standard V-BLAST algorithm and achieves the same bit error performance.
[show abstract] [hide abstract]
ABSTRACT: We investigate the use of multiple transmitting and/or receiving antennas for single user communications over the additive Gaussian channel with and without fading. We derive formulas for the capacities and error exponents of such channels, and describe computational procedures to evaluate such formulas. We show that the potential gains of such multi-antenna systems over single-antenna systems is rather large under independence assumptions for the fades and noises at different receiving antennas. 1 Introduction We will consider a single user Gaussian channel with multiple transmitting and/or receiving antennas. We will denote the number of transmitting antennas by t and the number of receiving antennas by r. We will exclusively deal with a linear model in which the received vector y 2 C r depends on the transmitted vector x 2 C t via y = Hx+ n (1) where H is a r Theta t complex matrix and n is zero-mean complex Gaussian noise with independent, equal variance real and imaginary p...04/2000;
Reduced Complexity MMSE Detection
for BLAST Architectures
Ronald B¨ ohnke, Dirk W¨ ubben, Volker K¨ uhn, and Karl-Dirk Kammeyer
Department of Communications Engineering
University of Bremen
D-28359 Bremen, Germany
Abstract—Theoretical and experimental studies have shown
that layered space-time architectures like the BLAST system can
exploit the capacity advantage of multiple antenna systems in
rich-scattering environments. In this paper, we present a new
efficient algorithm for detecting such architectures with respect
to the MMSE criterion. This algorithm utilizes a sorted QR
decomposition of the channel matrix and leads to a simple suc-
cessive detection structure. The algorithm needs only a fraction
of computational effort compared to the standard V-BLAST
algorithm and achieves the same bit error performance.
Index Terms—BLAST, MIMO systems, Zero-Forcing and
MMSE detection, wireless communication.
In rich-scattering environments the use of multiple antenna
systems provides an enormous increase in spectral efficiency
compared to single antenna systems . A multiple-input
multiple-output (MIMO) system that exploits this potential
is the V-BLAST (Vertical Bell Labs Layered Space-Time)
architecture proposed in . It uses a vertically layered coding
structure, where independent code blocks (called layers) are
associated with a particular transmit antenna. At the receiver,
these layers are detected by a successive interference cancella-
tion technique which nulls the interferers by linearly weighting
the received signal vector with a zero-forcing nulling vector
(ZF-BLAST). A very efficient detection algorithm utilizing the
QR decomposition of the channel matrix was proposed by the
authors in , . It jointly calculates an optimized detec-
tion order and the QR decomposition of the channel matrix
and is called ZF-SQRD (ZF Sorted QR Decomposition). An
adaption of the original ZF-BLAST to the MMSE criterion
was presented in  and a version with lower complexity was
introduced in .
In this paper, we propose an extension of the ZF-SQRD
algorithm to the MMSE solution called MMSE-SQRD. As
it does not always find the optimal detection order, a per-
formance degradation may occur. If this drawback is not
acceptable for the specific application, MMSE-SQRD can be
used as a pre-ordering for the optimal strategy, leading to the
ideal detection sequence with reduced computational effort.
This work was supported in part by the German ministry of education and
research (BMBF) under grant 01 BU 153.
The remainder of this paper is organized as follows. In
Section II, the system model is introduced. In Section III, sev-
eral ZF detection algorithms are reviewed. MMSE extensions
of these detection algorithms are investigated and the new
MMSE-SQRD is described in Section IV. The performances
of the different methods are compared in Section V and
concluding remarks can be found in Section VI.
II. SYSTEM DESCRIPTION
We consider a multiple antenna system with nT transmit
and nR≥ nT receive antennas as shown in Fig. 1. The data
is demultiplexed into nT substreams of equal length (called
layers). These substreams are mapped onto M-PSK or M-
QAM symbols s1,...,snTand simultaneously transmitted
over the nT antennas.
Fig. 1. Model of a MIMO system with nTtransmit and nRreceive antennas.
In order to describe the MIMO system, one time slot of
the time-discrete complex baseband model is investigated.
Let1s = [s1... snT]Tdenote the nT×1 vector of transmit
symbols, then the corresponding nR×1 receive signal vector
x = [x1... xnR]Tis given by
x = Hs + n .
In (1), n = [n1... nnR]Trepresents the white gaussian noise
of variance σ2
nobserved at the nRreceive antennas while the
1Throughout this paper, (·)Tand (·)Hdenote matrix transposition and
hermitian transposition, respectively. Furthermore Iα indicates the α × α
identity matrix and 0α,βdenotes the α × β all zero matrix.
average transmit power of each antenna is normalized to one,
The nR×nTchannel matrix H contains uncorrelated complex
gaussian fading gains with unit variance. We assume a flat-
fading environment, where the channel matrix H is constant
over a frame and changes independently from frame to frame
(block fading channel). The distinct fading gains are assumed
to be perfectly known by the receiver.
In order to detect the transmit signals at the receiver, it
would be optimal to use a maximum-likelihood (ML) detector.
As the computational effort is of order MnT, ML detection is
not feasible for real time implementations. Therefore, in the
following sections we present suboptimal detection schemes
with reduced complexity.
III. ZERO-FORCING DETECTION
In this section, different zero-forcing approaches to the
estimation of transmit signals in a V-BLAST architecture are
A. Linear Zero-Forcing Detector (ZF)
In a linear detector, the receive signal vector x is multiplied
with a filter matrix G, followed by a parallel decision on
all layers. Zero-forcing means that the mutual interference
between the layers shall be perfectly suppressed. This is
accomplished by the Moore-Penrose pseudo-inverse (denoted
by (·)+) of the channel matrix 
where we assumed that H has full column rank. The decision
step consists of mapping each element of the filter output
˜ sZF= GZFx = s +?HHH?−1HHn
onto an element of the symbol alphabet by a minimum
distance quantization. The estimation errors of the different
layers correspond to the main diagonal elements of the error
ΦZF= E?(˜ sZF− s)(˜ sZF− s)H?= σ2
which equals the covariance matrix of the noise after the
receive filter. It is obvious that small eigenvalues of HHH
will lead to large errors due to noise amplification. This effect
is especially observed in systems with the same number of
transmit and receive antennas. In fact, using a result from
random matrix theory , it can be shown that in the large
system limit for nT= nR→ ∞ the noise amplification tends
to infinity almost surely.
B. Zero-Forcing BLAST (ZF-BLAST)
In , a successive interference cancellation technique
based on the zero-forcing solution was proposed. Here, the sig-
nals are not detected in parallel, but one after another. Assume
that layer i yields the smallest estimation error or, equivalently,
the largest signal-to-noise ratio (SNR) after linear nulling of
the interference. From (4) and (5) it can be concluded that this
layer is associated with the row g(i)
euclidean norm, because this vector causes the smallest noise
enhancement. So, during the first step of the algorithm, only
the decision statistic
ZFof GZFthat has minimum
˜ si= g(i)
ZFx = g(i)
ZF(Hs + n) = si+ ηi
with the effective noise ηi= g(i)
ˆ sifor the transmit signal si. The interference caused by this
signal is then subtracted from the receive signal vector x and
the i-th column is removed from the channel matrix, leading
to a new system with only nT − 1 transmit antennas. This
procedure consisting of nulling and cancelling is repeated for
the reduced systems until all signals are detected.
Always choosing the layer with the best post detection SNR
certainly minimizes the risk of error propagation. Even more,
this ordering strategy also maximizes the SNR of the weakest
layer in the absence of detection errors and is therefore optimal
in the sense of minimum bit error probability .
The main computational bottleneck of the originally pro-
posed V-BLAST algorithm is the calculation of the pseudo-
inverse in each step of detection. This can be avoided using
one of the following schemes.
ZFn is used to find an estimate
C. Zero-Forcing BLAST with QR Decomposition (ZF-QRD)
It was shown in several publications (e.g. , , ) that
the BLAST algorithm can be restated in terms of the QR
decomposition of the channel matrix H, i.e.
H = QR ,
where the nR×nT matrix Q has orthogonal columns with
unit norm and the nT×nT matrix R is upper triangular. By
multiplying the received signal x with the hermitian transpose
of Q, the sufficient statistic
˜ s = QHx = Rs + η .
for the transmit vector s is obtained. Note that the statistical
properties of the noise term η = QHn remain unchanged.
Due to the upper triangular structure of R, the k-th element
of ˜ s is
˜ sk= rk,k· sk+
Thus, ˜ snTis free of interference and can be used to estimate
snTafter appropriate scaling with 1/rnT,nT. Proceeding with
˜ snT−1,..., ˜ s1 and assuming correct previous decisions, the
interference can be perfectly cancelled in each step. Then it
follows from (9) that the SNR of layer k is determined by the
diagonal element |rk,k|2.
As already mentioned, the detection sequence is crucial
due to the risk of error propagation. It can be modified by
permuting elements of s and the corresponding columns of H
prior to the QR decomposition, leading to different matrices
Q and R . In order to find the optimum sequence, |rk,k|,
which represents the length of the component of the column
vector hk that is perpendicular to the space spanned by
rk,i· si+ ηk.
h1,...,hk−1, needs to be maximized for k = nT,...,1. This
may be accomplished in a straight forward way by performing
. A far more efficient approach is based on the easily
T/2) different QR decompositions of permutations of H
GZF= H+= R−1QH
and the fact that the row norms of GZFequal those of R−1.
Keeping in mind that the signal snTis detected first and
recalling the optimal ordering criterion from Section III-B, the
last row of R−1must have minimum norm. If necessary, rows
of R−1as well as the corresponding columns of R have to be
exchanged at the expense of destroying the upper triangular
structure. However, by right multiplying the permuted version
of R−1with a proper unitary nT×nT Householder matrix
Θ, a block triangular matrix is achieved. Finally, Q has to be
updated to QΘ while the permuted R is left multiplied with
ΘH. These steps are then iterated for the upper left (nT−1)×
(nT− 1) submatrices of the such modified matrices R−1, R
and the first nT− 1 columns of the new matrix Q, resulting
in the QR decomposition of the optimally ordered channel
matrix H. In , a related algorithm that avoids explicit matrix
inversions is presented, but the version described here will
prove to be more useful later on.
The computational effort is made up of an initial QR de-
composition, the inversion of R, and the subsequent ordering,
which is dominated by the multiplications of R−1, R, and Q
with the Householder matrix Θ in each step. Although this is
much better than computing the pseudo-inverse over and over
again as in the original ZF-BLAST, a suboptimal algorithm
proposed by the authors  requiring only a single sorted QR
decomposition is reviewed in the next section.
D. Zero-Forcing Sorted QR Decomposition (ZF-SQRD)
In order to obtain the optimal detection order, first |rnT,nT|
has to be maximized over all possible permutations of the
columns of the channel matrix H, followed by |rnT−1,nT−1|,
and so on. Unfortunately, using standard algorithms for the QR
decomposition, the diagonal elements of R are calculated just
in the opposite order, starting with r1,1. This makes finding
the optimal order of detection such a difficult task.
The sorted QR decomposition (SQRD) algorithm presented
in  is basically an extension to the modified Gram-Schmidt
procedure  by reordering the columns of the channel
matrix prior to each orthogonalization step. The fundamental
idea is that |rk,k| is minimized in the order it is computed
(from 1 to nT) instead of being maximized in the order of
detection (from nTto 1). This is motivated by the fact that the
layers detected last affect only few other layers through error
propagation and may therefore have rather small SNR’s, which
increases the probability of large SNR’s for the first layers.
Now, r1,1 is simply the norm of the column vector h1, so
the first optimization in the SQRD algorithm consists merely
of permuting the column of H with minimum norm to this
position. During the following orthogonalization of the vectors
h2,...,hnTwith respect to the normalized vector h1, the first
row of R is obtained. Next, r2,2 is determined in a similar
fashion from the remaining nT − 1 orthogonalized vectors,
et cetera. Thereby, the channel matrix H is successively
transformed into the matrix Q associated with the desired
ordering, while the corresponding R is calculated row by row.
Note that the column norms have to be calculated only once
in the beginning and can be easily updated afterwards. Hence,
the computational overhead due to sorting is negligible.
It should be emphasized that SQRD does not always lead to
the perfect detection sequence, but in many cases of interest
the performance degradation is small compared to the reduced
complexity . Whenever SQRD fails to find the optimal
order, the algorithm from Section III-C can be applied without
having to calculate the initial QR decomposition again. In
other words, the computational effort of the optimum algo-
rithm can be decreased by using SQRD to perform a pre-
IV. MMSE DETECTION
The problem of noise enhancement through zero-forcing
has already been addressed. An improved performance can
be achieved by including the noise term in the design of
the linear filter matrix G. This is done by MMSE detection
schemes, where the filter represents a trade-off between noise
amplification and interference suppression.
A. Linear MMSE Detector (MMSE)
Minimizing the mean squared error (MSE) between the
actually transmitted symbols and the output of a linear detector
leads to the filter matrix 
GMMSE=?HHH + σ2
The resulting filter output is given by
˜ sMMSE= GMMSEx =?HHH + σ2
and, after some manipulations, the error covariance matrix is
found to be
?HHH + σ2
With the definition of a (nT+nR)×nTextended channel matrix
H and a (nT+ nR)×1 extended receive vector x through
the output of the MMSE filter given by (12) can be rewritten
HHx = H+x .
Furthermore, the error covariance matrix (13) becomes
Comparing (15) and (16) to the corresponding expression for
zero-forcing that can be found in (4) and (5), the only differ-
ence is that the channel matrix H has been replaced by H.
This observation is extremely important for incorporating the
MMSE criterion into the previously discussed ZF algorithms.
At first sight, the MMSE extension to V-BLAST seems
to follow from ZF-BLAST described in Section III-B by
simply employing the filter matrix GMMSE instead of GZF,
as proposed in . Although this yields the desired MMSE
solution in each detection step, the order is not necessarily
optimal. The reason for this is, that the row norms of G
only represent the noise amplification through filtering, which
is equivalent to the mean estimation error for zero-forcing.
In contrast, after MMSE detection there still remains some
residual interference, which, of course, also affects the output
signal. Therefore, the layer to be detected must have the
largest signal-to-interference-and-noise ratio (SINR), leading
to the minimal estimation error. From (16) it follows that this
layer corresponds to the row of the pseudo-inverse H+with
minimum norm. As GMMSEconsists of the first nRcolumns
of H+, the ordering criterion from  must be suboptimal.
A straight forward implementation of MMSE-BLAST
would calculate the filter GMMSE from (11) and the error
covariance matrix ΦMMSE from (13) in each detection step
for the respective reduced system. Fortunately, the similarity
of ZF and MMSE detection noted at the end of the last section
makes it possible to use the algorithm from Section III-C
again. To this end, consider the QR decomposition of the
extended channel matrix
where the (nT+nR)×nTmatrix Q with orthonormal columns
was partitioned into the nR×nT matrix Q1 and the nT×
nT matrix Q2. Interestingly, the inverse matrix R−1required
to find the optimal detection sequence does not need to be
calculated explicitly, because from (17) it follows that
= QR =
i.e. the inverse is a byproduct of the initial QR decomposition.
This exactly compensates for the additional computational
effort due to the additional rows of the extended channel
matrix H. It seems, that this relation has not been observed
QHH = QH
1H + σnQH
holds. Using (18) and (19), the filtered receive vector becomes
˜ s = QHx = QH
1x = Rs − σ2
nR−Hs + QH
The second term on the right hand side of (20) including the
lower triangular matrix R−Hconstitutes the remaining inter-
ference that can not be removed by the successive interference
Since Q2 is proportional to the inverse of R and Q1
represents the actual filter matrix, the matrices R, R−1, and
Q encountered in the ZF-QRD algorithm only have to be
substituted by R, Q2, and Q1, respectively, in order to get
the corresponding optimum MMSE solution.
C. MMSE Sorted QR Decomposition (MMSE-SQRD)
Previous publications only treated the sorted QR decom-
position in the zero-forcing sense , . However, utilizing
the extended channel matrix H, the results from Section III-
D can be adopted to the MMSE case similar to the optimum
sorting procedure. Since H initially contains a multiple of the
identity matrix in the last nT rows, only the first nR+ k
rows are considered during the k-th step of the MMSE-
SQRD algorithm. This leads to an additional simplification.
Furthermore, it ensures the upper triangular structure of Q2.
V. PERFORMANCE ANALYSIS
In the sequel, we investigate the bit error rates (BER)
for a MIMO system with nT = 4 transmit and nR = 4
receive antennas employing uncoded QPSK modulation. Eb
denotes the average energy per information bit arriving at the
receiver, thus Eb/N0= nR/(log2(M)σ2
the performance of various zero-forcing detection algorithms
and the BER of maximum-likelihood (ML) detection. As
expected, the successive detection algorithms outperform the
linear ZF detector. The impact of an optimized detection order
becomes obvious by comparing the unsorted ZF-QRD, the
ZF-SQRD and ZF-BLAST (achieving the optimum detection
sequence). ZF-SQRD results in a performance degradation of
1 dB compared to ZF-BLAST, as it does not always find the
optimum order. This degradation reduces for an increasing
number of receive antennas, e.g. for a system with nR = 6
the difference is only 0.5 dB for a BER of 10−5.
n) holds. Fig. 2 shows
0510 1520 2530
symbols, spectral efficiency of 8 Bit/s/Hz.
Simulation with nT = 4 and nR= 4 antennas, uncoded QPSK
For the same system, Fig. 3 shows the performance of the
MMSE detection algorithms. Comparing the simulation results
of the successive MMSE detection procedures with the ZF-
BLAST algorithm, a remarkable performance improvement
can be observed. Up to an SNR of 10 dB, the MMSE-SQRD
achieves the same performance as the optimal MMSE-BLAST
and also outperforms the detection scheme proposed in .
In many cases of interest, MMSE-SQRD would be the first
0510 1520 2530
MMSE detector from 
symbols, spectral efficiency of 8 Bit/s/Hz.
Simulation with nT = 4 and nR= 4 antennas, uncoded QPSK
choice for implementation due to the reduced complexity. Note
that for the (4,4) system, MMSE-SQRD only fails to find
the perfect detection sequence in about 20% of all channel
realizations. In these cases, the optimum ordering strategy
can be applied afterwards, thereby closing the increasing
performance gap for higher SNR.
MMSE-SQRD, first layer
MMSE-BLAST, first layer
MMSE-SQRD, last layer
MMSE-BLAST, last layer
Fig. 4.BER per layer for a (4,4) system without error propagation
In Fig. 4, the BER’s of the first and the last layer in the
absence of error propagation (genie case) are displayed for
MMSE-SQRD and MMSE-BLAST. It can be seen, that the
performance degradation is solely caused by the first detected
layer, while the BER of the last layer is the same for both
schemes. Hence, MMSE-SQRD may be appropriate as an
initial stage in iterative schemes.
We have reviewed several detection methods for V-BLAST
architectures. First, the ZF criterion was employed. For suc-
cessive algorithms, the importance of the detection sequence
was pointed out. Based on a QR decomposition of the channel
matrix, a way to find the optimal ordering without the need
of repeated calculations of pseudo-inverses was described.
Additionally, a very efficient sorting strategy proposed by the
authors was explained. Using an equivalence relation, these
results were adopted to the MMSE criterion, thus leading
from the formerly known ZF-SQRD to the new MMSE-SQRD
scheme. This algorithm performs a sorted QR decomposition
which can be used for subsequent detection. For those cases,
where MMSE-SQRD does not find the correct order, a reorder-
ing can easily be applied, thereby resulting in an optimum
algorithm with reduced complexity.
 E. Telatar, “Capacity of Multi-antenna Gaussian Channels,” European
Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595,
 P. W. Wolniansky, G. J. Foschini, G. D. Golden, and R. A. Valenzuela,
“V-BLAST: An Architecture for Realizing Very High Data Rates
Over the Rich-Scattering Wireless Channel,” in Proc. ISSE, Pisa, Italy,
 D. W¨ ubben, J. Rinas, R. B¨ ohnke, V. K¨ uhn, and K. D. Kammeyer,
“Efficient Algorithm for Detecting Layered Space-Time Codes,” in Proc.
ITG Conference on Source and Channel Coding, Berlin, Germany,
January 2002, pp. 399–405.
 D. W¨ ubben, R. B¨ ohnke, J. Rinas, V. K¨ uhn, and K. D. Kammeyer,
“Efficient Algorithm for Decoding Layered Space-Time Codes,” IEE
Electronic Letters, vol. 37, no. 22, pp. 1348–1350, October 2001.
 A. Benjebbour, H. Murata, and S. Yoshida, “Comparison of Ordered
Successive Receivers for Space-Time Transmission,” in Proc. IEEE
Vehicular Technology Conference (VTC), USA, Fall 2001.
 B. Hassibi, “An Efficient Square-Root Algorithm for blast,” in Proc.
IEEE Intl. Conf. Acoustic, Speech, Signal Processing, Istanbul, Turkey,
June 2000, pp. 5–9.
 S. Verdu, Muliuser Detection, 2nd ed.
University Press, 1998.
 J. Silverstein and Z. Bai, “On the Empirical Distribution of Eigenvalues
of a Class of Large Dimensional Random Matrices,” Journal of Multi-
variate Analysis, vol. 54, no. 2, pp. 175–192, 1995.
 E. Biglieri, G. Taricco, and A. Tulino, “Decoding Space-Time Codes
With BLAST Architectures,” IEEE Transactions on Signal Processing,
vol. 50, no. 10, pp. 2547–2551, October 2002.
 G. J. Foschini, G. D. Golden, A. Valenzela, and P. W. Wolniansky,
“Simplified Processing for High Spectral Efficiency Wireless Commu-
nications Emplying Multi-Element Arrays,” IEEE Journal on Selected
Areas in Commununications, vol. 17, no. 11, pp. 1841–1852, November
 G. Strang, Linear Algebra and its Applications, 3rd ed.
Florida: Harcout Brace Jovanovich College Publishers, 1988.
 S. B¨ aro, G. Bauch, A. Pavlic, and A. Semmler, “Improving BLAST
Performance using Space-Time Block Codes and Turbo Decoding,” in
Proc. IEEE Globecom 2000, San Francisco, CA, November 2000.
Cambridge, U.K.: Cambridge