Research Article

A Low-Complexity Decision Feedforward Equalizer Architecture for High-Speed Receivers on Highly Dispersive Channels

Ariel L. Pola,1 Juan E. Cousseau,2 Oscar E. Agazzi,3 and Mario R. Hueda1

1 Laboratorio de Comunicaciones Digitales, Universidad Nacional de Córdoba, CONICET, Avenida Vélez Sarsfield 1611, Córdoba X5016GCA, Argentina
2 Universidad Nacional del Sur, IIEE, CONICET, Avenida Alem 1253, Bahía Blanca B8000CPB, Argentina
3 ClariPhy Communications, Inc., 7585 Irvine Center Drive, Suite 100, Irvine, CA 92618, USA

Correspondence should be addressed to Ariel L. Pola; arielpola@gmail.com

Received 10 December 2012; Revised 14 February 2013; Accepted 18 February 2013

Academic Editor: Antonio G. M. Strollo

This paper presents an improved decision feedforward equalizer (DFFE) for high speed receivers in the presence of highly dispersive channels. This decision-aided equalizer technique has been recently proposed for multigigabit communication receivers, where the use of parallel processing is mandatory. Well-known parallel architectures for the typical decision feedback equalizer (DFE) have a complexity that grows exponentially with the channel memory. Instead, the new DFFE avoids that exponential increase in complexity by using tentative decisions to cancel iteratively the intersymbol interference (ISI). Here, we demonstrate that the DFFE not only allows to obtain a similar performance to the typical DFE but it also reduces the complexity in channels with large memory. Additionally, we propose a theoretical approximation for the error probability in each iteration. In fact, when the number of iteration increases, the error probability in the DFFE tends to approach the DFE. These benefits make the DFFE an excellent choice for the next generation of high-speed receivers.

1. Introduction

Future generation of communication systems will operate at multigigabit-per-second data rates on highly dispersive channels [1, 2]. In commercial applications, the digital receiver is often implemented as a monolithic chip in CMOS technology [1]. Maximum clock frequency of state-of-the-art complex digital signal processors in 28 nm CMOS technology is limited to frequencies lower than 1 GHz. Therefore, in order to achieve multigigabit-per-second data rates, parallel processing techniques are required [1].

Maximum likelihood sequence detection (MLSD) and decision feedback equalization (DFE) are two efficient techniques used to compensate the high ISI introduced by such channels as the ones described in [3]. The complexity of the former grows exponentially with the channel memory, regardless of whether parallel processing is used or not. As for the latter, although the complexity of serial implementations grows linearly with channel memory, all presently known parallel processing implementations require that the bottleneck created by the feedback loop be broken using techniques like the ones proposed by [4–6], whose complexity again grows exponentially with the channel memory.

Some algorithms to deal with the drawbacks of the DFE in high-speed applications and parallel processing have been proposed by [4–12]. For example, parallel DFE architectures based on look-ahead pipelined multiplexer loops have been introduced in [6, 7]. These architectures can mitigate the speed limitation of feedback loops by using nested multiplexer loops where the implementation is reported in [10]. Some further improvements to these schemes have been proposed in [8, 9]. However, the implementation complexity of DFE parallel architectures based on look-ahead pipelined multiplexer loops still increases exponentially with the number of feedback taps. Recent works [11, 12] present the concurrent look-ahead technique for high-speed data rate.
scheme reduces the hardware complexity in comparison with a look-ahead pipelined multiplexer loops technique, but the decision loop is not broken.

Iterative interference cancellation and turbo equalization have received increasing attention in recent years [13]. For example, iterative cancellation is proposed in [14–17] where nonlinear equalizers for ISI channels are introduced. This technique uses an iterative algorithm to successively cancel ISI from a block of received data. The algorithm generates symbol decisions whose reliability increases monotonically with each iteration. According to these authors, so far these techniques have not been applied to create efficient pipelined and parallel-processing implementations of equalizer structures for ultra-high-speed applications despite its interesting characteristics. Therefore, the application of both DFE and MLSD is limited to moderate ISI channels. As a consequence, there is a need for reduced-complexity receivers which can operate efficiently on channels with large ISI.

A preliminary study of a new low-complexity iterative equalization architecture for high-speed receivers is introduced in [18]. The decision feedforward equalizer (DFE) allows to obtain similar performance to DFE with a parallelizable architecture, whose complexity increases only quadratically with the channel memory. For channels with large ISI this results in a dramatic complexity reduction if compared with DFE. The central idea behind DFFE is the iteration of tentative decisions to improve the accuracy of the ISI estimation. We would like to highlight that tentative decisions have been used in the past to cancel FEXT interference [19].

Finally, the error probability in the DFE has been widely discussed in the literature with numerous authors who develop different methods to estimate the error probability in DFE [20–24].

In this work, we explain the concept of DFFE and its implementation complexity to parallel architectures. Moreover, we propose a theoretical approximation for the error probability in each iteration, where it is easy to approximate that when the number of iteration increases the error probability in the DFFE tends to approach the DFE.

This paper is organized as follows. The concept of DFFE is explained in Section 2. In Section 3 the performance evaluation is researched. Section 4 analyzes parallel architectures for DFFE and implementation complexity. Finally, conclusions are drawn in Section 5.

2. Decision Feedforward Equalization (DFFE)

To begin with, we will explain the concept of DFFE. For simplicity, we only consider a dispersive channel with postcursor ISI. Our results can be generalized to channels with both pre- and postcursor ISIs by combining the DFFE with a feedforward equalizer [3]. Let $y_n$, $\hat{a}_n$, and $L$ be the DFFE input sample, the tentative decision at the $i$th iteration, and the memory of the channel, respectively. At the first iteration, $i = 0$, we get the first tentative decision without any cancellation of interference:

$$\hat{a}_n^{(0)} = \mathcal{Q}(y_n),$$

where $\mathcal{Q}(\cdot)$ is the slicer function. This tentative decision can be then used to cancel the postcursor ISI introduced by the first past symbol and thus to improve the accuracy of the detection. By using proper time delays, we can obtain the tentative decision at the second iteration as follows:

$$\hat{a}_n^{(1)} = \mathcal{Q}(y_n - f_1(\hat{a}_{n-1}^{(0)})),$$

where $f_k(\cdot)$ with $0 < k < L$ denotes the partial postcursor ISI caused by the past $k$ symbols. This process is repeated at least until $L$ consecutive tentative decisions are available. At this point, a final decision can be obtained from

$$\hat{a}_n = \hat{a}_n^{(L)} = \mathcal{Q}(y_n - f_L(\hat{a}_{n-L}^{(L-1)}, ..., \hat{a}_{n-1}^{(0)})),$$

where $f_L(\cdot)$ is the total postcursor ISI of the channel. Based on an information theory metric [25], in this work we show that the reliability of the tentative decision $\hat{a}_n^{(i)}$ improves as the number of iteration $i$ grows. In this way, both the accuracy of the interference estimate and the performance of the DFFE are improved with the number of iterations. Numerical results derived from computer simulations demonstrate that the DFFE can achieve performance similar to the DFE on highly dispersive channels. Furthermore, since tentative decisions are used instead of final decisions to estimate the postcursor ISI, it is possible to implement the DFFE in a feedforward way, which leads to a direct parallel implementation. We show that the computational complexity of the DFFE grows quadratically with $L$. This results in a drastic complexity reduction in comparison to parallel architectures for the DFE where the computational load grows exponentially with $L$. This favorable tradeoff between performance and complexity makes the DFFE an excellent alternative for implementing high-speed receivers in transmissions over highly dispersive channels.

As we expressed above, the iterative use of tentative decisions to estimate the postcursor ISI is the key to DFFE. In the following section, we use the mutual information [25] to show how the iterations impact the reliability of the tentative decisions. In addition, we study the DFFE performance in transmissions over channels with high memory.

2.1. Architecture of DFFE. The received sample is given by

$$y_n = a_n + \sum_{k=1}^{L} a_{n-k} d_k + z_n,$$

where $d_k$ with $k = 1, \ldots, L$ is the postcursor ISI tap, $a_n$ is the transmitted symbol (e.g., $a_n \in \{\pm 1\}$), and $z_n$ is white Gaussian noise with power $\sigma^2$. Assuming that the channel is known at the receiver (i.e., perfect channel estimation), the detected symbol provided by the DFFE at instant $n$ given by (3) can be rewritten as

$$\hat{a}_n = \hat{a}_n^{(R-1)} = \mathcal{Q}(y_n - \sum_{k=1}^{L} \hat{a}_{n-k}^{(R-1-k)} d_k),$$
where $R$ with $R > L$ is the total number of iterations. The first $L$ tentative decisions are calculated iteratively as follows:

$$
\hat{a}_n^{(i)} = \mathcal{Q} \left( y_n - \sum_{k=1}^{i} \hat{a}_n^{(i-k)} d_k \right), \quad 1 \leq i < L, 
$$

with $\hat{a}_n^{(0)} = \mathcal{Q}(y_n)$ for $i = 0$.

Figure 1 shows the architecture of the DFFE for a channel with memory $L = 3$ and $R = 5$. Note that the final decision $\hat{a}_n = \hat{a}_n^{(4)}$ uses past tentative decisions to estimate the postcursor interference, and not previous final decisions as in the DFE. As we will show later, this fact allows the direct parallel implementation of the DFFE.

2.2. Reliability of the Tentative Decisions. Next, we analyze the mutual information between the transmit symbol $a_n$ and the tentative decision at the $i$th iteration, $\hat{a}_n^{(i)}$, defined by

$$
I \left( a_n, \hat{a}_n^{(i)} \right) = H(a_n) - H(a_n | \hat{a}_n^{(i)}),
$$

where $H(\cdot)$ and $H(\cdot | \cdot)$ denote entropy and conditional entropy, respectively [25]. Note that $I(a_n, \hat{a}_n^{(i)})$ is the information on $a_n$ contained in $\hat{a}_n^{(i)}$. For example, for binary transmit symbols, $I(a_n, \hat{a}_n^{(i)}) = 1$ indicates that no error occurs in the tentative decisions (i.e., $\Pr(\hat{a}_n^{(i)} = a_n) = 1$). On the other hand, in the presence of a high error rate in the tentative decisions (i.e., $\Pr(\hat{a}_n^{(i)} \neq a_n) = 1$), the mutual information gets $I(a_n, \hat{a}_n^{(i)}) = 0$. Thus, it can be concluded that the mutual information (7) provides a measure of the reliability of the tentative decision $\hat{a}_n^{(i)}$.

2.3. Numerical Results. Figure 2(a) depicts the mutual information versus the signal-to-noise ratio (SNR), defined as $\text{SNR} = E(|a_n|^2)/\sigma^2$. We consider $a_n \in \{ \pm 1 \}$ and a postcursor ISI channel modeled as

$$
d_k = \begin{cases} 
\alpha^k, & 0 < k \leq L, \\
0, & \text{otherwise}
\end{cases}
$$

with $\alpha$ being a positive number smaller than one. In Figure 2(a) we consider $\alpha = 0.6$ with $L = 10$ and a DFFE with
Iterative (3. Performance Evaluation)

Figure 2: Reliability of the DFFE tentative decisions. (a) Mutual information versus SNR for $\alpha = 0.6$, $L = 10$, and $R = 11$. (b) Mutual information versus number of iterations for different postcursor channels with SNR = 15 dB.

Let $\mathbf{y}_{n}^{(i)}$ be the DFFE-state vector at the $i$th iteration defined by

$$
\mathbf{y}_{n}^{(i)} = \begin{cases} 
(a_{n-1}, a_{n-2}, \ldots, a_{n-L}), & i = 0, \\
(a_{n-1}, a_{n-2}, \ldots, a_{n-L}, \tilde{a}_{n-1}, \tilde{a}_{n-2}, \ldots, \tilde{a}_{n-L}), & 0 < i < L, \\
(a_{n-1}, a_{n-2}, \ldots, a_{n-L}, \tilde{a}_{n-1}, \tilde{a}_{n-2}, \ldots, \tilde{a}_{n-L,1}, \tilde{a}_{n-2}, \ldots, \tilde{a}_{n-L,2}, \tilde{a}_{n-L}), & i \geq L.
\end{cases}
$$

(10)

Let $N_i$ denote the dimension of the state vector $\mathbf{y}_{n}^{(i)}$. Thus, observe that

$$
\mathbf{y}_{n}^{(i)} \in \left\{\psi^{(i,0)}, \psi^{(i,1)}, \ldots, \psi^{(i,2^{N_i}-1)}\right\},
$$

(11)

where $\psi^{(i,0)} = (+1, +1, \ldots, +1)$, $\psi^{(i,1)} = (+1, +1, \ldots, -1)$, $\psi^{(i,2^{N_i}-1)} = (-1, -1, \ldots, -1)$, are $N_i$-dimensional vectors. The slicer input signal at the $i$th iteration given by (9) can be rewritten as

$$
y_{n}^{(i)} = g(a_{n}, \mathbf{y}_{n}^{(i)}) + z_{n},
$$

(12)

where

$$
g(a_{n}, \mathbf{y}_{n}^{(i)}) = \begin{cases} 
a_{n} + \sum_{k=1}^{L} a_{n-k} d_{k}, & i = 0, \\
a_{n} + \sum_{k=1}^{L} a_{n-k} d_{k} - \sum_{k=1}^{i} a_{n-k} d_{k} + z_{n}, & 0 < i < L, \\
a_{n} + \sum_{k=1}^{L} a_{n-k} d_{k} - \sum_{k=1}^{i} a_{n-k} d_{k} + z_{n}, & i \geq L.
\end{cases}
$$

(13)

Then, the probability density function (pdf) given the transmit symbol $a_{n}$ can be expressed as

$$
f_{y|a}(y_{n}^{(i)} \mid a_{n}) = \sum_{k=0}^{2^{N_i}-1} f_{y|\mathbf{y}}(y_{n}^{(i)} \mid a_{n}, \psi^{(i,k)}) P(\psi^{(i,k)}),
$$

(14)

3. Performance Evaluation

From (4) and (5), the slicer input signal at the $i$th iteration, $y_{n}^{(i)}$, can be expressed as

$$
y_{n}^{(i)} = \begin{cases} 
a_{n} + \sum_{k=1}^{L} a_{n-k} d_{k} + z_{n}, & i = 0, \\
a_{n} + \sum_{k=1}^{L} a_{n-k} d_{k} - \sum_{k=1}^{i} a_{n-k} d_{k} + z_{n}, & 0 < i < L, \\
a_{n} + \sum_{k=1}^{L} a_{n-k} d_{k} - \sum_{k=1}^{i} a_{n-k} d_{k} + z_{n}, & i \geq L.
\end{cases}
$$

(9)
where \( P(\psi^{(i,k)}) = \Pr \{ \Psi_n = \psi^{(i,k)} \} \) and
\[
f_{y|a,\Psi}(y^{(i)} | a_n, \psi^{(i,k)}) = \frac{1}{\sqrt{2\pi}\sigma} e^{-(1/2\sigma^2)(y_n^{(i)} - g(a_n, \psi^{(i,k)}))^2}.
\]
(15)

The symbol error probability at the \( i \)th iteration is
\[
P_e^{(i)} = \Pr \{ y_n^{(i)} < 0 \mid a_n = +1 \} \Pr \{ a_n = +1 \} \\
+ \Pr \{ y_n^{(i)} \geq 0 \mid a_n = -1 \} \Pr \{ a_n = -1 \}.
\]
(16)

Note that \( \Pr \{ y_n^{(i)} < 0 \mid a_n = +1 \} \) and \( \Pr \{ y_n^{(i)} \geq 0 \mid a_n = -1 \} \) can be computed by using the pdf given by (14).

3.1. Example. In the following equations we consider a postcursor channel with \( L = 1 \) and \( d_1 = 1 \) (i.e., a duobinary channel). At the first iteration, we get
\[
\Psi_n^{(0)} = (a_{n-1}),
\]
(17)
\[
g(a_n, \Psi_n^{(0)}) = a_n + a_{n-1}.
\]
(18)

Note that \( N_f = 1 \) and
\[
\Psi_n^{(0)} \in \{ \psi^{(0,0)}, \psi^{(0,1)} \}
\]
(19)
with \( \psi^{(0,0)} = (+1) \) and \( \psi^{(0,1)} = (-1) \). The transmit symbols are assumed independent and identically distributed with
\[
\Pr \{ a_n = +1 \} = \Pr \{ a_n = -1 \} = \frac{1}{2} \quad \forall n.
\]
(20)

In this situation, from (17) and (19) note that
\[
P(\psi^{(0,k)}) = \frac{1}{2}, \quad k = 0, 1.
\]
(21)

The error probability \( P_e^{(0)} \) can be derived from (16) and
\[
f_{y|a,\Psi}(y^{(0)} | a_n, \psi^{(0,k)}) = \frac{1}{2} \sum_{k=0}^{1} f_{y|a,\Psi}(y^{(0)} | a_n, \psi^{(0,k)}).
\]
(22)

At the second iteration, we get
\[
\Psi_n^{(1)} = (a_{n-1}, \bar{a}_{n-1}^{(0)}),
\]
(23)
\[
g(a_n, \Psi_n^{(1)}) = a_n + a_{n-1} - \bar{a}_{n-1}^{(0)}.
\]
(24)

In this case, notice that \( N_f = 2 \) and
\[
\Psi_n^{(1)} \in \{ \psi^{(1,0)}, \psi^{(1,1)}, \psi^{(1,2)}, \psi^{(1,3)} \}
\]
(25)
with \( \psi^{(1,0)} = (+1, +1), \psi^{(1,1)} = (+1, -1), \psi^{(1,2)} = (-1, +1), \) and \( \psi^{(1,3)} = (-1, -1) \). From (20) and (23), we get
\[
\Pr \{ \Psi_n^{(1)} \} = \Pr \{ a_{n-1}, \bar{a}_{n-1}^{(0)} \} \\
= \Pr \{ \bar{a}_{n-1}^{(0)} \mid a_{n-1} \} \Pr \{ a_{n-1} \} \\
= \frac{1}{2} \Pr \{ \bar{a}_{n-1}^{(0)} \mid a_{n-1} \}.
\]
(26)

Since
\[
\Pr \{ \bar{a}_{n-1}^{(0)} \mid a_{n-1} \} = P_e^{(0)},
\]
(27)
with \( P_e^{(0)} \) being the symbol error probability of the first iteration, the probability (26) results
\[
P(\psi^{(1,0)}) = P(\psi^{(1,3)}) = \frac{1}{2} \left( 1 - P_e^{(0)} \right),
\]
(28)
\[
P(\psi^{(1,1)}) = P(\psi^{(1,2)}) = \frac{1}{2} P_e^{(0)}.
\]
(29)

Generalizing, for \( i > 0 \) it is possible to show that
\[
P(\psi^{(i,0)}) = P(\psi^{(i,3)}) = \frac{1}{2} \left( 1 - P_e^{(i-1)} \right),
\]
(30)
\[
P(\psi^{(i,1)}) = P(\psi^{(i,2)}) = \frac{1}{2} P_e^{(i-1)}.
\]
(31)

On the other hand, taking into account that
\[
g(a_n, \psi^{(i,0)}) = g(a_n, \psi^{(i,3)}) = a_n,
\]
(32)
\[
g(a_n, \psi^{(i,1)}) = a_n + 2,
\]
(33)
\[
g(a_n, \psi^{(i,2)}) = a_n - 2,
\]
(34)

it is possible to verify that
\[
f_{y|a,\Psi}(y^{(i)} | a_n, \psi^{(i,k)}) = f_{y|a,\Psi}(y^{(i)} | a_n, \psi^{(i,k)})
\]
(35)
\[
= \frac{1}{\sqrt{2\pi}\sigma} e^{-(1/2\sigma^2)(y_n^{(i)} - g(a_n, \psi^{(i,k)}))^2}.
\]
(36)

Thus, at high SNR (i.e., \( 1/\sigma \gg 1 \)), from (19)–(33) it is possible to show that
\[
P_e^{(i)} = \frac{1}{2} \left[ \Pr \{ y_n^{(i)} < 0 \mid a_n = +1 \} + \Pr \{ y_n^{(i)} \geq 0 \mid a_n = -1 \} \right]
\]
(37)
\[
= Q \left( \frac{1}{\sigma} \right) + \frac{1}{2} P_e^{(i-1)} Q \left( \frac{1}{\sigma} \right) + \frac{1}{2} P_e^{(i-1)} Q \left( \frac{3}{\sigma} \right)
\]
(38)
\[
= Q \left( \frac{1}{\sigma} \right) + \frac{1}{2} P_e^{(i-1)},
\]
(39)

where
\[
Q(x) = \frac{1}{\sqrt{2\pi}} \int_{x}^{\infty} e^{-t^2/2} dt.
\]
(40)

Operating on the recursive form of the error probability (34), it is simple to verify that
\[
P_e^{(i)} \approx 2Q \left( \frac{1}{\sigma} \right), \quad i \gg 1.
\]
(41)
Since the error probability of the DFE with error propagation is given by [3]

\[ P_e^{\text{DFE}} = 2^L Q\left(\frac{1}{\sigma}\right), \]  

(37)

from (36) we can conclude that for a number of iterations sufficiently large, the performance of the DFFE in the presence of a duobinary channel (i.e., \( L = 1 \)) is reduced to that achieved by the DFE with error propagation. As we shall show later, the proper number of iterations depends strongly on both the noise power and the channel dispersion. Finally, we realize that the conclusions derived from this example can be extended for channels with memory \( L > 1 \).

3.2. Simulation Results. A theoretically based estimation of the error probability provides an effective tool for designing the DFFE parameters. The design process is simple and consists of two main steps.

(i) Estimate the number of taps for the feedforward and feedback filters according to the expected channel response (similarly to the design of the DFE).

(ii) Estimate the number of the DFFE iterations based on performance evaluation. This task can be also achieved by using computer simulations. As initial point, set \( R = L + 1 \).

Figure 3 shows the contour of the BER as a function of the SNR and the iteration number. In this case, we use a postcursor ISI channel defined by \( d_k = \alpha^k, 0 < k \leq L \) with \( \alpha = 0.5, L = 6, \) and \( R = 20 \). We can observe that the performance of the DFFE for \( R > 6 \) is similar in all iterations. Therefore, we conclude that DFFE with \( R = L + 1 \) achieves the same performance as the traditional DFE, as it can be verified from Figure 4. For the DFFE, note the excellent agreement between the values derived from computer simulations and the theoretical prediction given by (16).

The performance of the DFE and an adaptive DFFE with \( R = L + 1 \) iterations in the presence of different dispersive channels is evaluated in Figure 5. We consider four channels: \( \alpha = 0.6, 0.82, 0.92, \) and \( 0.95 \) with \( L = 10, 30, 60, \) and \( 100, \) respectively. The adaptive DFFE has been implemented with the least mean square (LMS) algorithm [3] by using the final decision to estimate the error signal. In all cases, it can be observed that DFFE and DFE achieve essentially the same performance. This result agrees with the theoretical analysis presented in the Appendix, where the impact of imperfect
while the number of registers was estimated based on their look-ahead techniques or multiplexer loops, and this reduces DFE proposals considered is that the former does not use the parallel DFE schemes were extracted from [4, 7, 9, 10]. The numbers of adders and 2-to-1 multiplexers for DFFE with the DFE architectures proposed in [4, 7, 9, 10]. The numbers of adders for the DFFE 

of components as functions of the number of feedback taps. The most important difference between the DFFE and the DFE proposals considered is that the former does not use look-ahead techniques or multiplexer loops, and this reduces the implementation complexity. In all the cases, the benefits of the DFFE are evident in the presence of highly dispersive channels (i.e., $L \gg 1$). A comparison of the complexity for $M$-PAM is shown in Table 3. We observe that the DFFE still provides a significant reduction of complexity with respect to the DFE architectures [7, 9]. (In $M$-PAM, multiplication operations are achieved by using $M - 1$ 2-to-1 muxes.) This conclusion can be extended to $M$-QAM where the complexity of both DFE and DFFE is approximately two times the one obtained with $M$-PAM.

### 4. Parallel Implementation and Complexity

#### 4.1. Parallel-Processing DFFE Architecture

As mentioned in Section 1, the DFFE breaks the bottleneck created by the feedback loop of the DFE using tentative decisions in a feedforward fashion. This enables pipelined implementations which are able to operate at high clock rates. Moreover, parallel processing can be used to further increase the throughput and achievable data rate of the DFFE-based receiver. A $P$-way parallel implementation is shown in Figure 6. Using this architecture, the data rate and throughput may be increased by a factor $P$ with growth in complexity linear in $P$.

#### 4.2. Complexity of DFFE

Table 1 shows the numbers of adders, registers, and multiplexers for the DFFE, computed under the following assumptions. The multipliers shown in Figure 1 were considered to be 2-to-1 multiplexers (it is assumed that both the positive and negative values of the coefficients $d_k$ are available), which is a correct assumption for binary decisions with values ±1 (e.g., 2-pulse amplitude modulation (PAM) [3]). The number of adders for the DFFE was estimated assuming that the basic building block is a two-input adder.

Table 2 presents a comparison of the complexity of the DFFE with the DFE architectures proposed in [4, 7, 9, 10]. The numbers of adders and 2-to-1 multiplexers for the parallel DFE schemes were extracted from [4, 7, 9], while the number of registers was estimated based on their architectures. Figure 7 shows the numbers of the three types of components as functions of the number of feedback taps. The most important difference between the DFFE and the DFE proposals considered is that the former does not use look-ahead techniques or multiplexer loops, and this reduces the implementation complexity. In all the cases, the benefits

| Adders | $L(R - L/2 - 1/2)P$ |
| Registers | $((R - 1)R/2 + (R - L)(L + 1)L/2 + (L^2 - 1)L/6)P$ |
| 2-to-1 Mux | $L(R - L/2 - 1/2)P$ |

#### 4.3. VLSI Implementation

We consider an application-specific integrated circuit (ASIC) implementation of the proposed DFFE in a 10 Gb/s 2-PAM receiver. The DFFE architecture was successfully synthesized (i.e., no timing issues) by using 28 nm CMOS technology with standard voltage threshold (SVT) transistors for $L = 5/10/30$, $P = 16$ ($f_{\text{clock}} = 625.0$ MHz), and $P = 32$ ($f_{\text{clock}} = 312.5$ MHz) with $R = L + 1$ iterations. Multiplication operations were implemented by using 2-to-1 multiplexers. The number of bits of the input samples ($N_i$) and taps ($N_j$) has been derived from computer simulations for the different postcursor channels (i.e., $L = 5/10/30$). We used $N_i = 7$ and $N_j = 7$ for $L = 5$ and 10. For $L = 30$, the number of bits of the input samples was increased to $N_i = 8$ (see Figure 8). Adders were implemented with carry propagation, thus $N_i + \log_2(L)$ bits are required to represent the sample at the slicer input. Finally, the slicer uses the MSB of the input sample to control the muxes in order to select the positive or negative coefficient.

Table 4 shows the total number of cells and components normalized to the values of $P = 16$ and $L = 5$. Note that these results agree very well with the expected values derived from the complexity analysis developed in Section 4.2; that is, the complexity increases linearly with the parallelization factor ($P$) and quadratically with the memory factor of channel $L$.

#### 4.4. Analysis of the Critical Path

The speed of the different DFE architectures are related to their critical paths. The existing parallel DFE architectures of [4, 9, 10] are faster than the DFFE. However, they are not considered for a speed comparison as a result of their prohibitive high implementation complexity in the presence of channels with high ISI ($L \gg 1$). On the other hand, the critical path of the less complex DFE solution proposed in [7] is given by $T_{\text{DFE-7}} \approx (1/(L/2+1))T_{\text{add}} + \log_2(M)T_{\max}$ for $M$-PAM, where $T_{\text{max}}$ and $T_{\text{add}}$ are the multiplexer and adder delays, respectively. Note that $T_{\text{DFE-7}}$ is independent of the channel memory $L$. For example, for 28 nm CMOS technology, $T_{\max} \approx 0.05$ ns and $T_{\text{add}} \approx 0.10$ ns; therefore, the maximum data rates with $P = 1$ for 2-PAM and 4-PAM are $\sim 17.8$ and $18.8$ Gb/s, respectively.

The critical path for the DFFE is shown in Figure 1. Notice that the delay of the critical path given by $T_{\text{DFFE}} \approx LT_{\text{add}} + \log_2(M)T_{\max}$ increases linearly with the memory channel. As it is shown in Section 4.3, no timing issues have been observed with $L = 30$ and $P = 32$ for 2-PAM with $f_{\text{clock}} = 312.5$ MHz by using 28 nm CMOS technology. Thus, the maximum data rates achieved by the DFFE for 2-PAM and 4-PAM are 10 and $\sim 20$ Gb/s (since $L \gg 1$ and $T_{\max} < T_{\text{add}}$, note that $T_{\text{DFFE}}$ is dominated by the term $LT_{\text{add}}$). Therefore
combined with traditional linear feedforward equalizers or more, owing to the DFFE flexibility, the architecture can be required to operate over highly dispersive channels. Furthermore, owing to the DFFE flexibility, the architecture can be combined with traditional linear feedforward equalizers or

Table 2: Complexity comparison between parallel DFFE and DFE architectures for 2-PAM with \( R = L + 1 \) for \( L \gg 1 \).

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Adders</td>
<td>( L^2P/2 )</td>
<td>( 2^3P )</td>
<td>( 2^{L/2+1}P )</td>
<td>( 2^3P )</td>
<td>( 2^{L+1}P )</td>
</tr>
<tr>
<td>Registers</td>
<td>( L^3P/6 )</td>
<td>( \sim 2^4P )</td>
<td>( \sim 2^{L/2}(P+1) )</td>
<td>( L^2 + 2^P )</td>
<td>( (2^L + L)P )</td>
</tr>
<tr>
<td>2-to-1 multiplexers</td>
<td>( L^2P/2 )</td>
<td>( (2^L - 1)P )</td>
<td>( 2^{L/2 - 1}2P )</td>
<td>( 2^L(P - L/2 + P/L - 1) )</td>
<td>( 2^L )</td>
</tr>
</tbody>
</table>

Table 3: Complexity comparison between parallel DFFE and DFE architectures for M-PAM with \( R = L + 1 \) for \( L \gg 1 \).

<table>
<thead>
<tr>
<th>Component</th>
<th>DFFE (this work)</th>
<th>DFE [7]</th>
<th>DFE [9]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Adders</td>
<td>( L^2P/2 )</td>
<td>( M^{L/2}2P )</td>
<td>( M^4P )</td>
</tr>
<tr>
<td>Registers</td>
<td>( L^3P/6 )</td>
<td>( \sim M^{L/2}(P+1) )</td>
<td>( L^2 + M^4P )</td>
</tr>
<tr>
<td>2-to-1 multiplexers</td>
<td>((M - 1)L^2P/2 )</td>
<td>( (M^{L/2} - 1)2P )</td>
<td>( M^4L(P - L/2 + P/L - 1) )</td>
</tr>
</tbody>
</table>

Table 4: Synthesis results for parallel DFFE architecture for 2-PAM and \( R = L + 1 \) with 28 nm CMOS technology.

<table>
<thead>
<tr>
<th>( f_{\text{clock}} ) (MHz)</th>
<th>( P )</th>
<th>( L )</th>
<th>Number of cells</th>
<th>Number of components</th>
</tr>
</thead>
<tbody>
<tr>
<td>625.0</td>
<td>16</td>
<td>5</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>625.0</td>
<td>16</td>
<td>10</td>
<td>5.19</td>
<td>4.18</td>
</tr>
<tr>
<td>312.5</td>
<td>32</td>
<td>5</td>
<td>1.96</td>
<td>2.00</td>
</tr>
<tr>
<td>312.5</td>
<td>32</td>
<td>10</td>
<td>10.03</td>
<td>9.62</td>
</tr>
<tr>
<td>312.5</td>
<td>32</td>
<td>30</td>
<td>180.65</td>
<td>159.00</td>
</tr>
</tbody>
</table>

\(^\dagger\) The total number of cells and components normalized to the values of \( P = 16 \) and \( L = 5 \).

the impact of the increase of the constellation size (2 \( \rightarrow \) 4 on the critical path will be small), respectively. On the other hand, for \( L = 30 \) the relative complexity of the DFE [7] with \( P = 1 (\propto 2^{M^{L/2}}) \) with respect to the DFFE with \( P = 32 (\propto 32(M - 1)L^2) \) is (a) \( 2 \times 2^{(30)^2}/(32 \times 30^3) = 2.28 \) for 2-PAM and (b) \( 2 \times 2^{(30)^2}/(32 \times 3 \times 30^3) = 2.49 \times 10^4 \) for 4-PAM. Therefore, the DFFE is able to provide high data rates (e.g., >10 Gb/s) by using existing CMOS technology with complexity implementation lower than that derived from the less complex parallel DFE proposed in [7].

5. Conclusions

In this paper we have proposed and analyzed the DFFE, a low-complexity iterative equalization architecture for high-speed receivers which uses tentative decisions in a feedforward way to estimate postcursor ISI. This central feature lends itself well to a simple parallel implementation, resulting in a reduction of complexity. Using typical examples, we show that DFFE allows to obtain a similar performance to DFE architecture. Moreover, we have proposed a theoretical approximation to estimate the error probability which allows us to demonstrate that the DFFE reaches the same performance as DFE when the number of iterations increases. These advantages make the DFFE an attractive choice for high-speed receivers required to operate over highly dispersive channels. Furthermore, owing to the DFFE flexibility, the architecture can be combined with traditional linear feedforward equalizers or

Viterbi algorithm (VA) [3] to compensate channel impairments in the presence of both pre- and postcursor ISI.

Appendix

Impact of Imperfect Channel Estimation

Since the DFFE is an attractive solution in the presence of channels with high ISI (i.e., \( L \gg 1 \)), it is possible to show that the impact of an imperfect channel estimation is similar in both equalizers, that is, DFE and DFFE. The received input sample \( y_n \) can be expressed as

\[
y_n = a_n + \sum_{k=1}^{L} a_{n-k} \Delta_k + z_n, \tag{A.1}
\]

where \( d_k \) with \( k = 1, \ldots, L \) is the postcursor ISI tap, \( a_n \) is the transmitted symbol, and \( z_n \) is white Gaussian noise with power \( \sigma^2 \). The signal (A.1) can be rewritten as

\[
y_n = a_n + \sum_{k=1}^{L} a_{n-k} \tilde{d}_k + z_n \tag{A.2}
\]

where \( \tilde{d}_k \) and \( \Delta_k \) denote the taps estimated at the receiver and the error estimation, respectively (i.e., \( d_k = \tilde{d}_k + \Delta_k \)). Since \( L \gg 1 \) and symbols \( a_n \) are assumed independent and identically distributed (iid), from the central limit theorem note that the term

\[
r_n = \sum_{k=1}^{L} a_{n-k} \Delta_k \tag{A.3}
\]

can be modeled as a zero mean Gaussian random variable with variance \( \sigma^2 \). Therefore, the signal at the input of the receiver with imperfect channel estimation can be seen as

\[
y_n = a_n + \sum_{k=1}^{L} a_{n-k} \tilde{d}_k + z_n. \tag{A.4}
\]
Figure 7: Number of adders, registers, and 2-to-1 multiplexers versus the number of feedback taps $L$, for the parallel DFFE with $R = L + 1$ and DFE architectures proposed in [4, 7, 9, 10]. Parallelization factor: $P = 16$. Modulation format: 2-PAM.

\[ \bar{z}_n = r_n + z_n \]  

(A.5)

is zero mean Gaussian noise with power $\sigma_r^2 + \sigma_z^2$. Thus, from (A.4) and (A.5) we can conclude that the impact of the imperfect channel estimation on the performance of DFE and DFFE will be similar.

Acknowledgments

This paper has been supported in part by the ANPCyT (PICT2008-1256, PRH-203), Fundación Tarpuy, and Fundación Fulgor.

References


