Content uploaded by Elad Alon
Author content
All content in this area was uploaded by Elad Alon on Jul 30, 2015
Content may be subject to copyright.
A 60Gb/s 173mW Receiver Frontend in 65nm CMOS Technology
Jaeduk Han1, Yue Lu2, Nicholas Sutardja1, Kwangmo Jung1, Elad Alon1
1University of California, Berkeley, CA 94704, USA 2Qualcomm Atheros Inc., San Jose, CA 95110, USA
Abstract
This paper presents a 65nm CMOS 60Gb/s receiver frontend
incorporating CTLE, FFE, DFE, output slicers and clock
generation as well as distribution circuits. Current-integration
along with cascode gain control is used to maintain equalizer
linearity under varying gain without sacrificing power
consumption. Interleaved deserializing slicers achieve the high
gain required for adaptive error sampling. The receiver
operates error free over >1e12 bits at 60Gb/s, occupies
0.16mm2, and consumes 173mW.
Introduction
The tremendous growth in data traffic has placed
increasingly stringent demands on high-speed wireline
communication systems. Since the total power budget assigned
to I/O remains fixed, the energy-efficiency of such high-speed
transceivers must be improved as their data-rates are increased
to the 40-60Gb/s range. Several groups have demonstrated
56-80Gb/s DFE and CTLE circuits [1-3] to address these needs.
However, in addition to the high-speed equalization, support
for equalizer adaptation as well as clocking must ultimately be
included; integration of multiple equalizer types (including
FFE), along with these functions while retaining energy
efficiency is critical. This work (Fig. 1) therefore presents a
60Gb/s receiver frontend including all of these components
while dissipating a total of 173mW.
Receiver Architecture and Circuit Implementation
Multiple types of equalizers are necessary to achieve
sufficient signal quality over a realistic channel operating at
60+Gb/s. Thus, in this design, three types of equalizers are
combined to cancel various types of ISI, as shown in Fig. 2.
First, the input ports are connected to a CTLE that removes
long-tail ISI and de-multiplexes the input signal for the
half-rate operation of subsequent stages. After the
CTLE+DMUX, a 2-tap FFE and 3-tap DFE are utilized to
cancel pre-cursor and post-cursor ISI, respectively.
In order to improve upon the energy-efficiency offered by
earlier efforts at these data-rates, this design makes extensive
use of current-integration techniques [4] (as opposed to
resistively loaded stages). For example, the CTLE integrator in
Fig. 3(a) uses a source-degenerated input pair, with clocked
cascode devices for demultiplexing. Note that the resetting
behavior of such current-integration stages is often not
desirable for proper operation of subsequent stages. Dynamic
latches are therefore inserted after the CTLE to convert the
return-to-zero signals to non-return-to-zero, and to provide a
set of UI delayed signals to the FFE.
An integration summer after the latches (Fig. 3(b)) replaces
the power-hungry continuous-time summer in [1] and also
includes a 2-tap FFE to correct pre-cursor ISI. The clocked
current source in the main integration branch turns off the tail
currents during the reset phase, thus reducing the power
consumption further. A key challenge in integrating an FFE
into the receiver is that FFEs require a mechanism to realize
tunable analog gain while retaining a sufficiently linear
transfer function across all such gain settings. Thus, unlike
DFE, one generally cannot set the magnitude of an equalizer
coefficient simply by changing the tail current of a branch in
the circuit, since this would result in low overdrive and hence
poor linearity at low gain. This design therefore instead varies
the cascode gate voltage bias of a cascoded differential pair
(Fig. 3(b)) to achieve variable gain; linearity over the gain
control range is maintained since the overdrive of the input
pair remains roughly constant. Common-mode (CM) control is
critical to ensure the functionality of the following stage, and
thus pull-up current sources [4] (to maintain CM during
integration) as well as a resistor DAC (to lower the CM during
reset) are utilized for this purpose.
Since it has extremely stringent feedback latency
requirements, the first post-cursor tap is cancelled in a separate
dynamic-latch based stage [1] following the integrating
FFE+DFE. Since the dynamic latch should ideally be
transparent when the integrator output is fully evaluated,
simulations showed that delaying the DFE clock relative to the
integrator clock by ~3ps resulted in the best overall
performance (Fig. 4(b)). Given the relatively small delay
compared to the bit period, this delay was simply implemented
by the passive RC network shown in Fig. 4(a).
Even though the output of the dynamic latch stage has
sufficient swing to force the DFE feedback devices into their
non-linear regime, significant additional gain is required after
this stage in order to support equalizer adaptation. This is
because the error slicer required for such adaptation is
intentionally driven by the adaptation loop to have essentially
zero input. Leveraging their regenerative nature and small
aperture windows [6], this design therefore uses two pairs
(even/odd, data/error) of interleaved, offset-cancelled
Fig. 3 (a) CTLE + DMUX (b) Current integrating FFE+DFE
Fig. 1 Receiver Architecture
Fig. 2 Equalizer Architecture
(a) (b)
ITAP3
D3 D3VIP VIM
VCKN ITAP2
D2 D2
VDD
VCKP
VDD
IBP1 IBP2
Main
integrator
DFE branches
(2nd, 3rd tap)
VOUT
gm2 gm3
VPIP VPIM
VCASC
TAP-1
FFE branch
gm-1
VDD
VCKP
VOUTE
VCKC
VIP VIM
RS
VDD
VCKP
VOUTO
VCKC
CS
IBN
LC OSC
DRV
ESD &
Tcoil
60Gb/s In DE DOUT[7:0]
EOUT[7:0]
Slicers
DO
/4
vdLev
CTLE+FFE+DFE
30GHz Clk
gm3 gm2
gm1
gm-1
gm0
∫
gm0
gm3 gm2gm-1
CK
∫
CK
CKD
CK
CK
Feedback
latches
FFE latches Integrating
FFE+DFE
Latched
Summer DFE
CTLE
+DMUX
CK CK
Vin
DE
DO
CK
CK
CK
CKD
LL
LL
CK
L
CK
L
CK
L
CK
L
∫
gm1
CKD
CK
StrongArm latches following the DFE (Fig. 5) to efficiently
realize this gain requirement.
To complete the frontend, a 140 m by 250 m LC oscillator
with a number of digitally controlled capacitor arrays to
support CDR generates 30GHz differential clocks. The
oscillator also includes an injection locking path to support
testing, and is followed by a 220 m by 130 m resonant clock
buffer to efficiently drive the ~300fF differential capacitive
load from the equalizer circuitry.
Measurement Results
As shown in Fig. 7(a), the receiver was fabricated in a 65nm
CMOS process, with the frontend occupying 0.16mm2. To
characterize the design, a 60Gb/s PRBS7 pattern was
transmitted (by a separate chip/TX described in [1]) and
1/128x sub-samplers (using the same structure as Fig. 5, with
oscillator injection locking enabled) were used to reconstruct
the received pattern and measure the BER (Fig. 7(b)). Fig. 8(a)
and Fig. 8(b) show the eye diagram and pulse response of the
input signal, respectively. As shown by the timing bathtub
curve in Fig. 8(c), the receiver successfully recovers the
PRBS7 pattern, and operates error free over >1e12 bits in the
center region. The equalizer core consumes 48mW while
supporting substantially more functionality than previous
designs [1-3] (Table I); the complete frontend dissipates
173mW from 1.2V and 1.0V supplies.
Acknowledgements
The authors would like to thank Systems on Nanoscale
Information fabriCs (SONIC), BWRC, BDA, Integrand EMX,
Lorentz PeakView, the TSMC University Shuttle Program,
and B. Casper of Intel, K. Chang of Xilinx, P. Y. Chiang of
OSU, C.–K. K. Yang of UCLA, P. K. Hanumolu of UIUC, and
V. Stojanovic of UC Berkeley.
References
[1] Yue Lu, Elad Alon, JSSC, Dec. 2013
[2] Takayuki Shibasaki, et al., VLSI Symp., 2014
[3] Ahmed Awny, et al., JSSC, Feb. 2014
[4] Matt Park, et al., ISSCC, Feb. 2007
[5] Rui Bai, et al., ISSCC, 2014
[6] Jaeha Kim, et al., TCAS I, Aug. 2009
TABLE I: PERFORMANCE SUMMARY
Reference
[1]
[2]
[3]
This
work
Process
65nm
CMOS
20nm
CMOS
130nm
SiGe
65nm
CMOS
Data-rate (Gb/s)
66
56
80
60
Equalizer
3-tap
DFE
CTLE
1-tap DFE
1-tap
DFE
CTLE
2-tap FFE
3-tap DFE
VISI/VCURSOR or
Channel Loss (dB)
1.65
23 dB
12 dB
1.54
Power (mW)
Equalizer
46
1772▪
48
Deserializer
28
Clock generation
52◊
Clock distribution
2228
45
Total
46
177*
4000
173
Efficiency (pJ/bit)
0.7
3.16
50
2.88
▪: Includes output buffer ◊: LC oscillator + divider + PI
*: Includes equalizer, 4:16 DES, clock distribution
Fig. 4 (a) Latched summer 1-tap DFE (b) DFE behavior without clock
delay (dashed) and with clock delay (solid)
Fig. 7 (a) Die photo (b) Measurement setup
Fig. 8 (a) Eye diagram at channel output (b) Estimated pulse response
(c) Bathtub curve after equalization
Fig. 5 Interleaved deserializing slicers
Fig. 6 Clock path architecture
VIP
VCKDN
D1 D1
VDD
VCKDP
VTAPCKDN 1st tap
branch
FFE+DFE
Integrator
VBDN VBTAPDNVBN
Passive delay
VCKN
VOUT
gm1
VIM
Integration Reset
(a) (b)
Equalizer Slicer
60Gb/s Tx
with ISI
10GHz clock
Subsamping
clock
BERT
RxIC
(b)
Injection
locked OSC.
(a)
(b)
Volts
(a)
BER
(c)
Tx phase (UI)
Time index (UI)
DoutEP
DoutEM
VDD
OFP OFM
IMAIN
IOFST
vdLevP
vdLevM
CK30GP
CK30GM /2 CML2
CMOS DESCLK
[3:0]
OFSTLE[3:0]
DESCLK[3:0]
DIV+PI
vdLevO
0
30GHz Clk
DoutO
DoutE
CK
Preamp
Strongarm
Latch
vdLevE
0
Equalizer in Fig. 2
/2 PI
OFSTLO[3:0]
L
EOUT[3:0], DOUT[3:0]
retimer
4x Interleaved Slicer Array
EOUT[7:4], DOUT[7:4]
LC tank
CTLE+
FFE+
DFE
Resonant
clock driver
Band select
Prop control
Intg control
/4
30GHz injection locked LC OSC
10GHz
Clk in
Clock receiver
IINJ IOSC