Design methodology of high performance on-chip global interconnect using terminated transmission-line.
-
Citations (0)
-
Cited In (0)
Page 1
Design Methodology of High Performance On-Chip Global Interconnect
Using Terminated Transmission-Line
Yulei Zhang1, Ling Zhang2, Alina Deutsch3, George A. Katopis4
Daniel M. Dreps5, James F. Buckwalter1, Ernest S. Kuh6, Chung-Kuan Cheng2
1ECE Dept.,2CSE Dept., University of California, San Diego, CA
3IBM T. J. Watson Research Center, Yorktown Heights, NY,4IBM System Group, Poughkeepsie, NY
5IBM System and Technology Group, Austin, TX,6University of California, Berkeley, CA
1y1zhang@ucsd.edu,2lizhang@cs.ucsd.edu
3deutsch@us.ibm.com,4katopis@us.ibm.com,5drepsdm@us.ibm.com
1buckwalter@ece.ucsd.edu,6kuh@eecs.berkeley.edu,2ckcheng@ucsd.edu
Abstract—We explore two schemes using transmission-
line (T-line) to achieve high-performance global intercon-
nects on VLSI chips. For both schemes, we select wire
dimensions to ensure T-line effects present and employ in-
verter chains as drivers and receivers. In order to achieve
high throughput and alleviate Inter-Symbol Interference
(ISI), high termination resistance is used in the second
scheme. For the two schemes, we discuss how to optimize
the wire dimensions and the effects of driver impedance and
termination resistance on the wire bandwidth. Secondly,
design methodology is proposed to determine the optimal
design variables for three objectives. We adopt the pro-
posed methodology and compare the performance metrics
with repeated RC wires. Simulation results show that, the
proposed T-line schemes reduce the delay and improve the
throughput as much as 82% and 63%, for min-ddp (delay2-
power product) objective.
Keywords—On-chip transmission line, global intercon-
nect, termination resistance, design methodology
I. Introduction
As the semiconductor technology advances, the intercon-
nection becomes a critical factor to determine the digital
system performance and the power consumption. Accord-
ing to the prediction of ITRS roadmap 2007 [1], at 45 nm
technology node, the RC delay is 542 ps for 1 mm mini-
mum pitch Cu global wire, whereas the clock frequency will
reach 10 GHz (equivalent to 100 ps cycle time). So, there
exists a huge performance gap between the interconnect
delay and required clock rate.
the global wires, also consume a significant portion of to-
tal power. In [2], Magen et al. found that interconnection
power accounts for half the total dynamic power of a 0.13
μm microprocessor, and nearly 50% of the interconnect-
power is consumed by global wires.
To improve the global wire delay, buffer/repeater inser-
tion is normally used nowadays, which is referred as re-
peated RC wires [3].By repeatedly inserting buffers
along the long wire, the wire is divided into several RC
segments, which reduces the load driven by each buffer,
making the total delay change linearly w.r.t the wire length.
How to design such repeated wire structure under differ-
ent objective functions has been well studied in previous
Interconnects, especially
works [4], [5], [6]. Although buffers improve the wire per-
formance, they also bring the overhead in terms of extra
power consumption, chip area and wiring complexity. In
[6], Zhang et al. pointed out that, to minimize the total
delay, the gate capacitance of buffer should be equal to the
wire segment capacitance, which means that half of the
dynamic power is dissipated on buffers.
As a potential alternative, on-chip global wiring using
transmission line (T-line) has attracted many research fo-
cus recently. It has been shown in [7] that, under some
conditions, transmission line effects need to be considered
for on-chip wires. If the wires are operated in LC-region,
the wire delay will be determined by the wave propaga-
tion delay, which is much smaller than the wire RC de-
lay. Wave propagation also reduces the power consumption
by eliminating the full-swing charge and discharge on wire
and gate capacitance. However, on-chip T-lines normally
need larger dimension than RC wires, and could encounter
Inter-Symbol Interference (ISI) brought by resistive loss,
which makes T-line structures not cost efficient in terms
of throughput density. To break this barrier, various ap-
proaches have been proposed. [8], [9] added termination
resistance to minimize the distortion and derived the ana-
lytical formula for optimal resistance value. [10], [11] uti-
lized passive and active equalization to alleviate the ISI.
[12], [13], [14], [15] adopted different transceiver schemes
on long global wires, and the overall performance is com-
pared with repeated RC wires.
In this work, we explore two T-line schemes for achiev-
ing high-performance on-chip global interconnects.
first scheme adopts tapered inverter chain (equal size-
progressive ratio) as the driver and receiver of long global
wire, which is based on the work of [7] and [16]. By choos-
ing larger wire geometry and proper driver impedance, this
scheme could achieve better performance compared with
repeated RC wires in terms of total delay and Far-End
Noise (FEN). However, this scheme cannot achieve high
throughput because of the full-swing signal at wire end. In
order to push the scheme for higher bandwidth, we add the
termination resistance at the far-end of T-line and devise
a non-tapered inverter chain (shown later) to amplify the
received small signal back to full-swing. The two T-line
schemes are designed and simulated, and the performance
The
Page 2
(a)On-chip T-line scheme w/o termination resistance.
(b)On-chip T-line scheme w/ termination resistance.
Fig. 1. On-chip global interconnection used in this work.
metrics including delay, power, throughput are compared
with repeated RC wires under different objectives.
Our contributions of this work include: 1) A new on-chip
global interconnect structure using terminated transmis-
sion line and non-tapered inverter chain as transceiver, 2)
A design methodology to achieve unified design of termina-
tion resistance and inverter chain in proposed interconnect
structure, 3) A case study using predictive 45 nm process
to verify the potential of proposed structure, which is also
compared with repeated RC wires under different design
objectives.
The rest of this paper is organized as follows. Section II
introduces the two schemes we study in this work, and
discusses how to choose the wire dimension and driver
impedance according to T-line theory. Furthermore, Sec-
tion III describes the design methodology for these two
schemes respectively, emphasizing the determination of the
bandwidth of the T-line scheme with termination resis-
tance.We utilize these methodologies to design T-line
schemes, and summarize all the results in Section IV. The
performance metrics are also evaluated and compared with
repeated RC wire in this section. Finally, Section V con-
cludes the whole paper.
II. On-Chip Global Interconnect
The two global signaling schemes using T-lines are shown
in Figure 1. For both schemes, we adopt single-ended T-
line structure and two identical inverter chains as the driver
and receiver. We add the termination resistance on the
second scheme to push for higher bandwidth.
schemes are designed to be repeatable, so the total delay
and power consumption are contributed by the T-line and
following receiver, as shown in the box surrounded by dash
line. The detailed features of the two schemes will be dis-
cussed in the following sections.
The two
tan0.00068
θ =
W
H
T
H
S
G
G
S2
GS1
3.1
rε =
6
2.2 10
×
Cu
cm
ρ
−
= Ω⋅
S3
G
Fig. 2. The wire structure used in this work.
TABLE I
Parameters of typical on-chip global wires
Wire
Case
8X
16X
∗fLC indicates the corner frequency between RC and LC region.
WSHTRZ0
(Ω)
40
40
f∗
LC
(GHz)
7.15
1.79
(μm)
0.8
1.6
(μm)
0.8
1.6
(μm)
1.6
3.2
(μm)
1.2
2.4
(Ω/mm)
23.9
6.0
A. Interconnect schemes using T-lines
The scheme without adding termination resistance is
shown in Figure 1(a), which follows the structure in the
work [7] and [16]. For identification, we refer this scheme
as T-line Scheme A. In Scheme A, tapered inverter chain
is adopted such that the size of inverters are progressively
increasing, which provides a low impedance RSto drive the
T-line for high-bandwidth (shown later in Section II-C).
Like conventional repeated wire design, it must be guaran-
teed that full-swing signal is received at the T-line output,
which limits the overall bandwidth of Scheme A.
By adding the termination resistance RLoad, we get
the other signaling scheme, which is referred as T-line
Scheme B. Termination resistance lowers the DC volt-
age of wire output, to match the attenuated high-frequency
component of input signal as discussed in [8], resulting in
considerable far-end eye-opening at high data rate. The in-
verter chain used in Scheme B is not simple tapered chain.
It consists of an equal-sized chain and a tapered chain, as
shown in Figure 1(b). Here, we use the equal-sized inverter
chain at first stage of receiver in order to recover received
high-speed low-swing signal back to full-swing, then the fol-
lowing tapered chain will improve the slew rate and provide
low impedance to drive the T-line. By simulation, we found
that two-stage equal-sized inverters are enough for recov-
ering the signal as long as the far-end eye-opening is larger
than the threshold of this equal-sized inverter. In this
situation, output slew of the first inverter limits the band-
width of whole inverter chain. Because the output slew of
the first inverter is related to worst-case eye-opening, which
is determined by bit rate, optimal cycle time TC need to
balance the bandwidth of T-line and the inverter chain re-
ceiver. This issue will be discussed in the Section III.
B. On-chip T-lines
Given the fact that on-chip T-line is very lossy due to
the miniaturization of the wire cross section, it can ei-
ther operate in RC or LC region under different frequen-
Page 3
cies [17]. In RC region, the frequency is low which makes
ωL ? R, so the propagation constant could be written
as γ =
2
+j
components of signal travel fast but with more attenuation,
resulting in the distortion of received signal and limiting the
bit rate. If the frequency increases such that ωL ? R, the
wire operates at LC region and the propagation constant
becomes γ =
constant is
α =
2?L/C
where Z0is the characteristic impedance of T-Line, and the
phase velocity v =ω
components of signal will travel with the same speed and
get the same attenuation, which achieves the fast distor-
tionless communication.
In reality, on-chip global wires normally have lengths of
5-10 mm, which satisfies that l ≈ λ or Tof≈ tr(Tofis the
time of flight, and tr is the input signal rise time) as the
operation frequency goes up to tens of gigahertz. The wire
resistance is kept low such that R < ωL, so the inductive
effects need to be taken into account. The voltage step
response of such wire can be expressed as [7]
?
where B(l,t) is a slowly rising modified Bessel function. In
order to utilize the fast transition of LC-mode, we need
to make the first term in (2) dominant, that means we
need to keep the resistance attenuation low and increase
the incident wave amplitude, which are summarized into
several conditions [16]
?
ωRC
?
ωRC
2. In this situation, high frequency
R
2√
L/C+jω√LC. Therefore the attenuation
R
=
R
2Z0
(1)
β=
1
√LC. In LC region, all frequency
V (l,t) =
e−
R
2Z0 + B(l,t)
?
u(t − Tof)(2)
Tof≥ 0.5tr
R/2Z0< 1
RS< Z0
(3)
(4)
(5)
In this work, we use the single-ended strip line structure
to model on-chip T-lines, which has been shown in Figure
2. The ground planes on top and bottom are used to repre-
sent different layers while calculating the wire capacitance
and inductance, and we put 3 signal lines within power-
ground bars to study the crosstalk effects. Due to the skin
effect (The skin depth of copper is 0.65 μm at 10 GHz)
and other non-ideal factors, resistance and inductance of
on-chip T-line are frequency dependent, especially in the
high-frequency region. Therefore, we need to extract the
frequency dependent RLGC parameter of the wire using
EM filed solver in order to capture the T-line characteris-
tic. Also, some assumptions are introduced to simplify the
extraction as shown below:
1) The random switching activity of signal lines on top
and bottom orthogonal layers (layer n+1 and n-1 if T-line
is located on layer n) do not change the capacitance of T-
line due to the statistical cancellation of opposite switching
directions, so the ground plane is used to represent orthog-
onal layers while extracting the T-line capacitance.
1015 20253035 40
2
4
6
8
10
12
14
Rs/Ω
Bandwidth/GHz
Wire Full−swing Bandwidth vs. Rs
8X Wire
16X Wire
Fig. 3. The effect of driver impedance on wire bandwidth for Scheme
A.
2) The power/ground wires on the layer n+2 and n-2,
which are parallel with T-line on layer n, should be con-
sidered as the current return paths while calculating the
inductance, so the ground plane is used to represent these
parallel wires while extracting the T-line inductance.
The dimension and other parameters of the T-line are
summarized in Table I. The two wire cases shown in the
table are typically used for global signaling across several
millimeter range. The wire length is chosen to be 5 mm to
represent the critical path between CPU and cache.
C. Effects of driver impedance and termination re-
sistance
For scheme A, since we guarantee the full-swing signal
at the wire output, the wire bandwidth is defined as [16]
BW =
1
2.64 × tr
(6)
where tris the wire output slew. As discussed before, driver
impedance determines the amplitude of incident wave, so it
affects the wire output slew and bandwidth as a result. The
relation between driver impedance RSand wire bandwidth
is shown in Figure 3. In this figure, the higher bound of
RS is set to be the characteristic impedance Z0 = 40 Ω,
whereas the lower bound is chosen to be 10 Ω for achievable
on-chip inverter size. For both wire cases, reducing RSwill
increase the wire bandwidth, especially for the 16X wire
because of the low resistive attenuation. While RS = 10
Ω, the bandwidth of 16X wire can go up to 14 GHz.
For Scheme B, driver impedance not only affects the wire
bandwidth but also determines the eye-opening at wire out-
put together with the termination resistance RLoad. Fig-
ure 4 shows the 2D map of worst-case eye-opening of 16X
wire under different data rates within the design space
{RS,RLoad}. Generally, eye-opening reduces as the fre-
quency goes high due to the distortion and ISI. Lowering
RSimproves the eye by reinforcing the incident wave and
sharpening the rise edge of output signal. On the other
hand, given bit rate and driver impedance, there exists
an optimal RLoad value in terms of largest eye-opening.
Page 4
10
20
30
40
0
200
400
600
0.2
0.4
0.6
0.8
1
Rs/Ω Ω Ω Ω
Eye-opening vs Rs/Rload (16X Wire)
Rload/Ω Ω Ω Ω
Eye Opening/V
0.3
0.4
0.5
0.6
0.7
0.8
10GHz
20GHz
40GHz
50GHz
Fig. 4.
on eye-opening for Scheme B.
The effects of driver impedance and termination resistance
While designing such scheme, it always need to guaran-
tee that worst-case eye is larger than the following inverter
threshold voltage, which is around 250-300 mV for most
processes.
In summary, lower driver impedance and larger wire
cross section are needed in order to achieve high-
throughput both for Scheme A and B. As a result, we
choose RS = 10 Ω and 16X wire in the following exper-
iments.
III. Design Methodology
The design methodologies of two T-line schemes are in-
troduced in this section, respectively.
A. T-line Scheme A
In this case, since the wire dimension and driver
impedance have been chosen, we take two steps to deter-
mine the design variables, which include the first inverter
size S1(ratio to the minimum-size inverter) and number of
stages N.
Step 1: determine the bit rate
By simulation, we found that the inverter chain could
support the desired frequency as long as the full-swing
signal is guaranteed at the wire end, so the bit rate of
Scheme A is limited by the wire bandwidth, which is al-
ready defined in (6) using wire output slew.
driver impedance and wire geometry, the bit rate can be
determined as shown in Figure 3.
Step 2: choose the optimal design variables
At the bit rate found in Step 1, we explore the design
space to determine the optimal variables by sweeping the
first inverter size S1 and number of stages N within a
physical range, and generate the cost map for three dif-
ferent design objectives: minimum delay (min-d), mini-
mum delay-power product (min-dp) and minimum delay2-
power product (min-ddp). The optimal design variables
correspond to the lowest points on the cost map.
For given
Fig. 5. The design flow to determine the optimal bit rate and termi-
nation resistance for Scheme B.
B. T-line Scheme B
For Scheme B, one more design variable RLoadis added
while determining the first inverter size S1and number of
stages N. Still, two similar steps are taken to design this
scheme, but the bit rate is not that straightforward to be
chosen as the one in Scheme A.
Step 1: determine the bit rate
Figure 5 provides a design flow to choose the optimal bit
rate for Scheme B. We begin with a lower initial bit rate,
such as a larger cycle time TC. At this bit rate, we optimize
the termination resistance RLoadin terms of largest worst-
case eye-opening Veye, by sweeping the RLoadand applying
the algorithm in [18] to predict the worst-case eye-opening.
If Veye is larger than the first inverter threshold, which
is set to be 250mV here, then the inverter chain output
slew is checked to see if the overall output signal could be
recovered. Otherwise, the bit rate needs to be reduced to
enlarge Veye in order to satisfy the threshold constraint.
At the next stage, the output slew of the inverter chain is
compared with the rise time of input signal (assumed to be
10% of cycle time in this work). If the inverter chain can
recover the signal with better slew than the input signal,
the cycle time could be reduced further; otherwise, we need
to reduce the bit rate to balance the bandwidth of wire
and the inverter chain. Finally, the optimal bit rate and
corresponding termination resistance RLoadare found after
some iterations.
Step 2: choose the optimal design variables
Using the optimal bit rate and RLoadfound by Step 1,
we explore the design space {S1,N} similarly to the Step 2
while designing Scheme A. Also, the optimal variables are
chosen for the objective min-d/min-dp/min-ddp, respec-
tively.
Page 5
IV. Experimental Results
At 45 nm technology node, we design T-line scheme A
and B using the methodologies introduced in Section III
under three different design objectives: min-d, min-dp
and min-ddp. Also, the performance metrics of Scheme
A and B are compared with repeated RC wires, which are
optimized under the same objectives using the approach of
[6].
A. Experimental settings
We use the 2D EM field solver CZ2D of EIP tool suite
from IBM [19] to build the 2D structure of T-line in Fig-
ure 2 and extract the frequency dependent RLGC tabular
model for 16X wire case listed in Table I. During the ex-
traction, the assumptions introduced in Section II-B are
followed to calculate the wire capacitance and inductance,
respectively. The dielectric constant ?r, loss tangent tanθ
and resistivity ρCu follow the values shown in Figure 2.
HSPICE is adopted to simulate the step response of on-
chip T-line, which is utilized to predict the worst-case eye-
opening with a C-code package from the work of [18].
The 45 nm predictive transistor model [20], which is a
Synopsys level3 MOSFET model, is utilized to build the
inverter chain in the T-line Scheme A and B. The design
flows introduced in Section III are implemented in PERL
and MATLAB to find the optimal design parameters under
given objective. We simulate the whole circuit in HSPICE
to evaluate the delay and power consumption.
For the repeated RC wires, the minimum pitch of global
RC wire is 135 nm and the corresponding aspect ratio (AR)
is 2.4 at 45 nm node according to the ITRS roadmap 2007
[1]. The repeater model is extracted using the same 45
nm predictive transistor model. We adopt optimization
method proposed in [6] to find the optimal wire dimension
(width and spacing), the length between repeaters and the
repeater size for the three objectives, then evaluate the per-
formance metrics of repeated global RC wires. The results
are verified with HSPICE simulation using the Π model to
represent the distributed RC wires.
B. Definitions of performance metrics
We compare the delay, power consumption and through-
put of proposed T-Line scheme with repeated RC wires.
For the delay comparison, we define the normalized delay:
delayn=propagtion delay
wire length
(7)
where the propagation delay includes the wire delay and
gate delay.The gate delay refers to the repeater delay
for RC wires and the transceiver (inverter chain) delay for
T-Line scheme.
The normalized energy per bit is used to evaluate the in-
terconnect power consumption, which is defined as follows:
powern=energy per bit
wire length
=
power
bit rate × wire length
(8)
TABLE II
Performance metrics comparison
Design Objects /
Scheme Category
Performance Metrics
EnergyThroughput
(pJ/m)(Gbps/μm)
121
70.1
103.2
56.8
68.5
93.6
77.3
68.8
96.8
Delay
(ps/mm)
45.9
10.6
10.1
75.0
10.7
10.5
56.4
10.7
10.2
Bit Rate
(Gbps)
4.36
14.0
25.0
2.65
14.0
25.0
3.56
14.0
25.0
min-d
RC wire
T-line A
T-line B
RC wire
T-line A
T-line B
RC wire
T-line A
T-line B
8.22
3.28
5.86
3.12
3.28
5.86
3.60
3.28
5.86
min-dp
min-ddp
The bit rate of RC wire is the inverse of propagation de-
lay since one bit is transmitted only after the previous
bit reaches destination (here we assume RC wires are not
pipelined). For T-Line scheme, the bit rate is determined
by the wire bandwidth in Scheme A or by the eye-opening
and inverter chain in Scheme B as discussed in Section III.
The normalized throughput is defined as:
throughputn=
bit rate
wire pitch
(9)
which reflects the amount of data can be transmitted for a
given cross area in a given time interval.
C. Optimal solutions and performance comparison
Utilizing the design methodologies proposed in Section
III, we perform the experiments and generate the 2-D cost
maps for T-line Scheme A and B under three objectives
(min-d/min-dp/min-ddp), which are shown in Figure 6 and
Figure 7,respectively. For Scheme A, the sweeping range
of design variables {S1,N} are set to be [40,300] and [3,6],
whereas for Scheme B, they are set to be [60,300] and [4,6].
The lowest points in cost maps correspond to the optimal
design variables.
We summarize the performance metrics of T-line Scheme
A and B under different objectives and compare the results
with repeated RC wire in the Table II. By adopting the on-
chip T-line schemes, the normalized delay could be reduced
greatly. Under min-d objective, T-line Scheme A and B can
improve the delay by 76.9% and 78.0%, respectively. Also,
the energy consumed on unit wire length is reduced due
to the wave propagation of on-chip T-line. Under min-ddp
objective, T-line Scheme A consumes 89.0% energy of re-
peated RC wires. We can notice that, T-line Scheme B will
consume 36.6%-47.2% extra energy compared with Scheme
A because of the static power dissipated on termination
resistance. Regarding the throughput, although the T-line
schemes utilize larger wire dimensions, still the through-
put is improved under objective min-dp/min-ddp because
of the higher bandwidth achieved by T-line. Under min-dp
objective, T-line Scheme A can increase the throughput of
RC wire by 5%, which could be further increased up to
88% by introducing the termination resistance.
Page 6
3
4
5
6
0
100
200
300
50
55
60
65
stage number
Performance Metrics: delay
first size
delay/ps
(a)Delay.
3
4
5
6
0
100
200
300
250
300
350
400
450
stage number
Performance Metrics: delay-power product
first size
delay-power product/(ps*mW)
(b)Delay-Power Product.
3
4
5
6
0
100
200
300
1
1.5
2
2.5
3
x 10
4
stage number
Performance Metrics: delay2-power product
first size
delay2-power product/(ps2*mW)
(c)Delay2-Power Product.
Fig. 6. Performance objectives within the design space for T-line Scheme A.
4
4.5
5
5.5
6
0
100
200
300
50
52
54
56
58
60
stage number
Performance Metrics: delay
first size
delay/ps
(a)Delay.
4
4.5
5
5.5
6
0
100
200
300
600
650
700
750
800
850
stage number
Performance Metrics: delay-power product
first size
delay-power product/(ps*mW)
(b)Delay-Power Product.
4
4.5
5
5.5
6
0
100
200
300
3
3.5
4
4.5
5
x 10
4
stage number
Performance Metrics: delay2-power product
first size
delay2-power product/(ps2*mW)
(c)Delay2-Power Product.
Fig. 7. Performance objectives within the design space for T-line Scheme B.
To better understand the effects of driver impedance RS
and termination resistance RLoadin Scheme B, we show the
wire step response in Figure 8. As shown in Figure 8(a),
larger RS leads to slower rise edge and lower saturation
voltage, resulting in the poor eye quality at the wire output.
Choosing RS= 10Ω, the effect of RLoadis shown in Figure
8(b). Larger RLoadcauses the sharper rise edge but also
introduces larger reflections, which could also deteriorate
the eye-opening. Balancing the above two scenarios, an
optimal RLoad (220 Ω in this case) is chosen to generate
the largest eye-opening as a result.
D. Tradeoff between performance metrics
For Scheme B, there exists a tradeoff between achievable
bandwidth and interconnect performance, while choosing
the number of stages N.We show this relation in the
Figure 9 by plotting the cycle time TCand optimal delay2-
power (ddp) product versus number of stages N on the
same figure. In previous design of T-line Scheme B, we
study the performance metrics in a design space {S1,N}
at an optimal bit rate, which is actually the lower bound
that such scheme can achieve within this given space. As
shown in the Figure 9, higher bit rate could be achieved
by increasing stage number in the inverter chain, because
the output slew rate is improved further by more stages.
The bit rate improvement meets the limitation when stage
number N is larger than 7, that is, the highest achievable
bit rate is around 30 Gbps (TC=33 ps) for Scheme B. On
the other hand, the optimal ddp product increases from
3.16×104ps2· mW to 7.35×104ps2· mW while the stage
number N changes from 4 to 8, as indicated on the figure.
In summary, increasing stage number can improve the bit
rate as much as 20% but will also bring about 2.3 times
performance overhead in terms of optimal ddp product.
Therefore, generally speaking, choosing less stage number
will bring better performance with considerable bandwidth
from the perspective of a designer.
E. Crosstalk effects
By adding different PRBS input patterns on adjacent
lines, which are quiet in the previous experiments, we in-
vestigate the crosstalk effects of two T-line schemes. We
choose the optimal design under the min-ddp objective to
represent the typical application that achieves tradeoff be-
tween performance and power consumption. The simula-
tion results show that, considering the crosstalk effects, the
normalized delay of T-line Scheme A and B will increase
by 9.6% and 2%, respectively. Due to the adjacent capaci-
tance, the power consumptions also increase by 37.0% and
25.7% for Scheme A and B. It can be seen that, by adding
the termination resistance, crosstalk effects could be alle-
viated because a DC path is added at the wire output.
Figure 10 shows the eye-diagrams at the output of wire
and inverter chain for Scheme B. Comparing Figure 10(a)
and 10(b), we can see that, even adding crosstalk effects,
Scheme B could work at original bit rate with little per-
formance overhead in terms of received signal quality. By
adding the crosstalk, the eye-opening at wire output is re-
Page 7
@Wire output@Inverter chain output
820mV3.6ps
(a)Eye-diagrams w/o crosstalk effects.
@Wire output @Inverter chain output
750mV 6.9ps
(b)Eye-diagrams w/ crosstalk effects.
Fig. 10. Eye-diagrams of Scheme B using solution of min-ddp.
(a)Effect of driver impedance RSon step response.
(b)Effect of termination resistance RLoadon step response.
Fig. 8. Wire step response of Scheme B.
duced from 820 mV to 750 mV , and the jitter of received
signal is increased from 3.6 ps to 6.9 ps while the scheme
works at 25 Gbps in both scenarios.
V. Conclusions and Future Works
A. Conclusions
In this work, we study two T-line schemes for on-
chip global interconnects. Design methodologies for two
schemes are proposed and applied in the experiments to
determine the optimal design variables for design objective
445
# of stages in inverter chain
66778
33
34
35
36
37
38
39
40
Cycle Time/(ps)
Tradeoff between Throughput and Performance
5 83
4
5
6
7
x 10
4
Optimal Delay2-Power Product/(ps2-mW)
Cycle Time
Delay2-Power Product
Fig. 9.
Scheme B.
Tradeoff between achievable bandwidth and performance in
min-d, min-dp and min-ddp. Compared with optimized
repeater RC wire using the same process, T-line schemes
could improve the delay, reduce the power consumption
and achieve comparable or even higher throughput by uti-
lizing the wave propagation. Adding the termination re-
sistance will increase the bandwidth further due to the re-
duction of signal distortion with the sacrifice of power con-
sumption overhead. To balance the tradeoff between band-
width and interconnect performance of T-line scheme, it is
preferred to use the inverter chain with less stages. While
taking the crosstalk effects into account, the termination
resistance will alleviate the performance degradation by
adding a DC path at the wire end. Therefore, proposed
scheme with terminated T-line provides designer a poten-
tial alternative to achieve high-performance, low-power and
also robust on-chip global interconnects.
B. Future works
The future works will include how to build a more prac-
tical model of on-chip T-lines considering the real three
dimensional BEOL (Back End Of Line) stacking intercon-
nects. This practical model needs to take orthogonal wires
on the adjacent layers (layer n±1) and parallel wires on the
sub-adjacent layers (layer n±2) into account while extract-
ing the T-line capacitance and inductance. Also, adjacent
in-plane power/ground bars should be considered when cal-
culating frequency-dependent inductance. In summary, a
practical, three dimensional, frequency-dependent T-line
model is needed in the future.
Page 8
Another possible direction is considering the crosstalk ef-
fects during the optimization of design variables. This will
need to extend the worst-case eye prediction algorithm,
which is currently based on the victim line step response,
to handle crosstalk effects. As soon as we could evaluate
the eye-quality with considering the crosstalk, a new opti-
mization methodology can be developed to generate more
robust designs.
VI. Acknowledgement
The authors would like to acknowledge the support of
NSF CCF-0811794 and California MICRO Program.
References
[1]Semiconductor Industry Association, “International technology
roadmap for semiconductors,” 2004,2006,2007.
N. Magen, A. Kolodny, U. Weiser, and N. Shamir, “Intercon-
nect power dissipation in a microprocessor,” in IEEE/ACM Int.
Workshop on System Level Interconnect Prediction, Feb. 2004,
pp. 7–13.
H. B. Bakoglu, Circuits, Interconnections, and Packaging for
VLSI, Addison-Wesley, 1990.
A. Nalamalpu and W. Burleson, “Repeater insertion in deep
sub-micron cmos: ramp-based analytical model and placement
sensitivity analysis,” in IEEE Int. Symp. on Circuits and Sys-
tems, May 2000, pp. 766–769.
P. Kapur, G. Chandra, and K. C. Saraswat, “Power estimation
in global interconnects and its reduction using a novel repeater
optimization methodology,” in IEEE/ACM Design Automation
Conf., June 2002, pp. 461–466.
L. Zhang, H. Chen, B. Yao, K. Hamilton, and C.K. Cheng,
“Repeated on-chip interconnect analysis and evaluation of de-
lay, power and bandwidth metrics under different design goals,”
in IEEE Int. Symp. on Quality Electronic Design, Mar. 2007,
pp. 251–256.
A. Deutsch, P. W. Coteus, G. V. Kopcsay, H. H. Smith, C. W.
Surovic, B. L. Krauter, D. C. Edelstein, and P. L. Restle, “On-
chip wiring design challenges for gigahertz operation,” Proceed-
ings of the IEEE, vol. 89, no. 4, pp. 529–555, April 2001.
M.P. Flynn and J.J. Kang, “Global signaling over lossy trans-
mission lines,” in IEEE/ACM Int. Conf. on Computer-Aided
Design, Nov. 2005, pp. 985–992.
A. Tsuchiya, M. Hashimoto, and H. Onadera, “Design guidline
for resistive termination of on-chip high-speed interconnects,”
in IEEE Custom Integrated Circuits Conf., Sept. 2005, pp. 613–
616.
[10] H. Chen, R. Shi, and C. K. Cheng, “Surfliner: A distortionless
electrical signaling scheme for speed-of-light on-chip communi-
cation,” in IEEE Int. Conf. on Computer Design, Oct. 2005,
pp. 497–502.
[11] B. Kim and V. Stojanovic, “Equalized interconnects for on-chip
networks: Modeling and optimization framework,” in IEEE Int.
Conf. on Computer Aided Design, Nov. 2007, pp. 552–559.
[12] M. Hashimoto, A. Tsuchiya, and H. Onodera, “On-chip global
signaling by wave pipelining,”
Electrical Performance of Electronic Packaging, Oct. 2004, pp.
311–314.
[13] M. Hashimoto, A. Tsuchiya, A. Shinmyo, and H. Onodera, “Per-
formance prediction of on-chip high-throughput global signal-
ing,” in IEEE. Topical Meeting on Electrical Performance of
Electronic Packaging, Oct. 2005, pp. 79–82.
[14] H. Ito, J. Inoue, S. Gomi, H. Sugita, K. Okada, and K. Masu,
“On-chip transmission line for long global interconnects,”
IEEE. Int. Electron Device Meeting, Dec. 2004, pp. 677–680.
[15] S. Gomi, K. Nakamura, H. Ito, K. Okada, and K. Masu, “Dif-
ferential transmission line interconnect for high speed and low
power global wiring,”in IEEE Custom Integrated Circuits
Conf., Oct. 2004, pp. 325–328.
[16] A. Deutsch, H. H. Smith, C. Vakirtzis, J. Kozhaya, and L. M.
Greenberg, “Effect of noise on timing or data-pattern depen-
dent delay variation when transmission-line effects are taken into
acouunt for on-chip wiring,” in IEEE Workshop on Signal Prop-
agation on Interconnects, May 2007, pp. 7–10.
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
in IEEE. Topical Meeting on
in
[17] H. Johnson and M. Graham,
Prentics Hall, 2003.
[18] R. Shi, W. Yu, Y. Zhu, E. S. Kuh, and C. K. Cheng, “Efficient
and accurate eye diagram prediction for high speed signaling,” in
IEEE/ACM Int. Conf. on Computer-Aided Design, Nov. 2008,
pp. 655–661.
[19] IBM,“IBM electromagnetic field solver suite of tools,”
http://www.alphaworks.ibm.com/tech/eip.
[20] S. Uemura, A. Tsuchiya, and H. Onodera, “A predictive tran-
sistor model based on itrs roadmap,” in General Conference of
IEICE, Mar. 2006, p. 81.
High-speed signal propagation,
in
View other sources
Hide other sources
-
Available from Alina Deutsch · 8 Mar 2013
-
Available from ucsd.edu