A power-efficient, low-complexity, memoryless coding scheme for buses with dominating inter-wire capacitances
ABSTRACT In this paper we present a simplified model of parallel, on-chip buses, motivated by the movement toward CMOS technologies where the ratio between interwire capacitance and wire-to-ground capacitance is very large. We also introduce a ternary bus state representation, suitable for the bus model. Using this representation we propose a coding scheme without memory which reduces energy dissipation in the bus model by approximately 20-30% compared to an uncoded system. At the same time the proposed coding scheme is easy to realize, in terms of standard cells needed, compared to several previously proposed solutions.
-
Citations (0)
-
Cited In (0)
Page 1
A Power-Efficient, Low-Complexity, Memoryless Coding Scheme for Buses with
Dominating Inter-Wire Capacitances
Tina Lindkvist, Jacob Löfvenberg, Henrik Ohlsson, Kenny Johansson and Lars Wanhammar
Department of Electrical Engineering, Linköpings Universitet, SE-581 83 Sweden
{tina, jacob, henriko, kennyj, larsw}@isy.liu.se
Abstract
In this paper we present a simplified model of parallel,
on-chip buses, motivated by the movement toward CMOS
technologies where the ratio between inter-wire capaci-
tance and wire-to-ground capacitance is very large.
We also introduce a ternary bus state representation,
suitable for the bus model. Using this representation we
propose a coding scheme without memory which reduces
energy dissipation in the bus model by approximately 20-
30% compared to an uncoded system. At the same time the
proposed coding scheme is easy to realize, in terms of
standard cells needed, compared to several previously pro-
posed solutions.
1. Introduction
1.1. Background
The continuing decrease in the minimum feature size in
modern CMOS circuits and the corresponding increase in
chipdensityandoperatingfrequencyhavemadepowercon-
sumption a major concern in ULSI design. Chip area and
throughput may no longer be primary system limiting fac-
tors except in very high-volume integrated circuits (tens of
millions circuits per year) and in general-purpose comput-
ing.
What is becoming a more and more important factor in
CMOS circuits is consumption and dissipation of power.
One of the reasons for this is the increase in the number of
hand-held devices, requiring electronic circuits with low
power consumption to increase battery time.
In this paper our concern is not high-speed, but instead
very low power. This means that the buses we consider are
energy optimized, even if this means that we have to sacri-
fice throughput. As a result, problems with cross-talk and
inductive couplings will not be discussed.
Oneimportantfactorinpowerconsumption,andonethat
needs addressing, is leakage. This problem is however not a
topic in this paper. We will instead focus on the energy that
is dissipated due to parasitic capacitances between nodes in
the circuit.
1.2. Bus model
In on-chip, parallel buses the energy dissipation stems
from parasitic capacitances between wires (inter-wire ca-
pacitances) and capacitances between wires and other metal
layers (or the substrate) that have to be charged and dis-
charged as the bus state changes.
In Figure 1 a cross section of the metal layers in a 180nm
process is shown, and different capacitances for metal layer
4 is shown. A more detailed figure would have shown also
the capacitances between non-adjacent wires. For the sake
of clarity we have chosen not to do so here, and since they
are much smaller then the capacitances between adjacent
wires we will disregard them.
The capacitancecan be split into
andfor the capacitances to the different layers be-
low metal layer 4. In the same way
three different capacitances. Of these the capacitive cou-
pling to the adjacent layers will be the greater.
We assume signals in different layers to be independent,
implying that the energy dissipation due to such capacitive
couplings depend only on the frequency of state changes on
the bus wires under consideration. For every bus wire we
can lump together all the capacitances to nodes in other lay-
ers and view them as a single capacitance, connecting the
bus wire with a single node with non-changing charge. The
value of this charge does not affect power dissipation, so we
will assume it to be ground and call it the wire-to-ground
capacitance,.
Thefringecapacitance,
wire-to-ground capacitance and less than the inter-wire ca-
pacitance, but in order to simplify the model
taken to be or zero (see [3]).
Ci
,,
can be split into
,isingeneralgreaterthanthe
is often
Cd
Cd3
Cd2
Cd1
CdGND
Cu
Ci
Cf
Cf
Page 2
Figure 1. A cross section of the metal layers in a
180nm process with seven metal layers.
The relation between the different capacitances is inter-
esting. In older models the inter-wire and fringe capacitanc-
eswere disregarded and
capacitances were taken into account. Such assumptions
motivated the use of Gray codes for address bus coding, and
Bus Invert coding for data buses. However, as processes
shrink the ratio of the inter-wire capacitances to the wire-to-
ground capacitances grows, and in modern processes the in-
ter-wire capacitances (between adjacent wires) can no long-
er be disregarded. In Figure 2 the inter-wire to wire-to-
ground capacitance ratio is shown. The numbers for the fig-
ure is taken from Table 1 in [6], which in turn is based on
[2].
only thewire-to-ground
Figure 2. Ratio between inter-wire and wire-to-
ground capacitances for different technology
generations.
As is seen in Figure 2 the inter-wire capacitance is much
greater than the wire-to-ground capacitance for modern
processes, and the trend is that the ratio grows. If this trend
continues the inter-wire capacitances will soon be dominat-
ing.Thismotivatesustouseasimplemodelwherethewire-
to-groundcapacitancesaredisregarded.Forthesakeofsim-
plicity, we also ignore the fringe capacitance. For small
processes this is a reasonable simplification, especially if
the bus under consideration is wide enough that the energy
dissipated due to the fringe capacitance is small compared
to what is dissipated in the rest of the bus.
These simplifications lead us to a model of a one layer
parallel bus as shown in Figure 3. We will later show that
the coding system derived using this simplified model
works well also in a more realistic setting.
Figure 3. Model of an on-chip parallel data bus.
are wirepotentials
capacitances
andtheinter-wire
Given an initial and a final state of the n-wire bus, with
and
representing the bus wire potentials, we can express the en-
ergy dissipatedduring
E ViVf
,()
1 2
⁄() Vf
Vi
–
(
=
thetransition
, where
Vi
)
as
???????
M1
M2
M3
M4
M5
M6
M7
Ci Cu
Cd
Cf
50 100150 200250
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Technology generation (nm)
Inter−wire / wire−to−ground capacitance ratio
c1,2
c2,3
V2
Vn
V1
Vi
ci j
,
Vi
V1
iV2
i… Vn
i
,,,()T
=
Vf
V1fV2f… Vnf
,,,()T
=
)TC Vf
–
(
Page 3
is the capacitance conductance matrix [3], where we have
set the ground capacitances
wire potentials will define a vector of values. We will con-
sider the system only when the wires have settled, so in our
model the potentials will be either 0 or
sent these two states with the binary values 0 and 1.
to zero. At any time the bus
. We will repre-
1.3. Cost Function
The energy dissipated when the bus goes from one state
to another can in our model be expressed as the sum of the
energies dissipated for each pair of adjacent wires during
the state transition. This can be seen in the capacitance con-
ductance matrix where such a matrix for each pair of adja-
cent wires can be found, see Figure 4. Note that this is not
possible when wire-to-ground capacitances are considered.
Figure 4. The capacitance conductance matrices of
the adjacent pairs of wires make up the total
capacitance conductance matrix.
Weassumeall
itance
tiples of
to another.
Table 1: Energy
2C˜
2
⁄
tobeequal,anddenotethiscapac-
.InTable 1theenergydissipationisshowninmul-
when switching two wires from one state
Vdd
2
dissipationinmultiples of
when a pair of wires changes state.
We call the energy dissipation in multiples of
normalized energy dissipation.
the
Between any two words we will use as a distance meas-
ure the total, normalized energy dissipation when transmit-
ting the words consecutively on the bus. In general we do
not know anything about what kind of data will be sent over
the bus, what codewords will be most common, or the order
in which they will be sent. Because of this we choose as
code cost function the average distance between two code-
words, divided by the number of bits in the uncoded repre-
sentation, with the average taken over every possible
ordered pair of codewords.
Since our distance measure can be calculated by adding
the distances for each pair of adjacent coordinates, we
choose another representation for the words, such that there
isone coordinatefor eachpair ofadjacent coordinatesin the
binary representation. We represent the binary pair 00 as
well as 11 with ‘0’, 01 with ‘+’ and 10 with ‘-’. For exam-
ple, the binary vector 0111 corresponds to +00.
Two things that follows from this definition are worth
noting: The first is that there are ternary vectors that do not
have a corresponding binary vector, for example 0+0+
(which is the reason for introducing condition II in Section
2.1 below). The other is that there are two ways of realising
the ternary all-zero vector.
This new representation transforms the problem from
choosing a code of binary vectors of length n into choosing
a code of ternary vectors of length n-1. Using the ternary
representation in Table 1 above, we see that going from
state‘+’tostate‘-’(orviceversa)costsmuchmorethanany
other state transition, so a good code will result in few such
transitions.
As can be seen this bus model yields a line of reasoning
thatisslightlydifferentfromtheapproachtakenwhenwire-
to-ground capacitance is dominating. In that case the transi-
tion activity, corresponding to the Hamming distance be-
tween consecutive words, is the relevant measure.
2. Coding system
A coding system can either be with or without memory.
In a coding system with memory one needs to know the pre-
vious state to encode the next word to be sent. The coding
systems in [1], [3], [4] and [5] are with memory.
Our coding system is without memory, i.e. the encoder
does not need to know the previous state to be able to
choose the next codeword to send.
2.1. The Fibonacci Code
We define our code for a given length as the ternary
words fulfilling the following conditions:
I.‘+’ is only allowed in even coordinates and ‘-’ is only
allowed in odd coordinates.
after/before
00
(‘0’)
0
1
1
0
01
(‘+’)
1
0
4
1
10
(‘-’)
1
4
0
1
11
(‘0’)
0
1
1
0
00 (‘0’)
01 (‘+’)
10 (‘-’)
11 (‘0’)
C
c1 2
c
–1 2
0
,
c
–1 2
c1 2
,
c
–2 3
,
0
…
…
0
,
c2 3
,
+
c
–2 3
c2 3
,
,
0
,
c3 4
,
+
c
–n
cn
1–
n
,
00
…
c
–n
1–
n
,
1–
n
,
=
ci i ,
Vdd
C
c1 2
c
–1 2
0
,
c
–1 2
c1 2
,
c
–2 3
,
0
…
…
0
,
c2 3
,
+
c
–2 3
c2 3
,
,
0
,
c3 4
,
+
c
–n
cn
1–
n
,
00
…
c
–n
1–
n
,
1–
n
,
=
ci i
1+
,
C˜
2C˜
⁄
Vdd
Vdd
2C˜
2
⁄
Page 4
II.Neither two ‘+’, nor two ‘-’ may be adjacent, zeros
disregarded.
III. The all-zero codeword may be used twice.
Condition I assures that no changes at all between the
states ‘+’ and ‘-’ will occur. Condition II is necessary since
a vector contradicting it would have no binary representa-
tion. Condition III is motivated by the fact that both the bi-
nary all-zero word, as well as the all-one word is
represented by the ternary all-zero word.
The number of ternary vectors of length n-1, which fulfil
these conditions, is
bonaccisequence
Fib i ( )
Fib i
1–
()
Fib i
(
+=
n
1–
n
, i.e., element n-2 in the Fi-
(
). Note that a ternary
corresponds to a binary vector of
, i.e., the number of physical wires is
this code the Fibonacci code, regardless of whether it is bi-
nary or ternary represented.
In Table 2 below is an example showing how the Fibon-
acci code of ternary length two is constructed.
Table 2:
Exampleof
construction of ternary length two.
,
vector of length
length. We call
a Fibbonaccicode
2.2. Code Expurgation
The number of codewords in a Fibonacci code is in gen-
eral not a power of two, why we have to choose a subset,
called a subcode, to represent the states of a binary bus. In
our construction we choose the subcode that minimizes the
cost function (see Section 1.3). For large codes it may be in-
feasible to find the minimizing subcode, in which case we
will resort to the following heuristic:
I.For each codeword, compute the average distance to
the codewords in the Fibonacci code.
Choose from the Fibonacci code the required number
of codewords with the lowest average distance.
In cases that are small enough to find the optimal subcode,
the heuristic produces results which are very close in cost,
as can be seen in Table 3.
Table 3: Cost for optimal and heuristic choice of
subcode.
II.
2.3. Code Mapping
When the subcode to use has been chosen from the Fi-
bonaccicode,amappingfromuncodedwordstocodewords
has to be made. Since we assume a uniform distribution of
the uncoded words, the choice of this mapping does not in-
fluence the code cost. Hence we may choose this mapping
arbitrarily, for example to simplify the realization of the en-
coderanddecoder.Itisnotfeasibletofindtheoptimalmap-
ping in this sense, so when we want to simplify the
realization we use the following heuristic, in which the un-
coded and coded words are ordered so as to minimize the
number of bits that switches values between two rows.
I. The uncoded and coded words are ordered separately,
starting with the words with the lowest Hamming
weight, usually the all-zero words.
II.Choose the next word among those that differ in only
one bit position compared to the previous word. If no
such word exists, choose a word with two differing
bits, and if no such word exists, choose a word with
three differing bits, and so on.
III. When a word has been chosen, a new word with a
minimum number of differing bits should be found.
This is repeated until all words, both coded and
uncoded, have been ordered.
IV. After the ordering, the first uncoded word is mapped
to the first coded word; the second to the second, and
so on.
The idea with this method is to group bits with the same
value together to a high degree, yielding a simplified map-
ping to hardware.
Ternary
representation
00
00
0-
0+
-0
--
-+
+0
+-
++
Binary
representation
000
111
110
001
100
none
101
011
010
none
Rejected
(condition)
OK
OK
Rejected (I)
OK
OK
Rejected (I, II)
OK
Rejected (I)
Rejected (I)
Rejected (I, II)
Fib n
2+
()
Fib 1
( )
Fib 2
( )
1==
2–
)
n
Size of
Fib
Code
5
21
89
377
Uncoded
bits
Extir
pates
Uncoded
cost
Optimal
subcode
cost
0.38
0.54
?
?
Heuristic
subcode
cost
0.38
0.54
0.59
0.62
2
4
6
8
1
2
3
4
0.50
0.75
0.83
0.88
Page 5
3. Realization
When the mapping between uncoded words and code-
words has been chosen, realizations of encoder and decoder
are possible. These have been realized in VHDL, and
mapped to hardware through logic synthesis using Design
Compiler from Synopsys. The target technology has been a
0.13 µm CMOS process. The synthesis results for two dif-
ferent code sizes, with respect to the number of standard
cells required, are shown in Table 4.
Table 4: Number of standard cells in realization of
encoder and decoder.
3.1. Example
We have an on-chip parallel data bus with eight wires,
and we assume the 256 uncoded words to have a uniform
distribution. This yields a code cost of 0.88 (
uncoded bit). In order to decrease this we choose a Fibonac-
ci code containing 377 codewords (using twelve bits).
Note that it would have been possible to choose a larger
Fibonacci code, giving more codewords to choose from.
However, in practice this yields a very small improvement
in code cost, but a much larger complexity.
We only need 256 out of the 377 codewords so we use
the heuristic in Section 2.2 to choose a subcode of size 256.
The coded bus will have twelve wires and the code cost will
be 0.62. This is a 29.5% decrease in energy dissipation. The
mapping of the uncoded words to the subcode of the Fibon-
acci code is done according to the heuristic in Section 2.3,
resulting in an encoder consisting of 333 standard cells and
a decoder consisting of 479 standard cells, using a 0.13 µm
CMOS process.
per
3.2. Simulations
We will verify that the simplified model is relevant, and
that the coding system constructed works also in a more re-
alistic setting. We will do so by comparing simulations with
and without wire-to-ground capacitance. In Table 5 below
the results of simulations without wire-to-ground capaci-
tance is shown. The simulations have been done using one
million randomly chosen codewords.
Table 5: Simulated code cost for coded and
uncoded transmission, when disregarding wire-
to-ground capacitance.
The next emerging technology today is 65nm. Looking
at Figure 2 we find that for 65nm the ratio between inter-
wire and wire-to-ground capacitances is approximately 5,
and we use this ratio in the simulations. Under these as-
sumptions we get a capacitance conductance matrix
.
The simulated code cost under these assumptions are
shown in Table 6 below. The simulations have been done
using one million randomly chosen codewords.
Table 6: Simulated code cost for coded and
uncoded transmission, when disregarding wire-
to-ground capacitance.
What is interesting is that the Fibonacci code, even
though constructed with the simplified bus model in consid-
eration, works well also when the wire-to-ground capaci-
tance is not neglected. The code gain is not quite as good as
in the simplified model, but it is still considerable.
4. Comparison with other solutions
Most previously presented power reducing coding
schemes use different bus models than the one presented in
Section 1.2. In older technology, the wire-to-ground capac-
itance is much larger than the inter-wire capacitance, why
the latter is often disregarded in older models, resulting in
schemes that try to minimize transition activity. One well-
knownsuchscheme,whichisefficientonuniformlydistrib-
uted data, is the Bus-Invert Coding [5].
Uncoded
bits
7
7
8
8
Extra
bits
3
3
4
4
Mapping
type
Random
Heuristic
Random
Heuristic
Std Cells in
encoder
201
172
352
333
Std Cells in
decoder
242
212
503
479
Vdd
2C˜
2
⁄
Uncoded bits Uncoded cost Coded cost Code gain
20.50
40.75
6 0.83
8 0.88
0.38
0.54
0.59
0.62
24%
28%
29%
30%
Uncoded bits Uncoded cost Coded cost Code gain
2 0.60
4 0.85
6 0.93
8 0.98
0.51
0.67
0.73
0.75
15%
21%
22%
23%
C
C˜
C˜
C˜
0
5
⁄
–
–
C˜
C˜
C˜
–
–
–
0
C˜
–
–
…
…
0
02C˜
5
⁄
2C˜
C˜
5
⁄
C˜
C˜
–
–00
…
C˜
–
C˜
5
⁄
=