Conference PaperPDF Available

Mapping CIRCAL Algorithms to Event Logic Using A Standard Cell Library

Authors:

Abstract

Abstrcat The work reported in this paper is part of a silicon compiler that receives a parallel algorithm written in CIRCAL and produces a VLSI implementation. The implementation logic used is the asynchronous event logic. A generated netlist of event logic modules is used to produce the VLSI mask layout geometries for this circuit using the standard cell approach. A standard library of cells for event logic modules has been designed, simulated and its layout generated.
1
Mapping CIRCAL Algorithms to Event Logic Using A Standard Cell Library
K. M. Elleithy and A. A. Amin
Computer Engineering Department
KFUPM, Dhahran 31261
Saudi Arabia
e-mail: elleithy@ccse.kfupm.edu.sa
Abstrcat
The work reported in this paper is part of a silicon
compiler that receives a parallel algorithm written in
CIRCAL and produces a VLSI implementation. The
implementation logic used is the asynchronous event
logic. A generated netlist of event logic modules is
used to produce the VLSI mask layout geometries
for this circuit using the standard cell approach. A
standard library of cells for event logic modules has
been designed, simulated and its layout generated.
1. Introduction
Even though most commercially available digital
systems are synchronous, there is currently a
renewed interest among researchers in asynchronous
logic. With the ongoing continuous increase in
system integration, signal propagation delays are
exceeding acceptable limits particularly for signals
of global nature as the system clock. As a result,
problems associated with designing a clock signal
which traverses all system components, e.g. clock
skew are adversely affecting system performance. In
addition to being free of global clock signal
problems, asynchronous logic enjoys several other
advantages [lav-93]. The performance of
asynchronous systems follow the average-case speed
behavior as compared to the worst-case behavior
characterizing synchronous systems [Sei-80].
Continued scaling of MOS devices and increased
hunger for larger chip integration has effected
another trend in the semiconductor industry, namely
the design of low-voltage low-power circuitry. The
low voltage requirement is necessary for reliable
operation as device dimensions are scaled down. The
low-power requirement is necessary for more device
integration without violating the maximum power
dissipation constraints of chip carriers/packages.
Asynchronous circuits offer an attractive solution to
the power dissipation problem since, assuming a
CMOS implementation, only active modules in a
VLSI asynchronous circuit will dissipate power.
This is in direct contrast to synchronous systems
where the heavily loaded clock signals will cause
power dissipation charging and discharging gate
capacitances of all modules each clock cycle.
Asynchronous circuits have been reported to enjoy a
substantial power saving advantage over their
synchronous counterparts which may be as high as
80% in some cases [Ber-94]. In addition, the current
push for the design of low-voltage circuits has
resulted in an associated reduction of transistor
switching speed which negatively affects the overall
system performance. Such speed reduction can be
compensated for through the use of parallel
architectures. In this regard, asynchronous systems
lend themselves more naturally to the
implementation of parallel systems architectures.
Last, but not least, contrary to synchronous systems,
replacing any module in a given asynchronous
system with a faster one improves the overall system
speed without need to perform complex timing
analysis or replace other system modules. On the
other hand, design of individual asynchronous
modules is generally a more difficult task than
designing the corresponding synchronous module.
Reliable asynchronous circuits, should ensure
hazard-free behavior under some timing model. This
generally leads to implementations with larger
silicon area.
The design methodology adopted in this work uses
event logic [Sut-89]. Our approach employs the
transition signaling convention and a Request-
Acknowledge communication protocol under the
Bundled-Data model[Bru-91].
2. Transition Signaling
Transition Signaling means that transitions from the
high state to the low state (falling transition) or from
the low state to the high state (rising transition) are
used as basic action-triggering events. For control
signals, no particular meaning is assigned to the
absolute high or absolute low states. In addition, the
sense of transition is of no particular significance,
i.e. rising and falling transitions have the same
meaning. In asynchronous circuits that follow this
signaling scheme, circuit elements respond only to
transitions of states. Generally, we will refer to these
transitions as events and to circuits designed using
this signaling convention as event logic.
3. Request - Acknowledge Interface
Request-acknowledge interface is one of the
simplest forms of asynchronous communication
2
between modules. In this model, the sender module
sends a request event
1
(Req) to the receiver module.
As soon as the receiver completes the requested task,
it sends an acknowledge event (Ack) back to the
sender module (see fig. 4-1 ).
Sender Receiver
Req
Ack
Sender-Receiver Request/Acknowledge Interface
4. Bundled-Data Communication
Under this model, when the sender module sends
data to the receiver module the request-acknowledge
protocol should follow the following procedure.
First, the sender places the valid data on the data
wires. Then, the sender initiates a request to the
receiver. Furthermore, the sender should maintain
the data stable until an acknowledge signal is
received from the receiver module. Finally, the data
and request wires should be connected in such a way
that the valid data reaches to the receiver before a
request is initiated. This protocol is called the
bundled-data protocol ( see fig. 4-2).
Bundeled-Data Communication
Sender Receiver
Req
Ack
Data
5. Event Logic Cell Library
To allow for standard cell implementation of event
logic, a library of essential event logic cells has been
developed. The operation of the cells has been
verified through simulation of the circuit extracted
from their VLSI CMOS layout. Work is still going
on to add more of the needed cells to the library. In
the following we present the event logic cells that
have already been implemented.
I- The Merge element : Merges two independent
events to provide an OR function for events. It
is basically an exclusive-OR gate (XOR).
1
An event is a transition on some control signal irrespective of
the sense of this transition. In this regard both high to low and
low to high transitions are considered as events indistinguishable
from one another.
M
B
A
Out
XOR
B
A
Out
O u t
A
B
B
A
A
B
Vcc
MERGE ELEMENT
Gate Level Representation
Transistor Level Implementation
II- The Muller C-Element (Rendezvous) which
provides the AND function for events. The
output switches to logic 1 only when both inputs
are 1’s, and it switches to 0 only when both
inputs are 0’s. Otherwise, the output does not
change.
Out
A
B
Vcc
CLR
CLR
Vcc
C
A
B
Out
MULLER C-ELEMENT
Out(t+1) = A . B + Out(t). (A + B)
Gate Level Representation
Transistor Level Implementation
CLR
II- Switch Element: is a double throw electronic
switch which is used in several other circuits, e.g. in
event registers.
C
Ii
Qi
C
C
C
Out
IF C = 1 Then Out = Ii
Else Out = Qi
Qi
Ii
Out
C
Gate level Representation Transistor-Level Implementation
Switch - Element (Double - Throw Switch)
VDD
IV- G- Latch (GL) A transparent latch which
latches the input data (D) on the falling edge of
the control signal (G.)
L
CLR
CLR
Vcc
G - Transparent Latch ( GL )
Transistor Level Implementation
Gate Level Representation
Vcc
Vcc
G
D
G
D
L
CLR
GL
V - The TOGGLE-element steers its input events
(IN) to one of its outputs (Out0 & Out1)
alternately. After a clear signal, the dotted
output (Out0) will get the first event.
3
VI- The SELECT-element steers the input
events (IN) according to the Boolean value of its
control (diamond-shaped) input (Sel).
G
D
L
CLR
GL
G
D
L
CLR
GL
Out0
Out1
CLR
Out0
Out1
CLR
IN (event)
SELECT MODULE
Gate Represesntation Logic Implementation
SELECT
Tr ue Fa lse
Sel
Sel
M
IN (event)
IN (event)
M
VII- The CALL-element, remembers which client
request R1 or R2 has called the procedure RS
and after the procedure is done, the
acknowledge event AS triggers a matching
acknowledge event at either at A1 or A2.
VIII- The Event-Bit, is a storage bit (latch) which
latches either a 0 or a 1 depending on whether it
receives an event request for latching a 0
(Rset0), or for latching a 1 (Rset1). Latching a 0
(or 1) is acknowledged by an output event on
A0 (or A1). The stored data bit is available at
the Boolean output Q.
IX- Parallel Load Event Register, Earlier
implementations of Event registers required the
use of two alternating control events; Pass and
Capture [Sut-89]. The circuit implementation
reported here requires only one Request input
control event which internally generates a self-
timed pass and capture events. The circuit self-
initializes to the state where all FFs are in the
"capture" position. Upon receiving a request
( Req ) event, FFs will switch to the "PASS"-
position and once data are propagated to the
output it will automatically switch to the capture
position and an acknowledge event will be
asserted. Simulation of the logic extracted from
the transistor level layout of a 4-bit register is
shown below.
C
C
A1
A2
CLR
AS
R1
R2
RS
CLR
CALL
AS
RS
R2
R1
A2
A1
CALL MODULE
Gate Level RepresentationLogic Implementation
CLR
Event Storage Bit
Logic Implementation
Symbolic Representation
Vcc
G
D
L
CLR
GL
G
D
L
CLR
GL
Q
Rset0
Rset1
A0
A1
CLR
Event Bit
Rset0
Rset1
A0
A1
Q
Rset0 = Request to Set Q to 0
Rset1 = Request to Set Q to 1
A0 = Acknowledge Setting Q to 0
A1 = Acknowledge Setting Q to 1
Clear
Q1
Qn-1
K
Req
K
ACKK
Req
I0
I1
.
.
In-1
C
C
Q0
Clear
Req
Q0
Q1
Qn-1
ACK
.
.
.
.
.
.
I
n-1
I
1
I
0
C
Ii
Qi
C
C
C
Out
Qi
Ii
Out
C
IF C = 1 Then Out = Ii
Else Out = Qi
Parallel Load Event Register
Logic-Level Implementation
Gate level Representation
Transistor-Level Implementation of the
Double-Throw Switch Circuit
X- The Enable module, drives n-bit bundled data
(I
1
, I
2
, ...., I
n
) onto a shared bus in response to a
request event on the R
en
control input. An
enable acknowledge event is generated after the
output data bundle (Q1-Qn) are stable on the
4
bus. If a disable request event R
dis
is received,
the data outputs are placed in the Hi-Z state, and
an event is generated at the disable acknowledge
output A
dis
. For proper use, the enable and
disable requests should alternate.
.
.
.
.
.
.
.
.
.
.
.
.
min p & n
sizes
I1
I2
In
Q1
Q2
Qn
Rdis
Ren
C
1
2
n
Aen
Adis
GL
G
D
Q
Aen
Rdis
Q1
Q2
Qn
In
I2
I1
Adis
Ren
Enable-n
Xi
C
Qi
C
C
Qi
Xi
C
GL
G
D
Q
C
C
Symbolic Representation
Logic Implementation
ENABLE MODULE
CLR
CLR
XI- Event Counter, Shown below is the logic for
one possible implementation of a 2-bit counter.
The count (C1C0) is incremented each time an
event is received at the input request
line(IN_REQ). An Acknowledge signal is
issued after count update.
BIT
BIT
As
Ar
As
Ar
Output
Output
TOG
TOG
SET
RESET
SET
RESET
INC_REQ
Event Counter
M
M
Ack
C0
C1
XII- Event Decoder, Shown below is an event logic
implementation of a 2-bit event decoder. An
input event received on the input request line is
routed to one of 4 possible outputs depending on
the value of the control Boolean variable C1C0
XIII- Other Modules, Still more modules need to
be added, e.g. an arbiter, more data path
Event Decoder
SEL
T
F
SEL
T
F
SEL
T
F
C0
R
C1
C0
0
1
2
3
MSB
LSB
R
C1
C0
00
01
10
11
EVENT DECODER
2-To-4
Event Decoder
Representation
2-Bit Event Decoder
Logic Implementation
6. Conclusion
A standard library of cells for event logic modules
has been designed, simulated and its layout
generated. The modules are to be used as part of a
silicon compiler which generates VLSI layout mask
geometry of formally specified parallel algorithms.
Acknowledgments
Acknowledge is due to King Fahd University of
Petroleum & Minerals and King Abdul-Aziz City of
Science & Technology for their continued support.
We also like to acknowledge our students Hussein
AL-Jamal & M. Al-Humaigani for their dedication.
References
[Ber-94] Kees Van Berkel, et. al. . “Asynchronous
Circuits for Low Power,” IEEE Design &
Test of Computers,Summer 94, pp. 22-32.
[Bru-91] Brunvand, E. “Translating Concurrent
Communicating Programs into
Asynchronous Circuits,”, Ph.D. Thesis.,
Carnegie Mellon Univ., 1991.
[Ell-94] Elleithy, K. M., and Amin, A. A.,
“Parallelism Analysis and Extraction of
Digital Signal Processing Algorithms,”,
28
th
Asilomar Conf. on Signals, Systems,
and Computers, 1994, pp. 1041 - 1045.
[Lav-93] L. Lavagno and A. Sangiovanni-
Vincentelli, “ Algorithms for Synthesis &
Testing of Asynchronous Circuits,”
Kluwer Academic Publishers 1993.
[Sei-80] C. L. Seitz, "System Timing," in
Introduction to VLSI Systems, C. Mead
and L. Conway, Eds. Reading, MA:
Addison Wesley, 1980, pp. 218- 262.
[Sut-89] Ivan E. Sutherland, "Micropipelines,"
CACM, Vol. 32, No. 6, pp. 720 - 738,
June1989.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
A new approach for parallelism analysis and extraction of digital signal processing algorithms is introduced. The high level description of the input is given in CIRCAL. A dependency graph of the problem is constructed to check existence of cycles. Loops in the dependency graph are parallelized. The approach is illustrated by an example
Book
From the Publisher: The design of asynchronous circuits is increasingly important in solving problems such as complexity management, modularity, power consumption and clock distribution in large digital integrated circuits. Algorithms for Synthesis and Testing of Asynchronous Circuits describes a variety of mathematical models and algorithms that form the backbone and the body of a new design methodology for asynchronous design. The book is intended for asynchronous hardware designers, for computer-aided tool experts, and for digital designers interested in exploring the possibility of designing asynchronous circuits. It requires a solid mathematical background in discrete event systems and algorithms. While the book has not been written as a textbook, nevertheless it could be used as a reference book in an advanced course in logic synthesis or asynchronous design. Algorithms for Synthesis and Testing of Asynchronous Circuits also includes an extensive literature review. The review summarizes and compares classical papers from the 1960s with the most recent developments in the areas of asynchronous circuit design testing and verification.
Article
Abstract: "As VLSI technology improves, the number of devices that can be built on a chip, and the speed of those devices continue to increase. These improvements allow much more complicated systems to be considered than were possible a short time ago. Along with these improvements, however, come many challenges directly associated with the speed and scale of the new circuits. This thesis presents a method for taming the complexity of large and fast VLSI systems. As chips get larger, and delays in signal propagation even inside a single chip become more significant, systems that are designed around a global synchronizing signal such as a clock become more difficult to design. One alternative is to design the system hierarchically as a set of subsystems each operating at its own rate and cooperating through communication. These subsystems can be built using asynchronous or self- timed circuit techniques which allow the circuits to run at their own speed without synchronizing to a global clock signal. The act of communicating synchronizes the processes involved in the communication and allows data to be shared between processes. Another problem of large systems is related to the very complexity of the system. One method for taming this complexity is to use automatic methods for generating circuits from behavioral descriptions. Such a system is usually called a silicon compiler. This allows the designer to abstract away details of the low- level circuits and think of system behavior in terms of high level programs. Because the generated circuits faithfully mimic the behavior of the program, the resulting circuits are correct by construction. In order to design efficient systems in this way there must be a way for the programmer to reason about the resulting circuit based on the program text. The translation process must be sufficiently transparent to give the programmer some idea of how different program alternatives will affect the compiled circuit. Combining these ideas, I present a method for designing a VLSI system as a concurrent program written in a subset of OCCAM and automatically translating that system description into an asynchronous circuit. The translation procedure is syntax-directed, and the resulting circuits are improved using correctness-preserving circuit-to-circuit transformations. A compiler has been constructed that automatically performs the translation and transformation." "September, 1991." Thesis (Ph. D.)--Carnegie Mellon University, 1991. Includes bibliographical references. Supported in part by the Avionics Laboratory, Wright Research and Development Center, Aeronautical Systems Division (AFSC), U.S. Air Force, Wright-Patterson AFB. Supported in part by IBM
Algorithms for Synthesis & Testing of Asynchronous CircuitsSystem Timing," in Introduction to VLSI Systems
  • L Lavagno
  • A Sangiovanni-Vincentelli
  • C L Seitz
[Lav-93] L. Lavagno and A. Sangiovanni- Vincentelli, " Algorithms for Synthesis & Testing of Asynchronous Circuits, " Kluwer Academic Publishers 1993. [Sei-80] C. L. Seitz, "System Timing," in Introduction to VLSI Systems, C. Mead and L. Conway, Eds. Reading, MA: Addison Wesley, 1980, pp. 218-262.
Translating Concurrent Communicating Programs into Asynchronous Circuits Parallelism Analysis and Extraction of Digital Signal Processing Algorithms
  • E Brunvand
  • K M Elleithy
  • A A Amin
[Bru-91] Brunvand, E. " Translating Concurrent Communicating Programs into Asynchronous Circuits, ", Ph.D. Thesis., Carnegie Mellon Univ., 1991. [Ell-94] Elleithy, K. M., and Amin, A. A., " Parallelism Analysis and Extraction of Digital Signal Processing Algorithms, ", 28 th Asilomar Conf. on Signals, Systems, and Computers, 1994, pp. 1041 -1045.