Conference PaperPDF Available

A New Synchronous circuit for Elastic Pipeline Architecture

Authors:

Abstract and Figures

Pipelining is a method of circuit design which implements pipelining in logic using of intermediate latches or registers. A simple pipeline circuit for Synchronous design is presented. The main features of this pipeline architecture are the implementation of elastic communication channels and easily implementable in both ASIC (Application Specific Integrated Circuit) and FPGA independent of the EDA (Electronic Design Automation) design tool. The pipeline uses edge triggered flip-flop in its datapath and controller to generate the clock enable signal for the datapath. With this approach, elasticity can be introduced at the level of functional units (e.g. ALUs, memories). The formal specification of the protocol is defined and implementation of elasticity in the pipeline is discussed.
Content may be subject to copyright.
A New Synchronous circuit for Elastic
Pipeline Architecture
Md. Ashraful Islam
Baysand Inc
Dhaka, Bangladesh
ash_apee@yahoo.com
Md. Yeasin Arafath
Baysand Inc
Dhaka, Bangladesh
ara2ras@yahoo.com
Mamun-Ur-Rashid Khandker
Dept. of Applied Physics and Electronic Engg.
Rajshahi University
Rajshahi, Bangladesh
khandker@ru.ac.bd
Abstract Pipelining is a method of circuit design which
implements pipelining in logic using of intermediate latches or
registers. A simple pipeline circuit for Synchronous design is
presented. The main features of this pipeline architecture are the
implementation of elastic communication channels and easily
implementable in both ASIC (Application Specific Integrated
Circuit) and FPGA independent of the EDA (Electronic Design
Automation) design tool. The pipeline uses edge triggered flip-
flop in its datapath and controller to generate the clock enable
signal for the datapath. With this approach, elasticity can be
introduced at the level of functional units (e.g. ALUs, memories).
The formal specification of the protocol is defined and
implementation of elasticity in the pipeline is discussed.
Index Terms—Elastic Pipeline, FPGA, ASIC, Synchronous
Interlocked.
I. INTRODUCTION
Early concept of interlocked pipelines was for
asynchronous pipeline designs ([1] [2]). Asynchronous
pipelines [1] have several properties that have the potential
benefit to circuit design. The most attractive feature is the
ability to only activate a pipeline stage in the presence of valid
data, and local control decisions for pipeline interlocking.
Asynchronous pipelines inherently provide elasticity which
means a variable number of data item scan appear in the
pipeline at any time [5]. If there is no congestion and data
items are injected at wide intervals, data items are widely
spaced in the pipeline and travel rapidly through. If input rates
are higher, spacing becomes tighter between items. In the
extreme case, with a slow or stalled output environment, data
items become bunched or stalled at close intervals. In all cases,
input data items are processed as they arrive, even with an
unknown or irregular arrival rate; there is no wait for a clock
edge. Hence, the inter-token spacing and the throughput rate
are determined dynamically [6].
But the Poor CAD support hinders wide acceptance of
asynchronous methodologies. Moreover most of the
asynchronous pipelines proposed by the early researchers are
fabrication technology dependent.
Latency insensitive schemes proposed in [7] separate the
communication channels from computational units. This
synchronous system uses relay stations at the interfaces
between computational units [8] [9].
Synchronous interlocked pipelines [10] were proposed to
achieve fine grained interlocking at the level of stages. Each
stage is interlocked with its neighboring stages in both forward
and backward direction. As far we know this was the first
interlocked synchronous pipeline technique. But the whole
design was based on latch. Their proposed pipeline circuit is
shown in Fig. 1. Though latches provide some facility to clock
skew and allow slack passing and time borrowing between
pipeline stages, latches are not traditionally used in ASICs and
FPGA designs. Synthesis tools have provided only limited
support for latch based designs. Also latches are not friendly
with DFT tools [3]. For scan testing, they are often replaced by
a flip-flop compatible with the scan-test shift-register. Under
these conditions, a flip-flop would actually be less expensive
than a latch. Another problem is that latch based design are
more difficult to verify. As a latch’s output must be valid at the
clock edge causing it to go opaque that can be used as a hard
clock edge boundary for formal verification tools.
Unfortunately, FPGA and ASIC verification tools don’t
support this methodology. Here we propose a similar
interlocked synchronous pipeline using edge triggered flip-
flops to synchronizing the events on the edge of a clock. For
power reduction we have also implied the clock gating
technique for data channel. Our clock gating circuit is
supported by most of the EDA (Electronic Design Automation)
tools [4].
Fig. 1. Two-phase clocked interlocked synchronous pipeline implementing
forward and backward interlock [10]
International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015
05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh
www.ru.ac.bd/icmeie2015/proceedings/
ISBN 978-984-33-8940--4
II. ELASTIC SYNCHRONOUS INTERLOCKED PIPELINE
A. Specification of the Handshake Protocol
Pipeline interlocking is a technique to control the flow of
data through pipelined systems. Pipeline interlocking is
typically achieved through the use of handshake techniques. To
do so we have adopted the same handshaking protocol
described in [10]. This protocol defines the propagation of data
in forward direction with valid bits. The valid bit represents the
data in the channel is valid for the associated pipeline stage.
Stall bits are propagated in the backward direction of the
pipeline. These stall bits indicate when the pipeline must halt.
That is the destination module cannot receive further data until
the stall bit is de-asserted. The valid/stall handshake interface
model is shown in Fig. 2.
A new data item is read into a data register only when the
data item is valid and the register is not stalled. A data register
does not stall until it is filled with valid data. A stall condition
does not need to propagate backward when there is no valid
data (absence of valid data indicates a hole) to stall in a stage.
Therefore, the whole pipeline system would not be stalled
unless the pipeline completely fills up. This protocol improves
throughput of the system in the presence of stalls as the valid
data is continue to fill in forward direction until all holes have
been filled. The behavior of handshake protocol is illustrated in
a state diagram in Fig 3.
Fig. 2. Interface of elastic synchronous interlocked pipeline
The possible states in a pipeline stage are:
Idle: indicating that the stage has not any valid data.
Valid: indicating that the stage has valid data
received from its previous stage and ready to
deliver to its next stage.
Stalled: indicating that the stage is not ready to
receive new data from its previous stage and
remains in that state until it ready to receive new
data again.
Fig. 3. Possible transition of valid/stall handshake protocol
B. Implementation of elastic synchronous interlocked pipeline
Our proposed pipeline circuit is shown in Fig. 4.Valid bit
propagates in forward direction (left to right) in every positive
edge of clock. Stall bit is allowed to propagate during the low
period of the clock. The data is captured by the gated clock.
The AND function between the stall input and valid output of
the pipeline ensures that holes in the pipeline are filled in by
disabling the stall signal when there is no valid data present.
Two latches are used in each stage for clock gating. These
latches are enabled at low level of clock. It helps to generate a
stable clock gating enable signals during high level of clock.
This ensures the glitch free gated clock which is recommended
by the most of the EDA tools. Data in the input channel is
captured during the positive edge of gated clock. Block
diagram of the pipeline circuit for simulation is in Fig. 5. The
simulation of the pipeline circuit is shown in Fig. 6.
C. Clock Constraint
To calculate maximum clock frequency we have to find out
the worst path delay. The worst case path for an N-stage
pipelined circuit is shown in Fig. 7. We have the following
assumptions: at any stage K, (a) the valid_out [K] arrives at the
input of AND gate during the clock high period, b) the
stall_out[K] can only change during the clock low period.
It is notable that valid_out signal is generated within the
stage, but stall_in signal comes from the next stage. However,
for Stage[N], the stall_in[N] signal comes from external
source; for example, memory.
According to our assumption, the clock high period has the
following constraint TCLK_HtC_Q; where tC_Q is the clock to
output delay of the flip-flop. If tPD is the propagation delay of
each stage (from Fig. 7 which is the combination of AND gate
delay and latch delay), then the clock low time would be
TCLK_L K*tPD. Suppose that the maximum logic delay
between consecutive stages is tLOGIC then the clock period for
conventional pipeline architecture [11] would be T_CLK
tLOGIC + tSU_F + tC_Q + tSKEW; where tSU_F is the setup time for the
flip-flop (which is in the datapath) and tSKEW is the maximum
clock skew between pipeline stages. However, for the proposed
pipeline system the minimum clock period, TCLK will be the
largest between TCLK_H + TCLK_L and T_CLK; That is TCLK
MAX (T_CLK, TCLK_H + TCLK_L).
International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015
05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh
www.ru.ac.bd/icmeie2015/proceedings/
ISBN 978-984-33-8940--4
Fig. 4. Proposed pipeline circuit
Fig. 5. Block diagram of the pipeline circuit for simulation
Fig. 6. Simulation of the pipeline circuit
International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015
05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh
www.ru.ac.bd/icmeie2015/proceedings/
ISBN 978-984-33-8940--4
Fig. 7. Worst case path
III. CONCLUSIONS
A new scheme for synchronous elastic pipeline design has
been presented. An efficient flip-flop based implementation
combines the efficiency of synchronous implementations and
reduced power consumption by clock gating. The proposed
scheme can be applied on different levels of system like, in the
white-box (e.g. microprocessor design) and black-box
scenarios (SoC IPs). A drawback in this design is long
combinational path that exists in the backward direction i.e.
stall propagation path. This drawback can be overcome by
cutting down the path by inserting buffer.
REFERENCES
[1] I.E. Sutherland, “Micropipelines,” Communications of the
ACM,vol. 32, no. 6,pp. 720-738, June 1989.
[2] E.J. McLellan, “Reducing stall delay in pipelined computer
system using queue between pipeline stages,”Digital Equipment
Corporation, U.S. patent 5325495 (1994).
[3] D.Chinnery, K.Keutzer, J. Sanghavi, E. Killian and K. Sheth,
“Automatic Replacement of Flip-Flops by Latches in ASICs,”
Closing the Gap Between ASIC & Custom, Kluwer Academic
Publishers,2002, pp. 187-208.
[4] Synopsys. [Online]. Available: https://www.synopsys.com/
COMPANY/PUBLICATIONS/SYNOPSYSINSIGHT/Pages/Ar
t2-reduceadvsynthesis-IssQ4-11.aspx
[5] D.E. Muller, “Asynchronous Logics and Application to
Information Processing,”Proc. Symp. the Application of
Switching Theory to Space Technology, Stanford University
Press, 1963, pp. 289-297.
[6] S.M. Nowick and M. Singh, “High-Performance Asynchronous
Pipelines: An Overview,” Design & Test of Computers, IEEE ,
vol.28, no.5, pp.8,22, Sept.-Oct. 2011.
[7] L. Carloni, K.L. McMillan and A.L. Sangiovanni- Vincentelli,
“Theory of latency-insensitive design, ”IEEE Transactions on
Computer-Aided Design, vol. 20, no. 9, pp.1059–1076, Sept.
2001.
[8] L.P. Carloni and A.L. Sangiovanni-Vincentelli, “Coping with
latency in SoCdesign,”IEEE Micro, Special Issue on Systems on
Chip, vol. 22, no. 5, pp.12, Octo. 2002.
[9] Tiberiu Chelcea and Steven M. Nowick, “Robust interfaces for
mixed-timing systems with application to latency- insensitive
protocols,” Proc. ACM/IEEE Design Automation Conference,
June 2001.
[10] Hans M. Jacobson, Prabhakar N. Kudva, Pradip Bose, Peter
W.Cook, Stanley E. Schuster, Eric G. Mercer, and Chris J.
Myers, “Synchronous interlocked pipelines,” Proc. International
Symposium on Advanced Research in Asynchronous Circuits
and Systems, pp. 3–12, April 2002.
[11] Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and
Wentai Liu, “Wave-Pipelining: A Tutorial and Research
Survey” IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 6, no. 3, September 1998.
International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015
05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh
www.ru.ac.bd/icmeie2015/proceedings/
ISBN 978-984-33-8940--4
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Editor’s note: Pipelining is a key element of high-performance design. Distributed synchronization is at the same time one of the key strengths and one of the major difficulties of asynchronous pipelining. It automatically provides elasticity and on-demand power consumption. This tutorial provides an overview of the best-in-class asynchronous pipelining methods that can be used to fully exploit the advantages of this design style, covering both static and dynamic logic implementations.
Article
Full-text available
The theory of latency-insensitive design is presented as the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components. Latency-insensitive designs are synchronous distributed systems and are realized by composing functional modules that exchange data on communication channels according to an appropriate protocol. The protocol works on the assumption that the modules are stallable, a weak condition to ask them to obey. The goal of the protocol is to guarantee that latency-insensitive designs composed of functionally correct modules behave correctly independently of the channel latencies. This allows us to increase the robustness of a design implementation because any delay variations of a channel can be “recovered” by changing the channel latency while the overall system functionality remains unaffected. As a consequence, an important application of the proposed theory is represented by the latency-insensitive methodology to design large digital integrated circuits by using deep submicrometer technologies
Conference Paper
Full-text available
Locality principles are becoming paramount in controlling advancement of data through pipelined systems. Achieving fine grained power down and progressive pipeline stalls at the local stage level is therefore becoming increasingly, important to enable lower dynamic power consumption while keeping introduced switching noise under control as well as avoiding global distribution of timing critical stall signals. It has long been known that the interlocking properties of as asynchronous pipelined systems have a potential to provide such benefits. However it has not been understood how such interlocking can be achieved in synchronous pipelines. This paper presents a novel technique based on local clock gating and synchronous handshake protocols that achieves stage level interlocking characteristics in synchronous pipelines similar to that of asynchronous pipelines. The presented technique is directly applicable to traditional synchronous pipelines and works equally well for two-phase clocked pipelines based on transparent latches, as well as one-phase clocked pipelines based on master-slave latches.
Article
Full-text available
Wave-pipelining is a method of high-performance circuit design which implements pipelining in logic without the use of intermediate latches or registers. The combination of high-performance integrated circuit (IC) technologies, pipelined architectures, and sophisticated computer-aided design (CAD) tools has converted wave-pipelining from a theoretical oddity into a realistic, although challenging, VLSI design method. This paper presents a tutorial of the principles of wave-pipelining and a survey of wave-pipelined VLSI chips and CAD tools for the synthesis and analysis of wave-pipelined circuits.
Article
Full-text available
Latency-insensitive design is the foundation of a correct-by-construction methodology for SOC design. This approach can handle latency's increasing impact on deep-submicron technologies and facilitate the reuse of intellectual-property cores for building complex systems on chips, reducing the number of costly iterations in the design process
Chapter
We have overcome some of the limitations of existing ASIC tools for handling latch-based designs, providing a theoretically valid and working methodology for retiming latches by retiming flip-flops. We have demonstrated a successful approach to replacing flip-flops on critical paths by latches to speed up ASICs, providing actual speed improvements of 5% to 20% on real commercial designs. In this chapter we outlined some of the limitations on latch-based ASIC designs. Hopefully, by showing that latches provide performance improvement over traditional flip-flop ASICs with minimal area penalty, future tools and standard cell libraries will provide more support for latch-based designs.
Article
The pipeline processor is a common paradigm for very high speed computing machinery. Pipeline processors provide high speed because their separate stages can operate concurrently, much as different people on a manufacturing assembly line work concurrently on material passing down the line. Although the concurrency of pipeline processors makes their design a demanding task, they can be found in graphics processors, in signal processing devices, in integrated circuit components for doing arithmetic, and in the instruction interpretation units and arithmetic operations of general purpose computing machinery. Because I plan to describe a variety of pipeline processors, I will start by suggesting names for their various forms. Pipeline processors, or more simply just pipelines, operate on data as it passes along them. The latency of a pipeline is a measure of how long it takes a single data value to pass through it. The throughput rate of a pipeline is a measure of how many data values can pass through it per unit time. Pipelines both store and process data; the storage elements and processing logic in them alternate along their length. I will describe pipelines in their complete form later, but first I will focus on their storage elements alone, stripping away all processing logic. Stripped of all processing logic, any pipeline acts like a series of storage elements through which data can pass. Pipelines can be clocked or event-driven, depending on whether their parts act in response to some widely-distributed external clock, or act independently whenever local events permit. Some pipelines are inelastic; the amount of data in them is fixed. The input rate and the output rate of an inelastic pipeline must match exactly. Stripped of any processing logic, an inelastic pipeline acts like a shift register. Other pipelines are elastic; the amount of data in them may vary. The input rate and the output rate of an elastic pipeline may differ momentarily because of internal buffering. Stripped of all processing logic, an elastic pipeline becomes a flow-through first-in-first-out memory, or FIFO. FIFOs may be clocked or event-driven; their important property is that they are elastic. I assign the name micropipeline to a particularly simple form of event-driven elastic pipeline with or without internal processing. The micro part of this name seems appropriate to me because micropipelines contain very simple circuitry, because micropipelines are useful in very short lengths, and because micropipelines are suitable for layout in microelectronic form. I have chosen micropipelines as the subject of this lecture for three reasons. First, micropipelines are simple and easy to understand. I believe that simple ideas are best, and I find beauty in the simplicity and symmetry of micropipelines. Second, I see confusion surrounding the design of FIFOs. I offer this description of micropipelines in the hope of reducing some of that confusion. The third reason I have chosen my subject addresses the limitations imposed on us by the clocked-logic conceptual framework now commonly used in the design of digital systems. I believe that this conceptual framework or mind set masks simple and useful structures like micropipelines from our thoughts, structures that are easy to design and apply given a different conceptual framework. Because micropipelines are event-driven, their simplicity is not available within the clocked-logic conceptual framework. I offer this description of micropipelines in the hope of focusing attention on an alternative transition-signalling conceptual framework. We need a new conceptual framework because the complexity of VLSI technology has now reached the point where design time and design cost often exceed fabrication time and fabrication cost. Moreover, most systems designed today are monolithic and resist mid-life improvement. The transition-signalling conceptual framework offers the opportunity to build up complex systems by hierarchical composition from simpler pieces. The resulting systems are easily modified. I believe that the transition-signalling conceptual framework has much to offer in reducing the design time and cost of complex systems and increasing their useful lifetime. I offer this description of micropipelines as an example of the transition-signalling conceptual framework. Until recently only a hardy few used the transition-signalling conceptual framework for design because it was too hard. It was nearly impossible to design the small circuits of 10 to 100 transistors that form the elemental building blocks from which complex systems are composed. Moreover, it was difficult to prove anything about the resulting compositions. In the past five years, however, much progress has been made on both fronts. Charles Molnar and his colleagues at Washington University have developed a simple way to design the small basic building blocks [9]. Martin Rem's "VLSI Club" at the Technical University of Eindhoven has been working effectively on the mathematics of event-driven systems [6, 10, 11, 19]. These emerging conceptual tools now make transition signalling a lively candidate for widespread use.
Conference Paper
This paper presents several low-latency mixed-timing FIFO designs that interface systems on a chip working at different speeds. The connected systems can be either synchronous or asynchronous. The designs are then adapted to work between systems with very long interconnection delays, by migrating a single-clock solution by Carloni et al. (for "latency-insensitive" protocols) to mixed-timing domains. The new designs can be made arbitrarily robust with regard to metastability and interface operating speeds. Initial simulations for both latency and throughput are promising.
Available: https://www.synopsys.com/ COMPANY/PUBLICATIONS/SYNOPSYSINSIGHT/Pages/Ar t2-reduceadvsynthesis-IssQ4-11 Asynchronous Logics and Application to Information Processing
  • Synopsys
Synopsys. [Online]. Available: https://www.synopsys.com/ COMPANY/PUBLICATIONS/SYNOPSYSINSIGHT/Pages/Ar t2-reduceadvsynthesis-IssQ4-11.aspx [5] D.E. Muller, " Asynchronous Logics and Application to Information Processing, " Proc. Symp. the Application of Switching Theory to Space Technology, Stanford University Press, 1963, pp. 289-297.