Conference PaperPDF Available

A New Synchronous circuit for Elastic Pipeline Architecture

June 2015

June 2015

Conference: International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015

Authors:

Ashraful Islam

Tokyo Institute of Technology

Md Yeasin Arafath

Khulna University of Engineering and Technology

Mamun Khandker

University of Rajshahi

Pipelining is a method of circuit design which implements pipelining in logic using of intermediate latches or registers. A simple pipeline circuit for Synchronous design is presented. The main features of this pipeline architecture are the implementation of elastic communication channels and easily implementable in both ASIC (Application Specific Integrated Circuit) and FPGA independent of the EDA (Electronic Design Automation) design tool. The pipeline uses edge triggered flip-flop in its datapath and controller to generate the clock enable signal for the datapath. With this approach, elasticity can be introduced at the level of functional units (e.g. ALUs, memories). The formal specification of the protocol is defined and implementation of elasticity in the pipeline is discussed.

Two-phase clocked interlocked synchronous pipeline implementing forward and backward interlock [10]

…

Interface of elastic synchronous interlocked pipeline

…

Possible transition of valid/stall handshake protocol

…

Worst case path

…

Figures - uploaded by Ashraful Islam

Content may be subject to copyright.

Content uploaded by Ashraful Islam

Content may be subject to copyright.

A New Synchronous circuit for Elastic

Pipeline Architecture

Md. Ashraful Islam

Baysand Inc

Dhaka, Bangladesh

ash_apee@yahoo.com

Md. Yeasin Arafath

Baysand Inc

Dhaka, Bangladesh

ara2ras@yahoo.com

Mamun-Ur-Rashid Khandker

Dept. of Applied Physics and Electronic Engg.

Rajshahi University

Rajshahi, Bangladesh

khandker@ru.ac.bd

Abstract— Pipelining is a method of circuit design which

implements pipelining in logic using of intermediate latches or

registers. A simple pipeline circuit for Synchronous design is

presented. The main features of this pipeline architecture are the

implementation of elastic communication channels and easily

implementable in both ASIC (Application Specific Integrated

Circuit) and FPGA independent of the EDA (Electronic Design

Automation) design tool. The pipeline uses edge triggered flip-

flop in its datapath and controller to generate the clock enable

signal for the datapath. With this approach, elasticity can be

introduced at the level of functional units (e.g. ALUs, memories).

The formal specification of the protocol is defined and

implementation of elasticity in the pipeline is discussed.

Index Terms—Elastic Pipeline, FPGA, ASIC, Synchronous

Interlocked.

I. INTRODUCTION

Early concept of interlocked pipelines was for

asynchronous pipeline designs ([1] [2]). Asynchronous

pipelines [1] have several properties that have the potential

benefit to circuit design. The most attractive feature is the

ability to only activate a pipeline stage in the presence of valid

data, and local control decisions for pipeline interlocking.

Asynchronous pipelines inherently provide elasticity which

means a variable number of data item scan appear in the

pipeline at any time [5]. If there is no congestion and data

items are injected at wide intervals, data items are widely

spaced in the pipeline and travel rapidly through. If input rates

are higher, spacing becomes tighter between items. In the

extreme case, with a slow or stalled output environment, data

items become bunched or stalled at close intervals. In all cases,

input data items are processed as they arrive, even with an

unknown or irregular arrival rate; there is no wait for a clock

edge. Hence, the inter-token spacing and the throughput rate

are determined dynamically [6].

But the Poor CAD support hinders wide acceptance of

asynchronous methodologies. Moreover most of the

asynchronous pipelines proposed by the early researchers are

fabrication technology dependent.

Latency insensitive schemes proposed in [7] separate the

communication channels from computational units. This

synchronous system uses relay stations at the interfaces

between computational units [8] [9].

Synchronous interlocked pipelines [10] were proposed to

achieve fine grained interlocking at the level of stages. Each

stage is interlocked with its neighboring stages in both forward

and backward direction. As far we know this was the first

interlocked synchronous pipeline technique. But the whole

design was based on latch. Their proposed pipeline circuit is

shown in Fig. 1. Though latches provide some facility to clock

skew and allow slack passing and time borrowing between

pipeline stages, latches are not traditionally used in ASICs and

FPGA designs. Synthesis tools have provided only limited

support for latch based designs. Also latches are not friendly

with DFT tools [3]. For scan testing, they are often replaced by

a flip-flop compatible with the scan-test shift-register. Under

these conditions, a flip-flop would actually be less expensive

than a latch. Another problem is that latch based design are

more difficult to verify. As a latch’s output must be valid at the

clock edge causing it to go opaque that can be used as a hard

clock edge boundary for formal verification tools.

Unfortunately, FPGA and ASIC verification tools don’t

support this methodology. Here we propose a similar

interlocked synchronous pipeline using edge triggered flip-

flops to synchronizing the events on the edge of a clock. For

power reduction we have also implied the clock gating

technique for data channel. Our clock gating circuit is

supported by most of the EDA (Electronic Design Automation)

tools [4].

Fig. 1. Two-phase clocked interlocked synchronous pipeline implementing

forward and backward interlock [10]

International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015

05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh

www.ru.ac.bd/icmeie2015/proceedings/

ISBN 978-984-33-8940--4

II. ELASTIC SYNCHRONOUS INTERLOCKED PIPELINE

A. Specification of the Handshake Protocol

Pipeline interlocking is a technique to control the flow of

data through pipelined systems. Pipeline interlocking is

typically achieved through the use of handshake techniques. To

do so we have adopted the same handshaking protocol

described in [10]. This protocol defines the propagation of data

in forward direction with valid bits. The valid bit represents the

data in the channel is valid for the associated pipeline stage.

Stall bits are propagated in the backward direction of the

pipeline. These stall bits indicate when the pipeline must halt.

That is the destination module cannot receive further data until

the stall bit is de-asserted. The valid/stall handshake interface

model is shown in Fig. 2.

A new data item is read into a data register only when the

data item is valid and the register is not stalled. A data register

does not stall until it is filled with valid data. A stall condition

does not need to propagate backward when there is no valid

data (absence of valid data indicates a hole) to stall in a stage.

Therefore, the whole pipeline system would not be stalled

unless the pipeline completely fills up. This protocol improves

throughput of the system in the presence of stalls as the valid

data is continue to fill in forward direction until all holes have

been filled. The behavior of handshake protocol is illustrated in

a state diagram in Fig 3.

Fig. 2. Interface of elastic synchronous interlocked pipeline

The possible states in a pipeline stage are:

 Idle: indicating that the stage has not any valid data.

 Valid: indicating that the stage has valid data

received from its previous stage and ready to

deliver to its next stage.

 Stalled: indicating that the stage is not ready to

receive new data from its previous stage and

remains in that state until it ready to receive new

data again.

Fig. 3. Possible transition of valid/stall handshake protocol

B. Implementation of elastic synchronous interlocked pipeline

Our proposed pipeline circuit is shown in Fig. 4.Valid bit

propagates in forward direction (left to right) in every positive

edge of clock. Stall bit is allowed to propagate during the low

period of the clock. The data is captured by the gated clock.

The AND function between the stall input and valid output of

the pipeline ensures that holes in the pipeline are filled in by

disabling the stall signal when there is no valid data present.

Two latches are used in each stage for clock gating. These

latches are enabled at low level of clock. It helps to generate a

stable clock gating enable signals during high level of clock.

This ensures the glitch free gated clock which is recommended

by the most of the EDA tools. Data in the input channel is

captured during the positive edge of gated clock. Block

diagram of the pipeline circuit for simulation is in Fig. 5. The

simulation of the pipeline circuit is shown in Fig. 6.

C. Clock Constraint

To calculate maximum clock frequency we have to find out

the worst path delay. The worst case path for an N-stage

pipelined circuit is shown in Fig. 7. We have the following

assumptions: at any stage K, (a) the valid_out [K] arrives at the

input of AND gate during the clock high period, b) the

stall_out[K] can only change during the clock low period.

It is notable that valid_out signal is generated within the

stage, but stall_in signal comes from the next stage. However,

for Stage[N], the stall_in[N] signal comes from external

source; for example, memory.

According to our assumption, the clock high period has the

following constraint TCLK_H≥tC_Q; where tC_Q is the clock to

output delay of the flip-flop. If tPD is the propagation delay of

each stage (from Fig. 7 which is the combination of AND gate

delay and latch delay), then the clock low time would be

TCLK_L ≥ K*tPD. Suppose that the maximum logic delay

between consecutive stages is tLOGIC then the clock period for

conventional pipeline architecture [11] would be T_CLK ≥

tLOGIC + tSU_F + tC_Q + tSKEW; where tSU_F is the setup time for the

flip-flop (which is in the datapath) and tSKEW is the maximum

clock skew between pipeline stages. However, for the proposed

pipeline system the minimum clock period, TCLK will be the

largest between TCLK_H + TCLK_L and T_CLK; That is TCLK ≥

MAX (T_CLK, TCLK_H + TCLK_L).

International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015

05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh

www.ru.ac.bd/icmeie2015/proceedings/

ISBN 978-984-33-8940--4

Fig. 4. Proposed pipeline circuit

Fig. 5. Block diagram of the pipeline circuit for simulation

Fig. 6. Simulation of the pipeline circuit

International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015

05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh

www.ru.ac.bd/icmeie2015/proceedings/

ISBN 978-984-33-8940--4

Fig. 7. Worst case path

III. CONCLUSIONS

A new scheme for synchronous elastic pipeline design has

been presented. An efficient flip-flop based implementation

combines the efficiency of synchronous implementations and

reduced power consumption by clock gating. The proposed

scheme can be applied on different levels of system like, in the

white-box (e.g. microprocessor design) and black-box

scenarios (SoC IPs). A drawback in this design is long

combinational path that exists in the backward direction i.e.

stall propagation path. This drawback can be overcome by

cutting down the path by inserting buffer.

REFERENCES

[1] I.E. Sutherland, “Micropipelines,” Communications of the

ACM,vol. 32, no. 6,pp. 720-738, June 1989.

[2] E.J. McLellan, “Reducing stall delay in pipelined computer

system using queue between pipeline stages,”Digital Equipment

Corporation, U.S. patent 5325495 (1994).

[3] D.Chinnery, K.Keutzer, J. Sanghavi, E. Killian and K. Sheth,

“Automatic Replacement of Flip-Flops by Latches in ASICs,”

Closing the Gap Between ASIC & Custom, Kluwer Academic

Publishers,2002, pp. 187-208.

[4] Synopsys. [Online]. Available: https://www.synopsys.com/

COMPANY/PUBLICATIONS/SYNOPSYSINSIGHT/Pages/Ar

t2-reduceadvsynthesis-IssQ4-11.aspx

[5] D.E. Muller, “Asynchronous Logics and Application to

Information Processing,”Proc. Symp. the Application of

Switching Theory to Space Technology, Stanford University

Press, 1963, pp. 289-297.

[6] S.M. Nowick and M. Singh, “High-Performance Asynchronous

Pipelines: An Overview,” Design & Test of Computers, IEEE ,

vol.28, no.5, pp.8,22, Sept.-Oct. 2011.

[7] L. Carloni, K.L. McMillan and A.L. Sangiovanni- Vincentelli,

“Theory of latency-insensitive design, ”IEEE Transactions on

Computer-Aided Design, vol. 20, no. 9, pp.1059–1076, Sept.

2001.

[8] L.P. Carloni and A.L. Sangiovanni-Vincentelli, “Coping with

latency in SoCdesign,”IEEE Micro, Special Issue on Systems on

Chip, vol. 22, no. 5, pp.12, Octo. 2002.

[9] Tiberiu Chelcea and Steven M. Nowick, “Robust interfaces for

mixed-timing systems with application to latency- insensitive

protocols,” Proc. ACM/IEEE Design Automation Conference,

June 2001.

[10] Hans M. Jacobson, Prabhakar N. Kudva, Pradip Bose, Peter

W.Cook, Stanley E. Schuster, Eric G. Mercer, and Chris J.

Myers, “Synchronous interlocked pipelines,” Proc. International

Symposium on Advanced Research in Asynchronous Circuits

and Systems, pp. 3–12, April 2002.

[11] Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and

Wentai Liu, “Wave-Pipelining: A Tutorial and Research

Survey” IEEE Transactions on Very Large Scale Integration

(VLSI) Systems, vol. 6, no. 3, September 1998.

International Conference on Materials, Electronics & Information Engineering, ICMEIE-2015

05-06 June, 2015, Faculty of Engineering, University of Rajshahi, Bangladesh

www.ru.ac.bd/icmeie2015/proceedings/

ISBN 978-984-33-8940--4

ResearchGate has not been able to resolve any citations for this publication.

High-Performance Asynchronous Pipelines: An Overview

Article

Full-text available

Sep 2011

Editor’s note: Pipelining is a key element of high-performance design. Distributed synchronization is at the same time one of the key strengths and one of the major difficulties of asynchronous pipelining. It automatically provides elasticity and on-demand power consumption. This tutorial provides an overview of the best-in-class asynchronous pipelining methods that can be used to fully exploit the advantages of this design style, covering both static and dynamic logic implementations.

Theory of latency-insensitive design

Article

Full-text available

Sep 2001

The theory of latency-insensitive design is presented as the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components. Latency-insensitive designs are synchronous distributed systems and are realized by composing functional modules that exchange data on communication channels according to an appropriate protocol. The protocol works on the assumption that the modules are stallable, a weak condition to ask them to obey. The goal of the protocol is to guarantee that latency-insensitive designs composed of functionally correct modules behave correctly independently of the channel latencies. This allows us to increase the robustness of a design implementation because any delay variations of a channel can be “recovered” by changing the channel latency while the overall system functionality remains unaffected. As a consequence, an important application of the proposed theory is represented by the latency-insensitive methodology to design large digital integrated circuits by using deep submicrometer technologies

Synchronous Interlocked Pipelines

Conference Paper

Full-text available

May 2002

Locality principles are becoming paramount in controlling advancement of data through pipelined systems. Achieving fine grained power down and progressive pipeline stalls at the local stage level is therefore becoming increasingly, important to enable lower dynamic power consumption while keeping introduced switching noise under control as well as avoiding global distribution of timing critical stall signals. It has long been known that the interlocking properties of as asynchronous pipelined systems have a potential to provide such benefits. However it has not been understood how such interlocking can be achieved in synchronous pipelines. This paper presents a novel technique based on local clock gating and synchronous handshake protocols that achieves stage level interlocking characteristics in synchronous pipelines similar to that of asynchronous pipelines. The presented technique is directly applicable to traditional synchronous pipelines and works equally well for two-phase clocked pipelines based on transparent latches, as well as one-phase clocked pipelines based on master-slave latches.

Wave-pipelining: A tutorial and research survey

Article

Full-text available

Oct 1998

Wave-pipelining is a method of high-performance circuit design which implements pipelining in logic without the use of intermediate latches or registers. The combination of high-performance integrated circuit (IC) technologies, pipelined architectures, and sophisticated computer-aided design (CAD) tools has converted wave-pipelining from a theoretical oddity into a realistic, although challenging, VLSI design method. This paper presents a tutorial of the principles of wave-pipelining and a survey of wave-pipelined VLSI chips and CAD tools for the synthesis and analysis of wave-pipelined circuits.

Coping with latency in SOC design

Article

Full-text available

Oct 2002

Latency-insensitive design is the foundation of a correct-by-construction methodology for SOC design. This approach can handle latency's increasing impact on deep-submicron technologies and facilitate the reuse of intellectual-property cores for building complex systems on chips, reducing the number of costly iterations in the design process

Asynchronous logics and application to information processing

Article

Jan 1963

D. E. Muller

Automatic Replacement of Flip-Flops by Latches in ASICs

Chapter

Jan 2004

We have overcome some of the limitations of existing ASIC tools for handling latch-based designs, providing a theoretically valid and working methodology for retiming latches by retiming flip-flops. We have demonstrated a successful approach to replacing flip-flops on critical paths by latches to speed up ASICs, providing actual speed improvements of 5% to 20% on real commercial designs. In this chapter we outlined some of the limitations on latch-based ASIC designs. Hopefully, by showing that latches provide performance improvement over traditional flip-flop ASICs with minimal area penalty, future tools and standard cell libraries will provide more support for latch-based designs.

Micropipelines

Article

Jun 1989

Ivan E. Sutherland

The pipeline processor is a common paradigm for very high speed computing machinery. Pipeline processors provide high speed because their separate stages can operate concurrently, much as different people on a manufacturing assembly line work concurrently on material passing down the line. Although the concurrency of pipeline processors makes their design a demanding task, they can be found in graphics processors, in signal processing devices, in integrated circuit components for doing arithmetic, and in the instruction interpretation units and arithmetic operations of general purpose computing machinery. Because I plan to describe a variety of pipeline processors, I will start by suggesting names for their various forms. Pipeline processors, or more simply just pipelines, operate on data as it passes along them. The latency of a pipeline is a measure of how long it takes a single data value to pass through it. The throughput rate of a pipeline is a measure of how many data values can pass through it per unit time. Pipelines both store and process data; the storage elements and processing logic in them alternate along their length. I will describe pipelines in their complete form later, but first I will focus on their storage elements alone, stripping away all processing logic. Stripped of all processing logic, any pipeline acts like a series of storage elements through which data can pass. Pipelines can be clocked or event-driven, depending on whether their parts act in response to some widely-distributed external clock, or act independently whenever local events permit. Some pipelines are inelastic; the amount of data in them is fixed. The input rate and the output rate of an inelastic pipeline must match exactly. Stripped of any processing logic, an inelastic pipeline acts like a shift register. Other pipelines are elastic; the amount of data in them may vary. The input rate and the output rate of an elastic pipeline may differ momentarily because of internal buffering. Stripped of all processing logic, an elastic pipeline becomes a flow-through first-in-first-out memory, or FIFO. FIFOs may be clocked or event-driven; their important property is that they are elastic. I assign the name micropipeline to a particularly simple form of event-driven elastic pipeline with or without internal processing. The micro part of this name seems appropriate to me because micropipelines contain very simple circuitry, because micropipelines are useful in very short lengths, and because micropipelines are suitable for layout in microelectronic form. I have chosen micropipelines as the subject of this lecture for three reasons. First, micropipelines are simple and easy to understand. I believe that simple ideas are best, and I find beauty in the simplicity and symmetry of micropipelines. Second, I see confusion surrounding the design of FIFOs. I offer this description of micropipelines in the hope of reducing some of that confusion. The third reason I have chosen my subject addresses the limitations imposed on us by the clocked-logic conceptual framework now commonly used in the design of digital systems. I believe that this conceptual framework or mind set masks simple and useful structures like micropipelines from our thoughts, structures that are easy to design and apply given a different conceptual framework. Because micropipelines are event-driven, their simplicity is not available within the clocked-logic conceptual framework. I offer this description of micropipelines in the hope of focusing attention on an alternative transition-signalling conceptual framework. We need a new conceptual framework because the complexity of VLSI technology has now reached the point where design time and design cost often exceed fabrication time and fabrication cost. Moreover, most systems designed today are monolithic and resist mid-life improvement. The transition-signalling conceptual framework offers the opportunity to build up complex systems by hierarchical composition from simpler pieces. The resulting systems are easily modified. I believe that the transition-signalling conceptual framework has much to offer in reducing the design time and cost of complex systems and increasing their useful lifetime. I offer this description of micropipelines as an example of the transition-signalling conceptual framework. Until recently only a hardy few used the transition-signalling conceptual framework for design because it was too hard. It was nearly impossible to design the small circuits of 10 to 100 transistors that form the elemental building blocks from which complex systems are composed. Moreover, it was difficult to prove anything about the resulting compositions. In the past five years, however, much progress has been made on both fronts. Charles Molnar and his colleagues at Washington University have developed a simple way to design the small basic building blocks [9]. Martin Rem's "VLSI Club" at the Technical University of Eindhoven has been working effectively on the mathematics of event-driven systems [6, 10, 11, 19]. These emerging conceptual tools now make transition signalling a lively candidate for widespread use.

Robust interfaces for mixed-timing systems with application to latency-insensitive protocols

Conference Paper

Feb 2001

This paper presents several low-latency mixed-timing FIFO designs that interface systems on a chip working at different speeds. The connected systems can be either synchronous or asynchronous. The designs are then adapted to work between systems with very long interconnection delays, by migrating a single-clock solution by Carloni et al. (for "latency-insensitive" protocols) to mixed-timing domains. The new designs can be made arbitrarily robust with regard to metastability and interface operating speeds. Initial simulations for both latency and throughput are promising.

Available: https://www.synopsys.com/ COMPANY/PUBLICATIONS/SYNOPSYSINSIGHT/Pages/Ar t2-reduceadvsynthesis-IssQ4-11 Asynchronous Logics and Application to Information Processing

Jan 1963
289-297

Synopsys

Synopsys. [Online]. Available: https://www.synopsys.com/ COMPANY/PUBLICATIONS/SYNOPSYSINSIGHT/Pages/Ar t2-reduceadvsynthesis-IssQ4-11.aspx [5] D.E. Muller, " Asynchronous Logics and Application to Information Processing, " Proc. Symp. the Application of Switching Theory to Space Technology, Stanford University Press, 1963, pp. 289-297.

A New Synchronous circuit for Elastic Pipeline Architecture

Abstract and Figures

Recommended publications

Low power CMOS look-up tables using PROM

A novel asynchronous wrapper using 1-of-4 data encoding and single-track handshaking

A basic mathematical formalism for representation and analysis of RF/microwave EDA and design flows

Design and analysis of the on chip integrated data interfaces for ASIC adopting I2C and SPI protocol