Figure 5- - uploaded by Clifford E. Cummings
Content may be subject to copyright.
FIFO2 partitioning with asynchronous pointer comparison logic

FIFO2 partitioning with asynchronous pointer comparison logic

Source publication
Article
Full-text available
An interesting technique for doing FIFO design is to perform asynchronous comparisons between the FIFO write and read pointers that are generated in clock domains that are asynchronous to each other. The asynchronous FIFO pointer comparison technique uses fewer synchronization flip-flops to build the FIFO. The asynchronous FIFO comparison method re...

Context in source publication

Context 1
... block diagram for FIFO style #2 is shown in Figure 5. To facilitate static timing analysis of the style #2 FIFO design, the design has been partitioned into the following five Verilog modules with the following functionality and clock domains: ...

Similar publications

Article
Full-text available
Tone mapping algorithms are used to adapt captured wide dynamic range (WDR) scenes to the limited dynamic range of available display devices. Although there are several tone mapping algorithms available, most of them require manual tuning of their rendering parameters. In addition, the high complexities of some of these algorithms make it difficult...
Article
Full-text available
Based on the step function and signum function, a chaotic system which can generate multiscroll chaotic attractors with arrangement of saddle-shapes is proposed and the stability of its equilibrium points is analyzed. The under mechanism for the generation of multiscroll chaotic attractors and the reason for the arrangement of saddle shapes and bei...
Preprint
Full-text available
As large language models (LLMs) like ChatGPT exhibited unprecedented machine intelligence, it also shows great performance in assisting hardware engineers to realize higher-efficiency logic design via natural language interaction. To estimate the potential of the hardware design process assisted by LLMs, this work attempts to demonstrate an automat...
Article
Full-text available
Tree multipliers are fast multipliers which are important for timing-critical applications. However, due to the irregular multiplier structure, the process of coding a tree multiplier is often very time-consuming. In addition, it is difficult to generalize the multiplier codes for variable-width inputs. In this paper, the authors used Python script...
Article
Full-text available
This work presents a configurable architecture for an artificial neural network implemented with a Field Programmable Gate Array (FPGA) in a System on Chip (SoC) environment. This architecture can reproduce the transfer function of different Multilayer Feedforward Neural Network (MFNN) configurations. The functionality of this configurable architec...

Citations

... In [12], Gray code encoding and decoding techniques were employed to mitigate metastability and ensure orderly data transfer. In [13], the Gray code address pointer is divided into four quadrants to generate the empty-full signals for the FIFO circuit. In [14],dual-port memories were adopted to ...
Article
In integrated designs of multiple IP cores across clock domains, signal metastability can occur due to unequal wiring and variations in PVT. This leads to inconsistency between the signals obtained by the target and the signals at the source. Establishing a FIFO is one of the crucial methods for addressing data inconsistency. Therefore, this paper proposes a novel array structure based on one-hot coding, where the row and column codes generated by Johnson counters are XORed to create the address pointer. This innovation reduces the area for the FIFO and enables rapid control logic using one-hot coding. Furthermore, a state-based approach is employed to mitigate the impact of memory size on the empty/full detection circuit. It only records the read-and-write addresses, enhancing the reconfigurability of the FIFO. Using the SMIC 0.18µm process, the synthesis and simulation results demonstrate that the FIFO can achieve a maximum operating frequency of 830MHz. Additionally, compared to similar synchronous FIFO, it exhibits a significant 30% reduction in area. When considering different FIFO depths and widths, the method proposed in the paper shows an area reduction of 30% to 47% compared to similar synchronous methods. For a depth of 16 and a data width of one word, the power consumption is about 6.8 mW. The FIFO presented in this paper can serve as a reference for data transmission between different clock domains.
... The server application interacts with a Linux kernel driver module 22 , which writes the data into DDR memory and programs the FPGA registers using the AXI Light bus. The data are transferred via DMA from the memory into a transmit (TX) first-in-first-out (FIFO) buffer 26,27 which holds a maximum of 8192 samples of 128 bits each. The FIFO serves to buffer gaps in the DMA data transmission, and allows efficient transfer of data between regions using different clocks (clock domains). ...
Preprint
Full-text available
We have implemented a control system for experiments in atomic, molecular and optical physics based on a commercial low-cost board, featuring a field-programmable gate array as part of a system-on-a-chip on which a Linux operating system is running. The board features Gigabit Ethernet, allowing for fast data transmission and operation of remote experimental systems. A single board can control a set of devices generating digital, analog and radio frequency signals with a precise timing given either by an external or internal clock. Contiguous output and input sampling rates of up to 40 MHz are achievable. Several boards can run synchronously with a timing error approaching 1 ns. For this purpose, a novel auto-synchronization scheme is demonstrated, with possible application in complex distributed experimental setups with demanding timing requests.
... Wherein, customization is observed to be an optimal approach, the design overhead is considerably high [4]. In the optimization of power and processing overhead, the design approach involves in monitoring the communication protocol and the signal interfacing among different units [5] in the processor unit. The diversity in the design units, and the components used in such design are also a major constraint in the optimization process of a MPSoC unit [6]. ...
Article
High speed computing is the upcoming challenge for next generation applications. To cope with high speed operations, new processing architectures are evolving. Multi processor design is one optimal design approach for such need. In the design development of multi processor unit, Multi-Processor System-on-Chip (MPSoC) has an outcome in the domain of VLSI design. MPSoC are designed to process multiple instructions and data handling simultaneously. The parallel processing feature make this unit faster and optimal design for upcoming applications. However, MPSoC operations have a latency issue in clock allocation and resource utilization, which effects the processing efficiency and introduces delay and resource overhead in MPSoC interface. This paper outlines a Mesochronous operation in MPSoC design for minimizing latency in clock allocation and resource allocation, hence improving the speed of operation.
... Despite the higher degree in complexity, FIFO synchronizers are not sensitive to those drawbacks and therefore transfer the data between two clock domains in the proposed DC-DP-SRAM. A generic FIFO structure [14] is shown in figure 2. The data signals of the FIFO buffer are RDATAF and WDATAF (read/write data) controlled by REF, WEF (read/write enable) as well as RRST and WRST (reset in the read/write clock domain). The clock signals are RCLK and WCLK (read/write clock), while EMPTY and FULL signals indicate whether the FIFO buffer is empty or full respectively. ...
... • Based on the data availability, the EMPTY signal status changes. It takes 2 to 3 read clock cycles (T rclkm ) for the change (referred to as D empty in figure 5), for any frequency combination of read and write clocks of the proposed memory due to the FIFO buffer synchronizer [14] • The EMPTY signal triggers the REF (Read Enable of FIFO buffer) and RE (Read Enable of SC-DP-SRAM). WDATA is written into the SC-DP-SRAM at address WADD. ...
... • Based on data availability, the EMPTY status changes. It takes 2 to 3 T wclkm = T rclkf for any frequency combination of the read and write clock because of FIFO buffer architecture as explained in section III-A [14]. It is followed by a read from the SC-DP-SRAM, requiring 1 T wclkm . ...
Conference Paper
Full-text available
With the advancement in technology nodes, the number of components operating in different clock domains in a System on Chip (SoC) increases. Asynchronous multi-port memory with dedicated write and read ports is used to allow data to cross clock domain boundaries. The dual-port memory architecture introduced in this paper, is based on the Single-Port SRAM (SP-SRAM) that can be generated in larger capacities with better performance statistics compared to the Dual-Port SRAM (DP-SRAM). The proposed design has been evaluated by comparing existing dual-port 1R-1W and 2RW designs in 28nm Ultra Thin Body and Box Fully Depleted Silicon on Insulator (UTBB-FDSOI) technology. A memory with a capacity of 2048 words with 64 bits, shows 15%, 35%, 28% and 4.5% improvement in read power, write power, read-write power consumption and performance respectively over conventional 1R-1W DP-SRAM with equal area. The synthesis with area optimizations applied instead, shows an area advantage of 50% over conventional 1R-1W DP-SRAM, but with a degradation in performance.
... The RTL Schematic as a result of implementation program in Xilinx is as shown in Fig 1 and Fig. 4 using VHDL [8]. It is followed by their respective testbench waveforms obtained during simulation using ISim simulator [9]. The numbers of address bits are2-bit and latch of 4-bit for RTL Schematic as Fig.2, whereas for RTL Schematic shown in Fig. 5, (component names are also changed) address is of 8-bit and latchof 16-bit. ...
... The asynchronous FIFO proposed by Cumming in [7] is very robust FIFO which uses pointers generated from Graycode counters to control the writing/reading data into/from a dual Random Access Memory (dual-RAM). In particular, two pointers are compared asynchronously to detect FIFO's full/empty status, and then full/empty flags are asserted immediately and de-asserted safely by using Asynchronous Assertion, Synchronous De-assertion (AASD) technique. ...
Article
Full-text available
The integration of a variety of IP cores into a single chip to meet the high demand of new applications leads to many challenges in timing issues, especially the interface between different clock domains. Globally Asynchronous, Locally Synchronous (GALS) approach addresses these challenges by dividing a chip into several independent subsystems working with different clock signals. In multi-synchronous Network-on-Chip (NoC) based on GALS architecture, the network routers run with different frequencies, so the problem is how to transfer data safely and efficiently between them. In order to build a synchronization unit to tackle this problem, in this paper, we propose a novel efficient asynchronous First-In-First-Out architecture targeting to multi-synchronous NoCs. Token ring structure, register-based memory, and modified Asynchronous Assertion-Synchronous De-assertion techniques are applied to improve the performance of the proposed asynchronous FIFO. After simulating and verifying the design, we have implemented our asynchronous FIFO architecture with CMOS 180nm technology from AMS. Implementation results are analyzed and compared with previous works to show the strong points of our design.
... The micro-architectural level merging of the bus wrappers with the respective packing/unpacking modules and the asynchronous FIFO offer the latency free bus wrapper to achieve high speed data transaction on NI between various processing IP cores and NoC router. Different encoded asynchronous FIFO schemes such as binary [6], Gray [6], Johnson [7] and one-hot [8] are designed and analyzed. The proposed NI design utilized the best asynchronous FIFO namely Gray encoded FIFO among the four. ...
... The micro-architectural level merging of the bus wrappers with the respective packing/unpacking modules and the asynchronous FIFO offer the latency free bus wrapper to achieve high speed data transaction on NI between various processing IP cores and NoC router. Different encoded asynchronous FIFO schemes such as binary [6], Gray [6], Johnson [7] and one-hot [8] are designed and analyzed. The proposed NI design utilized the best asynchronous FIFO namely Gray encoded FIFO among the four. ...
... Synchronized methods are used to avoid or totally suppress the probability of metastability. The synchronization failure probability can be reduced to an acceptable range by carefully designed synchronizer [6,22]. The simplest and safest solution to avoid metastability problems in an asynchronous clock domain is to use flip-flop, double cascaded synchronizer, triple synchronizer and multi cascaded flip-flops [6]. ...
Article
In this paper, a generic asynchronous First In First Out (FIFO) based WISHBONE compatible plug and play Network Interface (NI) for Network on Chip (NoC) is designed and verified. Four different types of encoded asynchronous FIFOs namely binary, Gray, one-hot and Johnson are designed and analyzed. It is found that Gray-code asynchronous FIFO is the best to handle the asynchronous clock domain issues in NI. The control signals of the WISHBONE bus wrappers from/to asynchronous FIFOs and packing/unpacking modules are asserted concurrently at the same rising edge of the respective router and IP clocks to reduce the latency. The same NI has been utilized for transferring data between synchronous as well as asynchronous clock domains irrespective of clock frequency and phase differences. The proposed NI ensures the seamless high data throughput between the routers and IP cores with minimal latency, higher throughput, higher speed and utilized lesser area compared to the existing design.
... The 10b symbols are clocked into the Elastic Buffer using the recovered clock associated with the receiver PLL. The Elastic Buffer is used for clock tolerance compensation; i.e. the Elastic Buffer is used to adjust for minor clock frequency variation between the recovered clock used to clock the incoming bit stream into the Elastic Buffer and the locallygenerated clock associated that is used to clock data out of the Elastic Buffer [8]. ...
Article
Full-text available
In this proposed design it mainly includes USB 3.0, Physical Layer along with USB 2.0 functionality with Super speed functionality. Physical Layer mainly contains PCI Express and PIPE interface. This proposed design transferred data from transmitter to receiver serially. This design manages to transfer data either on 2.5GT/s or on 5.0GT/s depends upon the mode and rate. The design generates clock that runs on two different frequencies i.e. 125MHz and 250MHz that used to transfer data on parallel interface. This Design manages to capture the data that are coming asynchronously and lock the receiver clock with incoming asynchronous serial data. The architecture for USB 3.0 Physical Layer has been proposed in this paper. The proposed model is implemented and verified using Verilog HDL.
... Different encoding schemes such as Binary, Gray, Johnson and One Hot are used to encode read and write pointers to pass the data from one clock domain to another clock domain to avoid metastability. 14 ...
Article
Connecting different Intellectual Property (IP) cores with the Network on Chip (NoC) router using a Network Interface (NI) is a challenging task due to its asynchronous nature and data width. In this paper, a generic high-speed NI for NoC using Ping Pong Buffers is proposed, in order to ensure low power and seamless high throughput between the router and processing IP core. The proposed scheme uses simple control logic to handle the read and write operations simultaneously from/to the memory modules and disables the unused memory banks as per the data flow required to achieve low power. This proposed method is analyzed with the existing Asynchronous First in First Out (FIFO) based NIs with different encoding schemes like One-Hot encoding and Johnson encoding. The NI is implemented using the asynchronous FIFOs and ping pong—double buffering scheme in Altera Stratix III FPGA. The synthesis results show that the proposed architecture enhances the speed of NI by 30% when memory depth is 8 and enhances speed by 11% when memory depth is 256. The power reduction is 12% and 5% when the memory size is 8 and 30% and 17% when the memory size is 256 for the transaction from PE to Router banks and Router to PE banks respectively.
... Asynchronous FIFOs are used to transfer the data from one clock domain to another clock domain without any loss in the data [5]. This requires a memory architecture which has two memory ports, one for input (or write or push) operation and another for output (or read or pop) operation. ...
... The FIFO is said to be full when the read pointer catches up with the write pointer and the FIFO is said to be empty when the write pointer catches up with the read pointer [5]. Pointers must be one bit larger than the bits needed to address the FIFO memory. ...
... If both are equal, then the pointers have wrapped up the same number of times then the FIFO is empty. If both the MSBs are not equal then the FIFO is full [5]. ...
Conference Paper
Full-text available
In this paper, a low power flexible Network Interface (NI) Architecture for Network on chip (NoC) is proposed. The flexible run time configuration controller in the proposed NI plays a vital role to reduce the power by enabling and disabling the entire asynchronous First In First Outs (FIFOs) based on the traffic conditions between the router and the processing elements (PE). The NI has been implemented in Xilinx Virtex-5 XC5VLX110T FPGA. Experimental results show that the proposed low power NI offers a power improvement of 37%, when both FIFOs are inactive and 32 %, when only one FIFO is active.