Content uploaded by Alexander Fell
Author content
All content in this area was uploaded by Alexander Fell on Feb 08, 2018
Content may be subject to copyright.
Asynchronous 1R-1W Dual-Port SRAM by using
Single-Port SRAM in 28nm UTBB-FDSOI
Technology
Bharath K., Alexander Fell
Indraprastha Institute of Information Technology, New Delhi, India
Email: {bharath1459, alex}@iiitd.ac.in
Harsh Rawat
ST Microelectronics, Greater Noida, India
Email:harsh.rawat@st.com
Abstract—With the advancement in technology nodes, the
number of components operating in different clock domains
in a System on Chip (SoC) increases. Asynchronous multi-port
memory with dedicated write and read ports is used to allow
data to cross clock domain boundaries. The dual-port memory
architecture introduced in this paper, is based on the Single-Port
SRAM (SP-SRAM) that can be generated in larger capacities with
better performance statistics compared to the Dual-Port SRAM
(DP-SRAM). The proposed design has been evaluated by com-
paring existing dual-port 1R-1W and 2RW designs in 28nm Ultra
Thin Body and Box Fully Depleted Silicon on Insulator (UTBB-
FDSOI) technology. A memory with a capacity of 2048 words
with 64 bits, shows 15%, 35%, 28% and 4.5% improvement
in read power, write power, read-write power consumption and
performance respectively over conventional 1R-1W DP-SRAM
with equal area. The synthesis with area optimizations applied
instead, shows an area advantage of 50% over conventional 1R-
1W DP-SRAM, but with a degradation in performance.
I. INTRODUCTION
The advances in technology nodes lead to a higher density
of intellectual properties (IP) and components integrated in a
System-on-Chip (SoC). To sustain the data exchanges among
these components and to avoid high look-up latencies due
to off-chip memory communication with limited inputs and
output pins (I/Os), large memories are integrated in the chip
itself. According to the International Technology Roadmap
for Semiconductors (ITRS), embedded memories occupy a
large portion of the SoC area [1]. Therefore, to provide large
memory capacities within a minimized area, the bit-cells of
SRAM are scaled down, which causes them to be prone to
process variations. These variations have an impact on the
performance of the bit-cell increasing the gap between the
processing speed of the SoC components and the memory. In
addition, due to the tremendous technological upsurge, a boom
in multimedia applications is witnessed. These applications
(e.g. in mobiles and setup boxes) demand parallel and back to
back data processing for which large memories are to be shared
near the core area of the chip [2]. Further, these applications
necessitate data transfers across different clock domains at
low latencies and high throughputs. Single Port SRAM (SP-
SRAM) despite showing a good performance, does not have
the capabilities to satisfy the aforementioned requirements. As
a consequence asynchronous multi-port memories are required.
Unlike SP-SRAMs which can access only one memory
location in a clock cycle resulting in sequential operations,
GND
M1 M2
M3 M4
RWL
WWL
M7
GND
VDD
M6
M5 M8
RBL
WBLWBLB
Fig. 1: An 8T 1R-1W Dual-Port SRAM cell
multi-port memories have more than one port and can access
multiple memory locations at a time with the ability to perform
read and write operations simultaneously [3]. Dual-Port SRAM
(DP-SRAM) falls in this category, consisting of two ports
for simultaneous operations. Each port of DP-SRAM consists
of dedicated bit and word lines leading to the addition of
extra metal layers in silicon. To control these lines, in total
8 transistors (T) are required in DP-SRAM, thus increasing
the chip area when compared to SP-SRAM cells consisting of
6T or less [4].
The existing DP-SRAM cell is able to perform either a
read or a write operation on each port and hence is named as a
2RW (2 Read Write) cell. Drawbacks of this cell are instability
and size. To address these disadvantages, alterations to the
original cell have been proposed such as an 8T DP-SRAM
with a dedicated read and write port (1R-1W) which can be
addressed simultaneously. Essentially a 1R-1W DP-SRAM is
constructed using SP-SRAM cells with additional Read Bit-
Linea (RBL) which provides a read-only functionality (refer
to figure 1). Although the resulting cell shows a higher read
and write stability over the conventional 2RW DP-SRAM cell,
it consumes considerably more area and power due to the
necessity of 8T and the single ended differential amplifier [5]
required for read operations.
This paper proposes a dual clock 1R-1W DP-SRAM using
SP-SRAM, which gives better performance, with a reduction
in power and area, to replace existing conventional 1R-1W DP-
SRAM. The paper is organized as follows: Section II presents
related work addressing existing 1R-1W DP-SRAM implemen-
tations. Section III describes the proposed design architecture
utilizing SP-SRAM, and its implementation. Section IV shows
the results in terms of performance, area and power, followed
by the conclusion in section V.
II. RE LATE D WO RK
In existing conventional 1R-1W and 2RW DP-SRAMs,
each port accepts a different clock [6] due to their indepen-
dence. However, if both ports try to access the same memory
location, a contention occurs and the data integrity is lost,
since based on the phase of the clock, either the old data is
completely or partially read. A solution is to allow one of
the ports to proceed and to access the cell, while the other
port is blocked and rescheduled. This behavior needs to be
communicated to the accessing IP through a flag, so that the
IP is able to retry the operation. This results in a sequential
execution of the two operations which reduces the throughput.
Moreover, 1R-1W DP-SRAM operates at lower speeds because
of its cell architecture. To overcome the speed disadvantage,
banking architecture technique [7] can be used at the cost of
an increase in area.
Several techniques were proposed in the literature to
overcome disadvantages of power and area consumptions of
conventional 1R-1W DP-SRAM. Time Division Multiplexing
(TDM) [8], [9] and Replica based [10] designs are among
those creating extra read and write ports [11]. TDM based
memory read and write operations are sequential with respect
to the internal clock of the memory although it seems like a
parallel execution with respect to the SoC clock, as the internal
memory clock is twice the highest clock of the accessing
device in the SoC. Hence it does not experience contention
problems and greatly reduces area. However, this memory is
reduced to the domain of low frequency applications. Despite
the fact that this memory has two ports, it has only one clock
as input and hence is limited to synchronous data transfers.
In the Replica based technique, a DP-SRAM with a capac-
ity of Wwords is designed using capacity blocks of W/2or
W/4words with an additional empty block of the same size.
The empty block is used to remap the addresses, if read and
write operations access the same block, essentially allowing a
1R-1W operation within the same clock cycle. It results in a
better performance and lower power consumption because it is
designed by using SP-SRAMs with half or one-fourth the size
of the total DP-SRAM capacity. However this design cannot
be instantiated, if read and write ports require two different
clock frequencies, just like the TDM based memory design.
Both TDM and Replica based techniques can be used to de-
sign a 1R-1W Single Clock DP-SRAM (SC-DP-SRAM) but do
not support read and write operations at different frequencies,
which is a prerequisite for multi-clock domains often found
in SoCs. Therefore they cannot replace the conventional 8T
1R-1W DP-SRAM shown in figure 1. The proposed design
addresses this scenario and can be integrated in a multi-clock
environment. It offers two input signals for clock domains,
while at its core, an SC-SP-SRAM implemented.
III. PROP OS ED DESIGN
In this section, modifications to the SC-DP-SRAM are
proposed to enable the multi-clock domain operation using a
synchronization mechanism essentially converting the SC-DP-
SRAM into a Dual-Clock DP-SRAM (DC-DP-SRAM). The
techniques available for this clock domain crossing include
two stage flip-flops, handshake protocol based and First In First
Out (FIFO) based synchronization. The flip-flop synchronizers
have the disadvantage of incoherency [12] and data loss while
on the other hand the handshake protocol based synchronizers
have the drawback of high latencies [13]. Despite the higher
degree in complexity, FIFO synchronizers are not sensitive to
those drawbacks and therefore transfer the data between two
clock domains in the proposed DC-DP-SRAM.
A. FIFO Synchronizer
WPTR AND
CONTROL
FULL
RPTR AND
CONTROL
EMPTY
A_EMPTY
A_FULL
WPTR
RRST
WEF
REF
EMPTY
RCLKF
WRST
WCLKF
RPTR
WPTR
WDATAF RDATAF
FIFO MEMORY
ASY
CMP
FULL
SET
SET
Fig. 2: The basic block diagram of an asynchronous FIFO
A generic FIFO structure [14] is shown in figure 2. The
data signals of the FIFO buffer are RDATAF and WDATAF
(read/write data) controlled by REF, WEF (read/write enable)
as well as RRST and WRST (reset in the read/write clock
domain). The clock signals are RCLK and WCLK (read/write
clock), while EMPTY and FULL signals indicate whether the
FIFO buffer is empty or full respectively.
The depth of a FIFO depends on the read and write
clock frequencies of RCLKF and WCLKF. If the write clock
frequency of the FIFO (fwclkf ) is higher than the read clock
frequency of the FIFO (frclkf ), data overflows can occur which
lead to loss of data [15]. Hence in this scenario the required
depth dof the FIFO cannot be computed. However in case in
which fwclkf ≤frclk f , the maximum required depth of the
FIFO buffer has an upper bound with
d= 1 + 3 fwclkf
frclkf ≤4(1)
as derived from the timing diagram (refer to figure 3).
WCLKF
RCLKF
WEF
WDATAF (A) (B) (C) (D)
EMPTY
FULL
REF
RDATAF (A) (B)
|—-Twclkf —|——Dempty ——|
Fig. 3: FIFO timing diagram for the scenario in which
fwclkf ≤frclk f
To execute a write operation into the FIFO buffer, one
additional write clock cycle (Twclkf ) is required, represented
by the first summand in equation 1. This behavior prevents the
loss of data integrity, if read and write operations are executed
in parallel ensuring that the write operation has been completed
before the new data is read. Following this write operation the
EMPTY signal indicating the data availability, requires two
to three read clock cycles (2≤Trclkf ≤3) depending on
the phase to change its status which is shown in figure 3 as
Dempty. This is caused by the two flip-flop synchronizers in
the FIFO buffer shown in figure 2. The depth of the FIFO
buffer for fwclkf ≤frclkf is to be large enough to avoid
overwriting existing data before it is delivered to the read port.
Hence the depth depends on the number of write operations
between the time taken to write data into the FIFO buffer and
the EMPTY signal to trigger REF resulting in a maximum
depth of d= 4.
Due to the upper bound of the depth for fwclkf ≤frclk f ,
which is not applicable for fwclkf ≥frclk f as discussed
earlier, in the proposed design a data transfer occurs always
from the slower to the faster clock domain. Hence the data
receiver (the SC-DP-SRAM in this case) always operates on
the faster clock, necessitating two designs:
1) In the first design the SC-DP-SRAM is embedded in
a faster read, while the data is written from a slower
clock domain (fwclkm ≤frclk m). Hence the clock
frequency of the SRAM equals fclkm =frclkf .
2) In the other scenario (fwclkm ≥frclk m), the SC-DP-
SRAM is clocked on a faster write clock. Therefore
fclkm =fwclkm .
Both the designs operate with fwclkm =frclk m including a
potential phase shift between them. In the next section they
are introduced separately.
B. Design for fwclkm ≤frclk m
WEF
WEASY
WCLKF
WCLKM
RDATAF
EMPTY
RRST
WRST
RRST
INITN
FULL
WE
REASY
RADD
RE
CLK
RCLKF
RCLKM
REF
WADDM,
WDATAM
WRST
RADDM
WDATAF
WDATA,WADD
QQOUT
DC−DP−SRAM
FIFO SC−DP−SRAM
Fig. 4: Block diagram for the design with fwclkm ≤frclk m
The block diagram of the design for fwclkm ≤frclk m is
shown in figure 4. Control signals are REASY and WEASY
(asynchronous read/write enable), RADD/WADD (read/write
address), while the data signals are RDATA and WDATA
(read/write data). In this scenario, the data that is to be written
to the SRAM, needs to cross into the domain of the read
clock frequency and is therefore passed through the FIFO
synchronizer as shown in the figure. Since the SRAM consists
of a dedicated read and write port (1R-1W), the data from the
FIFO is always ready to be stored. The timing diagram of the
read and write operations is given in figure 5.
WCLKM
RCLKM
WEASY/WEF
WADDM (a) (b) (c) (d)
WDATAM (A) (B) (C) (D)
EMPTY
REF/WE
WADD (a) (b)
WDATA (A) (B)
REASY
RADD Previous Address (a)
QOUT Previous Output Data (A)
|–Twclkm –|—– Dempty —–|–T–|
Fig. 5: Timing diagram for fwclkm ≤frclkm with T=Trclk m
The sequence of write and read operations in this design:
•The address and data are written into the FIFO buffer
in 1 Twclkm .
•Based on the data availability, the EMPTY signal
status changes. It takes 2 to 3 read clock cycles
(Trclkm ) for the change (referred to as Dempty in
figure 5), for any frequency combination of read and
write clocks of the proposed memory due to the FIFO
buffer synchronizer [14]
•The EMPTY signal triggers the REF (Read Enable of
FIFO buffer) and RE (Read Enable of SC-DP-SRAM).
WDATA is written into the SC-DP-SRAM at address
WADD. This takes 1 Trclk m.
•This entire operation takes a maximum of 1 Twclkm
plus upto 4 Trclkm for any read and write frequency
combination of the memory.
•For the read operation, it takes only one Trclkm, since
the read clock is directly connected.
C. Design for fwclkm ≥frclkm
The block diagram of the proposed design for fwclkm ≥
frclkm is shown in the figure 6. In this design, as SC-DP-
SRAM is clocked at the faster fwclkm, the read address needs
to be transferred from the read to the write clock domain.
After the data has been read from the SRAM, it needs to cross
back from the write into the read clock domain requiring an
additional FIFO synchronizer (FIFO2). Since FIFO2 is written
to at a frequency of fwclkm, while it is read at a slower
rate of frclkm , an upper bound regarding the depth cannot
be calculated. However, since in this scenario the read address
is generated by the component clocked at the same frequency
in which the data is read, no overflow can occur and the FIFO
depth is finite.
Figure 6 shows that the REASY input signal is delayed.
After the address is given through RADDM, it takes several
REF
1 WCLK DELAY
QOUT
REFEMPTY
WADDM
RCLKF
DAR
WDATAM
FIFO1
RCLKM
WCLKM
SC−DP−SRAM
CLK
WDATAF
DC−DP−SRAM
REASY
RADDM
RRST
WRST
WEASY
RRST
WRST
RCLKF
WEF
WDATAF
RDATAF
EMPTY
WCLKF WADD
WDATA
RADD
RE
WE
INITN
RDATAF
WRST
WCLK
RRST
WEF
Q
max( ) RCLK DELAYn
FIFO2
Fig. 6: Block diagram for the design with fwclkm ≥frclk m
clock cycles till the required data is ready at QOUT due to the
FIFO synchronizers and memory latencies. After the delay,
the REASY together with the EMPTY signal of FIFO2 can be
used to enable the REF signal of FIFO2.
The timing diagram of a single write followed by a read
operation is shown in figure 7. The sequence of write and read
operations in this design:
•Similar to the read operation in the design for
fwclkm ≤frclk m (refer to section III-C), the write
operation in this design takes Twclkm since the SC-
DP-SRAM is clocked at the write clock.
•For the read operation, RADDM is written to the
FIFO1 requiring 1 Trclkm .
•Based on data availability, the EMPTY status changes.
It takes 2 to 3 Twclkm =Trclkf for any frequency
combination of the read and write clock because of
FIFO buffer architecture as explained in section III-A
[14]. It is followed by a read from the SC-DP-SRAM,
requiring 1 Twclkm .
•After the data is read from the SC-DP-SRAM, it needs
to be transferred from the write back to the read
clock domain. To enable this, the data is written into
FIFO2 and WEF of FIFO2 is triggered by one Twclkm
delayed RE of SC-DP-SRAM. To write the data into
FIFO2, it takes 1 Twclkm .
•An update of the EMPTY signal status of FIFO2 takes
again 2 to 3 Trclkm .
•REF of the FIFO2 is triggered, if the EMPTY signal
of FIFO2 is inactive and the REASY signal delayed by
n×Trclkm (delayed asynchronous read enable signal,
DAR) is active.
•The first read operation takes n×Trclkm and subse-
quent read operations take only 1 Trclkm .
The delay of n×Trclkm considered for REASY signal is
the maximum data path latency from REASY to the time taken
for the EMPTY signal of FIFO2 to change its status. Therefore
n= 1 Trclkm +p Twclkm + 1 Twclkm + 1 Twclkm
+q Trclkm with p, q ∈ {2,3}(2)
To ensure data integrity, the REASY signal is delayed by
the upper bound of the latency nas shown in figure 6.
Example 1. For fwclkm =2×frclkm ,
3Trclkm + 4 Twclkm ≤n≤4Trclk m + 5 Twclkm
5Trclkm ≤n≤6.5Trclk m
(3)
After writing the read data into FIFO2, it takes several
clock cycles for the EMPTY signals to be updated due to
the aforementioned synchronizers. However in the meantime,
at maximum three new data sets could be read from the
SRAM and therefore be written into FIFO2. In addition the
latency of the DAR signal needs to be considered for the depth
calculation of FIFO2. To summarize, the depth of FIFO2 can
be expressed as
dF IF O 2= 1 + 3 + (max(n)−min(n)) (4)
The depth of FIFO2 is largest, if frclkm =fwclkm with
max(dF IF O 2) = 1 + 3 + 2 = 6.
Adding FIFO synchronizers with predetermined sizes con-
verts an SC-DP-SRAM into an asynchronous dual clock-dual
port SRAM (DC-DP-SRAM) utilizing only SC-SP-SRAM
embedded in additional logic. In the next section, the design
for both scenarios have been implemented to obtain results
regarding cycle time, area and power consumption.
IV. RES ULT S
In this section, the proposed asynchronous DC-DP-SRAM
designed using SC-SP-SRAM, is compared with existing con-
ventional 1R-1W DP-SRAM and 2RW DP-SRAM designs
in terms of power, performance and area using Ultra-Thin
Body and Box Fully Depleted Silicon on Insulator (UTBB-
FDSOI) technology for both scenarios, fwclkm ≤frclk m
and fwclkm ≥frclk m. In literature 1R-1W DP-SRAMs are
implemented using SP-SRAMs with Time Division Multi-
plexing (TDM) [8] and Replica based [10] techniques. The
designs are synthesized in Synopsys Design compiler (DC)
for area and clock frequency calculation and power is obtained
using Synopsys Primetime. The capacities of memory designs
considered are 256, 512, 1024, 2048, 4096 and 8192 words
of 64 bit width each. The size of the conventional 1R-1W
DP-SRAM and 2RW DP-SRAM is limited to 2048 words
of 64 bits. To generate 4096 and 8192 sizes, the 2048 word
sized memory is duplicated multiple times and organized such
that it can be addressed as a single memory bank and results
are calculated. All values shown in the graphs are normalized
to the readings of conventional 1R-1W DP-SRAM with 256
words of 64 bits capacity. For the two proposed designs for
fwclkm ≤frclk m and fwclkm ≥frclkm , the slower clock is
set to half the frequency of faster clock.
Figure 8 shows the cycle time comparison among various
designs. It is observed that for RDP RCLK (Replica based
DP-SRAM with the read clock faster than the write clock)
and RDP WCLK (Replica based DP-SRAM with the write
clock faster than the read clock) cycle times are reduced by
16.5% and 4.5% for 1024 and 2048 words capacity compared
RCLKM
WCLKM
WEASY
WADDM (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t)
WDATAM (A) (B) (C) (D) (E) (F) (G) (H) (I) (J) (K) (L) (M) (N) (O) (P) (Q) (R) (S) (T)
REASY/WEF
RADDM (a) (b) (c) (d) (e) (f) (g) (h) (i)
EMPTY (FIFO1)
REF/RE
RADD (a) (b) (c) (d) (e) (f) (g)
Q(A) (B) (C) (D) (E) (F) (G)
WEF
EMPTY (FIFO2)
DAR
QOUT (A) (B) (C)
|–Trclkm –|—p×T—|T|T|———— qTrclkm ————|
Fig. 7: Timing diagram for fwclkm ≥frclkm with T=Twclk m
to conventional 1R-1W DP-SRAM of same sizes. With in-
crease in memory size, the gain in cycle time increases for
RDP RCLK and RDP WCLK memories till a capacity of
1024 words because they are designed by using three SP-
SRAMs, each with a capacity of half the size of the memory
being built. For the sizes larger than 1024 words, the gain
decreases due to the change in 1R-1W DP-SRAM architecture
as 2048 word memory includes Bank-4 architecture [7] and
the same is replicated for designing 4096 and 8192 memories
whereas till 1024, 1R-1W DP-SRAM is designed using Bank-
2 architecture. The maximum operating frequencies of both
TDM RCLK and TDM WCLK memories are approximately
half of that of an SP-SRAM showing 131% higher cycle time
compared to 1R-1W DP-SRAM for the capacity of 256 words
of 64 bits.
256
512
1,024
2,048
4,096
8,192
1
2
3
4
5
1R-1W DP
RDP RCLK
RDP WCLK
TDM RCLK
TDM WCLK
2RW DP
Number of 64 bits words
Normalized Cycle Time
1
Fig. 8: Memory Cycle Time
Figure 9a shows the area comparison of various designs.
The maximum area gains of 50% and 47% are observed for
TDM RCLK and TDM WCLK memories respectively over
the 1R-1W DP-SRAM for the capacity of 2048 words and
the minimum area gain is observed at 256 word capacity
for both the designs. The area gain rises with increase in
capacity because of the usage of SP-SRAMs. RDP RCLK has
almost equal area as that of the 1R-1W DP-SRAM whereas
RDP WCLK occupies 3% more area compared to 1R-1W
DP-SRAM for the capacity of 2048 words. The area gain is
noticeable for the sizes larger than 2048 words for both the
designs. For the capacity of 8192 words, both RDP RCLK and
RDP WCLK show area gains of 10.5% and 9.7% compared to
that of 1R-1W DP-SRAM of same capacity. The area decreases
with the rise in the memory capacity because of the increase in
advantage of using half sized SP-SRAM for both RDP RCLK
and RDP WCLK designs.
Read, write and read-write power are defined as the power
consumed by the memory for one read, write and read-
write operations respectively (refer to figures 9b, 9c and 9d).
RDP RCLK and RDP WCLK memories consume 15%, 35%
and 28% lesser read, write and read-write powers respectively
than 1R-1W DP-SRAM for 2048 words capacity. For both
RDP RCLK and RDP WCLK designs, the read, write and
read-write power gains increase with the rise in the memory
capacity till 2048 words as shown in the figures because of the
usage of half of the memory capacity sized SP-SRAMs. For the
designs with the capacity from 2048 to 8192 words, the power
gains decrease as 1R-1W DP-SRAM with the capacity of 4096
and 8192 words are designed with replicas of 2048 memories.
Whereas for TDP RCLK and TDP WCLK designs, read and
write power consumption is higher than that of the 1R-1W DP-
SRAM and with the increase in size, the power consumption
rises because of its TDM architecture [9]. For TDM RCLK
and TDM WCLK, an improvement of 15% is observed in
read-write power over 1R-1W DP-SRAM for the 2048 word
capacity because of the usage of SP-SRAM. For the designs
with the capacity from 2048 to 8192 words, the power gains
decrease as 1R-1W DP-SRAM for 4096 and 8192 words are
designed with replicas of 2048 word memories.
Conventional 2RW DP-SRAM is also compared with 1R-
1W DP-SRAM and it is clear from the results that the proposed
RDP designs are better in read, write and read-write power
by 52%, 40%, 28% respectively and by 20% and 19% in
performance and area respectively, whereas TDM designs are
better in area, read power and read-write power by 60%, 35%
and 39% compared to 2RW DP-SRAMs.
256
512
1,024
2,048
4,096
8,192
0
5
10
20
25
1R-1W DP
RDP RCLK
RDP WCLK
TDM RCLK
TDM WCLK
2RW DP
Number of 64 bits words
Normalized Area
1
(a) Total Memory Area
256
512
1,024
2,048
4,096
8,192
0
2
4
6
1R-1W DP
RDP RCLK
RDP WCLK
TDM RCLK
TDM WCLK
2RW DP
Number of 64 bits words
Normalized Read Power
1
(b) Memory Read Power
256
512
1,024
2,048
4,096
8,192
0
2
4
6
1R-1W DP
RDP RCLK
RDP WCLK
TDM RCLK
TDM WCLK
2RW DP
Number of 64 bits words
Normalized Write Power
1
(c) Memory Write Power
256
512
1,024
2,048
4,096
8,192
0
1
2
3
1R-1W DP
RDP RCLK
RDP WCLK
TDM RCLK
TDM WCLK
2RW DP
Number of 64 bits words
Normalized Read Write Power
1
(d) Memory Read-Write Power
Fig. 9: Comparison of the power consumption
However due to the synchronization, the proposed DC-DP-
SRAM suffers from a higher write latency of one clock cycle
for the design optimized for fwclkm ≤frclk m. For the scenario
of fwclkm ≥frclk m one additional clock cycle is required
increasing the read latency for the first read operation issued.
V. CONCLUSION
This paper proposes a novel dual-port (DP) 1R-1W mem-
ory with asynchronous read and write clocks targeted to
provide data exchange between IP blocks and components
in a System-on-Chip (SoC). Single-clock single-port SRAM
cells facilitate the data storage providing larger capacities and
avoiding instabilities. Depending on the frequencies of the
read and write clock domain, two different designs have been
presented.
Compared to existing dual port memories, the replica based
design shows a higher performance along with decreased
power consumption over conventional 1R-1W DP-SRAM with
the capacity of 2048 words of 64 bits. The TDM based design
is area efficient compared to conventional 64 bit DP-SRAM of
the same size. Additionally, the proposed design is scalable to
large memory capacities which is not possible with the dual-
port memories available.
REFERENCES
[1] Y. Zorian, “Embedded memory test and repair: infrastructure IP for SoC
yield,” Proceedings International Test Conference, pp. 340–349, 2002.
[2] D. Schwaderer and P. Martin, “Solving SoC shared memory resource
challenges,” Sonics Inc., Jun, 2003.
[3] Y. Ishii, H. Fujiwara, K. Nii, H. Chigasaki, O. Kuromiya, T. Saiki,
A. Miyanishi, and Kihara, “A 28-nm dual-port SRAM macro with active
bitline equalizing circuitry against write disturb issue,” VLSI circuits,
IEEE Sympoism on, pp. 99–100, June 2010.
[4] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems
Perspective. Addison-Wesley, 4 ed., 2010.
[5] Y. Ye, M. Khellah, and D. Somasekhar, “Evaluation of Differential vs.
Single-Ended Sensing and Asymmetric Cells in 90nm Logic Technol-
ogy for On-Chip Caches,” IEEE International Symposium on Circuits
and Systems (ISCAS), pp. 963–966, 2006.
[6] S. Chennapnoor, “Understanding Asynchronous Dual-Port RAMs,” Cy-
press, 2013.
[7] T. Granberg, Handbook of Digital Techniques for High-Speed Design.
Pearson Education, 1 ed., 2007.
[8] C. E. LaForest and J. G. Steffan, “Efficient multi-ported memories for
FPGAs,” Proceedings of the 18th annual ACM/SIGDA international
symposium on Field programmable gate arrays, pp. 41–50, 2010. ACM.
[9] J. Dama and A. Lines, “Pseudo dual-port SRAM and a shared memory
switch using multiple memory banks and a sideband memory,” February
2003. US Patent 8370557 B2.
[10] S. Iyer and S. T. Chuang, “System and method for storing data in
a virtualized high speed memory system,” April 2013. US Patent
8,433,880 B2.
[11] R. Kaur, A. Fell, and H. Rawat, “A 6T SRAM cell based Pipelined
2R/1W Memory Design using 28nm UTBB-FDSOI,” 28th IEEE Inter-
national System-on-Chip Conference (SOCC), pp. 310–315, 2015.
[12] T. Dave, A. Jain, and D. Jain, “Synchronizer techniques for multi-clock
domain SoCs & FPGAs,” EDN Network, Sept. 2014.
[13] M. Arora, The Art of Hardware Architecture. Springer, 1 ed., 2011.
[14] C. E. Cummings and P. Alfke, “Simulation and Synthesis Techniques for
Asynchronous FIFO Design with Asynchronous Pointer Comparisons,”
Synopsys User Group (SNUG), 2002.
[15] P. Satish, “Calculation of fifo depth,” GTU PG School, Nov. 2014.