<table>
<thead>
<tr>
<th>Title</th>
<th>A full current-mode sense amplifier for low-power SRAM applications.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Author(s)</td>
<td>Do, Anh Tuan.; Low, Jeremy Yung Shern.; Kong, Zhi Hui.; Yeo, Kiat Seng.; Low, Joshua Yung Lih.</td>
</tr>
<tr>
<td>Issue Date</td>
<td>2010-08-30T04:08:14Z</td>
</tr>
<tr>
<td>URL</td>
<td><a href="http://hdl.handle.net/10220/6361">http://hdl.handle.net/10220/6361</a></td>
</tr>
</tbody>
</table>

© 2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder. http://www.ieee.org/portal/site This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
A Full Current-mode Sense Amplifier for Low-power SRAM Applications

Anh-Tuan Do, Jeremy Low Yung Shern, Zhi-Hui Kong, Kiat-Seng Yeo, and Joshua Low Yung Lih
Division of Circuits and System (EEE2), School of Electrical and Electronics Engineering
Nanyang Technological University, NTU
Singapore
atdo@ntu.edu.sg

Abstract—A full current-mode sense amplifier is presented. It extensively utilizes the cross-coupled inverters for both local and global sensing stages, hence achieving ultra low-power and ultra high-speed properties simultaneously. Its sensing delay and power consumption are almost independent of the bit- and data-line capacitances. Extensive pre-layout simulation results based on a 1.8 V/0.18 μm CMOS technology from Chartered Semiconductor Manufacturing Ltd. (CHRT) have verified that the new SA outperforms the best published designs with 64% and 45% speed and power consumption improvements respectively. Furthermore, the new design can operate down to a supply voltage of 1 V. These attributes of the proposed SA make it judiciously appropriate for the use in the contemporary high-complexity systems, which continually crave for low-power and high-speed characteristics.

I. INTRODUCTION

SRAM-based cache is one of the most important components of state-of-the-art VLSI system. Being ubiquitous in the design of modern microprocessors for bridging the widening divergence between the performances of the Central Processing Unit (CPU) and the DRAM-based main memory [1], fast SRAM caches are responsible for increasing the speed of the data flows, and hence the speed of the system. This trend is accentuated by the never-ending market demand for sophisticated communication and multimedia applications, which calls for state-of-the-art portable electronic gadgets with high-performance as their requisite features. According to the 2002 International Technology Roadmap for Semiconductors (ITRS 2002) [2], memory chip will occupy 90% of the chip area by 2013. Therefore, the power dissipated within the on-chip caches, both active and standby will become dominant parts of the total power consumption of the chip. In view of the above, there is invariably an apparent urgency to address these two often-conflicting power and delay requirements [3-4]. While there are a lot of sources of power consumption (e.g. leakage, memory cells, SA, I/O circuits, to name a few) that can be reduced, total delay is mainly decided by the significant capacitances attributed by the long-wire paths routed in close proximity (commonly known as $C_{BL}$ and $C_{DL}$) [5-6]. These highly capacitive wires are also important sources of power dissipation during the read and write operations [5-6]. The current mode Sense Amplifier (SA), which has the ability to quickly amplify a small differential signal on the Bit-Lines (BLs) and Data-Lines (DLs) to the full CMOS logic level without requiring a large voltage swing on these lines, is widely used as one of the most effective ways to reduce both the sensing delay and the power consumption of the SRAM [7-17].

In this paper, we propose a full current-mode SA that enhances both the sensing speed and the power consumption of the previously published designs. Its circuit performance is then extensively simulated and graphically presented in comparison with other three current-mode SAs. From this point onwards, these designs are referred to as the high-speed [11], the ultra low-power [14] and the charge-transferred [17] designs.

II. THE PROPOSED DESIGN

The proposed SA, coupled with a simplified read-cycle-only memory system, is presented in Fig. 1. It consists of two sensing stages: local and global. The local sensing stage is formed by four pMOS (P3-P6) and three nMOS (N1, N2 and N7) transistors. While P3 and P4 act as a column switch, the rest of the transistors establish the local cross-coupled inverters, which are responsible for transferring the BL differential currents to the DLs. The global sensing stage consists of three pMOS (P7-P9) and five nMOS (N3-N6 and N8) transistors. In Fig. 1, two output inverters, which serve as buffers to drive the potentially large output loads to the full CMOS logic output levels, are also included. The operation of the proposed SA is described as follows.

During the standby period, P3 and P4 are turned off to block any BL currents. The Column Select and Global Enable (CS and GEN) signals turn on N7 and N8 to equalize the nodes A, B and C, D to the same potential respectively. Meanwhile, the two pre-charge transistors N5 and N6 are turned on to pre-charge both DLs to ground. At the same time, P9 is turned off to save power. Since P9 is off and the DLs are...
pre-charged to ground, C and D are also at low potential during standby. As a result, both the nMOSs of the output inverters are in the cut-off region and no DC current is dissipated by these output buffers. This topology ensures that the standby current of the circuit, and thus the power dissipation are kept at minimum level.

Figure 1. The proposed design coupled with a simplified read-cycle-only memory system.

Consider both RS1 and CS2 being activated during a read operation. The pre-charge signal (PRE) turns N5 and N6 off, allowing the DL voltages to change freely. The memory cell at the upper row and right column will be selected, resulting in a small current flowing from the complementary bit-line (BLB - read as Bit-line-bar) into the cell and discharges the BLB to a voltage level lower than that of the BL. As CS2 is triggered low, P3 and P4 are turned on to transfer the bit-line potentials to the inputs of the local cross-coupled inverter. At the same time, N7 is turned off to activate the local cross-coupled inverters. This building block senses the voltage difference at the source terminals of P5 and P6 and quickly finishes its latching process. Hence, node A is pulled to VDD while node B is discharged to ground. More importantly, during this latching process, the pulsing current flowing from the N2 to the DLB is much higher than that from the N1 to the complementary DL, as shown in Fig. 2.

Figure 2. Waveforms at several nodes of the proposed SA during a read cycle.

These currents charge up the C_DLS and a voltage difference is established across the DLs, which is subsequently amplified by the global sensing stage. Once the latching process of the local sensing stage is completed, P9 is turned on while N8 is turned off. A similar process as local cross-coupled inverter takes place and the intermediate outputs are obtained at nodes C and D. These two voltages are then fed to the output buffers to get the full CMOS logic levels. It is worth mentioning that the global sensing stage can only be activated after the latching process of the local amplifier finished. The waveforms of several nodes of the proposed SA during a read cycle are shown in Fig. 2.

The total power dissipated in the proposed SA is limited by the cell current flowing from the BLs to the node of a cell where a '0' is stored and the switching currents of the sensing stages. Since after latching, the cross-coupled configuration is in the stable stage and no additional current is consumed and hence the power dissipated on the BLs and DLs is optimized. Furthermore, the new design’s sensing delay is essentially equal to twice of the switching time of the cross-coupled inverters. Therefore, the overall performance of the new design is superior when compared to the other circuits, in terms of both sensing delay and power consumption.

III. PERFORMANCE COMPARISON AND ANALYSIS

The proposed SA and the other existing designs [11, 14 and 17] have been optimized and extensively simulated using Cadence’s Affirma Spectre circuit simulator based on a 0.18 μm CMOS process from CHRT. Four standard 6T SRAM
cells arranged in two columns-two rows structure have also been used to test the readability of the SAs. Each memory cell was alternatively activated by the corresponding Column-Select (CS1 or CS2) and Row-Select (RS1 or RS2) signals. The orders in which the memory cells were activated are identical for all designs in comparison. Furthermore, their average power consumption and sensing delay were measured against a wide range of $C_{DL}$ and $C_{BL}$ to adequately inspect the actual behavior of each design. After that, $V_{DD}$ sensitivity analysis was also performed on each design to evaluate their robustness. Pre-layout simulation results are presented in Figs. 3 to 6.

**Figure 3.** Sensing delay and average power versus $C_{DL}$ variation for the circuits in comparison at $C_{BL} = 1 \text{pF}$ and $C_{L} = 0.1 \text{pF}$.

**Figure 4.** Sensing delay and average power versus $C_{BL}$ variation for the circuits in comparison at $C_{DL} = 1 \text{pF}$ and $C_{L} = 0.1 \text{pF}$.

**Figure 5.** Sensing delay versus $V_{DD}$ variation for the circuits in comparison at $C_{BL} = 1 \text{pF}$, $C_{DL} = 1 \text{pF}$ and $C_{L} = 0.1 \text{pF}$.

**Figure 6.** Average power versus $V_{DD}$ variation for the circuits in comparison at $C_{BL} = 1 \text{pF}$, $C_{DL} = 1 \text{pF}$ and $C_{L} = 0.1 \text{pF}$.

Figs. 3 and 4 demonstrate the superiority of the proposed design over the other circuits at 1.8 V supply voltage against $C_{BL}$ and $C_{DL}$ variations respectively. For example, at $C_{BL} = 5 \text{pF}$, $C_{DL} = 1 \text{pF}$ and $C_{L} = 0.1 \text{pF}$, its sensing delay is reduced to 36%, 26% and 35% and its power consumption is reduced to 29%, 55% and 48% of those of [11], [14] and [17] respectively. Like other current-mode sense amplifiers, the proposed circuit is insensitive to $C_{BL}$ variation. This point is illustrated clearly in Fig. 3. In addition, the new SA offers an enhanced robustness with varying $C_{DL}$, giving a sensitivity of only 2.5 ps/pF, which is better than 30 ps/pF, 45 ps/pF and 22.5 ps/pF when compared to [11], [14] and [17] respectively. Like other current-mode sense amplifiers, the proposed circuit is insensitive to $C_{BL}$ variation. This point is illustrated clearly in Fig. 3. In addition, the new SA offers an enhanced robustness with varying $C_{DL}$, giving a sensitivity of only 2.5 ps/pF, which is better than 30 ps/pF, 45 ps/pF and 22.5 ps/pF when compared to [11], [14] and [17] respectively. Besides, the proposed design also offers a better $V_{DD}$-sensitivity, as shown in Figs. 5 and 6. It is apparent that only the proposed circuit can operate down to $V_{DD}$ of 1 V while [11], [14] and [17] cease to work at $V_{DD}$ equals to 1.2 V, 1.3 V and 1.3 V respectively. Furthermore, at any supply voltage,
the new design outperforms the rest in terms of both the sensing speed and power consumption. It is worth mentioning here that although the charge-transfer design has longer sensing delay compared to the high-speed circuit at 1.8 V supply voltage, it has a better V<sub>DD</sub> sensitivity and becomes faster than both the high-speed and the ultra low-power designs at lower power supply, as shown in Fig. 5. Table 1 provides a summary of performance metrics comparisons for all the SAs working at 1.8 V.

**Table I** Comparison summary of three circuits for CL = 0.1 PF, CBL = 1 PF, CDL = 5 PF at 0.18 µm CMOS technology and 50 MHz frequency

<table>
<thead>
<tr>
<th></th>
<th>Sensing delay, ns</th>
<th>Average power, mW</th>
<th>Total gate area, µm²</th>
</tr>
</thead>
<tbody>
<tr>
<td>Proposed</td>
<td>0.38</td>
<td>0.29</td>
<td>38.6</td>
</tr>
<tr>
<td>High-speed [11]</td>
<td>1.04</td>
<td>1.03</td>
<td>48.8</td>
</tr>
<tr>
<td>Ultra low-power [14]</td>
<td>1.46</td>
<td>0.58</td>
<td>33.2</td>
</tr>
<tr>
<td>Charge-transfer [17]</td>
<td>1.10</td>
<td>0.59</td>
<td>52</td>
</tr>
</tbody>
</table>

IV. CONCLUSION

A full current-mode SA has been presented. Due to the high-speed, low-power and high-reliability properties of the cross-coupled configuration, the proposed SA offers 64% speed and 45% power improvements when compared to the best speed performer (i.e. the high-speed design) and the best power performer (i.e. the ultra low-power design) at 1.8 V supply voltage respectively. Furthermore, its sensitivities to C<sub>DL</sub> and C<sub>DL</sub> are only 0.02 ps/pF and 2.5 ps/pF respectively, which are much lower than those of the designs in comparison. In addition, the new design also outperforms existing designs in terms of the sensitivity to V<sub>DD</sub>. It can work down to 1 V supply voltage, which is 0.2 V lower than the other designs. In view of the above, it can be concluded that the proposed design is appropriate for applications where ultra low-voltage, ultra low-power, and high-speed are of crucial design considerations.

V. REFERENCES


