Ultra-Low-Power SRAM Design In High
Variability Advanced CMOS
AUG 07 2009
Submitted to the Department of Electrical Engineering and Computer
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
@ Massachusetts Institute of Technology 2009. All rights reserved.
Department of Electric
ngineering and Computer Science
May 5, 2009
C ertified by ...............
Anantha P. Chandrakasan
Joseph F. and Nancy P. Keithley Professor of Electrical Engineering
Accepted by ..........................
Terry P. Orlando
Chairman, Department Committee on Graduate Theses
Ultra-Low-Power SRAM Design In High Variability
Submitted to the Department of Electrical Engineering and Computer Science
on May 5, 2009, in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Embedded SRAMs are a critical component in modern digital systems, and their role
is preferentially increasing. As a result, SRAMs strongly impact the overall power,
performance, and area, and, in order to manage these severely constrained trade-offs,
they must be specially designed for target applications. Highly energy-constrained
systems (e.g. implantable biomedical devices, multimedia handsets, etc.) are an
important class of applications driving ultra-low-power SRAMs.
This thesis analyzes the energy of an SRAM sub-array. Since supply- and threshold-
voltage have a strong effect, targets for these are established in order to optimize
energy. Despite the heavy emphasis on leakage-energy, analysis of a high-density
256x256 sub-array in 45nm LP CMOS points to two necessary optimizations: (1) ag-
gressive supply-voltage reduction (in addition to Vt elevation), and (2) performance
enhancement. Important SRAM metrics, including read/write/hold-margin and read-
current, are also investigated to identify trade-offs of these optimizations.
Based on the need to lower supply-voltage, a 0.35V 256kb SRAM is demonstrated
in 65nm LP CMOS. It uses an 8T bit-cell with peripheral circuit-assists to improve
write-margin and bit-line leakage. Additionally, redundancy, to manage the increas-
ing impact of variability in the periphery, is proposed to improve the area-offset
trade-off of sense-amplifiers, demonstrating promise for highly advanced technology
nodes. Based on the need to improve performance, which is limited by density con-
straints, a 64kb SRAM, using an offset-compensating sense-amplifier, is demonstrated
in 45nm LP CMOS with high-density 0.25 pm2 bit-cells. The sense-amplifier is re-
generative, but non-strobed, overcoming timing uncertainties limiting performance,
and it is single-ended, for compatibility with 8T cells. Compared to a conventional
strobed sense-amplifier, it achieves 34% improvement in worst-case access-time and
4x improvement in the standard deviation of the access-time.
Thesis Supervisor: Anantha P. Chandrakasan
Title: Joseph F. and Nancy P. Keithley Professor of Electrical Engineering
MIT is truly a unique and wonderful place on this earth. For a new graduate student,
as I once was, it can easily be too wonderful and too big. The only way to realize your
place at MIT is through the guidance, encouragement, support, and friendship of an
outstanding advisor like Prof. Anantha Chandrakasan. First and foremost, I thank
Anantha. When I arrived here, I was not sure what, if anything, I could accomplish.
Anantha, convinced me, by always expecting more from me, by always challenging me,
and by supporting me through every research endeavor, that I could be a contributing
member of this great community. His lessons for me have gone far beyond circuits;
he has taught me to be a critical, sincere, cooperative, and respectful researcher.
Anantha works firstly for his students, and I have learned more by watching him than
I ever will from reading volumes of journals. As I proceed in my career, Anantha will
always play an important role; he has given me something to strive for technically
and personally. Thank you, Anantha, for your always strong support and guidance.
I am eternally grateful to my thesis committee members, Prof. Charlie Sodini
and Prof. Duane Boning. Every researcher offers his work to the community hoping
it is received by someone. To be able to discuss my work with such outstanding
researchers as Charlie and Duane is the greatest honor of my career. Charlie and
Duane have given this thesis a level of attention that has made the effort more than
worthwhile. Thank you for your feedback and support, which has always aimed to
make this thesis better. Because of your input, I am much prouder of this work, and
after the many years it has consumed, that means a lot!
There are several faculty at MIT who have had a profound impact on me both
technically and non-technically. I am extremely grateful to Prof. Harry Lee, who's
mastery of circuits, and the ability to make that mastery accessible, has inspired me
to study every last aspect of my field. I am grateful to Prof. Al Oppenheim who,
by example, has shown me the impact that excellence in teaching can have and the
level of dedication that must applied. I thank Prof. John Guttag for encouraging me
to enthusiastically and intrepidly venture into new fields to seek out for myself how I
might broaden my contributions. Finally, I thank Prof. Joel Dawson for showing me
that a newbie can have as big an impact as anyone, and he can do so without strain
or tension, smiling all the way.
By far the most rewarding aspect of MIT has been the people I have been so
fortunate to interact with. First, I must thank Margaret, who has repeatedly rescued
me from overloads and crises. Margaret keeps ananthagroup running straight even
when us students have accidentally gone in the wrong direction! Technically, the most
fun I have ever had was discussing, debating, and pondering with Brian Ginsburg
on matters of how to design an ADC (yes, many of the problems we hotly contested
were already solved, but sometimes re-inventing the wheel is an unmatchable learning
exercise!). I will always remember those years spent with Brian twisting my brain
in front a white-board. Past members of ananthagroup, especially Benton Calhoun
and David Wentzloff, showed me the ropes of being a graduate student. This, as
they taught me, involves more than just tape-outs and paper deadlines; it involves
lunch-time business plans, political/social debates, "useless" riddles and anecdotes,
and most of all, laughs wherever they can be found. Also in this category are Alice
Wang, Frank Honore, Fred Lee, and Raul Blazquez.
I am privileged to have the current members of ananthagroup around me every-
day. I am especially grateful for the technical discussions and collaborations of Joyce
Kwong, Yogesh Ramadass, and Nigel Drego (I will have more to say about these last
two clowns shortly). I must thank my good friend Manish Bhardwaj, not just for
his technical feedback but also for his support and encouragement, which was always
on-hand when I needed it most (like when he put in a late night of chip testing with
me to get results that were due the previous week!). Daniel Finchelstein, Denis Daly,
and I arrived at MIT together, and I have had these two to lean on throughout my
time here. They are the best fellow travellers one can hope for on this sort of journey,
and I am grateful for their friendship the whole way through. It is also inspiring to
see the newer students in the group, Vivienne Sze, Mahmut Ersin Sinangil, Patrick
Mercier, and Masood Qazi, excelling and indeed becoming leaders.
I have especially been looking forward to say something about my friend Ali Shoeb.
His hyperactivity and enthusiasm are the main reasons why I will continually seek to
expand and broaden my horizons beyond any narrow expertise I might have. Ali is
genuinely inspired, and he inspires me! Eugene Shih is more controlled, but he has
contributed equally to the fun I have had on the ninth flour of Building 32!
Thankfully, my experiences at MIT have actually gone far beyond MIT. I am
extremely grateful for the support and encouragement I have received from collabo-
rators at Texas Instruments. Most of all, Dennis Buss has been a champion of my
work throughout my Ph.D. years. His enthusiasm has been a constant driving force,
and he has spun miracles for me on more than one occasion to overcome the barriers
and hurdles that inevitably arise during research. I am also grateful to Ted Houston,
Wah-Kit Loh, Xiaowei Deng, Mike Clinton, Hugh Mair, and Alice Wang for their
constant support and feedback.
I am thankful to Intel for providing me with fellowship support during my Ph.D.
Even more importantly, Kevin Zhang of Intel has played a major role in how I have
approached SRAMs from the research perspective. In fact, much of the work in this
thesis has been inspired by his own research and the feedback he has been so generous
to me with. Kevin has been a constant supporter and a mentor who I will always
look to for stimulating discussions and input.
I am also thankful to Peter Holloway of National Semiconductor. It is much easier
to do research when one has the kind of support that Peter has given me throughout
my Ph.D. Peter has a unique perspective on circuits that is rooted in real-life; the
only way a novice like myself can appreciate such a perspective is through the very
intriguing and stimulating discussions I have had with him.
Completing a Ph.D. is far more than a test of technical execution. In fact, most
of all, it is a test of will and morale. For both of these I am eternally grateful to the
close friends I have made during my time here at MIT. Some of my most important
moments at MIT have been spent during coffee-time with Nigel Drego and Yogesh
Ramadass. Here, we got to transfer our analysis skill to all of life's great problems.
None of us knows if we ever came close or even began to solve any of these, but
we always returned from coffee less stressed, more motivated, and of course slightly
more awake... any way you cut it coffee-time is indispensible! Yogesh, Nigel, Vidya,
Anand, and Nammi are great friends, and we are truly blessed to be able to laugh,
lounge, and talk smack with them. The same, of course, goes for Daniel and Tarik
(and Minou!). Since I arrived here at MIT Raj, Ferdi, Federico, and Gabi have been
the rough-around-the-edges group with whom I could always be myself. This turns
out to be a critical outlet when the pressure begins mounting, as it frequently does
Finally, I come to my family, without whom nothing in my life, let alone my
research, could ever have been possible. Most of all, my hard work and sincere efforts
are for Mom Ji and Dad Ji. I have always relied on your love and prayers to lift me
over obstacles. Of course, Vancouver is a continent away, but I have always felt you
here with me, and that has been the strength I have needed. This thesis is for both
of you. Thank you for your support, love, and blessings.
So far as effort put into this thesis is concerned, the first credit undoubtedly goes
my amazing wife Anita. Ana, you are the reason behind this accomplishment, and
your smile (and occasional craziness!) are the only rewards I hope for every day.
Thank you for your love and support. I love you with all my heart.
I am blessed to also have the support and love of a second set of parents. Mom
and Bug, thank you for your prayers, wishes, jokes, and love. I do not expect you to
read this thesis, but I do hope you realize the role you have played in supporting me
towards its completion. Thank you, once again, for your support, love, and blessings.
I am anxious to thank Angelee, Serena, and Jaimini. You three remind me that
there is a lot more to my life than whatever I am busy with today. Thank you for the
relief and lightening that your support and love always provides. This thesis truly
could not have been completed without the formidable force behind me that you three
have always been.
Similarly, Ang, Jason, and Connor, I know that you are always behind me and
Ana, and we are externally grateful for the love, laughs, and lessons (about leather-
backed turtles, etc.) that you have always provided.
1.1 Ultra-Low-Power Embedded SRAM Applications . ..........
1.2 SRAM Structure and Limitations . ..................
Thesis Contributions . ................
2 SRAM Energy and Operating Metrics
SRAM Energy ........
. ..... ........
SRAM Idle-Mode Leakage Reduction . .............
SRAM Sub-Array Optimal Energy . ..............
2.2 SRAM Operating Margins and Metrics . ................
2.2.2 Write-Margin ...................
Hold-Margin (and Data-Retention-Voltage) . . . ........
2.2.4 Cell Read-Current
. ... .
2.3 SRAM Energy with Variation . ..................
2.4 Summary and Conclusions . ..................
3 Ultra-Low-Voltage SRAM Design
Low-Voltage SRAM Challenges ...................
Low-Voltage Bit-Cell Array . ..................
3.1.2 Low-Voltage Periphery ...................
3.2 Ultra-Low-Voltage SRAM Prototype . .................
8T Bit-Cell with Low-Voltage Circuit Assists ..........
Sense-Amplifier Redundancy ........
. . .
Test-Chip Architecture . . . . ............
Measurements and Characterization . ..............
Summary and Conclusions . . . . ................
4 Performance Enhancement for High-Density SRAMs
High-Density SRAM Performance Challenges . .............
Bit-Cell Read-Current ......
. . . . .
Sense-Amplifier Delay and Uncertainty . ............
Single-Ended Sensing .......
. . . ...
High-Density SRAM Prototype . ...............
Non-Strobed Regenerative Sense-Amplifier . ..........
Test-Chip Architecture ............
. . .
Measurements and Characterization . ..............
4.4 Summary and Conclusions . .................
. . ...
5.1 Summary of Contributions . . . ................
5.2 Concluding Thoughts and Future Directions . .............
6 Appendix A: Acronyms
List of Figures
1-1 SRAM bit-cell density versus technology node showing cell density
scaling in-line with transistor dimension scaling (every two years cor-
responds to a new technology node). . ..................
1-2 Three example low-power applications demonstrating dominating area
and power-consumption of SRAMs: 45nm Intel Core 2 , 90nm ARM1176JZ
(suitable for iPhone application processor) , and 65nm custom
MSP430 . .............
1-3 SRAM trade-offs ..............
Die photo of ultra-low-power low-voltage MSP430 microcontroller dom-
inated by on-chip SRAM cache . . ...................
1-5 Operating states of an SRAM where data-retention consumes energy
even in the absence of active accesses. . ..................
1-6 Typical structure of modern SRAM; 6T bit-cell is composed of NMOS
driver and access devices and PMOS load devices. . . .........
1-7 six-transistor SRAM bit-cell (6T) bit-cell butterfly curves showing bi-
stable behavior during (a) hold, where access devices are "off", and
during (b) read, where access devices are "on" and bit-lines are clamped
Simulated total leakage-current for 1Mb array in 45nm LP CMOS (at
1.1V); result shown includes variation and is normalized to total nom-
2-2 Active- and leakage-energy profiles in digital circuits showing trends
expected in SRAMs .
2-3 Summary of parameters relevant to SRAM energy ............
2-4 Normalized leakage-current reduction with respect to supply voltage
for minimum-sized 90nm, 65nm, and 45nm devices due to DIBL (pre-
dictive models used) .
. . . ............
2-5 Circuitry to enforce idle-mode biasing using (a) programmable sleep
switches  and (b) an operational-amplifier . ...........
. . .
2-6 Waveforms corresponding to idle-to-active and active-to-idle mode tran-
2-7 Summary of SRAM energy components. .
2-8 Sub-array specifications for energy analysis .. . . . .
. . . . .....
2-9 Sub-array individual energy components. ....
. . . . .
. . . . ....
2-10 Sub-array total energy (at room temperature) for various performance
requirements (specified by TCYC,RTN) ...................
2-11 Energy components for TCYC,RTN = 10ms along Vt = 0.45V axis. ...
2-12 Mean and 4a drain-current for minimum sized NMOS in 45nm CMOS
with respect to (a) VDD (with Vt=0.3V) and (b) Vt (with VDD=V). .
2-13 Read SNM definition through butterfly plots . . . . . . . . . . ....
2-14 45nm 0.25 pm2 bit-cell read SNM contours for (a) mean case, and (b)
4a (on top of global variation) case .
2-15 45nm 0.25pm2 bit-cell write-margin contours for (a) mean case, and
(b) 4a (on top of global variation) case. . .........
2-16 Hold SNM definition through butterfly plots ..
.. . .
. . . . .....
2-17 45nm 0.25 pm2 bit-cell hold SNM contours for (a) mean case, and (b)
47 (on top of global variation) case .
2-18 45nm 0.25ptm2 bit-cell read-current contours (log-magnitude) for (a)
mean case (b) 4a .
2-19 Sub-array total energy (at room temperature, with variation) for var-
ious performance requirements (specified by TCYC,RTN) .........
3-1 Minimum supply-voltage of specifically ultra-low-voltage designs re-
cently reported  ...................
3-2 Degradation of LP 65nm NMOS (predicitive model) with respect to
VDD showing (a) drain-current variation and (b) ION/IOFF .......
6T bit-cell for low-voltage analysis. . ...................
0.5mim2 6T bit-cell degradation of (a) read/hold SNM and (b) write-
margin with respect to VDD.... ..............
3-5 Electrical-/ ratio definition and degradation with respect to VDD. . .
3-6 Bit-line leakage during read-data sensing opposing the ability to detect
3-7 Read-current degradation in the presence of variation (a) with respect
to VDD scaling and (b) leading to loss of data sense-ability due to
bit-line leakage. ...........
3-8 Non-buffered bit-cells formed by (a) asymmetrically upsizing one pull-
down path for rapid RdBLT discharge , and (2) addition of device
(M7) to gate bit-cell feedback path against disruption . ......
3-9 8T bit-cell and layout (to overcome read-data-disruptions) shown be-
sides a typical 6T bit-cell and layout. . ...............
3-10 6T bit-cell and 8T bit-cell operating margins for various size layouts
(and equivalent read-current) in LP 65nm CMOS. . ...........
3-11 Bit-cell read-buffer enhancements to manage bit-line leakage using (a)
PMOS/NMOS threshold-voltage skews , and (2) active pull-up on
internal NCB node . . ..... . ... ... .
3-12 8T bit-cell uses two-port topology to eliminate read SNM and periph-
eral assists, controlling BffrFt and VVDD, to manage bit-line leakage
and write errors.
3-13 Read-buffer bit-line leakage in (a) conventional case where unaccessed
read-buffer foot is statically connected to ground and (b) this design
where unaccessed read-buffer foot is pulled up to VDD.
3-14 BffrFt driver must sink the read-current from all bit-cells in accessed
row, and it draws leakage-current in all unaccessed rows. . .......
3-15 To resolve read-buffer footer limitation (a) charge-pump circuit is used
(b) BFB node gets bootstrapped to approximately 2VDD increasing
the current of the BffrFt driver by over 500x. . ............
3-16 Minimum word-line voltage resulting in a successful write with respect
to the bit-cell supply voltage. ......................
. . .
3-17 Virtual VDD scheme (a) supporting circuits, and (b) simulation wave-
3-18 Read-current gain as a result of read-buffer upsizing (a) via width in-
crease, and (b) via length increase (taking advantage of reduced vari-
ability and RSCE). ...................
3-19 8T bit-cell layout with read-buffer upsizing and Bf frFt control (but
no VVDD control).
3-20 Final 8T bit-cell layout and folded-row tiling. . .............
3-21 Differential sense-amp structure cancels effects of global variation. ..
3-22 Monte Carlo simulations of sense-amp statistical offset; at expected
input swing (i.e. 60mV), errors from offset are prominent.
3-23 With sense-amplifier redundancy, each RdBL is connected to N dif-
ferent sense-amplifiers ...................
. . . .
3-24 With sense-amplifier redundancy (a) the size of each individual sense-
amplifier must decrease, and (b) the individual sense-amplifier error
probabilities, defined as the area under the offset distribution exceeding
the magnitude of the input swing, increases. . ..............
3-25 Increased levels of redundancy significanly reduce the error probability
in the overall sensing network. ..............
3-26 Redundancy selection circuitry consisting of a dummy bit-cell and se-
lection state-machine. . . . . ...................
3-27 Overall error probability for implemented sense-amp redundancy scheme
improves by a factor of 5 compared to a single sense-amp scheme. . .
3-28 Sense-amplifier redundancy overhead circuitry for the case of N = 2..
3-29 Normalized sensing-network (N = 2) error probabilities for different
technologies and layout areas. ...................
3-30 Prototype test-chip architecture, with total capacity of 256kb parti-
tioned in eight sub-arrays. . ..................
3-31 Die photo of prototype low-voltage SRAM. . ...............
3-32 Prototype SRAM leakage-power; at the minimum VDD of 0.35V, the
entire SRAM draws 2.2pW of leakage-power. . ..............
3-33 SRAM speed with respect to VDD ......................
3-34 Total power (solid curves) and leakage power (dotted curves) with re-
spect to operating frequency. . ..................
4-1 Degradation in bit-line discharge time for high-density SRAMs caused
by (a) reduced cell read-current and (b) increased bit-line capacitance. 119
4-2 Read SNM trade-off in high-density SRAMs limited by (a) cell size
and (b) inverse correlation with cell read-current, caused by opposing
access-device requirements. ...................
4-3 Conventional strobed sense-amplifier topologies with (a) one input-
output port and (b) separate input-output ports. . ...........
4-4 Array read-path and sense-amplifier strobe-path (a) limited by match-
ing to 5a bit-cell and (b) exhibiting severe delay divergence over process-
voltage-temperature conditions, leading to excess overall delay. ....
4-5 Non-strobed regenerative sense-amplifier (NSR-SA) schematic and ideal
transfer function. ...................
4-6 NSR-SA circuit and waveforms during reset phase. . ..........
4-7 NSR-SA circuit and waveforms during detection phase (for both bit-
line logic cases). . ..................
4-8 Output clocking (a) at array-level with (b) waveforms showing decou-
pling from internal critical read-path. . ..................
4-9 Offset compensation (a) technique and (b) analysis. . ..........
4-10 10k point Monte Carlo simulation showing improved sigma of NSR-SA
access-time compared to conventional sense-amplifier access-time. . .
4-11 NSR-SA robustness to false-regeneration in the presence of charge-
injection errors. .
. . ...
4-12 NSR-SA technique to set regeneration trip-point (VTRIp) for noise-
rejection and sensitivity considerations ..............
4-13 NSR-SA (a) circuit showning noise sensitive nodes (X/Y), and (b)
reponse of X/Y due to transient spikes on VDD, and (c) Response of
X/Y leading to output errors on QB due to sustained step on VDD.
4-14 NSR-SA noise measurement simulation setup. . ...........
4-15 Example bit-line noise sources originating (a) from precharge, word-
line, and column-select control signal coupling, and (b) substrate cou-
4-16 NSR-SA input transfer characteristic .
4-17 Input transfer characteristic for (a) inverter and (b) two stage inverter
4-18 NSR-SA VDD noise transfer characteristic. . ...............
4-19 NSR-SA input transfer characteristic with ±50mV VDD noise. .....
4-20 NSR-SA transfer characteristic for (a) Vss noise and (b) input with
±50mV Vss noise.
. . . . ...
4-21 Input errors resulting from VDD and Vss noise...........
4-22 Block-diagram of prototype test-chip and access-time measurement
. . . .
4-23 Dedicated circuitry to inject a controllable noise-amplitude on one set
of bit-lines and independently adjust the sensitivity/noise-rejection of
4-24 IC die photo of prototype implemented in low-power 45nm CMOS to
compare performance of NSR-SA with conventional sense-amplifier.
4-25 Access-time measurements from 53 chips (at 1V) showing a factor of
four improvement in the NSR-SA distribution sigma compared to the
conventional sense-amplifier sigma. . ................... 149
4-26 Measured bit-line noise-rejection with respect to access-time, showing
ability to tune one at the cost of the other. . ............... 149
List of Tables
1.1 Key existing and emerging applications for biomedical devices . . . .25
Energy collecting and harvesting options  ........
4.1 Test-chip performance summary. . ..................
Moore's law of scaling  has been the most important driving force behind the
semiconductor industry. Scaling has directly or indirectly been the root cause of the
tremendous capabilities of today's ICs and their ubiquitous use in nearly all modern
electronic systems. Though Gordon Moore recently amended his law to include a
much broader set of metrics associated with ICs , his basic statement pertains to
"components," which literally implies number of transistors. Today, even as many
aspects of CMOS device scaling begin to saturate off the exponential trend, density-
scaling remains a primary objective of the semiconductor industry . In the face
of rapidly emerging limitations that are fundamental to continued device shrinking,
density-scaling enables circuit  and architecture level parallelism , providing a
means to achieve energy-efficiency and performance improvements in lieu of of the
Embedded SRAMs provide a direct means of bringing the benefits of transistor-
level density-scaling to the circuit and architecture levels and are therefore vital to
this new model of IC scaling. Due to their regular structure and broad applicability to
so many digital systems, SRAMs are carefully designed as one of the lead components
during the development of new technology nodes, and they utilize highly specialized
and aggressive layout rules that address sub-resolution fabrication limitations. This
level of design attention has allowed SRAM bit-cells to follow density trends in-line
with the transistors themselves . This is shown in Figure 1-1 where bit-cell areas
reported by Intel, IBM, TI, Sony, Renesas, and Samsung have been plotted versus
the technology node (represented by deployment year).
1998 2000 2002 2004 2006 2008
Figure 1-1: SRAM bit-cell density versus technology node showing cell density scal-
ing in-line with transistor dimension scaling (every two years corresponds to a new
Accordingly, to benefit efficiently from transistor density-scaling, modern digi-
tal architectures increasingly emphasize the use and integration of more and more
SRAMs . The resulting consequence for low-power devices is that SRAMs oc-
cupy a dominating portion of the total die area and the total power consumption.
Figure 1-2 shows three state-of-the-art examples intended for increasingly low-power
applications: the Intel Core 2 processor targets mobile computing , the ARM1176JZ
processor targets hand-held computing , and the custom MSP430 microcontroller
targets remote wireless sensor and implantable biomedical computing . The im-
portant trend observed here is that the SRAM (or memory) power becomes more and
more significant in increasingly low-power devices. The precise cause of this is dis-
cussed throughout the following chapters, but in the meantime, it is clear that SRAMs
are a fundamental platform component in the modern semiconductor industry, and
their power-consumption is a limiting factor.
An important evolution in the semiconductor industry is that, today, the appli-
cation space for integrated circuits is extremely broad, extending far beyond desktop
computing microprocessors to include ambient, remote, mobile, and implantable de-
* Implantable, sensor nets
Intel Core 2 (Penryn)
6MB SRAM L2, 64kB RF L1
16kB SRAM cache
16kB SRAM cache
Figure 1-2: Three example low-power applications demonstrating dominating area
and power-consumption of SRAMs: 45nm Intel Core 2 , 90nm ARM1176JZ (suit-
able for iPhone application processor) , and 65nm custom MSP430 .
vices, to name a few. With regards to the constituent digital circuits, all of these
applications have vastly varying and highly stringent demands that require careful
design within the associated trade-offs. In order to adhere to intense scaling trends,
SRAM design is also highly constrained, especially in the face of emerging limita-
tions ranging from device-level variability to system-level power consumption. Since
their impact on the overall system is so significant, and since their design is so con-
strained, modern embedded SRAMs must be developed with the application in mind
so that their own trade-offs can be carefully managed. Generally speaking, SRAMs
are strongly subject to the power, performance, and density trade-offs shown in Fig-
ure 1-3. The precise origins and effects of these trade-offs are discussed throughout
the following chapters, but the overall implication is that improvement in one of the
dimensions strongly stresses the others. Of course, all three dimensions are impor-
tant to some degree in all applications; as a result, embedded SRAM design involves
making judicious compromises in order to support the most important system-specific
requirements. The focus of this work is to investigate techniques that improve the ba-
sic trade-off in order to more efficiently allow optimization of the parameters relevant
for the systems considered (these are discussed in more detail below). It is important
to note that although the illustration in Figure 1-3 indicates a simple inverse relation-
* Mobile computing
ship between power, performance, and density, in reality, the relationships are often
much more complicated, and, importantly, aggressive emphasis on one dimension,
such as power reduction, increases the opposition imposed by the other dimensions
with much higher intensity.
* Desktop, server computing
* Advanced graphics, etc.
Trends: * Low-Vt devices
* Large bit-cells, short bit-lines
* Multimedia handsets
* Mobile, ubiquitous computing
Trends: * High-Vt devices
* Small bit-cells, long bit-lines
* Wireless sensor networks
Trends: * High-Vt devices, low VMIN
* Medium bit-cells, short bit-lines
Figure 1-3: SRAM trade-offs.
Ultra-Low-Power Embedded SRAM Applica-
Since SRAMs must be specially designed with their application in mind, it is worth
considering the application constraints. This work specifically considers a number
of applications where power consumption, or, more generally, energy consumption,
is paramount. Of course, the SRAM challenges associated with achieving multi-
Giga-Hertz operation in high-performance applications, including desktop and server
computing, requires very targeted and innovative solutions as well . How-
ever, a few of the highly energy-constrained applications that are the focus of this
work are considered below:
Table 1.1: Key existing and emerging applications for biomedical devices
& <10 pW
& 100-2000 /W
recording 1-10 mW n/a
(1) Biomedical. Existing and emerging biomedical applications are shown in Ta-
ble 1.1, along with some critical system requirements. In all cases, energy
is highly constrained. In the case of implantable devices, such as pacemak-
ers/defibrillators, cochlear implants, and neural sensors/stimulators, battery
lifetime constraints determine the time between surgical replacement, thereby
limiting total system power consumption to 1001pW or less. Wearable systems,
such as hearing aids and body-area sensors, have similar, though somewhat less
stringent, energy-constraints set by battery weight limitations.
Although the energy constraints in biomedical systems are severe, their perfor-
mance constraints are considerably relaxed. Table 1.1 shows that, for the most
part, processors need to operate at less than 1MHz. Additionally, the volume
of most of the highlighted applications is fairly modest, though it does range
to much higher volumes as well, especially in the case of body-area sensors.
Correspondingly, the required SRAMs must heavily emphasize low-power and,
secondarily, density, and these can be optimized at the cost of performance.
(2) Mobile multimedia. Today's portable handsets are capable of extremely so-
phisticated multimedia. In addition to rich audio and communication capabili-
ties, they will deliver high-definition video to users . For these applications,
however, the time required between battery charges must be extended to the
order of several days, and the battery itself can weigh no more than a few tens of
grams. Accordingly, power consumption is a major concern, though it is not as
constrained as in biomedical systems. Also unlike biomedical applications, per-
formance is critical in order to support rich multimedia operations, that require
processors with operating frequencies up to hundreds of Mega-Hertz. Further,
the large volume of consumer handsets implies that cost and density are also
primary concerns. As a result, very high-density SRAMs that minimize power
consumption under moderate-to-high performance constraints are needed.
(3) Wireless sensor networks. Micro/nano-scale devices providing sensing, pro-
cessing, and communications capabilities can form networks, broadly referred
to as wireless sensor networks  . The applications for such devices include
industrial and automotive sensing , environment monitoring , structural
monitoring , and military surveillance/detection.
Operation of such net-
works must be largely maintenance-free due to their use in remote or inaccessi-
ble physical locations. As a result, battery lifetime constraints are critical, and
the battery must be physically small to facilitate in-situ sensing in a broad range
of uses. Alternatively, to extend the lifetime of the sensor nodes, potentially in-
definitely, energy harvesting from the ambient environment can be leveraged as
long as occasional degradation in performance quality, depending on the ambi-
ent factors, can be tolerated. Nonetheless, the power consumption of the system
is limited by the harvesting capacity. Table 1.2 shows the power harvestable by
state-of-the-art energy harvesting devices, indicating a total power budget less
than 100pW for most of the sensor networks considered.
With regards to SRAM requirements, power consumption (both static and dy-
namic) is the primary concern, and, since most monitoring applications require
processing on low-speed signals, performance constraints are relaxed to the hun-
dreds of kilo-Hertz range. Since the nodes are meant to form high-density net-
works that are sacrificial after use, cost and density are also important concerns.
Table 1.2: Energy collecting and harvesting options    
100 pW/cm2 (office),
100 mW/cm2 (direct light)
4 pW/cm3 (human motion)
10-700 mW (walking)
20 mW at 5 cm 
2 pW at 10 m 
Near-field inductive energy transfer
Far-field inductive energy transfer
As with most digital systems, embedded SRAMs play a highly prominent role
in these energy-constrained applications. Also as before, they pose the most critical
limitation to the total power, performance, and area. Figure 1-4 shows an example of
a custom MSP430 microcontroller that specifically targets highly energy-constrained
biomedical and sensor applications . Operating at its minimum energy point,
its on-chip SRAM cache consumes 69% of the total energy per operation, limits the
operating frequency (which is 1.7MHz for the SRAM at 0.5V), and, as shown, occupies
a dominating portion of the total area. Consequently, to enable the applications
described above, embedded SRAM is a critical area of focus.
Figure 1-4: Die photo of ultra-low-power low-voltage MSP430 microcontroller domi-
nated by on-chip SRAM cache .
Energy Versus Power
For the applications discussed above, it is important to make the distinction between
energy consumption and power consumption. Ultimately, battery powered systems
are primarily limited by the energy the battery can provide. Energy harvesting sys-
tems typically use a battery (or other form of energy storage ) to buffer the power
extracted from an ambient source , and, once again, average power consumption,
corresponding to total energy normalized over a time period, is the critical concern.
Performing any circuit operation requires energy, and, so, it is a fundamental metric
for battery operated and energy-harvesting systems.
This implies that in an "off" state, where the circuit is performing no operation,
it can consume extremely low energy. Such an "off" state, however, only exists in
very specific cases for SRAMs. Generally, even in the absence of active accesses,
SRAMs are expected to retain their stored data. Figure 1-5 shows this distinction,
and, in the case of the persistent storage states, data retention is an operation that
requires energy. Importantly, however, this operation is inherently tied to time by
the duration for which data retention is required. Of course, ultimately, the SRAM
will transition to the "off" state, either at the end of the device's lifetime or upon
completion of a set sequence of operations. Accordingly, the total energy can still
be considered. However, unlike with generic digital logic, the energy consumed has a
component related to time, but unrelated to the time associated with its own circuit
delay. The corresponding energy optimization is considered in detail in Chapter 2.
SRAM Structure and Limitations
Figure 1-6 shows the architecture used by modern SRAMs. A combination of row
decoders and column multiplexers provide access to the bit-cells. While data-retention
circuits for logic, like flip-flops and latches, typically employ between 10 to 20 devices,
the 6T bit-cell shown relies on ratioed operation to achieve the required functionality
with very high density. 6T CMOS bit-cells in the 65nm and 45nm nodes occupy
0.4-0.5 um2 and 0.24-0.33 /m2, respectively. For reasons explained below,
Figure 1-5: Operating states of an SRAM
in the absence of active accesses.
where data-retention consumes energy even
M1 - 2 are called the driver devices, M3 - 4 are called the load devices, and M5 - 6
are called the access devices.
Figure 1-6: Typical structure of modern SRAM; 6T bit-cell is composed of NMOS
driver and access devices and PMOS load devices.
Data is held in the 6T cell by the cross-coupled inverter structure (formed by M1-
4). Figure 1-7a shows how the 6T cell's ability to hold data depends on its butterfly
curves. Here, the transfer-functions between the data storage nodes, NT/NC, are
superimposed, and the bi-stable nature required is indicated by intersection points at
valid logic "0" and "1" levels. Strictly speaking, read-access is a non-ratioed operation
where the bit-lines, BLT/BLC, are precharged, and, after word-line (WL) assertion,
the cell read current, IRD, which is generated by the driver and access devices, causes
a droop on one bit-line which can be sensed with respect to the other to quickly
decipher the accessed data. However, the transients on NT and NC can result in
loss of the bi-stable characteristic, and their worst-case impact can be analyzed by
assuming that BLT/BLC are clamped at VDD. The corresponding butterfly curves,
shown in Figure 1-7b, now have dangerously degraded lobes, quantified by the static
noise margin (SNM), which measures the diagonal length of the largest embedded
square . An SNM less than zero implies the loss of one of the required intersection
points, indicating the cell's inability to correctly retain the corresponding data state.
Hence, proper operation requires maintaining wide lobes, which depends on the driver
devices, M1 - 2, being much stronger than the access devices, M5 - 6.
Z 0 . 4
NC, NT (V)
NC, NT (V)
Figure 1-7: 6T bit-cell butterfly curves showing bi-stable behavior during (a) hold,
where access devices are "off", and during (b) read, where access devices are "on"
and bit-lines are clamped to VDD.
Data is written to the 6T cell by pulling the appropriate bit-line low. The cell is
made mono-stable at only the desired data value, and, after WL gets de-asserted, the
local feedback regenerates to the correct state. Write operation is explicitly ratioed,
since the NMOS access devices are required to overpower the PMOS load devices,
M3 - 4, in order to overwrite new data.
The ratioed operation, both during read and write, leaves the 6T bit-cell highly
susceptible to both variation and manufacturing defects. In particular, since a typical
SRAM is composed of bit-cell arrays of hundreds of kilo-bits to several Mega-bits,
extreme worst-case case behavior at the 4 or 5a level must be considered.
Two forms of variation affect SRAMs: inter-die (which will be called global vari-
ation) and intra-die (which will be called local variation) . Global variation is
the difference between average parameter values of the die; for instance, these can
include the average NMOS/PMOS threshold voltage, dielectric thickness, or poly
width. Global variation comes about due to systematic processing changes affect-
ing individual dies. On the other hand, local variation is the difference between
nominally matched devices on the same die. These can include the number of
NMOS/PMOS channel-adjust doping ions, poly line-edge roughness, local-layout-
dependant lithography effects, as well as transient effects such as negative bias tem-
perature instability (NBTI) . In advanced technologies, local variation sources
have an increasingly dominating impact ; while global variation significantly de-
grades the operating margins of SRAMs, local variation represents the most urgent
concern regarding the increasing rate of failures observed . A complete treatment
of variation in CMOS devices, and its impact on circuits, such as SRAMs, can be
found in .
1.3 Thesis Contributions
Previous work in SRAMs has focused on their reliability with technology and density
scaling. The use and implications of technology optimizations that are generally
pursued for a broad range of high-volume and low-energy applications (e.g. mobile
processors) have also begun to be investigated. There remains, however, the need to
develop SRAM techniques to support severely energy constrained applications such
as biomedical devices, wireless sensor nodes, and much richer mobile multimedia.
Specifically, these require strategies to improve the trade-offs highlighted in Figure
Due to its heightening importance in digital systems, and its increasing sensitivity
to processing and manufacturing factors, SRAM design requires some level of coordi-
nation with technology development in order to be effective. As a result, low-energy
SRAM solutions must be compatible with industry methodologies, which are well
suited for new technology development at the manufacturing level. For instance, op-
timal bit-cell layout design depends on several manufacturing details. Accordingly,
this work focuses on circuit techniques that are compatible with and supportive of
those approaches, particularly with regards to the most advanced technologies. It is
the hope that this thesis contributes to identifying and solving some of the most crit-
ical issues facing highly energy constrained SRAMs, though, of course, many issues
will remain, and every effort is made to identify those as well.
This thesis contributes in the following areas:
(1) SRAM Energy Analysis.
Supply- and threshold-voltage strongly impact
the total energy of an SRAM sub-array. Chapter 2 presents an analysis for
the optimal supply-voltage (VDD) and threshold-voltage (Vt) targets in order to
minimize total energy considering the need to perform a given average number
of accesses within a specified time. The analysis here is different from that of
generic logic  in two ways: (1) the presumed need to retain the stored data
for the entire time specified, and (2) the increased dependence of the energy on
variation, which in SRAMs occurs at extreme-levels.
In addition to optimal targets from the perspective of minimizing energy, Chap-
ter 2 considers how the metrics that are critical to SRAM operation depend on
the supply- and threshold-voltage targets. As a result, the major oppositions
to SRAM operation at the optimal energy point are established.
(2) Ultra-Low-Voltage SRAM. The analysis of Chapter 2 points to ultra-low-
voltage operation as a means to minimize sub-array energy. Chapter 3 provides
an analysis of failure sources within the SRAM that restrict low-voltage opera-
tion. Having analyzed the failure sources, techniques are proposed to overcome
them, and the techniques are analyzed for their efficiency. The techniques ad-
dress two key limitations: (1) bit-cell operation and (2) sense-amplifier opera-
tion. Redundancy, which is commonly relied on to overcome bit-cell variation
at the 5c level, is analyzed for critical periphery components (namely, sense-
amplifiers), where low-voltage operation exacerbates variation to an intolerable
point even at the 3a level. The proposed techniques are demonstrated in a
prototype 256kb SRAM test-chip in 65nm LP CMOS that operates down to
(3) Low-Power High-Density SRAM Performance Enhancement. The anal-
ysis of Chapter 2 points to sub-array performance as a major limitation to en-
ergy reduction, especially in the presence of variation. Chapter 4 analyzes the
severe trade-off between sub-array performance and density. The limitations
imposed by both the bit-cells and the sense-amplifiers are investigated to al-
leviate the constraining trade-offs. Specifically, a sense-amplifier is proposed
that provides regenerative small-signal sensing. Importantly, however, it does
not require an explicit strobe signal, which, in advanced technologies, imposes
severe timing uncertainties that limit the worst-case performace. Additionally,
due to the promise of single-ended bit-cells (e.g. 8T) for ultra-low-voltage, low-
energy applications, the sense-amplifier proposed provides variation resilient
single-ended sensing. Although this enables the low-energy benefits of voltage
scalability and high read-current, it introduces increased sensitivity to noise
sources. Accordingly, the noise performance of the proposed sense-amplifier is
analyzed. A prototype test-chip in 45nm LP CMOS compares its performance
to that of a conventional strobed sense-amplifier, demonstrating improvements
in the worst-case access-time and the standard-deviation of the access-time by
34% and 4x, respectively.
SRAM Energy and Operating
With respect to the growing number of applications considered in Chapter 1 and the
increasing dominance of SRAMs, careful consideration is required of the trade-offs
that minimize SRAM energy. The aggressive application of these energy-reducing
trade-offs, however, directly impacts the functionality and operating metrics of the
SRAM (and, in turn, the system) leading to a complex effect on the achievable energy
savings in a practical scenario. Of course, device variation, at the extreme levels
observed in typical SRAM arrays, plays a central role in precisely how the energy-
reducing trade-offs affect the operating metrics. Since their energy is so critical in
the overall system, SRAMs are subject to a sophisticated suite of power-management
assists spanning the device, circuit, and architecture levels. The energy, then, must
be analyzed under this power-management strategy.
Both active- and leakage-energy components contribute critically to SRAM energy,
and hence the analysis in this chapter treats them as the underlying optimization tar-
gets. For general digital circuits, it has already been shown that supply-voltage (VDD)
and threshold-voltage (Vt) interact to set the active and leakage energy . Com-
pared with general digital-circuits, however, SRAMs face the operational constraint
of long-term data-retention even during temporary idle periods (that may last ar-
bitrarily long) where it is known that active accesses will not be performed. This
gives rise to the concept of a data-retention voltage (VDRV) , where only idle
data-storage, and no data-read or data-write, functionality must be supported. In
addition to their effect on energy, which is the primary motivation for manipulating
VDD, Vi, and VDRV, this chapter analyzes the fundamental effect these voltages have
on SRAM functionality and performance in the presence of variation. Ultimately,
this chapter serves to determine what the optimal operating point (i.e. VDD and Vt)
target is to minimize SRAM energy and also to identify the challenges of operating
at that point.
The array nature of SRAMs has an important impact on the way their energy scales
with respect to VDD and Vt, especially during active-access modes. Specifically, com-
pared to general digital circuits, SRAM leakage-energy has increased importance due
to three factors: (1) high ratio of leakage-paths to actively-switching-nodes, (2) total
leakage set by an aggregation of intentionally minimum sized devices, and (3) critical-
path set by a single MOSFET pull-down stack with extreme variation. These factors
are considered below.
In order to maximize array area-efficiency, the trend is to use large sub-arrays
with up to 256 bit-cells (or more) per row and column , as far as performance
optimizations allow .
For such large sub-arrays, the leakage from the bit-
cells, which scales directly with the array size, dominates over that of the periphery.
Within the sub-array, the active switch capacitance from the word-lines scales with
the number of columns but not the number of rows, since only one row's word-line
switches per access. As a result, the word-line switch capacitance does not increase
in proportion to the total array size. Alternatively, the switch capacitance of the
bit-lines scales with the number of rows, and, during read-accesses, the bit-lines of all
columns switch; however, typically, their swing is significantly less than VDD. Further,
during write-accesses, the number of bit-lines that switch is reduced by the column-
multiplexer ratio (typically four or eight). Consequently, for large sub-arrays, the
ratio of leakage-energy to active-energy is higher than that of generic logic.
The use of intentionally small devices, to maximize the density of the bit-cell
arrays, introduces increased variation, elevating the actual aggregate leakage-current
significantly beyond the nominal aggregate leakage-current.
is related exponentially with threshold-voltage, the effect of Vt variation cannot be
expected to average out over the linear summation of all leakage-paths in the array.
Figure 2-1 shows the simulated total aggregate leakage-current (at 1.1V), normalized
to the nominal aggregate leakage-current, for a 1Mb array composed 0.25Pum
cells in an LP 45nm technology. As shown, increasing uVt (even over a fairly modest
range) leads to a significant increase in the total leakage-current . To simplify the
description, this will be referred to as the leakage-current gain factor due to variation.
Device a Vt (mV)
Figure 2-1: Simulated total leakage-current for 1Mb array in 45nm LP CMOS (at
1.1V); result shown includes variation and is normalized to total nominal leakage-
The critical delay path in an SRAM is limited by the time required for the accessed
bit-cells to discharge their bit-lines beyond the required data-sensing margin. In the
presence of variation, this implies that the performance of a large array may be set
by a single bit-cell experiencing drive-current degradation at an extreme level (e.g.
5a). The performance degrading effect of variation in a typical circuit composed of
logic paths is alleviated since the total delays are set by the sum of several constituent
stages . Consequently, extreme variation on any one device has greatly reduced
impact. Unfortunately, in SRAMs the tendency towards large arrays implies the
possibility of extreme variation, and the structure of the read-path precludes the
benefit of delay averaging over many stages. As a result, the overall performance of
an SRAM suffers far more drastically in the presence of variation.
Considering the active and leakage energy profiles for a general digital circuit ,
the active-energy scales quadratically in a straight-forward manner as CVAD with
respect to supply-voltage. Of course, as a circuit's VDD is reduced, however, the gate-
drive of the constituent MOSFETs is also reduced, degrading the switching speed.
Consequently, the integration time of the leakage-currents, which is set by the time
required to complete the operation, increases, raising the leakage-energy. The oppos-
ing active and leakage energy profiles are shown in Figure 2-2a for a representative
case (i.e. 32b carry-look-ahead adder in 90nm CMOS).
However, based on the factors discussed above leakage-energy in SRAMs has in-
creased prominence. Specifically, as sketched pictorially in Figure 2-2b, the high ratio
of leakage-paths to actively-switching-nodes and the leakage-current gain factor due
to variation both contribute to raising the leakage-energy curve up-ward relative to
the active-energy curve. Additionally, the severe performance degradation due to the
critical-path's dependence on a single bit-cell experiencing extreme variation, causes
the leakage-energy curve to shift right-ward, as sketched in Figure 2-2c. This can be
understood by observing that the point at which the leakage-energy begins increas-
ing exponentially occurs at a higher supply-voltage than before; effectively, variation
raises the limiting bit-cell's threshold voltage, and, as a result, supply-voltage reduc-
tion quickly leads to sub-threshold operation, which imposes an exponential increase
in circuit delay.
The result in Figure 2-2c seems to indicate that the optimal VDD for SRAMs
occurs at a relatively high supply-voltage. In fact, however, the energy optimization
picture must be modified by considering the practical power-management approach
discussed in Section 2.1.1. Although the importance of leakage-energy remains high,
it must be considered both during active-access and idle-data-storage modes. As
,A CT -
0.2 0.4 0.6 0.8 1.0 1.2
(a) Energy profiles represntative of generic
logic (90nm 32b carry-lookahead adder).
0 10-1oACT OVDD2
= . 10-2
Increosed relative impact
0 0.2 0.4 0.6 0.8 1.0 1.2
V DD (V)
(b) Relative leakage-energy shift exepcted in
SRAMs due to increased ratio of leakage-currents
performnance degr dation!
0.2 0.4 0.6 0.8 1.0 1.2
(c) Relative leakage-energy shift expected in
SRAMs due to severe performace degradation
from bit-cell variation.
Figure 2-2: Active- and leakage-energy profiles in digital circuits showing trends ex-
pected in SRAMs.
discussed below, raising VDD in order to reduce the SRAM access delay has reduced
benefit, as leakage-energy must still be incurred in order to retain data even after the
The following subsections start by describing the operating modes of an SRAM.
Then, the energy components during these modes are identified and analyzed in
detail, especially with respect to the supply- and threshold-voltages. Finally,VDD and
Vt targets are determined to optimize energy.
2.1.1 SRAM Idle-Mode Leakage Reduction
If the SRAM power-supply could be gated after the completion of a required number
of accesses, the picture in Figure 2-2, consisting of one leakage energy component and
one active energy component, could be used to determine the optimal total energy.
However, generally, an SRAM is required to retain its data for an arbitrary length of
time unrelated to its own access-delay. Consequently, the data-retention period can-
not be parameterized by the access-delay, and a new parameter must be introduced to
represent the total length of time data is retained. Specifically, idle data-retention con-
sumes power, and to analyze its energy, the period of the retention-cycle, TCYC,RTN,
must be considered. Accordingly, TCYC,RTN corresponds to the average duration of
time within which a required number of accesses are to be completed. The required
number of accsses are designed as N. The data stored in the SRAM at the end
of TCYC,RTN must correspond to these accesses, serving as the initial state for the
subsequent set of accesses.
The actual length of time required to complete the N accesses can be set freely
to optimize energy as long as it is less than TCYC,RTN. This time to complete the
accesses is designated as the access-period, TACC. For the remainder of the retention-
cycle (i.e. TCYC,RTN - TACC) only idle-data-storage is required. As discussed in detail
in Section 2.2, the operating metrics associated with idle-data-retention are far less
stringent than those associated with active data reads and writes. As a result, during
idle-data-retention, the power can be much more aggressively reduced. The timing
parameters relevant to SRAM energy are summarized in Figure 2-3.
Active (N accesses)
Figure 2-3: Summary of parameters relevant to SRAM energy.
A straight-forward and highly effective implementation of the low-energy data-
retention mode involves dynamically reducing the voltage across the bit-cell array.
This reduces the leakage-current by alleviating drain induced barrier lowering (DIBL),
an increasingly prominent effect in advanced technologies. DIBL pertains to an ef-
fective decrease in the threshold-voltage brought on by increasing the MOSFET VGS;
large VGs induces encroachment of the source/drain depletion regions into the channel
region, reducing the gate to bulk biasing required for channel inversion.
Figure 2-4 shows the normalized leakage-current with respect to supply-voltage
scaling, which also sets the VDS of the devices. Predicite models have been used for
this simulation, and as shown, well over an order of magnitude reduction in leakage-
current can easily be achieved. The leakage-power savings further benefit from the
supply-voltage reduction, leading to over 100x savings with 45nm CMOS when VDD
is scaled from 1.2V to 0.3V.
Practically, this approach has been successful by both reducing VDD   
and raising Vss     . It should be mentioned that an additional ap-
proach involves reverse body-biasing to further reduce the leakage-current  .
Nonetheless, the biasing employed in all of these cases can only be applied to the
point where the data-storage margin is violated. Hence, the data-retention-voltage
(VDRV) is introduced in  to characterize the minimum VDD at which data can re-
liably be retained by the bit-cells. As discussed further in Section 2.2, however, VDRV
is highly subject to variation. Consequently, closed-loop replica techniques have been
employed to estimate the VDRV limit dynamically, so that maximum idle-mode energy
savings can be achieved . In order to enforce a desired VDD or Vss voltage for
the sub-array (i.e. VDDSUB or VSSSUB) during the idle-mode, the supporting circuits
shown in Figure 2-5 have been used .
0.8 1 1.2
Figure 2-4: Normalized leakage-current reduction with respect to supply voltage for
minimum-sized 90nm, 65nm, and 45nm devices due to DIBL (predictive models used).
Idle mode bias
Figure 2-5: Circuitry to enforce idle-mode biasing using (a) programmable sleep
switches  and (b) an operational-amplifier .
Regardless of the choice of the idle-mode biasing or the circuitry used to enforce
it, it is critical that transitions between idle- and active-modes be made without
compromising the biasing required in order to maintain the stringent active-mode
operating margins. Consequently, careful signal timing is required to deriving the
idle-mode SLEEP signal which actuates the idle-mode biasing. Figure 2-6 shows an
example of this signaling. In this case, full-cycle and half-cycle latencies corresponding
to idle-to-active and active-to-idle transitions are inserted to ensure the corresponding
operating margins are not violated .
Figure 2-6: Waveforms corresponding to idle-to-active and active-to-idle mode tran-
SRAM Sub-Array Optimal Energy
In this section, the average energy of an SRAM sub-array is considered, and more
specifically, how it can be minimized by judicious selection of supply-voltage, VDD, and
device threshold-voltage, Vt, is analyzed. A typical SRAM is composed of many tiled
sub-arrays, themselves consisting of a bit-cell array and access-control drivers/sensors.
Additionally, global decoding and interfacing circuitry is also required. However, due
to their very specific energy, performance, and operating characteristics (described
above and further in Section 2.2), sub-arrays often employ a separate VDD  and
specialized devices , where the Vt is engineered for optimal operation. Because
the sub-array critically determines the energy and performance of the entire SRAM,
and because it offers independent control of VDD and Vt, this section focuses on how
the sub-array's energy can be optimized independently of the global decoding and
Based on the operating model considered in Section 2.1.1, total sub-array energy,
ETOT, has four components, as indicated in Equation 2.1:
ETOT = EACC + ELKG + EIDL + EOH
The active-access-energy (EAcc) and the leakage-access-energy (ELKG) pertain to the
active mode. EACC corresponds the switching energy required to perform reads and
writes, and ELKG corresponds to the leakage-energy imposed by applying a supply-
voltage across the array that must be large enough to ensure reliable reads and writes.
The idle-data-retention energy (EIDL) corresponds to data storage during the idle-
mode, and it will also be referred to as the idle-mode energy. Finally, the overhead-
energy (EOH) corresponds to the overhead incurred due to altering the sub-array's
biasing in accordance with idle-mode power reduction. These components are sum-
marized in Figure 2-7, and they are described in more detail below.
MActive (N accesses)
* EACT: switching energy for
* EIDL: Leakage energy to meet
hold margin (i.e. at VDRV)
* EOH: Overhead energy to
switch between active/
* ELKG: Leakage energy to
meet read/write margin
Figure 2-7: Summary of SRAM energy components.
(1) Active-Access-Energy (EAcc). This represents the energy required to switch
capacitive nodes in order to generate the control and data signals required to
read and write bit-cells. Signal nodes that transition over the full-range from
VDD to ground require an active access-energy given by CVD, where C is
the node capacitance. Full-swing signals typically include the one-hot enabled
word-line, WL, for row selection, and the one-hot enabled column-select, cSEL,
for multiplexed column selection in a column-interleaved array . Of course,
the internal nodes of the sense-amplifiers also switch from VDD to ground. In
total, the number of sense-amplifiers is equal to the number of columns in the
sub-array divided by the column-multiplexing ratio, m.
The most significant source of active-access-energy consumption, however, is
the bit-lines, BL, which are used to convey the stored read-data to the sense-
amplifiers and to drive new write-data into the bit-cells. However, in some
implementations, the BLs may not discharge completely during data-sensing.
Strictly speaking, to resolve the read-data, the BLs need only discharge to the
required sense-amplifier input margin, VSNS, which can be less than 100mV.
Nonetheless, in practice, the BLs are often discharged beyond the sensing-
margin to reduce the probability of data-disruption caused by sustained pulling
of the bit-cell storages nodes towards the BL voltage near VDD. During read-
accesses, for instance, the design in  actively amplifies the signal on all BLs
to full logic levels in order to avoid data-disruption. Accordingly, the total
active-access-energy for reads of an i x j (i.e. i-column, j-row) sub-array is
given by Equation 2.2, where the strong dependence on supply-voltage is clear:
EACC,RD = CWLVDD + CcSELV D + -CSAVDD + iCBLVDDVSNS
Similarly, the total active-access-energy for writes is approximately given by
CWLVD + CcSELVDD +
(2) Leakage-Access-Energy (ELKc).
This represents the static energy con-
sumed, even in the absence of active-accesses, just to generate a voltage across
the sub-array that ensures the operating margins associated with active-accesses
are reliably met. It comes about as a result of sub-threshold (and other) leakage-
currents through the bit-cell devices that multiply with the supply-voltage,
thereby consuming leakage-power. Since this source leads to static power dis-
sipation, it must be integrated over a time interval to derive its energy. Mini-
mally, the length of time that must be considered is TACC, the period required
to complete some set number of accesses, N. Beyond this, the bit-cell biasing
conditions no longer need to support the active-access operating margins, and
biasing more conducive to minimum power-consumption can be enabled. Ac-
cordingly, the leakage-access-energy for an i x j sub-array is given by Equation
2.4, where it is assumed that the entire sub-array is biased with a single VDD
that must meet the active-access operating margins:
ELKG = ijTAC
In this expression, the dependence on VDD is explicit through multiplication
with the bit-cell leakage-current, which leads to the leakage-power. However,
the dependence on VDD is also implicit in two other ways: (1) the effect of
VDD on ILKG,BC through DIBL, and (2) the effect of VDD on TACc through
the VGS available in order to generate bit-cell drive-current needed to discharge
the BLs during data-sensing. Similarly, the dependence on threshold-voltage,
Vt, is also implicit in two ways: (1) the effect of Vt on ILKO,BC through the
sub-threshold current equation , and (2) the effect of Vt on TACC through
the gate-overdrive (i.e. VGs - Vt) necessary to generate bit-cell drive-current.
Additionally, Vt also affects the ability of the bit-cells to meet the operating
margins given a particular VDD. Consequently, as described in Section 2.2, Vt
has a direct effect on the minimum VDD allowed.
(3) Idle-Data-Retention Energy (EIDL). This represents the static energy re-
quired to retain the data, without any active-accesses, until the end of some
required period. Considering the power-management scenario described in Sec-
tion 2.1.1, system operations will require an average number of accesses, N,
every TCYC,RTN seconds. The operating point of the sub-array may be chosen
to optimize energy as long as the N accesses are completed in a period less than
TCYC,RTN. For the remainder of the time until the end of TCYC,RTN, however,
the data must be retained so that it is available for the next set of accesses.
This cycle is shown in Figure 2-7. Accordingly, the idle-data-retention energy
is given by Equation 2.5:
EIDL = ij
Here, VDRV refers to the data-retention voltage , and IDRV,BC refers to the
leakage-current of the bit-cell at VDRV. In this expression, the dependence on
Vt is implicit since it affects IDRV,BC through the sub-threshold current equa-
tion. Further, as described in Section 2.2, Vt also affects the minimum VDRV
achievable. Although it is possible to adjust Vt dynamically  in order
to optimize the idle-mode energy, compared to VDD such adjustments are more
difficult to make over an aggressive range. Finally, as mentioned previously,
both VDD and Vt affect TACC.
(4) Overhead Energy (EOH). This represents the energy consumed in order to
transition to the low-energy idle-mode state. During the idle-mode, the array
must be rebiased by changing VDD, Vss, and/or the body-bias. This involves
appropriately charging the supply, ground, or back-gate capacitance for the
entire array. For the case of changing the sub-array supply-voltage from VDD to
VDRV, the overhead energy, which is consumed once every TCYC,RTN, is given
by Equation 2.6, where CVDD is the total power-supply capacitance:
In this expression, the dependence on VDD is explicit, and the dependence on
Vt, which limits the minimum achievable VDRV as mentioned above, is implicit.
It should be noted that some finite time is required in order to ensure complete
transition between the idle-mode and active-mode biasing, and it is critical to
consider this in order to avoid violating the different operating margins associ-
ated with each mode. Nonetheless, the leakage-energy that is consumed during
the transition period is relatively insignificant, since CVDD is typically very large
(i.e. >100pF) and the transition time required is on the order of only a few
clock-cycles . Finally, since EOH is an unavoidable overhead associated with
transitioning to the low-energy idle-mode, it is useful to analyze whether the en-
ergy savings yielded will be sufficient to exceed the energy overhead. Minimally,
this requires that 
EOH < ij(ILKG,BCVDD - IDRV,BCVDRV)(TCYC,RTN - TACC),
and even further, the overhead associated with circuitry to support the rebiasing
must also be considered.
Sub-Array Energy Analysis
To ascertain VDD and Vt targets that lead to optimal sub-array energy, a practical case
for a low-power high-density SRAM is considered. The specifications of the sub-array
are shown in Figure 2-8. In particular, an LP 45nm CMOS technology is used. The
sub-array consists of 256 columns and 256 rows of bit-cells that have been designed to
occupy a layout area of 0.25pum
2 using actual SRAM design-rules for the technology.
Column-multiplexing of 4:1 is assumed, such that 64 (out of the 256) cells are accessed
each cycle. Layout extraction is performed to determine the parasitic capacitances
of the word-lines (WL), bit-lines (BL), column-select-lines (cSEL), sense-amplifiers,
and power-supply (VDDSUB, which will be referred to as VDD for the remainder of this
analysis). Finally, the total voltage-swing on the bit-lines is assumed to be 200mV
during read-accesses. All other digital control signals are assumed to be full-swing,
from ground to VDD.
To characterize the energy, simulations are performed by scaling VDD for the en-
tire sub-array and scaling Vt of the bit-cell devices. This is achieved by adjusting the
VTHO parameter of the BSIM4 transistor models, which corresponds to the thresh-
old voltage of a long-channel device with zero substrate bias . The effects of device
variations, and how they scale with VDD and Vt are not considered here. Instead, the
optimal targets are being established. The impact variation has on parameters rele-
vant to the energy will be considered in Section 2.3, by revising the energy analysis.
BL swing (Vs,)
45nm LP CMOS
256 x 256
4:1 (i.e. 64 sense-amplifiers
Figure 2-8: Sub-array specifications for energy analysis.
In particular, the data-retention voltage, VDRV, which, in the presence of variation,
is heavily dependant on Vt, will be taken to equal 0.4V for the initial analysis.
The average number of accesses (N) required for logical operations, and the aver-
age time required to complete a logical operation (TcYC,RTN) are application depen-
dant parameters that can significantly affect the optimal total energy of the sub-array.
For instance, as TCYC,RTN becomes very long, the leakage-energy, specifically during
the idle-mode (i.e. EIDL), dominates over all of the other components, and it largely
negates the impact of VDD all together. However, for most of the low-power applica-
tions discussed in Chapter 1, the time-scales of interest lead to a dependence on all of
the energy components. To proceed with the analysis, N is assumed to be 1024, which
corresponds to an access of every bit-cell in the 64kb sub-array (since 64 cells are ac-
cessed each cycle). Additionally, TCYC,RTN is set to 10ms, 1ms, 100ps, and 10ps
to consider various performance constraints. For the array configuration considered,
EIDL overwhelmingly dominates when TcYC,RTN is much longer than 10ms.
Before analyzing the total energy, the energy components are discussed. Figure
2-9 shows the active-mode energy (corresponding to EACC + ELKG), idle-mode en-
ergy, and overhead energy plotted as log-magnitude contours with respect to VDD
and Vt. Here, TCYC,RTN is set to Ims. As TCYC,RTN and N are varied, the trends
observed for each component remain constant, but the relative magnitudes of the
components change. For instance, large TCYCRTN and small N elevates the impor-
tance of idle-mode energy with respect to active-mode energy; similarly, overhead
energy has reduced prominence as N increases, since it gets amortized over more
The contours observed for active-accesses (Figure 2-9a) are typical for digital
circuits . At low VDD (0.4-0.6V), the sub-array speed is significantly reduced,
so minimizing the leakage currents, by increasing Vt from 0.1-0.3V, favorably affects
the energy; at higher VDD, the energy is overwhelmingly dominated by capacitive
switching. As Vt is increased beyond 0.4V (in the region below the dotted line of
Figure 2-9a), deep sub-threshold operation leads to compromised logic levels causing
artifacts leading to increased energy. For the considered array configuration and
technology, the active-mode energy points to an optimal VDD and Vt of approximately
0.5V and 0.35V, respectively.
As expected, the idle-mode energy (Figure 2-9b) is strongly dependant on Vt, due