Conference PaperPDF Available

25.3 A 65nm Edge-Chasing Quantizer-Based Digital LDO Featuring 4.58ps-FoM and Side-Channel-Attack Resistance

Authors:
A 65nm Edge-Chasing Quantizer-Based Digital LDO Featuring 4.58ps-FoM and Side-
Channel-Attack Resistance
Yan He, Kaiyuan Yang
Rice University, Houston TX 77006
2020 IEEE International Solid- State Circuits Conference - (ISSCC)
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other
uses, in any current or future media, including reprinting/republishing this material for advertising or
promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse
of any copyrighted component of this work in other works.
A 65nm Edge-Chasing Quantizer-Based Digital LDO Featuring 4.58ps-FoM and Side-Channel-Attack
Resistance
Low-Dropout Regulators (LDOs) are commonly desired for fine-grained power management in SoCs because of their compact area, high
current efficiency, and small output ripple. Digital LDOs (DLDOs) are increasingly adopted in recent years thanks to voltage and process
scalability. However, achieving a fast response to
a load change requires a conventional synchronous DLDO to either increase its
sampling
frequency with a large power overhead, or include a large output
capacitor (C
OUT
) with increased chip area and cost. Various control
schemes have been proposed to mitigate these drawbacks. In synchronous designs, attempts
have been made to employ more complex
control algorithms to achieve faster response [1], and to replace the PMOS switch array with switched capacitors for reduced dynamic
energy consumption [2]. On the other hand, asynchronous
DLDOs with event-driven [3] and beat-frequency VCO-based structures [4]
show promise across the key performance, power and area metrics. Recently, voltage regulators are also found to be useful in enhancing the
resistance of cryptographic engines and processors against power and EM side-channel attacks (SCAs) [5-
7], which are physical attacks
representing a severe threat to mobile and
embedded devices. Employing regulators for SCA defense is promising because
they are
already used in most systems and require no modifications to existing
computing architectures and algorithms like other circuit-level
defenses [8-9]. In this paper, we present a high-performance and SCA-aware DLDO leveraging the
unique properties of the Edge-Chasing
Quantizer (ECQ). The 65nm DLDO prototype achieves: 1) a 101.7mV droop and 506ns settling time after a 20mA, 0.1ns step load
change, with only a 0.1nF capacitor; 2) a 0.018mm2 active area and 99.4% peak current efficiency; and, 3) more than 14000x
improvement in
power SCA resistance on a 128b AES engine, with 27.5% area, 19.4% power, and 4.54% performance overheads over a
standalone AES, and negligible overhead
when compared to an AES design integrated with conventional DLDOs.
The proposed DLDO consists of an ECQ, a second path booster, a digital PI controller, a PMOS switch array, and a 10b output register
to eliminate switching glitches (Fig. 25.3.1, top left). Inspired by the edge-pursuit comparator in [10-
11], the proposed ECQ quantizes the
input-voltage difference using an even-stage
ring oscillator, by measuring the time to integrate the difference of two input- voltage-
controlled delay paths until reaching a threshold. The time is inversely
proportional to the input difference (voltage error in LDOs),
making it a
continuously running quantizer with input-dependent sampling frequency and non-linear quantization (Fig. 25.3.1, top right).
Both properties are desirable for
DLDOs because they allow the DLDO to respond quickly after load changes, and
perform updates slower
but more accurately during steady state to save power and reduce steady-state error (Fig. 25.3.1, bottom right). This breaks the direct
trade-off between transient performance and power consumption in conventional
DLDOs. However, due to integration, a sudden increase in
error voltage (from 0 to E
max
) requires the same integration time as having a constant 1/2 E
max
error
voltage, limiting the DLDO’s response
speed, especially when supply voltage is low (<0.9V). To alleviate the ECQ’s integration effects, a second boosting path is
designed to boost
the current when the droop is larger than a threshold (Fig. 25.3.1, bottom left). The boost current is provided by a PMOS switch and an
RC lter is added to slow down the recovery, in order to avoid abrupt changes in the
output voltage.
The ECQ is implemented with an even-stage ring oscillator (RO) having two NAND gates at opposite positions to inject edges simultaneously
[10]. The RO is forced
into oscillation until the two injected edges collapse due to different propagation
paths. By connecting the current-
limiting PMOS of alternating stages to different
input voltages, the RO behaves like a comparator. As modeled in [10], the nal collapse
state is decided by the polarity of the input difference, and the collapse time is inversely proportional to the absolute voltage difference
(Fig. 25.3.2, top
right). However, since the reference voltage (V
REF
) and LDO output voltage (V
OUT
)
in a high-performance condition (VOUT
close to 1.2V) are too high to directly control the current limiting transistors, they are rst shifted down using a
programmable Input
Level Shifter (ILS) (Fig. 25.3.2, top left). Configuring PMOS
width in the ILS effectively changes the frequency of the ECQ. In order to further
configure the gain and resolution of the quantizer, a tunable resistor (6kΩ to 200kΩ) is inserted between the output nodes of the
ILS. To keep the ECQ
operating continuously and asynchronously, a self-triggered resetting circuit is designed to detect oscillation
collapse and restart the RO, without using an external clock (Fig. 25.3.2, bottom left). Conceptually, oscillation collapse is
detected
by sampling the RO outputs, UP/DN, with their properly delayed signals (Fig. 25.3.2, bottom right). In practice, the delay is implemented with
an RO replica for PVT tracking and simply delaying the original UP/DN signals fails because the
last narrow pulse will disappear after the
replica. The circuit shown in Fig. 25.3.2
(bottom left) produces a wider pulse to replace the narrow one for robust
detection.
Combined with a counter and a trigger generator for the PI controller, an asynchronous, non-linear ECQ with error-dependent
conversion time is
nished.
Compared to conventional DLDOs, the ECQ-based DLDO also presents stronger obfuscation of power signatures, thanks to its nonlinearity
and varying frequency.
The ECQ oscillation can also be used as a frequency-changing clock for digital blocks to add temporal distortion.
Aside from these inherent randomizations, a resistance randomizer ts inside the ILS to randomize both the resolution and
gain of the
ECQ (Fig. 25.3.3, left). Lastly, a reference-voltage randomizer, similar
to [5], is added with reasonable overhead. 11b LFSRs are
employed as the randomizer in the prototype, which is expected to be sufficient because of the
randomizer’s complex and difficult-to-
inverse impact on the current traces. But a
truly random source, such as a digital synthesizable true random number
generator like
[11], can also be easily deployed with small overhead and sufficient
throughput. A 128b parallel AES engine is included for SCA resistance
tests.
The test chip is fabricated in 65nm LP process. Fig. 25.3.4 presents measured DLDO performance. Under a 20mA/100ps load step at
nominal condition (V
IN
= 1.2V, V
OUT
= 1.15V), the DLDO achieves a minimum voltage droop (V
droop
) of 101.9mV and 180µA quiescent
current (I
Q
) without turning on the second path
(Fig. 25.3.4, top left). When VIN is 0.9V, activating the second path booster
improves
V
droop
by 46% (Fig. 25.3.4, top right). By reconfiguring the ILS and ECQ
oscillation frequency, the DLDO can be reconfigured for performance
and power
trade-offs (Fig. 25.3.4, middle left). It also shows the effect of the second path is limited when input voltage is high, since the
response time of the main path is comparable to second path at high V
IN
. At 1.2V V
IN
and 30mA I
LOAD
, a maximum
current efficiency of
99.4% is achieved. The PI controller works as expected, with the integration coefficient (K
I
) having a greater impact on the DLDO’s settling time
(Fig. 25.3.4, bottom left). Load and line regulations of the DLDO demonstrate
stable operation across 0.65V to 1.2V variations of V
IN
(Fig.
25.3.4, bottom right).
To evaluate the DLDO’s resistance against power side-channel attacks, 32K traces are collected in each design configuration for Test Vector
Leakage Assessment (TVLA), and 7 million traces are collected for Correlation Power Analysis (CPA).
The peak t-statistic in TVLA reduces
significantly from 82.9 in baseline AES to less than 4.5 (failure line for TVLA) when the voltage reference randomizer is enabled. CPA
extracts the 2nd key byte with 500 traces in baseline, while no key
byte is revealed after 7 million traces for DLDO-AES with both randomizers
turned on, achieving more than 14000x improvement in Minimum Traces to Disclosure (MTD) increase. With both randomizers turned on,
the DLDO output has a 50mV
ripple, effectively causing digital performance to degrade by 4.54%. The DLDO
consumes 330A current at
1.2V V
IN
, representing a 19.4% power overhead. The
area overhead of the whole DLDO is 27.5% compared to baseline AES. In most
cases,
however, LDOs are already necessary in SoCs so that the overheads are amortized. Fig. 25.3.6 and Fig. 25.3.7 compare the proposed DLDO
with state-of-
the-art DLDOs and SCA countermeasures. A die micrograph is in Fig. 25.3.7.
References:
[1] X. Sun et al., "A 0.6-to-1.1V Computationally Regulated Digital LDO with 2.79-
Cycle Mean Settling Time and Autonomous Runtime Gain
Tracking in 65nm
CMOS," ISSCC, pp. 230-232, 2019.
[2] L. G. Salem et al., "A Sub-1.55mV-Accuracy 36.9ps-FOM Digital-LowDropout
Regulator Employing Switched-Capacitor Resistance," ISSCC,
pp. 312-314, 2018.
[3] S. Kundu et al., "A Fully Integrated 40pf Output Capacitor Beat-Frequency-
Quantizer-Based Digital LDO with Built-In Adaptive Sampling
and Active Voltage
Positioning," ISSCC, pp. 308-310, 2018.
[4] D. Kim et al., "A 0.5V-VIN1.44mA-class Event-Driven Digital LDO with a Fully
Integrated 100pf Output Capacitor," ISSCC, pp. 346-
347, 2017.
[5] A. Singh et al., "A 128b AES Engine with Higher Resistance to Power and
Electromagnetic Side-Channel Attacks Enabled by a
Security-Aware Integrated
All-Digital Low-Dropout Regulator," ISSCC, pp. 404-406, 2019.
[6] M. Kar et al., "Reducing Power Side-Channel Information Leakage of AES Engines Using Fully Integrated Inductive Voltage
Regulator," IEEE JSSC, vol. 53, no. 8, pp. 2399-2414, Aug. 2018.
[7] D. Das et al., "ASNI: Attenuated Signature Noise Injection for Low-Overhead Power Side-Channel Attack Immunity," TCAS-I, vol.
65, no. 10, pp. 3300-3311, Oct. 2018.
[8] S. Lu et al., "1.32GHz High-Throughput Charge-Recovery AES Core with Resistance to DPA Attacks," IEEE Symp. VLSI Circuits,
pp. C246-C247, 2015.
[9] C. Tokunaga et al., "Secure AES Engine with a Local Switched-Capacitor Current Equalizer," ISSCC, pp. 64-65, 2009.
[10] M. Shim et al., "Edge-Pursuit Comparator: An Energy-Scalable Oscillator Collapse-Based Comparator with Application in a 74.1
dB SNDR and 20 kS/s 15 b SAR ADC," IEEE JSSC, vol. 52, no. 4, pp. 1077-1090, April 2017.
[11] K. Yang et al., "16.3 A 23Mb/s 23pJ/b Fully Synthesized True-Random- Number Generator in 28nm and 65nm CMOS," ISSCC,
pp. 280-281, 2014.
Figure 25.3.1: Architecture and principles of the proposed DLDO with an Edge Chasing Quantizer (ECQ) (top); diagrams of
the second-path booster (bottom left); and illustrations of the DLDO's load response (bottom right).
Figure 25.3.2: Circuit diagrams and operating waveforms of the edge-chasing quantizer.
Figure 25.3.3: Diagrams of DLDO with added security modules (top); trace measurement setup and flowchart for trace
processing (right); measured current traces under different configurations (bottom).
Figure 25.3.4: Measured DLDO step responses (top); droop and quiescent current at different ECQ oscillation frequencies
(middle left); current efficiency (middle right), impacts of KP, KI (bottom left), load and line regulation (bottom).
Figure 25.3.5: TVLA results for different DLDO configurations (32K traces per configuration) (top); CPA results for baseline
AES and DLDO with both randomizers turned on (bottom).
Figure 25.3.6: Summary of the DLDO performance and comparison with prior art.
Figure 25.3.7: Comparison table of state-of-the-art defenses against power side-channel attacks and die micrograph of the
DLDO.
Figure 25.3.S1: Testing setup using a 20GS/s oscilloscope with a 1GHz- bandwidth active probe, a source meter for
measuring power consumption and LabVIEW for chip interface.
Figure 25.3.S2: Testing PCB and its simplified schematic.
... As CMOS technologies and their supply voltage levels are down-scaled, the performance of ALDOs has degraded severely due to their insufficient gain of error amplifiers under low-voltage levels. To address this major challenge, DLDOs have been investigated extensively because DLDO typically can achieve better performances under lowvoltage conditions like near-threshold voltage levels [2,4,14,15]. In addition, DLDOs do not involve stringent requirements of stability and compensation like ALDOs, and Content courtesy of Springer Nature, terms of use apply. ...
... According to Eqs. (2)(3)(4)(5), there is a trade-off among delay, gain, and noise based on the different stages of the VCDL circuit. In the DLDO architecture, the more ...
Article
Full-text available
A coarse-fine tuning technique with analog enhancement (AE) operation is proposed for the digital low-dropout (DLDO) regulator. Once the undershoot or overshoot is detected, the coarse tuning quickly finds out the coarse control loop in which the load current should be located, with large power PMOS size and high sampling frequency. Then, the fine tuning, with reduced power PMOS size and sampling frequency, regulates the DLDO to the desired output voltage and takes over the steady-state operation for high accuracy and current efficiency. By using time-domain comparator instead of voltage-domain comparator, the voltage signal is converted to time signal and the comparison result is obtained through phase discriminator to identify phase difference, so that DLDO can realize low-voltage operation. Meanwhile, the AE operation is introduced to reduce the overshoot/undershoot voltage by increasing the transient current of the power PMOS. The proposed DLDO is designed using a 180 nm 1P6M CMOS process with a 0.09 mm² active area. The post-simulated undershoot and overshoot voltages are only 50.26 and 34.66 mV, respectively, when the load current changes between 1 and 2.5 mA, and the maximum recovery time is 305.3 ns.
... Digital low-dropout regulator (DLDO) enables voltage regulation at ultra-low supply and possesses the potential performance enhancements from process scaling of integrated-circuit technology [1]- [12]. The prior research works of DLDO were predominantly focused on load transient response, which is partly determined by switching frequency (f sw ), on-chip output capacitance (C OUT ) and turn-on/off strategy of power-switch array. ...
... However, the others such as the design in [10], [11] and [12] have much higher I LOAD(min) to achieve good load transient responses. Although I LOAD(min) of the design in [4] is as small as 0.04 mA, its I LOAD(max) is 1.1 mA only. ...
Article
Full-text available
This paper presents a digital low-dropout regulator (DLDO) with three-level switching (TLS) and analog-assisted (AA) structure formed by dynamic-biasing asynchronous comparator, capacitive-coupling RC network and auxiliary power switch. The proposed AA-DLDO is fabricated in a 65-nm CMOS process. The minimum load current is 18 $\mu \text{A}$ . The maximum undershoot is 200 mV under load transient of 4.82-mA/1-ns. The recovery time is 8 ns. The figure-of-merit of proposed design is better than the other DLDOs by more than 14 times.
... Once changed, the mismatch patterns (i.e., PUF data) are irrecoverable, so these CMOS-based PUFs cannot be physically concealed. As a result, to achieve sufficient attack resistance, the secure system relies on a probing detection circuit to sense the damage on the top metal, an extensive redundant circuit to confuse the layout analysis, and a power management circuit to nullify side-channel leakage (24)(25)(26). All these circuits need to be meticulously devised and are area-consuming. ...
Article
Full-text available
A physically unclonable function (PUF) is a creditable and lightweight solution to the mistrust in billions of Internet of Things devices. Because of this remarkable importance, PUF need to be immune to multifarious attack means. Making the PUF concealable is considered an effective countermeasure but it is not feasible for existing PUF designs. The bottleneck is finding a reproducible randomness source that supports repeatable concealment and accurate recovery of the PUF data. In this work, we experimentally demonstrate a concealable PUF at the chip level with an integrated memristor array and peripherals. The correlated filamentary switching characteristic of the hafnium oxide (HfO x )-based memristor is used to achieve PUF concealment/recovery with SET/RESET operations efficiently. PUF recovery with a zero-bit error rate and remarkable attack resistance are achieved simultaneously with negligible circuit overhead. This concealable PUF provides a promising opportunity to build memristive hardware systems with effective security in the near future.
Article
This brief proposes a digital LDO (Low DropOut regulator) with a built-in non-linear VCO (Voltage Controlled Oscillator) to achieve both the fast transient response and low power operation. This on-chip VCO generates a clock signal whose frequency is a non-linear symmetric function of the output voltage error. Here, we propose a design technique to realize the symmetric frequency generation with low power consumption. We demonstrate a design example of LDO using our proposed technique in a commercial 65 nm low-power CMOS process. We evaluate the LDO using transistor-level simulation using HSPICE. It achieves 0.03- $11 ~\mu \text{A}$ of quiescent current with an input voltage range of 0.6-1.2 V and an average current efficiency of 99.68% across $50\times $ load range.
Article
A digital low-dropout (DLDO) regulator is described which is controlled by voltage-controlled oscillator (VCO) based feedback loop. There are two VCOs in the control loop whose frequencies are controlled by the output voltage and reference level, respectively. By comparing the phase between the output clocks of the two VCOs and modulating the number of enabled power transistors and their on -times, the output voltage is regulated. The proposed VCO-based control is a hybrid of digital and analog control schemes because the number of enabled power transistors is controlled in discrete step while the on -time is modulated continuously. In order to improve the transient speed, transient detector is employed in the DLDO regulator. The DLDO regulator with the proposed VCO-based control has been implemented in a 65-nm CMOS process. The DLDO regulator can provide the output from 0.5 to 1.1 V from the input ranging from 0.9 to 1.2 V. The load transient time is smaller than 90 ns for both step-up and step-down changes of load current. The peak current efficiency is 99.3%.
Article
Full-text available
Computationally-secure cryptographic algorithms implemented on a physical platform leak significant "side-channel" information through their power supplies. Correlational power attack is an efficient power side-channel attack (SCA) technique, which analyzes the statistical correlation between the estimated and the measured supply current traces to extract the secret key. The existing power SCA countermeasures are mainly based on reducing the SNR of the leaked information, power balancing, or gate-level masking, each of which introduces significant power, area or performance overheads, which calls for an efficient generic countermeasure. This paper presents ASNI: Attenuated Signature Noise Injection, which is an energy-efficient generic countermeasure, and shows SCA resistance on the AES-128 encryption as an application. ASNI uses a shunt low-drop-out (LDO) regulator to suppress the AES current signature by >200$x in the supply current traces. The shunt LDO has been fabricated and validated in 130 nm CMOS technology. System-level implementation of the ASNI, with the AES-128 core operating at 40 MHz, shows that the system remains secure even after 1 M encryptions, with ~ 25x reduction in power overhead compared to that of noise addition alone.
Article
This paper proposes a fully integrated digital low-dropout (DLDO) regulator using a beat-frequency (BF) quantizer implemented in a 65-nm low power (LP) CMOS technology. A time-based approach, replacing the conventional voltage quantizer by a pair of voltage-controlled oscillator and a time quantizer, makes the design highly digital. A D-flip-flop is utilized as a BF generator, which is used as the sampling clock for the DLDO. The variable sampling frequency in the BF DLDO can achieve fast response, LP consumption, and excellent stability at the same time. In addition to that, the DLDO has a built-in active voltage positioning (AVP) for lower peak-to-peak voltage deviation during load step. The load capacitor is only 40 pF, and the total core area of the DLDO is 0.0374 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . A 50-mA step in load current produces a voltage droop of 108 mV, which is recovered in 1.24 μs. It can operate for a wide input voltage from 0.6 to 1.2 V while generating a 0.4-1.1-V output for a maximum load current of 100 mA. The peak current efficiency is 99.5% and the figure of merit (FOM) is 1.38 ps.
Article
This paper demonstrates an integrated inductive voltage regulator (IVR) for improving power side-channel-attack (PSCA) resistance of 128-bit Advanced Encryption Standard (AES-128) engines. An inductive IVR is shown to transform the current signatures generated by an encryption engine. Furthermore, an all-digital circuit block, referred to as the loop-randomizer, is introduced to randomize the IVR transformations. A 130-nm test-chip with an inductive IVR with 11.6-nH inductance, 3.2-nF capacitance, and 125-MHz switching frequency is used to drive two different architectures of AES-128 engine: high performance and low power. The measurements demonstrate that the IVR with loop randomizer eliminates information leakage while incurring only 3% overhead in performance and 5% overhead in power over a baseline IVR-AES system. Moreover, while a key-byte can be extracted for the standalone high-performance and low-power AES (LP-AES) with only 5000 and 1000 measurements, respectively, the proposed IVR inhibits key extraction even with 500,000 measurements.
Article
This paper presents a new energy-efficient ring oscillator collapse-based comparator, named edge-pursuit comparator (EPC). This comparator automatically adjusts the performance by changing the comparison energy according to its input difference without any control, eliminating unnecessary energy spent on coarse comparisons. Furthermore, a detailed analysis of the EPC in the phase domain shows improved energy efficiency over conventional comparators even without energy scaling, and wider resolution tuning capability with small load capacitance and area. The EPC is used in a successive-approximation-register analog-to-digital converter (SAR ADC) design, which supplements a 10 b differential coarse capacitive digital-to-analog converter (CDAC) with a 5 b common-mode CDAC. This offers an additional 5 b of resolution with common mode to differential gain tuning that improves linearity by reducing the effect of switch parasitic capacitance. A test chip fabricated in 40 nm CMOS shows 74.12 dB signal-to-noise and distortion ratio and 173.4 dB Schreier Figure-of-Merit. With the full ADC consuming 1.17 μW, the comparator consumes 104 nW, which is only 8.9% of the full ADC power, proving the comparator's energy efficiency.
Conference Paper
A 128-bit Advanced Encryption Standard (AES) core targeted for high-performance security applications is fabricated in a 65nm CMOS technology. A novel charge-recovery logic family, called Bridge Boost Logic (BBL), is introduced in this design to achieve switching-independent energy dissipation for an intrinsic high resistance against Differential Power Analysis (DPA) attacks. Based on measurements, the AES core achieves a throughput of 16.90Gbps and power consumption of 98mW, exhibiting 720x higher DPA resistance and 30% lower power than its conventional CMOS counterpart at the same clock frequency.