Francois Atallah’s research while affiliated with Qualcomm and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (11)


A 7nm Leakage-Current-Supply Circuit for LDO Dropout Voltage Reduction
  • Conference Paper

June 2019

·

65 Reads

·

2 Citations

Keith Bowman

·

Samantak Gangopadhyay

·

Francois Atallah

·

[...]

·



A 7-nm 6R6W Register File With Double-Pumped Read and Write Operations for High-Bandwidth Memory in Machine Learning and CPU Processors

December 2018

·

46 Reads

·

6 Citations

IEEE Solid-State Circuits Letters

A 7-nm register file (RF) with a 16-transistor (16T) 3-read and 3-write (3R3W) bitcell double pump or time multiplexes the read and write access ports twice per clock cycle to achieve 6-read and 6-write (6R6W) operations per cycle for high-bandwidth (BW) on-die memory in high-performance machine learning and CPU processors. From silicon test-chip measurements at 0.9 V, the double-pumped (DP) 6R6W RF trades off a 19% lower maximum clock frequency ( FMAXF_{\mathrm{ MAX}} ) for 2×2\times the number of read and write operations per cycle, resulting in a 62% higher memory BW compared to a conventional single-access (SA) 3R3W RF.




Adaptive Power Gating of 32-bit Kogge Stone Adder

December 2015

·

44 Reads

·

8 Citations

Integration

Static power consumes a significant portion of the available power budget. Consequently, leakage current reduction techniques such as power gating have become necessary. Standard global power gating approaches are an effective method to reduce idle leakage current, however, global power gating does not consider partially idle circuits and imposes significant delay and routing constraints. An adaptive power gating technique is applied locally to a 32-bit Kogge Stone adder, and evaluated at the 16 nm FinFET technology node. This high granularity adaptive power gating approach employs a local controller to lower energy use and reduce circuit overhead. The controller conserves additional power when the circuit is partially idle (based on the inputs to the adder) by adaptively powering down inactive blocks. Moreover, the local controller reduces routing complexity since a global power gating signal is not required. The proposed adaptive power gating technique exhibits significant energy savings, ranging from 8% to 21%. This technique targets partially idle circuits, and therefore complements rather than replaces global power gating techniques. A 12% delay overhead results in a 5% area overhead. This delay overhead is reduced to 5% by increasing the area overhead to 16%, and can be further reduced by trading off additional area.


A 16 nm All-Digital Auto-Calibrating Adaptive Clock Distribution for Supply Voltage Droop Tolerance Across a Wide Operating Range

September 2015

·

149 Reads

·

41 Citations

IEEE Journal of Solid-State Circuits

A 16 nm all-digital auto-calibrating adaptive clock distribution (ACD) enhances processor core performance and energy efficiency by mitigating the adverse effects of high-frequency supply voltage (VDD) droops. The ACD integrates a tunable-length delay prior to the global clock distribution to prolong the clock-data delay compensation in core paths for multiple cycles after a droop occurs to provide a sufficient response time for clock frequency (FCLK) adaptation. A dynamic variation monitor (DVM) detects the onset of the droop and interfaces with an adaptive control unit and clock divider to reduce FCLK in half at the TLD output to avoid path timing-margin failures. An auto-calibration circuit enables in-field, low-latency tuning of the DVM to accurately detect VDD droops across a wide range of operating conditions. The auto-calibration circuit maximizes the VDD-droop tolerance of the ACD while eliminating the overhead from tester calibration. From 109 die measurements across a wafer, the auto-calibrating ACD recovers a minimum of 90% of the throughput loss due to a 10% VDD droop in a conventional design for 100% of the dies. ACD measurements demonstrate simultaneous throughput gains and energy reductions ranging from 13% and 5% at 0.9 V to 30% and 13% at 0.6 V, respectively.



A 16nm auto-calibrating dynamically adaptive clock distribution for maximizing supply-voltage-droop tolerance across a wide operating range

March 2015

·

80 Reads

·

15 Citations

System-on-chip (SoC) processor cores experience high-frequency supply voltage (VDD) droops when the current in the power delivery network abruptly changes in response to workload variations, thus degrading performance and energy efficiency. Previous adaptive circuit techniques aim to reduce the effects of VDD droops by sensing the VDD variation with an on-die monitor and adjusting the clock frequency (FCLK) [1-2] or by directly modulating the phase-locked loop (PLL) clock output with changes in the core VDD to implicitly adapt FCLK [3]. The adaptive response time and complex analog circuits limit the benefits of these techniques for a wide range of FCLK and VDD operating conditions. The adaptive clock distribution (ACD) [4-5] exploits the path clock-data delay compensation during a VDD droop to enable a sufficient response time to proactively adapt FCLK. Although the ACD mitigates the impact of VDD droops on performance and energy efficiency, the previous designs require extensive post-silicon tester calibration of the dynamic variation monitor (DVM) to accurately detect the onset of the VDD droop. Since SoC cores operate across a wide range of FCLK, VDD, temperature, and process conditions, the DVM requires a unique calibration for each operating point, thus resulting in prohibitively expensive test time for high-volume products. This paper describes an ACD design in a 16nm [6] test chip with an auto-calibration circuit to enable in-field, low-latency tuning of the DVM across a wide range of operating conditions to maximize the ACD benefits, while eliminating the costly overhead from tester calibration.



Citations (8)


... A simpler way to add multiple ports is to modify the memory cell with additional wordlines and bitlines. Many works [1]- [3] have been proposed for dual port work, where one port is used for read and another for write operation and [4]- [8] presented multi-port memory designs. However, these ports are hard-bounded and can not change operations once fabricated. ...

Reference:

Configurable Multi-Port Memory Architecture for High-Speed Data Communication
A 7-nm 6R6W Register File With Double-Pumped Read and Write Operations for High-Bandwidth Memory in Machine Learning and CPU Processors
  • Citing Article
  • December 2018

IEEE Solid-State Circuits Letters

... To address these challenges, variation-aware voltage and frequency regulators have been proposed over the past few years [62]- [68]. Due to their main operation principle, i.e., regulating the voltage and frequency simultaneously, they are broadly known as UVFRs. ...

19.3 A 7nm All-Digital Unified Voltage and Frequency Regulator Based on a High-Bandwidth 2-Phase Buck Converter with Package Inductors
  • Citing Conference Paper
  • February 2019

... Liu et al. [14] used SiGe heterojunction bipolar transistor technology for ultrahigh speed optimization of the Regfile circuit to shorten the execution cycle of each instruction in the processor and increase the computational speed. Nguyen et al. [15,16] and Yantir et al. [17] used a double pump technology for Regfile optimization to improve the throughput and operation speed under the conventional CMOS technology. However, the high-speed design leads to the degradation of the reliability of the Regfile circuit, especially as the operation speed becomes increasingly faster with faster reliability becoming more and more important, so it is also essential to face the design of the Regfile reliability. ...

A 7NM Double-Pumped 6R6W Register File for Machine Learning Memory
  • Citing Conference Paper
  • June 2018

... Moreover, recent publications, e.g., [35,41] have reported greater PMOS than NMOS I dsat values for some devices. This has led to the suggestion that PMOS rather than NMOS bit cell access transistors may be preferred in such advanced technologies [84]. ...

A 16nm configurable pass-gate bit-cell register file for quantifying the VMIN advantage of PFET versus NFET pass-gate bit cells
  • Citing Conference Paper
  • September 2015

... An adaptive power gating technique is applied locally to a 32-bit Kogge Stone adder, and evaluated at the 16 nm FinFET technology node. This high granularity adaptive power gating approach employs a local controller to lower energy use and reduce circuit overhead [10]. In [13], the authors present a radix-4 static CMOS full adder circuit that reduces the propagation delay, PDP, and EDP in carry-based adders compared with using a standard radix-2 full adder solution. ...

Adaptive Power Gating of 32-bit Kogge Stone Adder
  • Citing Article
  • December 2015

Integration

... We summarize code parameters such as multipliers and bit assignments in Table 1. Chipkill) or for on-die caches where SEC-DED codes are commonly used to allow power savings of memory subsystem [38,41,51]. This code has perfect single error correction ability, and has double error detection rate of 77.88%. ...

Exploiting error-correcting codes for cache minimum supply voltage reduction while maintaining coverage for radiation-induced soft errors
  • Citing Conference Paper
  • September 2014

... Digital detectors typically translate droops to delay variations of logic gates. Some of these detectors utilize a delay line whose delay is dependent on the ac Vdd level [4]- [8], while others use a ring oscillator [9], [10] to modulate the Vdd droops on its output frequency. These solutions are fully digital and therefore simpler to design and implement, and provide a high-resolution indication of the supply level. ...

A 16nm auto-calibrating dynamically adaptive clock distribution for maximizing supply-voltage-droop tolerance across a wide operating range
  • Citing Article
  • March 2015

... Therefore, quantifying the impact of fast voltage-noise and the efficacy of mitigation features such as adaptive-clocking [21,22] require fine-grained temporal resolution in power-tracing [23,24,25], where a sample exists for every CPU cycle (per-cycle temporal resolution). ...

A 16 nm All-Digital Auto-Calibrating Adaptive Clock Distribution for Supply Voltage Droop Tolerance Across a Wide Operating Range
  • Citing Article
  • September 2015

IEEE Journal of Solid-State Circuits