Taegeun Yoo’s research while affiliated with Samsung and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (32)


A 65-nm 8T SRAM Compute-in-Memory Macro With Column ADCs for Processing Neural Networks
  • Article

November 2022

·

94 Reads

·

74 Citations

IEEE Journal of Solid-State Circuits

Chengshuo Yu

·

Taegeun Yoo

·

Kevin Tshun Chuan Chai

·

[...]

·

Bongjin Kim

In this work, we present a novel 8T static random access memory (SRAM)-based compute-in-memory (CIM) macro for processing neural networks with high energy efficiency. The proposed 8T bitcell is free from disturb issues thanks to the decoupled read channels by adding two extra transistors to the standard 6T bitcell. A 128 ×\times 128 8T SRAM array offers massively parallel binary multiply and accumulate (MAC) operations with 64 ×\times binary inputs (0/1) and 64 ×\times 128 binary weights (+1/–1). After parallel MAC operations, 128 column-based neurons generate 128 ×\times 1–5 bit outputs in parallel. The proposed column-based neuron comprises 64 ×\times bitcells for dot-product, 32 ×\times bitcells for analog-to-digital converter (ADC), and 32 ×\times bitcells for offset calibration. The column ADC with 32 ×\times replica SRAM bitcells converts the analog MAC results (i.e., a differential read bitline (RBL/RBLb) voltage) to the 1–5 bit output code by sweeping their reference levels in 1–31 cycles (i.e., 2N2^{N} –1 cycles for N -bit ADC). The measured linearity results [differential nonlinearity (DNL) and integral nonlinearity (INL)] are +0.314/–0.256 least significant bit (LSB) and + 0.27/–0.116 LSB, respectively, after offset calibration. The simulated image classification results are 96.37% for Mixed National Institute of Standards and Technology database (MNIST) using a multi-layer perceptron (MLP) with two hidden layers, 87.1%/82.66% for CIFAR-10 using VGG-like/ResNet-18 convolutional neural networks (CNNs), demonstrating slight accuracy degradations (0.67%–1.34%) compared with the software baseline. A test chip with a 16K 8T SRAM bitcell array is fabricated using a 65-nm process. The measured energy efficiency is 490–15.8 TOPS/W for 1–5 bit ADC resolution using 0.45-/0.8-V core supply.


A 6T SRAM Based Two-Dimensional Configurable Challenge-Response PUF for Portable Devices

June 2022

·

6 Reads

·

10 Citations

IEEE Transactions on Circuits and Systems I Regular Papers

This work proposes a 2-dimensional programable SRAM-based PUF. The selection of challenge groups, orders, and sequence lengths dominates the responses with challenge-response pairs (CRPs) by order of rows (sequence length1)×^{\mathrm {(sequence~\textrm {}length- 1)}} \times columns (sequence length1)^{\mathrm {(sequence~\textrm {}length - 1)}} . The PUF bit cell has split word-lines with vertical and horizontal connections, the bit-lines are placed orthogonally to generate one-bit data with four cells, the entropy source is enriched to 24 transistors. The proposed PUF supports multiple data maps from a single chip. A test chip was fabricated in 65 nm CMOS technology. Under 0.8V and 20 °C (nominal point), the bit error rate reaches 3%. In a single chip, the hamming distance achieves 42.49% within the same group and different orders of challenges, and 47.32% within the different groups of challenges (when the sequence length is 5). The measured inter-hamming distance between chips is improved to 49.47%.


Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks

March 2021

·

275 Reads

·

129 Citations

IEEE Journal of Solid-State Circuits

This article (Colonnade) presents a fully digital bit-serial compute-in-memory (CIM) macro. The digital CIM macro is designed for processing neural networks with reconfigurable 1–16 bit input and weight precisions based on bit-serial computing architecture and a novel all-digital bitcell structure. A column of bitcells forms a column MAC and used for computing a multiply-and-accumulate (MAC) operation. The column MACs placed in a row work as a single neuron and computes a dot-product, which is an essential building block of neural network accelerators. Several key features differentiate the proposed Colonnade architecture from the existing analog and digital implementations. First, its full-digital circuit implementation is free from process variation, noise susceptibility, and data-conversion overhead that are prevalent in prior analog CIM macros. A bitwise MAC operation in a bitcell is performed in the digital domain using a custom-designed XNOR gate and a full-adder. Second, the proposed CIM macro is fully reconfigurable in both weight and input precision from 1 to 16 bit. So far, most of the analog macros were used for processing quantized neural networks with very low input/weight precisions, mainly due to a memory density issue. Recent digital accelerators have implemented reconfigurable precisions, but they are inferior in energy efficiency due to significant off-chip memory access. We present a regular digital bitcell array that is readily reconfigured to a 1–16 bit weight-stationary bit-serial CIM macro. The macro computes parallel dot-product operations between the weights stored in memory and inputs that are serialized from LSB to MSB. Finally, the bit-serial computing scheme significantly reduces the area overhead while sacrificing latency due to bit-by-bit operation cycles. Based on the benefits of digital CIM, reconfigurability, and bit-serial computing architecture, the Colonnade can achieve both high performance and energy efficiency (i.e., both benefits of prior analog and digital accelerators) for processing neural networks. A test-chip with 128×128128 \times 128 SRAM-based bitcells for digital bit-serial computing is implemented using 65-nm technology and tested with 1–16 bit weight/input precisions. The measured energy efficiency is 117.3 TOPS/W at 1 bit and 2.06 TOPS/W at 16 bit.


A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks

November 2020

·

111 Reads

·

69 Citations

IEEE Transactions on Circuits and Systems I Regular Papers

A novel 4T2C ternary embedded DRAM (eDRAM) cell is proposed for computing a vector-matrix multiplication in the memory array. The proposed eDRAM-based compute-in-memory (CIM) architecture addresses a well-known Von Neumann bottle-neck in the traditional computer architecture and improves both latency and energy in processing neural networks. The proposed ternary eDRAM cell takes a smaller area than prior SRAM-based bitcells using 6–12 transistors. Nevertheless, the compact eDRAM cell stores a ternary state (−1, 0, or +1), while the SRAM bitcells can only store a binary state. We also present a method to mitigate the compute accuracy degradation issue due to device mismatches and variations. Besides, we extend the eDRAM cell retention time to 200 μs200~\mu \text{s} by adding a custom metal capacitor at the storage node. With the improved retention time, the overall energy consumption of eDRAM macro, including a regular refresh operation, is lower than most of prior SRAM-based CIM macros. A 128×128128\times 128 ternary eDRAM macro computes a vector-matrix multiplication between a vector with 64 binary inputs and a matrix with 64×12864\times 128 ternary weights. Hence, 128 outputs are generated in parallel. Note that both weight and input bit-precisions are programmable for supporting a wide range of edge computing applications with different performance requirements. The bit-precisions are readily tunable by assigning a variable number of eDRAM cells per weight or adding multiple pulses to input. An embedded column ADC based on replica cells sweeps the reference level for 2N12^{\mathrm {N}}-1 cycles and converts the analog accumulated bitline voltage to a 1-5bit digital output. A critical bitline accumulate operation is simulated (Monte-Carlo, 3K runs). It shows the standard deviation of 2.84% that could degrade the classification accuracy of the MNIST dataset by 0.6% and the CIFAR-10 dataset by 1.3% versus a baseline with no variation. The simulated energy is 1.81fJ/operation, and the energy efficiency is 552.5-17.8TOPS/W (for 1-5bit ADC) at 200MHz using 65nm technology.




A 137-μW 1.78-mm² 30-Frames/s Real-Time Gesture Recognition SoC for Smart Devices

June 2020

·

25 Reads

·

3 Citations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Gesture recognition has increasingly become one of the most popular human–machine interaction techniques for smart devices. Existing gesture recognition systems suffer from either excessive power consumption or large size, limiting their applications for ultralow-power wearable devices. This article presents an accurate area-efficient and low-power real-time gesture recognition system for smart wearable devices. The proposed work utilizes an accurate peak-based gesture classification engine with less memory and a low-resolution and low-power on-chip image sensor for achieving high area efficiency and low power. In addition, the feature extraction architecture removes fixed-pattern noises from the low-power on-chip image sensor for accuracy improvement and employs parallelism for recognition speed enhancement. Thus, the proposed system accomplishes accurate real-time gesture recognition for eight motion hand gestures with an average recognition accuracy of 90.6% and latency of 4.228 ms. Measurement results of a test chip fabricated in 65-nm CMOS demonstrate that the proposed system consumes 137.0 μW137.0~\mu \text{W} at 30 frames/s while occupying only 1.78 mm 2 , which achieves the lowest power and smallest area among the recently reported gesture recognition systems.


A 0.5 V 8-12 bit 300 KSPS SAR ADC with adaptive conversion time detection-and-control for high immunity to PVT variations
  • Article
  • Full-text available

May 2020

·

99 Reads

·

10 Citations

IEEE Access

In this paper, a low power asynchronous successive approximation register (SAR) analog-to-digital converter (ADC) involving the process, voltage, and temperature (PVT) compensation is presented. A proposed adaptive conversion time detection-and-control technique enhances the power efficiency, covering wide PVT variations. The proposed detection-and-control technique senses PVT variation in an aspect of conversion time, and adaptively controls the operation speed and power consumption. For PVT compensation, the proposed architecture includes the local supply/ground voltage. The local supply/ground voltage makes high |VGS| for transistors in the comparator and capacitive digital-to-analog converter switches, resulting in enhanced operation speed. However, when PVT condition changes to be favorable for the conversion speed, the |VGS| decreases for low power consumption. 30 chips were measured to verify the proposed ADC. Having the proposed architecture tested with 10 kHz input frequency, SNDR remained higher than 60 dB at unfavorable conditions such as -9 % supply voltage variation, or -20 ⁰C temperature variation. On the other hand, at favorable conditions such as +9 % supply voltage variation, or 80 ⁰C temperature variation, the power consumption of SAR ADC decreased without performance degradation.

Download

SRAM Radiation Hardening Through Self-Refresh Operation and Error Correction

May 2020

·

56 Reads

·

27 Citations

IEEE Transactions on Device and Materials Reliability

In Space applications, the scaling of transistors has made integrated circuits (ICs) more susceptible to soft errors, caused by radiation strikes. When a soft error causes a bit flip in a memory device, this event is referred to as a Single Event Upset (SEU). Since SEU errors degrade system performance and eventually lead to system failure, the design of radiation-resilient memory is substantial. This paper presents a radiation resilient SRAM with a self-refresh scheme for lowering the number of errors in each row below a threshold number. The proposed self-refresh operation reads out the stored data and performs single error correction using a simple algorithm during its hold/idle mode. A 4KB SRAM test chip in 65nm CMOS technology demonstrates a significant reduction in errors with the self-refresh operation. When the SRAM test chip was exposed to accelerated proton radiation with an energy level of 39.38 MeV, the self-refresh scheme reduces the number of uncorrectable errors by 25×25\times and 8×8\times lesser for the fluence of 9.82×10119.82\times 10^{11} particles/cm 2 and 49.1×101149.1\times 10^{11} particles/cm 2 , respectively.


A 0.506-pJ 16-kb 8T SRAM With Vertical Read Wordlines and Selective Dual Split Power Lines

April 2020

·

74 Reads

·

11 Citations

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

This article presents an 8T static random access memory (SRAM) macro with vertical read wordline (RWL) and selective dual split power (SDSP) lines techniques. The proposed vertical RWL reduces dynamic energy consumption during read operation by charging and discharging only selected read bitlines (RBLs). The data-aware SDSP technique combined with vertical write bitlines enhances both the write margin (WM) and the static noise margin (SNM). A 16-kb SRAM test chip fabricated in 65-nm CMOS technology demonstrates the minimum energy consumption of 0.506 pJ at 0.4 V and the minimum operating voltage of 0.26 V.


Citations (23)


... The goal of this paper is to reduce the proportion of error-prone columns and increase operational throughput by introducing offset calibration [6], [7] to PUD. The offset calibration technique, especially common in SRAM-based approaches [7], allocates specific cells in the dedicated rows to counteract the column-specific offsets caused by process variations. ...

Reference:

PUDTune: Multi-Level Charging for High-Precision Calibration in Processing-Using-DRAM
A 65-nm 8T SRAM Compute-in-Memory Macro With Column ADCs for Processing Neural Networks
  • Citing Article
  • November 2022

IEEE Journal of Solid-State Circuits

... They utilized horizontal word lines to connect four cells to generate 1-bit data. They achieved a BER of less than 3% with uniqueness of 49.7% and uniformity of 42.7% by using their design [60]. [61] improved the reliability and randomness of SRAM PUF by introducing a new timing control scheme that incorporates an additional NMOS transistor to address and eliminate the mismatches between the challenge and wordline inputs to both inverter arrays of the SRAM. ...

A 6T SRAM Based Two-Dimensional Configurable Challenge-Response PUF for Portable Devices
  • Citing Article
  • June 2022

IEEE Transactions on Circuits and Systems I Regular Papers

... By virtue of using 6T SRAM, our proposed design consumes a lower area. Kim et al. [13] propose a digital XAC architecture with adder trees and XNOR gates, however, this design incurs much higher area overhead. Our proposed designs are energy efficient and also incur low latency. ...

Colonnade: A Reconfigurable SRAM-Based Digital Bit-Serial Compute-In-Memory Macro for Processing Neural Networks
  • Citing Article
  • March 2021

IEEE Journal of Solid-State Circuits

... Emerging memories that store weight values in neural networks have been widely used in previous research on neuromorphic architectures. Volatile memories (VM) such as SRAM [7]- [12] and embedded DRAM (eDRAM) [13]- [15] are forming neuromorphic architectures to perform energyefficient MAC operations. However, due to their volatile characteristic, data is lost when the power is turned off, making it a disadvantage as weight data of the components cannot be preserved even after training. ...

A Logic-Compatible eDRAM Compute-In-Memory With Embedded ADCs for Processing Neural Networks
  • Citing Article
  • November 2020

IEEE Transactions on Circuits and Systems I Regular Papers

... For a diverse range of sensors with an integrated ADC, there are inevitable challenges to achieving high performance, where it is essential to employ an ADC with high linearity and high speed [10,11,12,13,14,15,16,17,18,19]. However, the nonlinearity caused by process, voltage, and temperature (PVT) variations can impede the ADC performance, and thus research efforts have been undertaken to conduct on-chip calibration to address the nonlinearity of ADCs [20,21,22,23,24,25,26,27,28,29]. [25] addresses the on-chip calibration of the capacitivedigital-to-analog-converter (CDAC) in a successiveapproximation (SAR) ADC. ...

A 0.5 V 8-12 bit 300 KSPS SAR ADC with adaptive conversion time detection-and-control for high immunity to PVT variations

IEEE Access

... By leveraging off-chain storage solutions, the system can significantly reduce the storage burden on RSUs. Data is stored in a distributed manner across multiple nodes, with blockchain technology ensuring data integrity and consistency through consensus mechanisms [34,35]. This decentralized approach not only alleviates the storage demands on individual RSUs but also enhances the network's resilience by preventing data bottlenecks and ensuring that data remains accessible even if some nodes fail [36]. ...

SRAM Radiation Hardening Through Self-Refresh Operation and Error Correction
  • Citing Article
  • May 2020

IEEE Transactions on Device and Materials Reliability

... As shown in Fig. 2a, the two extra transistors are parallel with access transistors, which is different from decoupled 8T-SRAM [41]. When the 8T-SRAM bitcell is Table 1 shows the 8T-SRAM with double-wordline write operation has better WSNM, short write delay and write-trippoint (WTP) at different process corners. ...

A 16K Current-Based 8T SRAM Compute-In-Memory Macro with Decoupled Read/Write and 1-5bit Column ADC
  • Citing Conference Paper
  • March 2020

... First, previous designs in 11T [20] and 12T [18] have solved the half-select disturbance but resulted in limited write static-noise-margin (WSNM) due to partially shared transistors for both read and write ports. Although designs in [21], [22], and [23] enhance the half-select cell stability and WSNM using the write assist scheme with split power rails, they fail to eliminate the half-select disturbance. Second, designs in [16], [24], [25], and [26] eliminate the write half-select disturbance by adopting a row/column cross-point wordline write structure but cause degraded write-ability due to limited write current through series-connected access transistors. ...

A 0.506-pJ 16-kb 8T SRAM With Vertical Read Wordlines and Selective Dual Split Power Lines
  • Citing Article
  • April 2020

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

... Many approaches to Logic-In-Memory can be found in literature; however, two main approaches can be distinguished. The first one can be classified as Near-Memory Computing (NMC) [8,9,12,14,16,20,23,24,26,27,11,13,25,31,32,33,7], since the memory inner array is not modified and logic circuits are added at the periphery of this; the second one can be instead denoted as Logic-in-Memory (LiM) [10,18,19,21,22,15,28,29,36,30], since the memory cell is directly modified by adding logic circuits to it. ...

A Bit-Precision Reconfigurable Digital In-Memory Computing Macro for Energy-Efficient Processing of Artificial Neural Networks
  • Citing Conference Paper
  • October 2019

... Digital CIM architectures integrate computing circuits within the memory array to perform MAC operations and can perform loss-free accumulations [12]- [15]. Such CIM macros provide full accuracy since the bit-width of accumulated output can be set adequately long to support the largest possible sum. ...

A 1-16b Precision Reconfigurable Digital In-Memory Computing Macro Featuring Column-MAC Architecture and Bit-Serial Computation
  • Citing Conference Paper
  • September 2019