Bongjin Kim’s research while affiliated with Korea Advanced Institute of Science and Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (52)


A Graph-Based Accelerator of Retinex Model With Bit-Serial Computing for Image Enhancements
  • Article

January 2025

·

3 Reads

IEEE Transactions on Circuits and Systems I Regular Papers

Zhengzhe Wei

·

·

·

[...]

·

Bongjin Kim

This work proposes the Poisson equation formulation of the Retinex model for image enhancements using a low-power graph hardware accelerator performing finite difference updates on a lattice graph processing element (PE) array. By encapsulating the underlying algorithm in a graph hardware structure, a highly localized dataflow that takes advantage of the physical placement of the PEs is enabled to minimize data movement and maximize data reuse. The on-chip dataflow that achieves data sharing, and reuse among neighboring PEs during massively parallel updates is generated in each PE driven by two external control signals. Using a custom accumulator design intended for bit-serial computing, this work enables precision on demand and extensive on-chip data reuse with minimal area overhead, accommodating a non-overlap image mapping scheme in which a 20×20 image tile can be processed without external memory access at a time. With increasing user-configurable update count, image noise and shadow can be progressively removed with the inevitable loss of image details. Fabricated using a 65nm technology, the test chip occupies 0.2955mm2 core area and consumes 2.191mW operating at 1V, 25.6MHz, and a reconfigurable 10-or 14-bit precision.



FlexSpin: A CMOS Ising Machine With 256 Flexible Spin Processing Elements With 8-b Coefficients for Solving Combinatorial Optimization Problems

August 2024

·

44 Reads

·

5 Citations

IEEE Journal of Solid-State Circuits

Combinatorial optimization problems (COPs) are essential in various applications, including data clustering, supply chain management, and communication networks. Many real-world COPs are non-deterministic polynomial-time hard problems intractable using classical computers. Ising machine, the hardware accelerator based on the Ising model and annealing operation, has gained much attention as an alternative for solving COPs. The COPs are mapped to the Ising model, and their optimal/near-optimal solutions are explored by the intrinsic convergence property of the Ising machine. However, prior Ising machines based on locally connected spins have limitations in solving hard COPs due to significant overhead while mapping the Ising model to the inflexible hardware topology. In this work, we propose a scalable CMOS Ising machine with a network of flexible processing elements (PEs) to map and solve complex COPs with minimal overhead. The proposed Ising machine implements 256 PEs, where each PE is reconfigured to 1-to-4 spins with 28 spin interactions based on 8-bit coefficients. A 65-nm prototype chip has been fabricated, and a range of COPs have been mapped and solved, including max-cut and Boolean satisfiability problems.


A Dual 7T SRAM-Based Zero-Skipping Compute-In-Memory Macro With 1-6b Binary Searching ADCs for Processing Quantized Neural Networks

August 2024

·

19 Reads

IEEE Transactions on Circuits and Systems I Regular Papers

This article presents a novel dual 7T static random-access memory (SRAM)-based compute-in-memory (CIM) macro for processing quantized neural networks. The proposed SRAM-based CIM macro decouples read/write operations and employs a zero-input/weight skipping scheme. A 65nm test chip with 528×128528\times 128 integrated dual 7T bitcells demonstrated reconfigurable precision multiply and accumulate operations with 384 ×\times binary inputs (0/1) and 384×128384\times 128 programmable multi-bit weights (3/7/15-levels). Each column comprises 384 ×\times bitcells for a dot product, 48 ×\times bitcells for offset calibration, and 96 ×\times bitcells for binary-searching analog-to-digital conversion. The analog-to-digital converter (ADC) converts a voltage difference between two read bitlines (i.e., an analog dot-product result) to a 1-6b digital output code using binary searching in 1-6 conversion cycles using replica bitcells. The test chip with 66Kb embedded dual SRAM bitcells was evaluated for processing neural networks, including the MNIST image classifications using a multi-layer perceptron (MLP) model with its layer configuration of 784-256-256-256-10 The measured classification accuracies are 97.62%, 97.65%, and 97.72% for the 3, 7, and 15 level weights, respectively. The accuracy degradations are only 0.58 to 0.74% off the baseline with software simulations. For the VGG6 model using the CIFAR-10 image dataset, the accuracies are 88.59%, 88.21%, and 89.07% for the 3, 7, and 15 level weights, with degradations of only 0.6 to 1.32% off the software baseline. The measured energy efficiencies are 258.5, 67.9, and 23.9 TOPS/W for the 3, 7, and 15 level weights, respectively, measured at 0.45/0.8V supplies.


A Scalable and Reconfigurable Bit-Serial Compute-Near-Memory Hardware Accelerator for Solving 2-D/3-D Partial Differential Equations

August 2024

·

9 Reads

·

2 Citations

IEEE Journal of Solid-State Circuits

This work presents a digital hardware accelerator with compute-near-memory to solve 2-D and 3-D partial differential equations (PDEs) using the finite difference method (FDM). The proposed hardware accelerator is reconfigured to solve 2-D/3-D Laplace and Poisson equations, and it scales to solve larger 2-D problems with no additional overhead. The reconfigurable and scalable architecture is implemented by building a 16 ×\times 16 near-memory bit-serial processing element (PE) array and four 16 ×\times boundary PEs with 92-kb static random-access memory (SRAM) distributed in the PE array. The proposed near-memory bit-serial computing architecture reduces data movement and achieves higher energy efficiency than the conventional Von Neumann architecture. The bit-serial computing architecture allows the PEs to communicate with neighbors via a minimal communication bandwidth (1 bit). The proposed hardware accelerator finds numerical solutions to 2-D PDEs (with up to a 64 ×\times 64 grid size) and 3-D PDEs (with up to a 16 ×\times 16 ×\times 16 grid size) using FDM. A prototype chip is fabricated using 65 nm, occupying a 1.78-mm 2^{{2}} die area. The measured energy to solve the 2-D/3-D PDE for updating an entire grid is 0.7 nJ/1.14 nJ at 1 V and 25.6 MHz.





A Time-Domain Wavefront Computing Accelerator With a 32 ×\times 32 Reconfigurable PE Array

August 2023

·

17 Reads

·

2 Citations

IEEE Journal of Solid-State Circuits

This work presents a hardware accelerator realizing true time-domain wavefront computing in a massive parallel two-dimensional (2-D) processing element (PE) array. The proposed 2-D time-domain PE array is designed for multiple applications based on its scalable and reconfigurable architecture. The shortest path problem (a classical problem in graph theory) is one of the critical problems to solve using the proposed accelerator. Unlike the A {}^\ast search algorithm, a heuristic method widely used in shortest path searching problems, the proposed accelerator requires only the propagation of rising-edge signals through the PE array without calculating or estimating the distances from the start to the goal. Hence, a single execution of the proposed time-domain wavefront computing provides all the optimal paths from a start point to an arbitrary goal. Besides the King’s graph model used for solving the shortest path searching, the PE array is reconfigured to a simpler lattice graph model and solves other problems, such as maze solving we used in this article as a benchmark. In addition, we used the proposed accelerator to demonstrate a scientific simulation. The propagation of circular or planar wavefronts was simulated using single or multiple start points using King’s graph configuration. A 1 ×\times 1 mm 2^{2} test chip with a 32 ×\times 32 reconfigurable time-domain PE array is fabricated using a 65-nm process. For a 2-D map with 32 ×\times 32 vertices, the proposed PE array consumes 776 pJ per task and achieves 1.6 G edges/second search rate using 1.2-/1.0-V core supply voltages.



Citations (36)


... Ising solvers map COPs to the Ising Hamiltonian and mimic quantum spin dynamics to perform a ground state search [1], [5]. Existing solvers utilize networks of coupled spins [6]- [8], stochastic neural networks (NNs) [3], or simulated annealing techniques [2], [5], [9]. However, long convergence times (> 100µs) [2], [3], [6] as well as poor energy efficiency [2], [6] make these approaches insufficient for optimization at the edge. ...

Reference:

A 10.8mW Mixed-Signal Simulated Bifurcation Ising Solver using SRAM Compute-In-Memory with 0.6us Time-to-Solution
15.6 e-Chimera: A Scalable SRAM-Based Ising Macro with Enhanced-Chimera Topology for Solving Combinatorial Optimization Problems Within Memory
  • Citing Conference Paper
  • February 2024

... To alleviate the limitations of the aforementioned algorithms, which result from inflexible classical logic 28,[34][35] or excessive relaxed logical constraint 36 , we propose the transient hyperlogic circuit. The concept of the hyperlogic is first introduced by the Priest 37 to handle the logic paradoxes, The truth table is shown in the Fig. 1a. ...

30.3 VIP-Sat: A Boolean Satisfiability Solver Featuring 5×12 Variable In-Memory Processing Elements with 98% Solvability for 50-Variables 218-Clauses 3-SAT Problems
  • Citing Conference Paper
  • February 2024

... By mapping QUBO problems to the Ising formulation, the QUBO could be solved directly on the Ising machine hardware. Due to the applicability and efficiency of the Ising machine, hardware structures using MOS devices (e.g., [7][8][9][10][11]), memristor-based systems (e.g., [12]), quantum devices (e.g., [13]), classical superconductor devices e.g., [14]), and optical devices (e.g., [15][16][17]) have been suggested. ...

FlexSpin: A CMOS Ising Machine With 256 Flexible Spin Processing Elements With 8-b Coefficients for Solving Combinatorial Optimization Problems
  • Citing Article
  • August 2024

IEEE Journal of Solid-State Circuits

... Most of the previous work are digital annealers and Ising computers implemented using CPU and GPU [7], optics [40] and FPGA [9], [38], [41], [42]. While there are many other implementations of FPGA-based and ASIC-based digital annealers in recent literature [43]- [45], we only include those which have demonstrated G-Set benchmarks for fair comparison. [13] is a CPU-based demonstration of G-Set max-cut with probabilistic computing. ...

CTLE-Ising: A Continuous-Time Latch-Based Ising Machine Featuring One-Shot Fully Parallel Spin Updates and Equalization of Spin States
  • Citing Article
  • January 2023

IEEE Journal of Solid-State Circuits

... In this line of research, [39] investigated the enhancement of cubic interactions in the Bistable Resistively-Coupled Ising Machine (BRIM) through additional architectural support, [40] implemented third-order interactions in a large-scale FPGAbased p-computer using hypergraph coloring to achieve parallelism, [41] proposed a reconfigurable higher-order Ising machine that utilizes SRAM to store spin interaction coefficients and employs a multiply-and-accumulate (MAC) unit for spin operations. However, none of these solutions cater to OIMs as opposed to ours and the operation cost is higher as these implementations rely on multi-cycle spin operations which is not feasible for larger problems. ...

A Reconfigurable CMOS Ising Machine With Three-Body Spin Interactions for Solving Boolean Satisfiability With Direct Mapping
  • Citing Article
  • January 2023

IEEE Solid-State Circuits Letters

... If these cooling requirements can be removed, these techniques could become very lucrative for future high-performance mobile computing systems, but there is still much research needed to reach that goal. [1,[14][15][16]82], Adiabatic [53,62], Alternative [55,81,83,84], Analog [31,75,[85][86][87][88][89], IMC [40,61,79,90,91]). ...

282-to-607 TOPS/W, 7T-SRAM Based CiM with Reconfigurable Column SAR ADC for Neural Network Processing
  • Citing Conference Paper
  • May 2023

... Recent CMOS-based iterative Ising machines exhibit room-temperature operation while offering design simplicity and manufacturing scalability. These iterative systems are architecturally categorized into discrete-time (DT) [8], [9], [10], [11], [12] or continuous-time (CT) [7], [13], [14], [15] types. The DT Ising machines utilize embedded SRAM arrays for spin states storage and dedicated logic circuits for spin interactions. ...

A Continuous-Time Ising Machine using Coupled Inverter Chains Featuring Fully-Parallel One-Shot Spin Updates
  • Citing Conference Paper
  • April 2023

... However, the practical application of quantum annealing faces significant challenges, such as extremely low temperature operating condition and excessive power consumption. For these reasons, low-power CMOS based Ising machines have been researched to achieve practical and energy efficient spin computing acceleration [4][5][6]. ...

CTLE-Ising:A 1440-Spin Continuous-Time Latch-Based isling Machine with One-Shot Fully-Parallel Spin Updates Featuring Equalization of Spin States
  • Citing Conference Paper
  • February 2023

... In contrast to fixed precision units, we propose runtime reconfigurable multiplication. Mu and Kim [22] propose a promising PDE solver using dynamically reconfigurable precision, showing similar vision with ours; however, it only has bit-serial adders for integer addition using 4/8/12/16 bits, with no floating point operations and no discussion about how to adjust the precision at run-time. ...

A Dynamic-Precision Bit-Serial Computing Hardware Accelerator for Solving Partial Differential Equations Using Finite Difference Method
  • Citing Article
  • February 2023

IEEE Journal of Solid-State Circuits

... Jeong et al. [9] proposed an ADC-free analog CIM processor but sacrificed noise resilience and susceptibility to PVT variations. In contrast, Kim et al. [10] adopted a digital CIM approach that eliminates the ADC and uses digital circuits to enhance system noise immunity, though this inherently falls under near-memory computing methods, requiring multi-level adders and increasing memory access overhead and computational delays. Xue et al. [11] replaced ADCs with Time-to-Digital Converters (TDCs), leading to reduced anti-interference capabilities. ...

A 1-16b Reconfigurable 80Kb 7T SRAM-Based Digital Near-Memory Computing Macro for Processing Neural Networks
  • Citing Article
  • April 2023

IEEE Transactions on Circuits and Systems I Regular Papers