Sachin S. Sapatnekar’s research while affiliated with University of Minnesota and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (569)


Towards Designing and Deploying Ising Machines
  • Conference Paper

March 2025

Sachin S. Sapatnekar


DROID: Discrete-Time Simulation for Ring-Oscillator-Based Ising Design

February 2025

·

1 Read

Abhimanyu Kumar

·

Ramprasath S.

·

·

[...]

·

Sachin S. Sapatnekar

Many combinatorial problems can be mapped to Ising machines, i.e., networks of coupled oscillators that settle to a minimum-energy ground state, from which the problem solution is inferred. This work proposes DROID, a novel event-driven method for simulating the evolution of a CMOS Ising machine to its ground state. The approach is accurate under general delay-phase relations that include the effects of the transistor nonlinearities and is computationally efficient. On a realistic-size all-to-all coupled ring oscillator array, DROID is nearly four orders of magnitude faster than a traditional HSPICE simulation in predicting the evolution of a coupled oscillator system and is demonstrated to attain a similar distribution of solutions as the hardware.


Accelerating OTA Circuit Design: Transistor Sizing Based on a Transformer Model and Precomputed Lookup Tables
  • Preprint
  • File available

February 2025

·

27 Reads

Device sizing is crucial for meeting performance specifications in operational transconductance amplifiers (OTAs), and this work proposes an automated sizing framework based on a transformer model. The approach first leverages the driving-point signal flow graph (DP-SFG) to map an OTA circuit and its specifications into transformer-friendly sequential data. A specialized tokenization approach is applied to the sequential data to expedite the training of the transformer on a diverse range of OTA topologies, under multiple specifications. Under specific performance constraints, the trained transformer model is used to accurately predict DP-SFG parameters in the inference phase. The predicted DP-SFG parameters are then translated to transistor sizes using a precomputed look-up table-based approach inspired by the gm/Id methodology. In contrast to previous conventional or machine-learning-based methods, the proposed framework achieves significant improvements in both speed and computational efficiency by reducing the need for expensive SPICE simulations within the optimization loop; instead, almost all SPICE simulations are confined to the one-time training phase. The method is validated on a variety of unseen specifications, and the sizing solution demonstrates over 90% success in meeting specifications with just one SPICE simulation for validation, and 100% success with 3-5 additional SPICE simulations.

Download

ML-based AIG Timing Prediction to Enhance Logic Optimization

December 2024

·

3 Reads

As circuit designs become more intricate, obtaining accurate performance estimation in early stages, for effective design space exploration, becomes more time-consuming. Traditional logic optimization approaches often rely on proxy metrics to approximate post-mapping performance and area. However, these proxies do not always correlate well with actual post-mapping delay and area, resulting in suboptimal designs. To address this issue, we explore a ground-truth-based optimization flow that directly incorporates the exact post-mapping delay and area during optimization. While this approach improves design quality, it also significantly increases computational costs, particularly for large-scale designs. To overcome the runtime challenge, we apply machine learning models to predict post-mapping delay and area using the features extracted from AIGs. Our experimental results show that the model has high prediction accuracy with good generalization to unseen designs. Furthermore, the ML-enhanced logic optimization flow significantly reduces runtime while maintaining comparable performance and area outcomes.


MMM: Machine Learning-Based Macro-Modeling for Linear Analog ICs and ADC/DACs

December 2024

·

15 Reads

·

2 Citations

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Performance modeling is a key bottleneck for analog design automation. Although machine learning-based models have advanced the state-of-the-art, they have so far suffered from huge data preparation cost, very limited reusability, and inadequate accuracy for large circuits. We introduce ML-based macro-modeling techniques to mitigate these problems for linear analog ICs and ADC/DACs. The modeling techniques are based on macro-models, which can be assembled to evaluate circuit system performance, and more appealingly can be reused across different circuit topologies. On representative testcases, our method achieves more than 1700×1700\times speedup for data preparation and remarkably smaller model errors compared to recent ML approaches. It also attains 3600×3600\times acceleration over SPICE simulation with very small errors and reduces data preparation time for an ADC design from 40 days to 9.6 h.


Constructive Place-and-Route for FinFET-Based Transistor Arrays in Analog Circuits Under Nonlinear Gradients

December 2024

·

6 Reads

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

The design of active array structures in analog circuits requires careful matching to minimize the impact of variations. This work presents a constructive approach for building these arrays to directly incorporate shifts due to process variations, considering systematic first-order and second order gradients; to account for systematic layout effects, including parasitic mismatch and layout-dependent effects due to stress; and to ensure that the resulting layout delivers high performance. The proposed algorithms are targeted to FinFET technologies and are validated for multiple analog blocks in a commercial 12nm FinFET process. The layouts generated by the proposed method are demonstrated to provide better matching and performance than prior methods.


On Heterogeneous Ising Machines

October 2024

·

39 Reads

Ising machines are effective solvers for complex combinatorial optimization problems. The idea is mapping the optimal solution(s) to a combinatorial optimization problem to the minimum energy state(s) of a physical system, which naturally converges to and stabilizes at a minimum energy state upon perturbance. The underlying mathematical abstraction, the Ising model, was originally developed to explain dynamic behavior of ferromagnetic materials and was shown to generalize to numerous other physical systems. In a generic optimization problem, each variable can interact with another in different ways. At the same time, problem sizes of practical importance are growing very fast. Unfortunately, both the number and connectivity of spins in hardware are subject to fundamental physical limits. Different problems feature different interaction patterns between variables which may not always directly match the network topology supported by a specific Ising machine. In the presence of a mismatch, emulating generic interactions using the machine topology is usually possible, however, comes at the cost of additional physical spins to facilitate the mapping. Furthermore, mismatches in the problem vs. hardware connectivity render even more physical spins necessary. Combinatorial optimization problems of practical importance come with diverse connectivity patterns, which a rigid network topology in hardware cannot efficiently cover. To bridge the gap between application demand and hardware resources, in analogy to classical heterogeneous chip multiprocessors, in this paper we make the case for heterogeneous Ising multiprocessors, where each Ising core features a different connectivity. We provide a detailed design space exploration and quantify the efficiency of different design options in terms of time or energy to solution along with solution accuracy compared to homogeneous alternatives.


Performance Analysis of CNN Inference/Training with Convolution and Non-Convolution Operations on ASIC Accelerators

September 2024

·

7 Reads

ACM Transactions on Design Automation of Electronic Systems

Today’s performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural networks (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel open-source performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64 × 64 processing array, non-convolution operations constitute 59.5% of total runtime for ResNet-50 training workload. In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18 × performance improvement over a generic static resource allocation for ResNet-50 inference.



Citations (44)


... Thus, most CIM designs [33], [39], [52] provide no fault tolerance other than by replication and voting [39], [52]. The fault tolerant CIM designs [11] are not directly compatible with traditional memory ECC codes. In Section V, we propose a reliability scheme that leverages traditional ECC, like Hamming and BCH codes, to protect memory access and CIM operations in DRAM. ...

Reference:

Count2Multiply: Reliable In-memory High-Radix Counting
On Error Correction for Nonvolatile Processing-In-Memory
  • Citing Conference Paper
  • June 2024

... A spiking variant of Legendre Memory Unit was introduced to increase the memory capacity of the network and extend the energy-efficient communication of spiking neurons. The framework was used to successfully solve physical problems such as predicting nonlinear path-dependent solid deformations and wave propagation phenomena based on shorttime measurement signals.Lv et al. 2 experimentally demonstrated a computational randomaccess memory array using magnetic tunnel junctions. Several functions, e.g., up to five-input logic operations and a full adder, were demonstrated. ...

Experimental demonstration of magnetic tunnel junction-based computational random-access memory

... AIrchitect [53] is a recommendation model that automatically predicts for a given workload optimized design parameters, but AIrchitect does not account for the DLA's architectural parameters. Other data-driven models are integrated into DSE tools [13], [22], [23], [34]. ...

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators
  • Citing Article
  • May 2024

ACM Transactions on Design Automation of Electronic Systems

... Note that other designs with planar (hexagonal [19] and King's graph [7]) coupling have been proposed, but we focus on the A2A testcase because of its compactness and greater flexibility: in particular, an A2A design with N coupled ROs is equivalent to a planar hexagonal/King's graph array with ∼ N 2 coupled ROs [20], [21] due to the need for planar arrays to replicate spins during minor embedding [8]. This family of A2A arrays has been applied to solve problems ranging from max-cut [8] to maximal independent set [17] to satisfiability [22]. For illustration, a simplified schematic, with Jij ∈ {−1, 0, +1}, for a three-RO system is presented in Fig. 4. ...

3SAT on an all-to-all-connected CMOS Ising solver chip

... Additionally, the technology node used in the fabrication process significantly impacts scaling trends and yield results. For a monolithic DNN accelerator die, the embodied carbon footprint is calculated based on emissions produced during the manufacturing of its logic chip area, using a specified technology node [4]. The total embodied carbon of a chip comprises two main components: the product of the Carbon Footprint Per unit Area (CFPA) of the die and its area (A die ), and the product of the CFPA of Silicon (CFPA Si ) and the wasted area of the silicon wafer (A wasted ) during fabrication, as shown in Eq. 1. ...

ECO-CHIP: Estimation of Carbon Footprint of Chiplet-based Architectures for Sustainable VLSI
  • Citing Conference Paper
  • March 2024

... These include maximizing degree of dispersion to achieve uniform device spread, a factor affecting variation performance [24], minimizing route length to reduce parasitics and voltage drop, maximizing diffusion sharing to reduce layout area, and minimizing layout dependent effects such as Length of Diffusion (LOD) and Well Proximity Effects (WPE) to mitigate threshold voltage change. Considering these competing constraints, manually selecting the optimal CC topology becomes challenging [21] [11]. ...

Understanding Distance-Dependent Variations for Analog Circuits in a FinFET Technology
  • Citing Conference Paper
  • September 2023

... Since the values of α and β are determined by the oscillator's voltage dynamics, it is possible to adjust the device temperature and circuit variables, such as the DC voltage and load resistance values, to generate the desired phase coupling functions. Other examples [75,76] also show that many oscillators, such as LC and ring oscillators, have the flexibility to tune the dynamics to yield the desired phase coupling functions. ...

An Ising solver chip based on coupled ring oscillators with a 48-node all-to-all connected array architecture

... There are two directions in improving design reliability: high-reliability design [1][2][3][4] and write behavior control to prevent repeated memory access on the same memory cells [5] or enhance endurance in processing-in-memory architectures [6]. In practice, one could combine the two approaches. ...

On Endurance of Processing in (Nonvolatile) Memory

... This includes strategies for aligning workload demands, resource allocation, and energy balancing [24], alongside software-oriented approaches for workload management [83], and integrated power management strategies [108]. Simultaneously, algorithm design plays a critical role in improving the energy efficiency of machine learning processes, with techniques like federated edge learning [109], hardware acceleration [171], and sparsely activated deep neural networks [123] offering significant reductions in energy consumption. ...

Energy-efficient Hardware Acceleration of Shallow Machine Learning Applications
  • Citing Conference Paper
  • April 2023

... In sections III.A to III.D, we illustrated how MTJs can be used for TRNGs or SBGs for SC. In this section, we will describe MTJ based circuits can also perform stochastic computing functions [84][85]. First, we will describe an approach based on the synchronous method described in section II.C, where a single MTJ can be used to perform stochastic multiplication and addition. ...

A Stochastic Computing Scheme of Embedding Random Bit Generation and Processing in Computational Random Access Memory (SC-CRAM)

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits