March 2025
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Publications (569)
March 2025
·
3 Reads
February 2025
·
1 Read
Many combinatorial problems can be mapped to Ising machines, i.e., networks of coupled oscillators that settle to a minimum-energy ground state, from which the problem solution is inferred. This work proposes DROID, a novel event-driven method for simulating the evolution of a CMOS Ising machine to its ground state. The approach is accurate under general delay-phase relations that include the effects of the transistor nonlinearities and is computationally efficient. On a realistic-size all-to-all coupled ring oscillator array, DROID is nearly four orders of magnitude faster than a traditional HSPICE simulation in predicting the evolution of a coupled oscillator system and is demonstrated to attain a similar distribution of solutions as the hardware.
February 2025
·
27 Reads
Device sizing is crucial for meeting performance specifications in operational transconductance amplifiers (OTAs), and this work proposes an automated sizing framework based on a transformer model. The approach first leverages the driving-point signal flow graph (DP-SFG) to map an OTA circuit and its specifications into transformer-friendly sequential data. A specialized tokenization approach is applied to the sequential data to expedite the training of the transformer on a diverse range of OTA topologies, under multiple specifications. Under specific performance constraints, the trained transformer model is used to accurately predict DP-SFG parameters in the inference phase. The predicted DP-SFG parameters are then translated to transistor sizes using a precomputed look-up table-based approach inspired by the gm/Id methodology. In contrast to previous conventional or machine-learning-based methods, the proposed framework achieves significant improvements in both speed and computational efficiency by reducing the need for expensive SPICE simulations within the optimization loop; instead, almost all SPICE simulations are confined to the one-time training phase. The method is validated on a variety of unseen specifications, and the sizing solution demonstrates over 90% success in meeting specifications with just one SPICE simulation for validation, and 100% success with 3-5 additional SPICE simulations.
December 2024
·
3 Reads
As circuit designs become more intricate, obtaining accurate performance estimation in early stages, for effective design space exploration, becomes more time-consuming. Traditional logic optimization approaches often rely on proxy metrics to approximate post-mapping performance and area. However, these proxies do not always correlate well with actual post-mapping delay and area, resulting in suboptimal designs. To address this issue, we explore a ground-truth-based optimization flow that directly incorporates the exact post-mapping delay and area during optimization. While this approach improves design quality, it also significantly increases computational costs, particularly for large-scale designs. To overcome the runtime challenge, we apply machine learning models to predict post-mapping delay and area using the features extracted from AIGs. Our experimental results show that the model has high prediction accuracy with good generalization to unseen designs. Furthermore, the ML-enhanced logic optimization flow significantly reduces runtime while maintaining comparable performance and area outcomes.
December 2024
·
15 Reads
·
2 Citations
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Performance modeling is a key bottleneck for analog design automation. Although machine learning-based models have advanced the state-of-the-art, they have so far suffered from huge data preparation cost, very limited reusability, and inadequate accuracy for large circuits. We introduce ML-based macro-modeling techniques to mitigate these problems for linear analog ICs and ADC/DACs. The modeling techniques are based on macro-models, which can be assembled to evaluate circuit system performance, and more appealingly can be reused across different circuit topologies. On representative testcases, our method achieves more than speedup for data preparation and remarkably smaller model errors compared to recent ML approaches. It also attains acceleration over SPICE simulation with very small errors and reduces data preparation time for an ADC design from 40 days to 9.6 h.
December 2024
·
6 Reads
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
The design of active array structures in analog circuits requires careful matching to minimize the impact of variations. This work presents a constructive approach for building these arrays to directly incorporate shifts due to process variations, considering systematic first-order and second order gradients; to account for systematic layout effects, including parasitic mismatch and layout-dependent effects due to stress; and to ensure that the resulting layout delivers high performance. The proposed algorithms are targeted to FinFET technologies and are validated for multiple analog blocks in a commercial 12nm FinFET process. The layouts generated by the proposed method are demonstrated to provide better matching and performance than prior methods.
October 2024
·
39 Reads
Ising machines are effective solvers for complex combinatorial optimization problems. The idea is mapping the optimal solution(s) to a combinatorial optimization problem to the minimum energy state(s) of a physical system, which naturally converges to and stabilizes at a minimum energy state upon perturbance. The underlying mathematical abstraction, the Ising model, was originally developed to explain dynamic behavior of ferromagnetic materials and was shown to generalize to numerous other physical systems. In a generic optimization problem, each variable can interact with another in different ways. At the same time, problem sizes of practical importance are growing very fast. Unfortunately, both the number and connectivity of spins in hardware are subject to fundamental physical limits. Different problems feature different interaction patterns between variables which may not always directly match the network topology supported by a specific Ising machine. In the presence of a mismatch, emulating generic interactions using the machine topology is usually possible, however, comes at the cost of additional physical spins to facilitate the mapping. Furthermore, mismatches in the problem vs. hardware connectivity render even more physical spins necessary. Combinatorial optimization problems of practical importance come with diverse connectivity patterns, which a rigid network topology in hardware cannot efficiently cover. To bridge the gap between application demand and hardware resources, in analogy to classical heterogeneous chip multiprocessors, in this paper we make the case for heterogeneous Ising multiprocessors, where each Ising core features a different connectivity. We provide a detailed design space exploration and quantify the efficiency of different design options in terms of time or energy to solution along with solution accuracy compared to homogeneous alternatives.
September 2024
·
7 Reads
ACM Transactions on Design Automation of Electronic Systems
Today’s performance analysis frameworks for deep learning accelerators suffer from two significant limitations. First, although modern convolutional neural networks (CNNs) consist of many types of layers other than convolution, especially during training, these frameworks largely focus on convolution layers only. Second, these frameworks are generally targeted towards inference, and lack support for training operations. This work proposes a novel open-source performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms. The modeling effort of SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training on a highly parameterizable hardware substrate. SimDIT is integrated with a backend silicon implementation flow and provides detailed end-to-end performance statistics (i.e., data access cost, cycle counts, energy, and power) for executing CNN inference and training workloads. SimDIT-enabled performance analysis reveals that on a 64 × 64 processing array, non-convolution operations constitute 59.5% of total runtime for ResNet-50 training workload. In addition, by optimally distributing available off-chip DRAM bandwidth and on-chip SRAM resources, SimDIT achieves 18 × performance improvement over a generic static resource allocation for ResNet-50 inference.
September 2024
·
7 Reads
·
2 Citations
Citations (44)
... Thus, most CIM designs [33], [39], [52] provide no fault tolerance other than by replication and voting [39], [52]. The fault tolerant CIM designs [11] are not directly compatible with traditional memory ECC codes. In Section V, we propose a reliability scheme that leverages traditional ECC, like Hamming and BCH codes, to protect memory access and CIM operations in DRAM. ...
- Citing Conference Paper
June 2024
... A spiking variant of Legendre Memory Unit was introduced to increase the memory capacity of the network and extend the energy-efficient communication of spiking neurons. The framework was used to successfully solve physical problems such as predicting nonlinear path-dependent solid deformations and wave propagation phenomena based on shorttime measurement signals.Lv et al. 2 experimentally demonstrated a computational randomaccess memory array using magnetic tunnel junctions. Several functions, e.g., up to five-input logic operations and a full adder, were demonstrated. ...
- Citing Article
- Full-text available
July 2024
... AIrchitect [53] is a recommendation model that automatically predicts for a given workload optimized design parameters, but AIrchitect does not account for the DLA's architectural parameters. Other data-driven models are integrated into DSE tools [13], [22], [23], [34]. ...
- Citing Article
May 2024
ACM Transactions on Design Automation of Electronic Systems
... Note that other designs with planar (hexagonal [19] and King's graph [7]) coupling have been proposed, but we focus on the A2A testcase because of its compactness and greater flexibility: in particular, an A2A design with N coupled ROs is equivalent to a planar hexagonal/King's graph array with ∼ N 2 coupled ROs [20], [21] due to the need for planar arrays to replicate spins during minor embedding [8]. This family of A2A arrays has been applied to solve problems ranging from max-cut [8] to maximal independent set [17] to satisfiability [22]. For illustration, a simplified schematic, with Jij ∈ {−1, 0, +1}, for a three-RO system is presented in Fig. 4. ...
- Citing Article
- Full-text available
May 2024
... Additionally, the technology node used in the fabrication process significantly impacts scaling trends and yield results. For a monolithic DNN accelerator die, the embodied carbon footprint is calculated based on emissions produced during the manufacturing of its logic chip area, using a specified technology node [4]. The total embodied carbon of a chip comprises two main components: the product of the Carbon Footprint Per unit Area (CFPA) of the die and its area (A die ), and the product of the CFPA of Silicon (CFPA Si ) and the wasted area of the silicon wafer (A wasted ) during fabrication, as shown in Eq. 1. ...
- Citing Conference Paper
March 2024
... These include maximizing degree of dispersion to achieve uniform device spread, a factor affecting variation performance [24], minimizing route length to reduce parasitics and voltage drop, maximizing diffusion sharing to reduce layout area, and minimizing layout dependent effects such as Length of Diffusion (LOD) and Well Proximity Effects (WPE) to mitigate threshold voltage change. Considering these competing constraints, manually selecting the optimal CC topology becomes challenging [21] [11]. ...
- Citing Conference Paper
September 2023
... Since the values of α and β are determined by the oscillator's voltage dynamics, it is possible to adjust the device temperature and circuit variables, such as the DC voltage and load resistance values, to generate the desired phase coupling functions. Other examples [75,76] also show that many oscillators, such as LC and ring oscillators, have the flexibility to tune the dynamics to yield the desired phase coupling functions. ...
- Citing Article
- Publisher preview available
August 2023
... There are two directions in improving design reliability: high-reliability design [1][2][3][4] and write behavior control to prevent repeated memory access on the same memory cells [5] or enhance endurance in processing-in-memory architectures [6]. In practice, one could combine the two approaches. ...
- Citing Conference Paper
- Full-text available
June 2023
... This includes strategies for aligning workload demands, resource allocation, and energy balancing [24], alongside software-oriented approaches for workload management [83], and integrated power management strategies [108]. Simultaneously, algorithm design plays a critical role in improving the energy efficiency of machine learning processes, with techniques like federated edge learning [109], hardware acceleration [171], and sparsely activated deep neural networks [123] offering significant reductions in energy consumption. ...
Reference:
AI Governance through Markets
- Citing Conference Paper
April 2023
... In sections III.A to III.D, we illustrated how MTJs can be used for TRNGs or SBGs for SC. In this section, we will describe MTJ based circuits can also perform stochastic computing functions [84][85]. First, we will describe an approach based on the synchronous method described in section II.C, where a single MTJ can be used to perform stochastic multiplication and addition. ...
- Citing Article
- Full-text available
June 2023
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits