About
144
Publications
22,596
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,754
Citations
Introduction
Current institution
Additional affiliations
July 2017 - present
June 2015 - March 2016
September 2010 - June 2015
Publications
Publications (144)
Uncertainties have become a major concern in integrated circuit design. In order to avoid the huge number of repeated simulations in conventional Monte Carlo flows, this paper presents an intrusive spectral simulator for statistical circuit analysis. Our simulator employs the recently developed generalized polynomial chaos expansion to perform unce...
Many critical EDA problems suffer from the curse of dimensionality, i.e. the very fast-scaling computational burden produced by large number of parameters and/or unknown variables. This phenomenon may be caused by multiple spatial or temporal factors (e.g. 3-D field solvers discretizations and multi-rate circuit simulation), nonlinearity of devices...
Fabrication process variations are a major source of yield degradation in the nanoscale design of integrated circuits (ICs), microelectromechanical systems (MEMSs), and photonic circuits. Stochastic spectral methods are a promising technique to quantify the uncertainties caused by process variations. Despite their superior efficiency over Monte Car...
Stochastic spectral methods have achieved a great success in the uncertainty quantification of many engineering problems, including variation-aware electronic and photonic design automation. State-of-the-art techniques employ generalized polynomial-chaos expansions and assume that all random parameters are independent or Gaussian correlated. This a...
While post-training model compression can greatly reduce the inference cost of a deep neural network, uncompressed training still consumes a huge amount of hardware resources, run-time and energy. It is highly desirable to directly train a compact neural network from scratch with low memory and low computational cost. Low-rank tensor decomposition...
Thermal analysis is crucial in three-dimensional integrated circuit (3D-IC) design due to increased power density and complex heat dissipation paths. Although operator learning frameworks such as DeepOHeat [1] have demonstrated promising preliminary results in accelerating thermal simulation, they face critical limitations in prediction capability...
This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and te...
Large Language Models (LLMs) pruning seeks to remove unimportant weights for inference speedup with minimal performance impact. However, existing methods often suffer from performance loss without full-model sparsity-aware fine-tuning. This paper presents Wanda++, a novel pruning framework that outperforms the state-of-the-art methods by utilizing...
Large language models have demonstrated exceptional capabilities across diverse tasks, but their fine-tuning demands significant memory, posing challenges for resource-constrained environments. Zeroth-order (ZO) optimization provides a memory-efficient alternative by eliminating the need for backpropagation. However, ZO optimization suffers from hi...
Physics-informed neural networks (PINNs) have shown promise in solving partial differential equations (PDEs), with growing interest in their energy-efficient, real-time training on edge devices. Photonic computing offers a potential solution to achieve this goal because of its ultra-high operation speed. However, the lack of photonic memory and the...
Language Models (LLMs) are often quantized to lower precision to reduce the memory cost and latency in inference. However, quantization often degrades model performance, thus fine-tuning is required for various down-stream tasks. Traditional fine-tuning methods such as stochastic gradient descent and Adam optimization require backpropagation, which...
Large language models (LLMs) are revolutionizing many science and engineering fields. However, their huge model sizes impose extremely demanding needs of computational resources in the pre-training stage. Although low-rank factorizations can reduce model parameters, their direct application in LLM pre-training often lead to non-negligible performan...
Partial differential equation (PDE) is an important math tool in science and engineering. This paper experimentally demonstrates an optical neural PDE solver by leveraging the back-propagation-free on-photonic-chip training of physics-informed neural networks.
Operator learning has become a powerful tool in machine learning for modeling complex physical systems governed by partial differential equations (PDEs). Although Deep Operator Networks (DeepONet) show promise, they require extensive data acquisition. Physics-informed DeepONets (PI-DeepONet) mitigate data scarcity but suffer from inefficient traini...
Back propagation (BP) is the default solution for gradient computation in neural network training. However, implementing BP-based training on various edge devices such as FPGA, microcontrollers (MCUs), and analog computing platforms face multiple major challenges, such as the lack of hardware resources, long time-to-market, and dramatic errors in a...
Operator learning has become a powerful tool in machine learning for modeling complex physical systems governed by partial differential equations (PDEs). Although Deep Operator Networks (DeepONet) show promise, they require extensive data acquisition. Physics-informed DeepONets (PI-DeepONet) mitigate data scarcity but suffer from inefficient traini...
Fine-tuning large language models (LLMs) has achieved remarkable performance across various natural language processing tasks, yet it demands more and more memory as model sizes keep growing. To address this issue, the recently proposed Memory-efficient Zeroth-order (MeZO) methods attempt to fine-tune LLMs using only forward passes, thereby avoidin...
Training large AI models such as deep learning recommendation systems and foundation language (or multi-modal) models costs massive GPUs and computing time. The high training cost has become only affordable to big tech companies, meanwhile also causing increasing concerns about the environmental impact. This paper presents CoMERA, a Computing- and...
Parallel tensor network contraction algorithms have emerged as the pivotal benchmarks for assessing the classical limits of computation, exemplified by Google's demonstration of quantum supremacy through random circuit sampling. However, the massive parallelization of the algorithm makes it vulnerable to computer node failures. In this work, we app...
Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable com-putationally efficient fine-tuning while maintaining model performance. However, existing PEFT methods are still limited by the growing number of trainable parameters with the rapid deployment of Large Language Models (LLMs). To address this challenge, we pre...
Reviewed on OpenReview: ht tp s: // op en re vi ew .n et /f or um ?i d= Fu 4m wB 0X IU Abstract Despite the effectiveness of deep neural networks in numerous natural language processing applications, recent findings have exposed the vulnerability of these language models when minor perturbations are introduced. While appearing semantically indistin...
Solving partial differential equations (PDEs) numerically often requires huge computing time, energy cost, and hardware resources in practical applications. This has limited their applications in many scenarios (e.g., autonomous systems, supersonic flows) that have a limited energy budget and require near real-time response. Leveraging optical comp...
Given their potential to demonstrate near-term quantum advantage, variational quantum algorithms (VQAs) have been extensively studied. Although numerous techniques have been developed for VQA parameter optimization, it remains a significant challenge. A practical issue is that quantum noise is highly unstable and thus it is likely to shift in real...
We introduce our recent work in applying tensor compression techniques in optical computing and highlight two applications: the tensorized integrated coherent Ising machine and the tensorized optical multimodal fusion network.
We propose Physics-Informed Fourier Networks for Electrical Properties (EP) Tomography (PIFON-EPT), a novel deep learning-based method for EP reconstruction using noisy and/or incomplete magnetic resonance (MR) measurements. Our approach leverages the Helmholtz equation to constrain two networks, responsible for the denoising and completion of the...
Given their potential to demonstrate near-term quantum advantage, variational quantum algorithms (VQAs) have been extensively studied. Although numerous techniques have been developed for VQA parameter optimization, it remains a significant challenge. A practical issue is the high sensitivity of quantum noise to environmental changes, and its prope...
a Equal contributions. Abstract Backward propagation (BP) is widely used to compute the gradients in neural network training. However, it is hard to implement BP on edge devices due to the lack of hardware and software resources to support automatic differentiation. This has tremendously increased the design complexity and time-to-market of on-devi...
Due to the significant process variations, designers have to optimize the statistical performance distribution of nano-scale IC design in most cases. This problem has been investigated for decades under the formulation of stochastic optimization, which minimizes the expected value of a performance metric while assuming that the distribution of proc...
Due to the significant process variations, designers have to optimize the statistical performance distribution of nano-scale IC design in most cases. This problem has been investigated for decades under the formulation of stochastic optimization, which minimizes the expected value of a performance metric while assuming that the distribution of proc...
Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a quantization-aware tensor-compressed training approach to reduce the model size, arithmetic operations, and ultima...
Thermal issue is a major concern in 3D integrated circuit (IC) design. Thermal optimization of 3D IC often requires massive expensive PDE simulations. Neural network-based thermal prediction models can perform real-time prediction for many unseen new designs. However, existing works either solve 2D temperature fields only or do not generalize well...
\textit{Objective:} In this paper, we introduce Physics-Informed Fourier Networks (PIFONs) for Electrical Properties (EP) Tomography (EPT). Our novel deep learning-based method is capable of learning EPs globally by solving an inverse scattering problem based on noisy and/or incomplete magnetic resonance (MR) measurements. \textit{Methods:} We use...
We propose the first tensorized optical multimodal fusion network architecture with a self-attention mechanism and low-rank tensor fusion. Simulation results show 51.3× less hardware requirement and 3.7× 10 13 MAC/J energy efficiency.
We propose the first tensorized optical multimodal fusion network architecture with a self-attention mechanism and low-rank tensor fusion. Simulation results show 51.3× less hardware requirement and 3.7 × 10 ¹³ MAC/J energy efficiency.
Introduction: Electrical properties (EP), namely permittivity and electric conductivity, dictate the interactions between electromagnetic waves and biological tissue [1]. EP can be potential biomarkers for pathology characterization, such as cancer, and improve therapeutic modalities, such radiofrequency hyperthermia and ablation. MR-based electric...
Electrical properties (EP), namely permittivity and electric conductivity, dictate the interactions between electromagnetic waves and biological tissue. EP can be potential biomarkers for pathology characterization, such as cancer, and improve therapeutic modalities, such radiofrequency hyperthermia and ablation. MR-based electrical properties tomo...
Tensor decomposition has been widely used in machine learning and high-volume data analysis. However, large-scale tensor factorization often consumes huge memory and computing cost. Meanwhile, modernized computing hardware such as tensor processing units (TPU) and Tensor Core GPU has opened a new window of hardware-efficient computing via mixed- or...
Magnetic resonance imaging (MRI) is a powerful imaging modality that revolutionizes medicine and biology. The imaging speed of high -dimensional MRI is often limited, which constrains its practical utility. Recently, low-rank tensor models have been exploited to enable fast MR imaging with sparse sampling. Most existing methods use some pre-defined...
Physics-informed neural networks (PINNs) have been increasingly employed due to their capability of modeling complex physics systems. To achieve better expressiveness, increasingly larger network sizes are required in many problems. This has caused challenges when we need to train PINNs on edge devices with limited memory, computing and energy reso...
Despite the wide applications of neural networks, there have been increasing concerns about their vulnerability issue. While numerous attack and defense techniques have been developed, this work investigates the robustness issue from a new angle: can we design a self-healing neural network that can automatically detect and fix the vulnerability iss...
Various tensor decomposition methods have been proposed for data compression. In real world applications of the tensor decomposition, selecting the tensor shape for the given data poses a challenge and the shape of the tensor may affect the error and the compression ratio. In this work, we study the effect of the tensor shape on the tensor decompos...
This work studies the porting and optimization of the tensor network simulator QTensor on GPUs, with the ultimate goal of simulating quantum circuits efficiently at scale on large GPU supercomputers. We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU....
Post-training model compression can reduce the inference costs of deep neural networks, but uncompressed training still consumes enormous hardware resources and energy. To enable low-energy training on edge devices, it is highly desirable to directly train a compact neural network from scratch with a low memory cost. Low-rank tensor decomposition i...
A fundamental challenge in Bayesian inference is efficient representation of a target distribution. Many non-parametric approaches do so by sampling a large number of points using variants of Markov Chain Monte Carlo (MCMC). We propose an MCMC variant that retains only those posterior samples which exceed a KSD threshold, which we call KSD Thinning...
A major challenge in many machine learning tasks is that the model expressive power depends on model size. Low-rank tensor methods are an efficient tool for handling the curse of dimensionality in many large-scale machine learning models. The major challenges in training a tensor learning model include how to process the high-volume data, how to de...
Conventional yield optimization algorithms try to maximize the success rate of a circuit under process variations. These methods often obtain a high yield but reach a design performance that is far from the optimal value. This paper investigates an alternative yield-aware optimization for photonic ICs: we will optimize the circuit design performanc...
This work studies the porting and optimization of the tensor network simulator QTensor on GPUs, with the ultimate goal of simulating quantum circuits efficiently at scale on large GPU supercomputers. We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU....
Conventional yield optimization algorithms try to maximize the success rate of a circuit under process variations. These methods often obtain a high yield but reach a design performance that is far from the optimal value. This paper investigates an alternative yield-aware optimization for photonic ICs: we will optimize the circuit design performanc...
Fabrication process variations can significantly influence the performance and yield of nanoscale electronic and photonic circuits. Stochastic spectral methods have achieved great success in quantifying the impact of process variations, but they suffer from the curse of dimensionality. Recently, low-rank tensor methods have been developed to mitiga...
The deep neural network (DNN) based AI applications on the edge require both low-cost computing platforms and high-quality services. However, the limited memory, computing resources, and power budget of the edge devices constrain the effectiveness of the DNN algorithms. Developing edge-oriented AI algorithms and implementations (e.g., accelerators)...
Tensor decomposition is an effective approach to compress over-parameterized neural networks and to enable their deployment on resource-constrained hardware platforms. However, directly applying tensor compression in the training process is a challenging task due to the difficulty of choosing a proper tensor rank. In order to address this challenge...
Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework,...
Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework,...
Fabrication process variations can significantly influence the performance and yield of nano-scale electronic and photonic circuits. Stochastic spectral methods have achieved great success in quantifying the impact of process variations, but they suffer from the curse of dimensionality. Recently, low-rank tensor methods have been developed to mitig...
We consider adversarial training of deep neu-ral networks through the lens of Bayesian learning , and present a principled framework for ad-versarial training of Bayesian Neural Networks (BNNs) with certifiable guarantees. We rely on techniques from constraint relaxation of non-convex optimisation problems and modify the standard cross-entropy erro...
We consider adversarial training of deep neural networks through the lens of Bayesian learning, and present a principled framework for adversarial training of Bayesian Neural Networks (BNNs) with certifiable guarantees. We rely on techniques from constraint relaxation of non-convex optimisation problems and modify the standard cross-entropy error m...
In recent years, tensor computation has become a promising tool for solving big data analysis, machine learning, medical image and EDA problems. To ease the memory and computation intensity of tensor processing, decomposition techniques, especially Tensor-train Decomposition(TTD), are widely adopted to compress the extremely high-dimensional tensor...
Despite their success in massive engineering applications, deep neural networks are vulnerable to various perturbations due to their black-box nature. Recent study has shown that a deep neural network can misclassify the data even if the input data is perturbed by an imperceptible amount. In this paper, we address the robustness issue of neural net...
Despite their success in massive engineering applications, deep neural networks are vulnerable to various perturbations due to their black-box nature. Recent study has shown that a deep neural network can misclassify the data even if the input data is perturbed by an imperceptible amount. In this paper, we address the robustness issue of neural net...
Tensor network and tensor computation are widely applied in scientific and engineering domains like quantum physics, electronic design automation, and machine learning. As one of the most fundamental operations for tensor networks, a tensor contraction eliminates the sharing orders among tensors and produces a compact sub-network. Different contrac...
We propose a novel architecture to efficiently perform sparse tensor decomposition (SpTD). As the generalization of vectors and matrices, tensors are widely used to process high-dimensional data. SpTD is not only an emerging tensor analysis technique but also an effective tool to reduce the storage and computation costs of tensors. However, convent...
We propose a novel architecture to efficiently execute sparse tensor decomposition/completion. As the generalization of vectors and matrices, tensors are widely used to process high-dimensional data. It is a natural choice for high-dimensional big data analysis problems in areas such as machine learning and EDA (electronic design automation). Low-r...
Magnetic resonance imaging (MRI) is a powerful imaging modality that revolutionizes medicine and biology. The imaging speed of high-dimensional MRI is often limited, which constrains its practical utility. Recently, low-rank tensor models have been exploited to enable fast MR imaging with sparse sampling. Most existing methods use some pre-defined...
Recommendation systems, social network analysis, medical imaging, and data mining often involve processing sparse high-dimensional data. Such high-dimensional data are naturally represented as tensors, and they cannot be efficiently processed by conventional matrix or vector computations. Sparse Tucker decomposition is an important algorithm for co...
Recommendation systems, social network analysis, medical imaging, and data mining often involve processing sparse high-dimensional data. Such high-dimensional data are naturally represented as tensors, and they cannot be efficiently processed by conventional matrix or vector computations. Sparse Tucker decomposition is an important algorithm for co...
Uncertainty quantification based on stochastic spectral methods suffers from the curse of dimensionality. This issue was mitigated recently by low-rank tensor methods. However, there exist two fundamental challenges in low-rank tensor-based uncertainty quantification: how to automatically determine the tensor rank and how to pick the simulation sam...
Uncertainty quantification based on stochastic spectral methods suffers from the curse of dimensionality. This issue was mitigated recently by low-rank tensor methods. However, there exist two fundamental challenges in low-rank tensor-based uncertainty quantification: how to automatically determine the tensor rank and how to pick the simulation sam...
Uncertainty quantification has become an efficient tool for uncertainty-aware prediction, but its power in yield-aware optimization has not been well explored from either theoretical or application perspectives. Yield optimization is a much more challenging task. On one side, optimizing the generally non-convex probability measure of performance me...
Hamiltonian Monte Carlo (HMC) is an efficient Bayesian sampling method that can make distant proposals in the parameter space by simulating a Hamiltonian dynamical system. Despite its popularity in machine learning and data science, HMC is inefficient to sample from spiky and multimodal distributions. Motivated by the energy-time uncertainty relati...
Acceleration of the method of moments (MoM) solution of the volume integral equation (VIE) on unstructured meshes is performed using a precorrected tensor train (P-TT) algorithm. The elements of the MoM’s unstructured mesh are projected onto a regular Cartesian grid. This enables representation of the MoM matrix as the Toeplitz matrix of point-to-p...
Tensor methods have become a promising tool to solve high-dimensional problems in the big data era. By exploiting possible low-rank tensor factorization, many high-dimensional model-based or data-driven problems can be solved to facilitate decision making or machine learning. In this paper, we summarize the recent applications of tensor computation...
Active subspace is a model reduction method widely used in the uncertainty quantification community. In this paper, we propose analyzing the internal structure and vulnerability and deep neural networks using active subspace. Firstly, we employ the active subspace to measure the number of "active neurons" at each intermediate layer and reduce the n...
Many systems such as autonomous vehicles and quadrotors are subject to parametric uncertainties and external disturbances. These uncertainties can lead to undesired performance degradation and safety issues. Therefore, it is important to design robust control strategies to safely regulate the dynamics of a system. This paper presents a novel framew...
Tensor methods have become a promising tool to solve high-dimensional problems in the big data era. By exploiting possible low-rank tensor factorization, many high-dimensional model-based or data-driven problems can be solved to facilitate decision making or machine learning. In this paper, we summarize the recent applications of tensor computation...
Uncertainty quantification has become an efficient tool for yield prediction, but its power in yield-aware optimization has not been well explored from either theoretical or application perspectives. Yield optimization is a much more challenging task. On one side, optimizing the generally non-convex probability measure of performance metrics is dif...
The post-Moore era casts a shadow of uncertainty on many aspects of computer system design. Managing that uncertainty requires new algorithmic tools to make quantitative assessments. While prior uncertainty quantification methods, such as generalized polynomial chaos (gPC), show how to work precisely under the uncertainty inherent to physical devic...
The post-Moore era casts a shadow of uncertainty on many aspects of computer system design. Managing that uncertainty requires new algorithmic tools to make quantitative assessments. While prior uncertainty quantification methods, such as generalized polynomial chaos (gPC), show how to work precisely under the uncertainty inherent to physical devic...
Tensor computation has emerged as a powerful mathematical tool for solving high-dimensional and/or extreme-scale problems in science and engineering. The last decade has witnessed tremendous advancement of tensor computation and its applications in machine learning and big data. However, its hardware optimization on resource-constrained devices rem...
Uncertainty quantification based on generalized polynomial chaos has been used in many applications. It has also achieved great success in variation-aware design automation. However, almost all existing techniques assume that the parameters are mutually independent or Gaussian correlated, which is rarely true in real applications. For instance, in...
Tensor decomposition is an effective approach to compress over-parameterized neural networks and to enable their deployment on resource-constrained hardware platforms. However, directly applying tensor compression in the training process is a challenging task due to the difficulty of choosing a proper tensor rank. In order to achieve this goal, thi...
Distribution grids probabilistic analysis is an essential step in order to assess the daily network operability under uncertain and stress conditions. It is also functional to the development of new services that require load growth capacity or to the exploitation of new energy resources affected by uncertainty. Efficient numerical tools able to fo...
Uncertainty quantification techniques based on generalized polynomial chaos have been used in many application domains. However, almost all existing algorithms and applications have a strong assumption that the parameters are mutually independent or Gaussian correlated. This assumption is rarely true in real applications. For instance, in chip manu...
As new services and business models are being associated to the power distribution network, it becomes of great importance to include load uncertainty in predictive computational tools. In this paper, an efficient uncertainty-aware load flow analysis is described which relies on generalized Polynomial Chaos and Stochastic testing methods. It is des...
This paper presents a multi-dimensional computational method to predict the spatial variation data inside and across multiple dies of a wafer. This technique is based on tensor computation. A tensor is a high-dimensional generalization of a matrix or a vector. By exploiting the hidden low-rank property of a high-dimensional data array, the large am...
Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them su...
Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them su...
Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. In many applications the full tensor is not known, but instead received in a slice-by-slice manner over time. Streaming factorizations aim to take advantage of inherent tem...
Stochastic spectral methods have achieved great success in the uncertainty quantification of many engineering systems. In the past decade, these techniques have been in- creasingly employed in the design automation community to predict and control the impact of process variations in nano-scale chip design. Existing stochastic spectral methods, incl...
This paper generalizes stochastic collocation methods to handle correlated non-Gaussian random parameters. The key challenge is to perform a multivariate numerical integration in a correlated parameter space when computing the coefficient of each basis function via a projection step. We propose an optimization model and a block coordinate descent s...
Since the invention of generalized polynomial chaos in 2002, uncertainty quantification has impacted many engineering fields, including variation-aware design automation of integrated circuits and integrated photonics. Due to the fast convergence rate, the generalized polynomial chaos expansion has achieved orders-of-magnitude speedup than Monte Ca...