Jiaqi GuUniversity of Texas at Austin | UT · Department of Electrical & Computer Engineering
Jiaqi Gu
Bachelor of Engineering
About
98
Publications
5,873
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,534
Citations
Introduction
Education
September 2018 - May 2023
September 2014 - June 2018
Publications
Publications (98)
Electronic-photonic integrated circuits (EPICs) offer transformative potential for next-generation high-performance AI but require interdisciplinary advances across devices, circuits, architecture, and design automation. The complexity of hybrid systems makes it challenging even for domain experts to understand distinct behaviors and interactions a...
In recent decades, the demand for computational power has surged, particularly with the rapid expansion of artificial intelligence (AI). As we navigate the post-Moore's law era, the limitations of traditional electrical digital computing, including process bottlenecks and power consumption issues, are propelling the search for alternative computing...
Nanophotonic device design aims to optimize photonic structures to meet specific requirements across various applications. Inverse design has unlocked non-intuitive, high-dimensional design spaces, enabling the discovery of high-performance devices beyond heuristic or analytic methods. The adjoint method, which calculates gradients for all variable...
Electromagnetic field simulation is central to designing, optimizing, and validating photonic devices and circuits. However, costly computation associated with numerical simulation poses a significant bottleneck, hindering scalability and turnaround time in the photonic circuit design process. Neural operators offer a promising alternative, but exi...
Optimization methods are frequently exploited in the design of silicon photonic devices. In this paper, we demonstrate that pushing the objective function to its minimum during optimization often results in devices that gradually become more sensitive to perturbations of design variables. The dominant strategy of selecting the design with the small...
The finite-difference time-domain (FDTD) method, which is important in photonic hardware design flow, is widely adopted to solve time-domain Maxwell equations. However, FDTD is known for its prohibitive runtime cost, taking minutes to hours to simulate a single device. Recently, AI has been applied to realize orders-of-magnitude speedup in partial...
Photonic computing shows promise for transformative advancements in machine learning (ML) acceleration, offering ultrafast speed, massive parallelism, and high energy efficiency. However, current photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint. To ad...
Optical neural networks (ONNs) are promising hardware platforms for next-generation neuromorphic computing due to their high parallelism, low latency, and low energy consumption. However, previous integrated photonic tensor cores (PTCs) consume numerous single-operand optical modulators for signal and weight encoding, leading to large area costs an...
We presents a hardware-efficient o ptical c omputing a rchitecture f or structured neural networks (OSNNs). The performance of our neural chip was validated on a photonic-electronic testing platform experimentally, demonstrating reduced optical component utilization and small deviation.
Employing Integrated Photonic Chip-Based ONNs for early pancreatic cancer detection, achieved an 80% Dice score, demonstrating efficient, high-speed alternatives to traditional electrical training systems for medical imaging.
The optical neural network (ONN) is a promising hardware platform for next-generation neuromorphic computing due to its high parallelism, low latency, and low energy consumption. However, previous integrated photonic tensor cores (PTCs) consume numerous single-operand optical modulators for signal and weight encoding, leading to large area costs an...
Photonic computing shows promise for transformative advancements in machine learning (ML) acceleration, offering ultra-fast speed, massive parallelism, and high energy efficiency. However, current photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint. To a...
The wide adoption and significant computing resource consumption of attention-based Transformers, e.g., Vision Transformer and large language models, have driven the demands for efficient hardware accelerators. While electronic accelerators have been commonly used, there is a growing interest in exploring photonics as an alternative technology due...
Transformers have achieved great success in machine learning applications. Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root Mean Square Normalization (RMSNorm), play a critical role in accelerating and stabilizing the training of Transformers. While LayerNorm recenters and rescales input vectors, RMSNorm only rescales...
We deploy a compact butterfly-style photonic-electronic neural chip on ResNet-20 and achieve > 85% measured accuracy on the CIFAR-10 dataset with only 3-bit weight programming resolutions, showing its practicality in implementing complicated deep learning tasks.
Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the p...
Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the p...
Among different quantum algorithms, PQC for QML show promises on near-term devices. To facilitate the QML and PQC research, a recent python library called TorchQuantum has been released. It can construct, simulate, and train PQC for machine learning tasks with high speed and convenient debugging supports. Besides quantum for ML, we want to raise th...
Optical computing is an emerging technology for next-generation efficient artificial intelligence (AI) due to its ultra-high speed and efficiency. Electromagnetic field simulation is critical to the design, optimization, and validation of photonic devices and circuits. However, costly numerical simulation significantly hinders the scalability and t...
Analog computing has been recognized as a promising low-power alternative to digital counterparts for neural network acceleration. However, conventional analog computing is mainly in a mixed-signal manner. Tedious analog/digital (A/D) conversion cost significantly limits the overall system's energy efficiency. In this work, we devise an efficient a...
As deep learning models and datasets rapidly scale up, network training is extremely time-consuming and resource-costly. Instead of training on the entire dataset, learning with a small synthetic dataset becomes an efficient solution. Extensive research has been explored in the direction of dataset condensation, among which gradient matching achiev...
Analog/mixed-signal circuit design is one of the most complex and time-consuming stages in the whole chip design process. Due to various process, voltage, and temperature (PVT) variations from chip manufacturing, analog circuits inevitably suffer from performance degradation. Although there has been plenty of work on automating analog circuit desig...
In the post Moore's era, conventional electronic digital computing platforms have encountered escalating challenges to support massively parallel and energy-hungry artificial intelligence (AI) workloads. Intelligent applications in data centers, edge devices, and autonomous vehicles have restricted requirements in throughput, power, and latency, wh...
Quantum Neural Network (QNN) is drawing increasing research interest thanks to its potential to achieve quantum advantage on near-term Noisy Intermediate Scale Quantum (NISQ) hardware. In order to achieve scalable QNN learning, the training process needs to be offloaded to real quantum machines instead of using exponential-cost classical simulators...
Optical phase change material (PCM) has emerged promising to enable photonic in-memory neurocomputing in optical neural network (ONN) designs. However, massive photonic tensor core (PTC) reuse is required to implement large matrix multiplication due to the limited single-core scale. The resultant large number of PCM writes during inference incurs s...
Optical neural networks (ONNs) are promising hardware platforms for next-generation artificial intelligence acceleration with ultrafast speed and low-energy consumption. However, previous ONN designs are bounded by one multiply–accumulate operation per device, showing unsatisfying scalability. In this work, we propose a scalable ONN architecture, d...
We propose a 2-bit electronic-photonic shifter based on microdisk add-drop switches with experimental demonstrations. The proposed shifter can be deployed in future high speed and energy efficient electronic-photonic arithmetic logic units.
Photonic tensor cores (PTCs) are essential building blocks for optical artificial intelligence (AI) accelerators based on programmable photonic integrated circuits. PTCs can achieve ultra-fast and efficient tensor operations for neural network (NN) acceleration. Current PTC designs are either manually constructed or based on matrix decomposition th...
With the recent advances in optical phase change material (PCM), photonic in-memory neurocomputing has demonstrated its superiority in optical neural network (ONN) designs with near-zero static power consumption, time-of-light latency, and compact footprint. However, photonic tensor cores require massive hardware reuse to implement large matrix mul...
As deep learning has shown revolutionary performance in many artificial intelligence applications, its escalating computation demand requires hardware accelerators for massive parallelism and improved throughput. The optical neural network (ONN) is a promising candidate for next-generation neurocomputing due to its high parallelism, low latency, an...
Vision transformers (ViTs) have attracted much attention for their superior performance on computer vision tasks. To address their limitations of single-scale low-resolution representations, prior work adapts ViTs to high-resolution dense prediction tasks with hierarchical architectures to generate pyramid features. However, multi-scale representat...
Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that could represent a paradigm shift in efficient AI with its CMOS-compatibility, flexibility, ultra-low execution latency, and high energy efficiency. In-situ training on the online programmable photonic chips is appealing but still encounters challenging issues...
Quantum Neural Network (QNN) is a promising application towards quantum advantage on near-term quantum hardware. However, due to the large quantum noises (errors), the performance of QNN models has a severe degradation on real quantum devices. For example, the accuracy gap between noise-free simulation and noisy results on IBMQ-Yorktown for MNIST-4...
Discrete cosine transform (DCT) and other Fourier-related transforms have broad applications in scientific computing. However, off-the-shelf high-performance multi-dimensional DCT (MD DCT) libraries are not readily available in parallel computing systems. Public MD DCT implementations leverage a straightforward method that decomposes the computatio...
Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though extensive efficient accelerator designs, from traditional electronics to emerging photonics, have been successfully...
Quantum noise is the key challenge in Noisy Intermediate-Scale Quantum (NISQ) computers. Limited research efforts have explored a higher level of optimization by making the quantum circuit resilient to noise. We propose and experimentally implement QuantumNAS, the first comprehensive framework for noise-adaptive co-search of variational circuit and...
Integrated photonics has shown extraordinary potential in optical computing. Various optical computing modules have been investigated, among which the digital comparator is a significant element for any arithmetic logic unit. In this study, an architecture of a wavelength‐division‐multiplexing‐based (WDM‐based) electronic–photonic digital comparato...
Optical neural networks (ONNs) have demonstrated record-breaking potential in high-performance neuromorphic computing due to their ultra-high execution speed and low energy consumption. However, current learning protocols fail to provide scalable and efficient solutions to photonic circuit optimization in practical applications. In this work, we pr...
Machine learning frameworks adopt iterative optimizers to train neural networks. Conventional eager execution separates the updating of trainable parameters from forward and backward computations. However, this approach introduces nontrivial training time overhead due to the lack of data locality and computation parallelism. In this work, we propos...
We propose and experimentally demonstrate a 3-8 wavelength-division-multiplexing (WDM) based optical decoder using microring-based add-drop switches and filters. The proposed decoder has a smaller footprint and consumes lower power compared with previous designs.
Optical neural networks (ONNs) have demonstrated record-breaking potential in high-performance neuromorphic computing due to their ultra-high execution speed and low energy consumption. However, current learning protocols fail to provide scalable and efficient solutions to photonic circuit optimization in practical applications. In this work, we pr...
Logic synthesis is a fundamental step in hardware design whose goal is to find structural representations of Boolean functions while minimizing delay and area. If the function is completely-specified, the implementation accurately represents the function. If the function is incompletely-specified, the implementation has to be true only on the care...
The recent rapid progress in integrated photonics has catalyzed the development of integrated optical computing in this post-Moore's law era. Electronic-photonic digital computing, as a new paradigm to achieve high-speed and power-efficient computation, has begun to attract attention. In this paper, we systematically investigate the optical sequent...
As machine learning models and dataset escalate in scales rapidly, the huge memory footprint impedes efficient training. Reversible operators can reduce memory consumption by discarding intermediate feature maps in forward computations and recover them via their inverse functions in the backward propagation. They save memory at the cost of computat...
As a promising neuromorphic framework, the optical neural network (ONN) demonstrates ultra-high inference speed with low energy consumption. However, the previous ONN architectures have high area overhead which limits their practicality. In this paper, we propose an areaefficient ONN architecture based on structured neural networks, leveraging opti...
Integrated photonics offers attractive solutions for realizing combinational logic for high-performance computing. The integrated photonic chips can be further optimized using multiplexing techniques such as wavelength-division multiplexing (WDM). In this paper, we propose a WDM-based electronic-photonic switching network (EPSN) to realize the func...
Placement for very large-scale integrated (VLSI) circuits is one of the most important steps for design closure. We propose a novel GPU-accelerated placement framework DREAMPlace, by casting the analytical placement problem equivalently to training a neural network. Implemented on top of a widely adopted deep learning toolkit
PyTorch
, with custo...
The past two decades have witnessed the stagnation of the clock speed of microprocessors followed by the recent faltering of Moore’s law as nanofabrication technology approaches its unavoidable physical limit. Vigorous efforts from various research areas have been made to develop power-efficient and ultrafast computing machines in this post-Moore’s...
Placement is an important step in modern very-large-scale integrated (VLSI) designs. Detailed placement is a placement refining procedure intensively called throughout the design flow, thus its efficiency has a vital impact on design closure. However, since most detailed placement techniques are inherently greedy and sequential, they are generally...