About
66
Publications
7,315
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
558
Citations
Introduction
Personal website: https://zishenwan.github.io.
I’m a PhD student in Electrical and Computer Engineering, Georgia Tech. My research interest spans from high-level algorithms to low-level computer architecture and VLSI, focusing on building efficient and reliable hardware and system for autonomous machines, edge intelligence, and ML.
Before joining Georgia Tech, I received M.S. from Harvard University in 2020, and B.S. from Harbin Institute of Technology in 2018, both in Electrical Engineering.
Skills and Expertise
Publications
Publications (66)
Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges...
Reliability and safety are critical in autonomous machine services, such as autonomous vehicles and aerial drones. In this paper, we first present an open-source Micro Aerial Vehicles (MAVs) reliability analysis framework, MAVFI, to characterize transient fault's impacts on the end-to-end flight metrics, e.g., flight time, success rate. Based on ou...
Quantization can help reduce the memory, compute, and energy demands of deep neural networks without significantly harming their quality. However, whether these prior techniques, applied traditionally to image-based models, work with the same efficacy to the sequential decision-making process in reinforcement learning remains an unanswered question...
This book provides a thorough overview of the state-of-the-art field-programmable gate array (FPGA)-based robotic computing accelerator designs and summarizes their adopted optimized techniques. This book consists of ten chapters, delving into the details of how FPGAs have been utilized in robotic perception, localization, planning, and multi-robot...
The next ubiquitous computing platform, following personal computers and smartphones, is poised to be inherently autonomous, encompassing technologies like drones, robots, and self-driving cars. Ensuring reliability for these autonomous machines is critical. However, current resiliency solutions make fundamental trade-offs between reliability and c...
The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, are facing challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability. To develop next-generation cognitive AI systems, neuro-symbolic AI emerges as a promising paradigm, fusing neural and sym...
The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, are facing challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability. To develop next-generation cognitive AI systems, neuro-symbolic AI emerges as a promising paradigm, fusing neural and sym...
Dek
Moving toward reliable autonomous machines.
The prediction accuracy of deep neural networks (DNNs) deployed at the edge can deteriorate over time due to shifts in the data distribution. For heightened robustness, it’s crucial for DNNs to continually refine and improve their predictive capabilities. However, adaptation in resource-limited edge environments is fraught with challenges: (i) new...
The remarkable advancements in artificial intelligence (AI), primarily driven by deep neural networks, are facing challenges surrounding unsustainable computational trajectories, limited robustness, and a lack of explainability. To develop next-generation cognitive AI systems, neuro-symbolic AI emerges as a promising paradigm, fusing neural and sym...
We introduce RobotPerf, a vendor-agnostic bench-marking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which meas...
Autonomous systems, such as Unmanned Aerial Vehicles (UAVs), are expected to run complex reinforcement learning (RL) models to execute fully autonomous position-navigation-time tasks within stringent onboard weight and power constraints. We observe that reducing onboard operating voltage can benefit the energy efficiency of both the computation and...
While deep neural networks are being utilized heavily for autonomous driving, they need to be adapted to new unseen environmental conditions for which they were not trained. We focus on a safety critical application of lane detection, and propose a lightweight, fully unsupervised, real-time adaptation approach that only adapts the batch-normalizati...
Accurate identification of the target and tracking it at high speeds using drone-mounted cameras and compute hardware finds military and commercial applications. Conventional frame-based cameras and convolutional neural networks (CNNs) extract detailed spatial information to show high accuracy but suffer from lower throughput caused by large models...
Safety and resiliency are essential components of autonomous vehicles. In this research, we introduce ROSFI, the first robot operating system (ROS) resilience analysis methodology, to assess the effect of silent data corruption (SDC) on mission metrics. We use unmanned aerial vehicles (UAVs) as a case study to demonstrate that system-level paramete...
Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges...
Robotic computing has reached a tipping point, with a myriad of robots (e.g., drones, self-driving cars, logistic robots) being widely applied in diverse scenarios. The continuous proliferation of robotics, however, critically depends on efficient computing substrates, driven by real-time requirements, robotic size-weight-and-power constraints, cyb...
We introduce an early-phase bottleneck analysis and characterization model called the F-1 for designing computing systems that target autonomous Unmanned Aerial Vehicles (UAVs). The model provides insights by exploiting the fundamental relationships between various components in the autonomous UAV, such as sensor, compute, and body dynamics. To gua...
Swarm intelligence is being increasingly deployed in autonomous systems, such as drones and unmanned vehicles. Federated reinforcement learning (FRL), a key swarm intelligence paradigm where agents interact with their own environments and cooperatively learn a consensus policy while preserving privacy, has recently shown potential advantages and ga...
As we march towards the age of ubiquitous intelligence, we note that AI and intelligence are progressively moving from the cloud to the edge. The success of Edge-AI is pivoted on innovative circuits and hardware that can enable inference and limited learning in resource-constrained edge autonomous systems. This paper introduces a series of ultra-lo...
Simultaneous Localization and Mapping (SLAM) estimates agents' trajectories and constructs maps, and localization is a fundamental kernel in autonomous machines at all computing scales, from drones, AR, VR to self-driving cars. In this work, we present an energy-efficient and runtime-reconfigurable FPGA-based accelerator for robotic localization. W...
Learning-based navigation systems are widely used in autonomous applications, such as robotics, unmanned vehicles and drones. Specialized hardware accelerators have been proposed for high-performance and energy-efficiency for such navigational tasks. However, transient and permanent faults are increasing in hardware systems and can catastrophically...
Learning-based navigation systems are widely used in autonomous applications, such as robotics, unmanned vehicles and drones. Specialized hardware accelerators have been proposed for high-performance and energy-efficiency for such navigational tasks. However, transient and permanent faults are increasing in hardware systems and can catastrophically...
We present a bottleneck analysis tool for designing compute systems for autonomous Unmanned Aerial Vehicles (UAV). The tool provides insights by exploiting the fundamental relationships between various components in the autonomous UAV such as sensor, compute, body dynamics. To guarantee safe operation while maximizing the performance (e.g., velocit...
Aerial autonomous machines (Drones) has a plethora of promising applications and use cases. While the popularity of these autonomous machines continues to grow, there are many challenges, such as endurance and agility, that could hinder the practical deployment of these machines. The closed-loop control frequency must be high to achieve high agilit...
In our past few years’ of commercial deployment experiences, we identify localization as a critical task in autonomous machine applications, and a great acceleration target. In this paper, based on the observation that the visual frontend is a major performance and energy consumption bottleneck, we present our design and implementation of an energy...
Stereo matching is a critical task for robot navigation and autonomous vehicles, providing the depth estimation of surroundings. Among all stereo matching algorithms, Efficient Large-scale Stereo (ELAS) offers one of the best tradeoffs between efficiency and accuracy. However, due to the inherent iterative process and unpredictable memory access pa...
Stereo matching is a critical task for robot navigation and autonomous vehicles, providing the depth estimation of surroundings. Among all stereo matching algorithms, Efficient Large-scale Stereo (ELAS) offers one of the best tradeoffs between efficiency and accuracy. However, due to the inherent iterative process and unpredictable memory access pa...
In our past few years' of commercial deployment experiences, we identify localization as a critical task in autonomous machine applications, and a great acceleration target. In this paper, based on the observation that the visual frontend is a major performance and energy consumption bottleneck, we present our design and implementation of an energy...
Building domain-specific architectures for autonomous aerial robots is challenging due to a lack of systematic methodology for designing onboard compute. We introduce a novel performance model called the F-1 roofline to help architects understand how to build a balanced computing system for autonomous aerial robots considering both its cyber (senso...
Before we delve into utilizing FPGAs for accelerating robotic workloads, in this chapter we first provide the background of FPGA technologies so that readers without prior knowledge can grasp the basic understanding of what an FPGA is and how an FPGA works. We also introduce partial reconfiguration, a technique that exploits the flexibility of FPGA...
FPGAs provide rich I/O interfaces, flexibility, and capability of handling complex workloads with high performance and low energy consumption, thus FPGAs are ideal compute substrates to deploy in autonomous driving systems. In this chapter, we present a detailed case study on building a commercial autonomous driving compute system, especially the c...
Cameras are widely used in intelligent robot systems because of their lightweight and rich information for perception. Cameras can be used to complete a variety of basic tasks of intelligent robots, such as visual odometry (VO), place recognition, object detection, and recognition. With the development of convolutional neural networks (CNNs), we ca...
The commercialization of autonomous robots is a thriving sector, and likely to be the next major compute demand driver, after PC, cloud computing, and mobile computing. After examining various compute substrates for robotic computing, we believe that FPGAs are currently the best compute substrate for robotic applications for several reasons: first,...
Thus far, we have focused on the utilization of FPGAs in single-robot applications. In this chapter, we consider collaborative exploration through a team of robots, in which the robots share information with each other or even with the infrastructure [301]. Especially, we discuss how FPGAs can be utilized to accelerate multi-robot acceleration work...
Localization, i.e., ego-motion estimation, is one of the most fundamental tasks to autonomous machines, in which an agent calculates the position and orientation of itself in a given frame of reference, i.e., map. For general robotic software stacks, localization is the building block of many tasks. Knowing the translational pose enables a robot or...
Motion planning is the module that computes how a robot or autonomous vehicle maneuvers itself. The task of motion planning is to generate a trajectory without colliding any obstacles and sends it to the feedback control for physical robot control execution. The planned trajectory is usually specified and represented as a sequence of planned trajec...
The last decade has seen significant progress in the development of robotics, spanning from algorithms, mechanics to hardware platforms. Various robotic systems, like manipulators, legged robots, unmanned aerial vehicles, and self-driving cars have been designed for search and rescue [1, 2], exploration [3, 4], package delivery [5], entertainment [...
Perception is related to many robotic applications where sensory data and artificial intelligence techniques are involved. The goal of perception is to sense the dynamic environment surrounding the robot and to build a reliable and detailed representation of this environment based on sensory data. Since all subsequent localization, planning, and co...
Due to radiation tolerance requirements, the compute power of space-grade ASICs is usually decades behind the state-of-the-art commercial off-the-shelf processors. On the other hand, space-grade FPGAs deliver high reliability, adaptability, processing power, and energy efficiency, and are expected to close the two-decade performance gap between com...
We introduce the "Formula-1" (F-1) roofline model to understand the role of computing in aerial autonomous machines. The model provides insights by exploiting the fundamental relationships between various components in an aerial robot, such as sensor framerate, compute performance, and body dynamics (physics). The model serves as a tool that can ai...
Time-lens technology is of significant interest in signal processing and optical communication. The impacts of group velocity dispersion (GVD) on ultrafast pulse shaping in a time-lens system based on four-wave mixing are explored in this paper. The output signals of temporal magnification and time-to-frequency conversion under different GVDs are t...
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for...