Huazhong Yang

Huazhong Yang
Tsinghua University | TH · Department of Electronic Engineering

PhD

About

693
Publications
90,115
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,270
Citations

Publications

Publications (693)
Preprint
Full-text available
Graph convolutional network (GCN), an emerging algorithm for graph computing, has achieved promising performance in graphstructure tasks. To achieve acceleration for data-intensive and sparse graph computing, ASICs such as GCNAX have been proposed for efficient execution of aggregation and combination in GCN. GCNAX reducing 8x DRAM accesses compare...
Preprint
One-shot Neural Architecture Search (NAS) has been widely used to discover architectures due to its efficiency. However, previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training because of the excessive sharing of operation parameters (i.e., large sha...
Preprint
Full-text available
Intellectual property (IP) piracy has become a non-negligible problem as the integrated circuit (IC) production supply chain is becoming increasingly globalized and separated that enables attacks by potentially untrusted attackers. Logic locking is a widely adopted method to lock the circuit module with a key and prevent hackers from cracking it. T...
Preprint
Full-text available
Computing-in-memory (CiM) is a promising technique to achieve high energy efficiency in data-intensive matrix-vector multiplication (MVM) by relieving the memory bottleneck. Unfortunately, due to the limited SRAM capacity, existing SRAM-based CiM needs to reload the weights from DRAM in large-scale networks. This undesired fact weakens the energy e...
Preprint
Full-text available
Compute-in-memory (CiM) is a promising approach to improving the computing speed and energy efficiency in dataintensive applications. Beyond existing CiM techniques of bitwise logic-in-memory operations and dot product operations, this paper extends the CiM paradigm with FAST, a new shift-based inmemory computation technique to handle high-concurre...
Preprint
Full-text available
Autonomous exploration and mapping of unknown terrains employing single or multiple robots is an essential task in mobile robotics and has therefore been widely investigated. Nevertheless, given the lack of unified data sets, metrics, and platforms to evaluate the exploration approaches, we develop an autonomous robot exploration benchmark entitled...
Preprint
Full-text available
Sparse Matrix-Matrix Multiplication (SpMM) has served as fundamental components in various domains. Many previous studies exploit GPUs for SpMM acceleration because GPUs provide high bandwidth and parallelism. We point out that a static design does not always improve the performance of SpMM on different input data (e.g., >85\% performance loss with...
Article
Thin-film transistor (TFT) has attracted enormous interests recently for its great potential in a wide range of edge computing applications, benefitting from its large-area low-cost flexible fabrications, and well integration with sensors and displays. With the support of in-situ processing of sensor data, TFT-based edge systems show their advantag...
Article
Thin-film transistor (TFT) has attracted enormous interests recently for its great potential in a wide range of edge computing applications, benefitting from its large-area low-cost flexible fabrications, and well integration with sensors and displays. With the support of in-situ processing of sensor data, TFT-based edge systems show their advantag...
Article
In recent years, convolutional neural networks (CNNs) have achieved significant advancements in various fields. However, the computation and storage overheads of CNNs are overwhelming for IoT devices. Both network pruning algorithms and hardware accelerators have been introduced to empower CNN inference at edge. Network pruning algorithms reduce th...
Article
As one type of associative memory, content-addressable memory (CAM) has become a critical component in several applications, including caches, routers, and pattern matching. Compared with the conventional CAM that could only deliver a “matched or not-matched” result, emerging multilevel CAM (ML-CAM) is capable of delivering “the degree of match” wi...
Article
Due to the electrical isolation, fewer switches, and easy control, the transformer-based integrated equalizer is promising in commercial applications. However, few balance energy transmission channels limit the transformer-based integrated equalizer applied to the long battery string. Therefore, a integrated voltage equalizer with a modularized arc...
Preprint
Discovering hazardous scenarios is crucial in testing and further improving driving policies. However, conducting efficient driving policy testing faces two key challenges. On the one hand, the probability of naturally encountering hazardous scenarios is low when testing a well-trained autonomous driving strategy. Thus, discovering these scenarios...
Article
An energy-efficient convolutional neural network (CNN) accelerator is proposed for the video application. Previous works exploited the sparsity of differential (Diff) frame activation, but the improvement is limited as many Diff-frame data is small but non-zero. Processing of irregular sparse data also leads to low hardware utilization. To solve th...
Article
With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage in...
Article
Computing-in-memory (CIM) is a new architecture which is more energy-efficient than the Von Neumann architecture due to the fact that it performs calculation in the memory units which can reduce a large amount of data movement. Nowadays, CIM with non-volatile memory (nvCIM), such as resistive random access memory (RRAM), has become a research front...
Preprint
Full-text available
We introduce a curriculum learning algorithm, Variational Automatic Curriculum Learning (VACL), for solving challenging goal-conditioned cooperative multi-agent reinforcement learning problems. We motivate our paradigm through a variational perspective, where the learning objective can be decomposed into two terms: task learning on the current task...
Article
Ternary content addressable memory (TCAM) is one type of associative memory and has been widely used in caches, routers, and many other mapping-aware applications. While the conventional SRAM-based TCAM is high speed and bulky, there have been denser but slower and less reliable nonvolatile TCAMs using nonvolatile memory (NVM) devices. Meanwhile, s...
Article
Full-text available
Computerized interpretation of electrocardiogram plays an important role in daily cardiovascular healthcare. However, inaccurate interpretations lead to misdiagnoses and delay proper treatments. In this work, we built a high-quality Chinese 12-lead resting electrocardiogram dataset with 15,357 records, and called for a community effort to improve t...
Article
Deep learning-based intelligent electrocardiogram (ECG) diagnosis algorithms heavily rely on large annotated datasets. Unfortunately, in the context of ECG diagnosis, privacy issues and the high cost of data annotations lead to a shortage of ECG datasets which severely limits the performance of the state-of-the-art ECG diagnosis algorithms. In this...
Article
The monitoring of bridge dynamic displacement under normal operation conditions has been a vital need for the assessment of the serviceability of bridges. Traditionally, it was mostly accomplished by the Global Navigation Satellite System (GNSS). However, the poor measurement accuracy and low sampling rate of the GNSS limit the use of monitored dis...
Conference Paper
This paper investigates reconfigurable physical unclonable function (PUF) design by exploiting the polarization switching variation and stochasticity in ferroelectric field-effect-transistors (FeFETs). The proposed PUFs include 1-transistor/cell (1T/C) and 2T/C designs. The denser 1T/C PUF splits random ‘0’ and ‘1’ states using a tactically pre-def...
Conference Paper
This paper investigates reconfigurable physical unclonable function (PUF) design by exploiting the polarization switching variation and stochasticity in ferroelectric field-effect-transistors (FeFETs). The proposed PUFs include 1-transistor/cell (1T/C) and 2T/C designs. The denser 1T/C PUF splits random ‘0’ and ‘1’ states using a tactically pre-def...
Article
The recent development of Internet-of-Things (IoT) technologies has enabled smaller and lower-cost sensor nodes, motivating the deployment of more flexible and scalable sensor networks for infrastructure monitoring applications. However, because these nodes tend to be affected by environmental conditions and aging, they are prone to long-term drift...
Article
Always-on intelligent visual perception applications are widely deployed in edges in the AIoT era. In order to eliminate power costs of data conversion and transmission, this paper proposes Senputing, an ultra-low-power processing-in-sensor chip that completely fuses sensing and computing together for a BNN-based hierarchical processing system. Thi...
Article
With the down-scaling of CMOS technology, the design complexity of very large-scale integrated is increasing. Although the application of machine learning (ML) techniques in electronic design automation (EDA) can trace its history back to the 1990s, the recent breakthrough of ML and the increasing complexity of EDA tasks have aroused more interest...
Article
Full-text available
In recent years, reinforcement learning (RL) has been widely used to solve multi-agent navigation tasks, and a high-fidelity level for the simulator is critical to narrow the gap between simulation and real-world tasks. However, high-fidelity simulators have high sampling costs and bottleneck the training model-free RL algorithms. Hence, we propose...
Article
Structured light (SL) based three-dimensional (3D) imaging technology has been widely employed in many fields of computer vision. However, currently available SL based depth sensors are sensitive to imaging noises, which severely limits the performance of subsequent advanced vision tasks. In this paper, we propose a robust and practical SL illumina...
Article
Multi-Robot Exploration (MR-Exploration) is a primary task providing the location and map for many multi-robot applications. To improve system performance, Convolutional Neural Network (CNN) is introduced by recent researches into critical components in MR-Exploration, such as Feature-point Extraction (FE) and Place Recognition (PR). This CNN-based...
Article
Electrocardiography (ECG) arrhythmia heartbeat classification is essential for automatic cardiovascular diagnosis system. However, the enormous differences of ECG signals among individuals and high price of labeled data have brought huge challenges for current classification algorithms based on deep neural networks and prevented these models from a...
Preprint
Adversarial attacks have rendered high security risks on modern deep learning systems. Adversarial training can significantly enhance the robustness of neural network models by suppressing the non-robust features. However, the models often suffer from significant accuracy loss on clean data. Ensemble training methods have emerged as promising solut...
Article
Always-on keyword spotting (KWS) that detects wake-up words has been the indispensable module in the voice interaction system. However, the ultra-low-power embedded devices put forward strict requirements on energy consumption, latency, and recognition accuracy of KWS. In this work, we propose a near-sensor processing architecture of feature-config...
Preprint
Full-text available
Compute-in-memory (CiM) is a promising approach to alleviating the memory wall problem for domain-specific applications. Compared to current-domain CiM solutions, charge-domain CiM shows the opportunity for higher energy efficiency and resistance to device variations. However, the area occupation and standby leakage power of existing SRAMbased char...
Preprint
Full-text available
Ternary content addressable memory (TCAM) has been a critical component in caches, routers, etc., in which density, speed, power efficiency, and reliability are the major design targets. There have been the conventional low-write-power but bulky SRAM-based TCAM design, and also denser but less reliable or higher-write-power TCAM designs using nonvo...
Preprint
Full-text available
With the down-scaling of CMOS technology, the design complexity of very large-scale integrated (VLSI) is increasing. Although the application of machine learning (ML) techniques in electronic design automation (EDA) can trace its history back to the 90s, the recent breakthrough of ML and the increasing complexity of EDA tasks have aroused more inte...
Article
Compute-in-memory (CiM) is a promising approach to alleviating the memory wall problem for domain-specific applications. Compared to current-domain CiM solutions, charge-domain CiM shows the opportunity for higher energy efficiency and resistance to device variations. However, the area occupation and standby leakage power of existing SRAMbased char...
Preprint
Full-text available
Convolutional neural networks (CNNs) are vulnerable to adversarial examples, and studies show that increasing the model capacity of an architecture topology (e.g., width expansion) can bring consistent robustness improvements. This reveals a clear robustness-efficiency trade-off that should be considered in architecture design. Recent studies have...
Chapter
Budgeted pruning is the problem of pruning under resource constraints. In budgeted pruning, how to distribute the resources across layers (i.e., sparsity allocation) is the key problem. Traditional methods solve it by discretely searching for the layer-wise pruning ratios, which lacks efficiency. In this paper, we propose Differentiable Sparsity Al...
Article
Convolutional neural network (CNN) has been widely deployed in various processors for intelligent visual signal processing. However, the large amount of activations and weights in CNN causes huge power consumption on SRAM access. Data-adaptive SRAM design is a widely-studied method to reduce SRAM reading power based on the utilization of data patte...
Chapter
This work proposes a novel Graph-based neural ArchiTecture Encoding Scheme, a.k.a. GATES, to improve the predictor-based neural architecture search. Specifically, different from existing graph-based schemes, GATES models the operations as the transformation of the propagating information, which mimics the actual data processing of neural architectu...
Preprint
Neural Architecture Search (NAS) has received extensive attention due to its capability to discover neural network architectures in an automated manner. aw_nas is an open-source Python framework implementing various NAS algorithms in a modularized manner. Currently, aw_nas can be used to reproduce the results of mainstream NAS algorithms of various...
Preprint
Full-text available
Binary Neural Networks (BNNs) have received significant attention due to their promising efficiency. Currently, most BNN studies directly adopt widely-used CNN architectures, which can be suboptimal for BNNs. This paper proposes a novel Binary ARchitecture Search (BARS) flow to discover superior binary architecture in a large design space. Specific...