About
306
Publications
25,245
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,618
Citations
Introduction
Current institution
Publications
Publications (306)
Recently, Diffusion Transformers (DiTs) have emerged as a dominant architecture in video generation, surpassing U-Net-based models in terms of performance. However, the enhanced capabilities of DiTs come with significant drawbacks, including increased computational and memory costs, which hinder their deployment on resource-constrained devices. Cur...
Low-bit model quantization for image super-resolution (SR) is a longstanding task that is renowned for its surprising compression and acceleration ability. However, accuracy degradation is inevitable when compressing the full-precision (FP) model to ultra-low bit widths (2~4 bits). Experimentally, we observe that the degradation of quantization is...
Large language models (LLMs) have achieved remarkable success in natural language processing (NLP) tasks, yet their substantial memory requirements present significant challenges for deployment on resource-constrained devices. Singular Value Decomposition (SVD) has emerged as a promising compression technique for LLMs, offering considerable reducti...
While super-resolution (SR) methods based on diffusion models (DM) have demonstrated inspiring performance, their deployment is impeded due to the heavy request of memory and computation. Recent researchers apply two kinds of methods to compress or fasten the DM. One is to compress the DM into 1-bit, aka binarization, alleviating the storage and co...
The map matching of cellular data reconstructs real trajectories of users by exploiting the sequential connections between mobile devices and cell towers. The difficulty in obtaining paired cellular-GPS data and the cellular variation compromise the accuracy and reliability of existing map matching approaches. In this paper, we propose a novel u ns...
There has been a growing interest in applying machine learning to real-world tasks. However, due to the black-box nature of machine learning models, it is crucial to 1) verify important properties of a model and 2) understand the reasons behind a model’s prediction before deploying them in a production environment. Existing approaches typically han...
3D Gaussian splatting (3DGS) suggests the use of explicit point-based 3D representations for high-fidelity novel view synthesis, with training and rendering speeds that are better than prior neural radiance fields. However, 3DGS relies heavily on synthetic point clouds generated by structure from motion (SfM) or multi view stereo (MVS) techniques,...
The technology of facial expression reconstruction has paved the way for various face-centric applications such as virtual reality (VR) modeling, human-computer interaction, and affective computing. Existing vision-based solutions present challenges in privacy leakage and poor lighting conditions. In this paper, we introduce a nonintrusive facial e...
As one of the most essential accessories, headsets have been widely used in common online conversations. The metal coil vibration patterns of headset speakers/microphones have been proven to be highly correlated with the speaker-produced/microphone-received sound. This paper presents an online conversation eavesdropping system, RFSpy , which uses o...
User authentication is evolving with expanded applications and innovative techniques. New authentication approaches utilize RF signals to sense specific human characteristics, offering a contactless and nonintrusive solution. However, these RF signal-based methods struggle with challenges in open-world scenarios, i.e., dynamic environments, daily b...
The emergence of the Simultaneous Wireless Information and Power Transfer (SWIPT) technology makes it possible to achieve energy sustainability in the Wireless Sensor Networks (WSNs). However, little attention was paid to the large-scale SWIPT-enabled WSNs. To this end, we synthesize the network Energy Efficiency (EE), energy sustainability conditi...
With the improvement of edge-based autonomous systems such as mobile Industrial IoT(IIoT) networks, edge devices can capture and upload videos with increasing bitrates. Massive edge-computing end nodes are eager for adequate multimedia data to satisfy the requirements of real-time video services. However, existing encoding standards for video servi...
Recent advancements in computational chemistry have leveraged the power of trans-former-based language models, such as MoLFormer, pre-trained using a vast amount of simplified molecular-input line-entry system (SMILES) sequences, to understand and predict molecular properties and activities, a critical step in fields like drug discovery and materia...
Large Language Models (LLMs) have greatly pushed forward advancements in natural language processing, yet their high memory and computational demands hinder practical deployment. Binarization, as an effective compression technique, can shrink model weights to just 1 bit, significantly reducing the high demands on computation and memory. However, cu...
Although substantial progress has been made in automatically verifying whether distributed routing configurations conform to certain requirements, diagnosing and repairing configuration errors remains manual and time-consuming. To fill this gap, we propose S^2Sim, a novel system for automatic routing configuration diagnosis and repair. Our key insi...
Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries. However, traditional LTR models encounter two principal obstacles that lead to suboptimal performance: (1) the lack of well-annotated query-webpage pairs with ranking scores covering a diverse range of search que...
Both Transformer and Graph Neural Networks (GNNs) have been employed in the domain of learning to rank (LTR). However, these approaches adhere to two distinct yet complementary problem formulations: ranking score regression based on query-webpage pairs, and link prediction within query-webpage bipartite graphs, respectively. While it is possible to...
Combination with the Simultaneous Wireless Information and Power Transfer (SWIPT) technology is expected as a promising solution to the issue of energy constraint in Wireless Sensor Networks (WSN). However, little attention is paid to the energy sustainability in SWIPT-enabled WSN with limited Mobile Energy Access Point (MEAP) configurations. To th...
Sparse mobile crowdsensing, driven by the increasing ubiquity of smartphones, has emerged as a popular method for data collection. This approach can reconstruct the whole sensing map by inferring missing temporal data from their spatial-temporal correlations to sparse samples. Traditional methods face challenges in capturing non-linear correlations...
Cloud services have shifted from monolithic designs to microservices running on cloud-native infrastructure with monitoring systems to ensure service level agreements (SLAs). However, traditional monitoring systems no longer meet the demands of cloud-native monitoring. In Alibaba’s “double eleven” shopping festival, it is observed that the monitor...
Both Transformer and Graph Neural Networks (GNNs) have been used in learning to rank (LTR), however, they adhere to two distinct yet complementary problem formulations, i.e., ranking score regression based on query-webpage pairs and link prediction within query-webpage bipartite graphs, respectively. Though it is possible to pre-train GNNs or Trans...
While Learning to Rank (LTR) is widely employed in web searches to prioritize pertinent webpages from the retrieved contents based on input queries, traditional LTR models stumble over two principal stumbling blocks leading to subpar performance: 1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of variou...
Wireless Sensor Networks (WSNs), which provide perception services for the Internet of Things (IoT) infrastructure, usually suffer from constrained energy resources. However, the fact that the data collected by WSNs often exhibit spatial-temporal correlation leads to the waste of energy. In addition, load imbalance among sensor nodes also makes ene...
Low-bit quantization has become widespread for compressing image super-resolution (SR) models for edge deployment, which allows advanced SR models to enjoy compact low-bit parameters and efficient integer/bitwise constructions for storage compression and inference acceleration, respectively. However, it is notorious that low-bit quantization degrad...
Advanced diffusion models (DMs) perform impressively in image super-resolution (SR), but the high memory and computational costs hinder their deployment. Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating DMs. Nonetheless, due to the model structure and the multi-step iterative attribute of DMs, existing...
Pretrained language models have shown promise in analysing nucleotide sequences, yet a versatile model excelling across diverse tasks with a single pretrained weight set remains elusive. Here we introduce RNAErnie, an RNA-focused pretrained model built upon the transformer architecture, employing two simple yet effective strategies. First, RNAErnie...
Three-dimensional convolutional neural networks (3D-CNNs) and full connection long short-term memory networks (FC-LSTMs) have been demonstrated as a kind of powerful non-intrusive approaches in fall detection. However, the feature extration of 3D-CNN-based requires a large-scale dataset. Meanwhile, the deployment of FC-LSTM to expand the input into...
Diffusion models (DMs) have recently been introduced in image deblurring and exhibited promising performance, particularly in terms of details reconstruction. However, the diffusion model requires a large number of inference iterations to recover the clean image from pure Gaussian noise, which consumes massive computational resources. Moreover, the...
Flow monitoring is widely applied in software-defined networks (SDNs) for monitoring network performance. Especially, detecting heavy hitters can prevent the Distributed Denial of Service (DDoS) attack. However, many existing approaches fall into one of two undesirable extremes: (i) inefficient collection where only accuracy is concerned in the met...
While learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from the retrieved contents based on input queries, traditional LTR models stumble over two principal stumbling blocks leading to subpar performance: (1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of vario...
Blockchain-based vehicular edge computing (VEC) is regarded as a promising computing paradigm that can enhance the computing capabilities of mobile vehicles while ensuring security during task offloading. However, the blockchain consensus for secure task offloading inevitably increases the communication and computation resource consumption. More im...
Wireless train communication network (WLTCN) is an emerging technology for enabling intelligent rail vehicles. It is responsible for providing train control services (TCS), passenger information services (PIS), and train sensing services (TSS). These services within WLTCN have notably different quality of service (QoS) requirements from traditional...
With the rapid developments of Internet of Thing (IoT) technologies, side-channel sound sensing has become a key research area. Unlike traditional sound sensing methods that rely on direct acoustic signal recording, side-channel sound sensing methods utilize various indirect emissions from electronic devices, such as mechanical vibrations or electr...
While
learning to rank
(LTR) has been widely used in web search to prioritize most relevant webpages among the retrieved contents subject to the input queries, the traditional LTR models fail to deliver decent performance due to two main reasons: 1) the lack of well-annotated query-webpage pairs with ranking scores to cover search queries of vari...
Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional pr...
The exponential growth of Massive Open Online Courses (MOOCs) surges the needs of advanced models for personalized Online Education Services (OES). Existing solutions successfully recommend MOOCs courses via deep learning models, they however generate weak “course embeddings” with original profiles, which contain noisy and few enrolled courses. On...
In this chapter, we review state-of-the-art research related to WiFi signal-based user authentication. We first survey mainstream user authentication approaches. Then, we investigate WiFi signal sensing research, and further review the latest WiFi signal-based user authentication work. Finally, we give a summary of existing research.
The development of smart homes has advanced the concept of user authentication to not only protecting user privacy but also facilitating personalized services to users. Along this direction, we propose to integrate user authentication with human-computer interactions between users and smart household appliances through widely-deployed WiFi infrastr...
Existing works utilize WiFi signals to capture a user’s activities for non-intrusive and device-free user authentication, but multi-user authentication remains a challenging task. In this chapter, we present a multi-user authentication system, MultiAuth, which can authenticate multiple users with a single commodity WiFi device. The key idea is to p...
User authentication is an essential mechanism to support various secure accesses. Although recent studies have shown initial success on authenticating users with human activities or gestures using WiFi, they rely on predefined body gestures and perform poorly when meeting undefined body gestures. This chapter aims to enable WiFi-based user authenti...
Transformer has recently gained considerable popu- larity in low-level vision tasks, including image super- resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve im- pressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful rep- resentation ca...
While traditional Learning to Rank (LTR) models use query-webpage pairs to perform regression tasks to predict the ranking scores, they usually fail to capture the structure of interactions between queries and webpages over an extremely large bipartite graph. In recent years, Graph Convolutional Neural Networks (GCNs) have demonstrated their unique...
Artificial intelligence technology has developed rapidly in various fields and has been widely used. Education and teaching are also areas in which artificial intelligence is applied. Research on artificial intelligence-enabled (AI-enabled) education and teaching is emerging, such as educational data mining and intelligent assisted teaching systems...
Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capabilit...
While Learning to Rank (LTR) models on top of transformers have been widely adopted to achieve decent performance, it is still challenging to train the model with sufficient data as only an extremely small number of query-webpage pairs could be annotated versus trillions of webpages available online and billions of web search queries everyday. In t...
Eavesdropping on human voice is one of the most common but harmful threats to personal privacy. Glasses are in direct contact with human face, which could sense facial motions when users speak, so human speech contents could be inferred by sensing the movements of glasses. In this paper, we present a live voice eavesdropping method, RF-Mic, which u...
Snapshot compressive imaging (SCI) cameras compress high-speed videos or hyperspectral images into measurement frames. However, decoding the data frames from measurement frames is compute-intensive. Existing state-of-the-art decoding algorithms suffer from low decoding quality or heavy running time or both, which are not practical for real-time app...
Diffusion models (DMs) have recently been introduced in image deblurring and exhibited promising performance, particularly in terms of details reconstruction. However, the diffusion model requires a large number of inference iterations to recover the clean image from pure Gaussian noise, which consumes massive computational resources. Moreover, the...
In this paper, we present a hybrid X-shaped vision Transformer, named Xformer, which performs notably on image denoising tasks. We explore strengthening the global representation of tokens from different scopes. In detail, we adopt two types of Transformer blocks. The spatial-wise Transformer block performs fine-grained local patches interactions a...
Transformer architectures have exhibited remarkable performance in image super-resolution (SR). Since the quadratic computational complexity of the self-attention (SA) in Transformer, existing methods tend to adopt SA in a local region to reduce overheads. However, the local design restricts the global context exploitation, which is crucial for acc...
In the last decade, MIMO spatial multiplexing and distributed beamforming play a significant role in improving data throughput through cooperative transmission. It has been widely used in wireless communication, especially in 6 G. However, the distributed uplink beamforming is still an open problem in highly dynamic environments. However, the propo...
Recent years have witnessed a mounting increase in the loss of lives and properties resulted from low-efficient and unsafe traffic management. However, traditional Intelligent Transportation System (ITS) cannot meet the strict requirements of transportation management on the efficiency, especially in terms of the latency. The emergence of fog compu...
Task offloading is beneficial to reducing the delay and energy consumption for the prosperity of the applications in Next Generation (NG) wireless networks. However, existing task offloading approaches are inability to exhibit low complexity and stable performance. To this end, a novel Federated Hierarchical Deep Deterministic Policy Gradient (FHDD...
Gait-based user authentication schemes have been widely explored because of their ability of non-invasive sensing and avoid replay attacks. However, existing gait-based user authentication methods are environment-dependent. In this paper, we present an environment-independent gait-based user authentication system,
RFPass
, which can identify diff...
Recently, Transformer architecture has been introduced into image restoration to replace convolution neural network (CNN) with surprising results. Considering the high computational complexity of Transformer with global attention, some methods use the local square window to limit the scope of self-attention. However, these methods lack direct inter...
Both the multiple sources of the available in-the-wild datasets and noisy information of images lead to huge challenges for discriminating subtle distinctions between combinations of regional expressions in facial expression recognition (FER). Although deep learning-based approaches have made substantial progresses in FER in recent years, small-sca...
Recently, Transformer architecture has been introduced into image restoration to replace convolution neural network (CNN) with surprising results. Considering the high computational complexity of Transformer with global attention, some methods use the local square window to limit the scope of self-attention. However, these methods lack direct inter...
Recently, Transformer-based image restoration networks have achieved promising improvements over convolutional neural networks due to parameter-independent global interactions. To lower computational cost, existing works generally limit self-attention computation within non-overlapping windows. However, each group of tokens are always from a dense...
Indoor vehicle localization is an underlying technology for realizing Autonomous Valet Parking (AVP), which demands high accuracy and reliability. However, existing localization technologies, such as GPS, WiFi, Bluetooth, suffer from either low availability or high cost, which are not practical in the real world. In order to put AVP into practice,...
This special issue of the IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING (T-ASE) focuses on how the state-of-the-art achievements and applications in the general area of artificial intelligence in automation for autonomous unmanned systems applications. As Guest Editors, we are very pleased to present the selected 16 articles, whose topics...
The Internet of Things (IoT) is impacting the world’s connectivity landscape. More and more IoT devices are connected, bringing many benefits to our daily lives. However, the influx of IoT devices poses non-trivial challenges for the existing cloud-based computing paradigm. In the cloud-based architecture, a large amount of IoT data is transferred...
Practical and accurate localization systems are important to mobile targets that enable promising services such as navigation and augmented reality. With the proliferation of WiFi, existing WiFi-based localization systems have leveraged RSSI, fingerprints, landmarks, time of arrival, or angle of arrival to locate targets, while no related work pays...