Running Zhao’s research while affiliated with The University of Hong Kong and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (14)


DietGlance: Dietary Monitoring and Personalized Analysis at a Glance with Knowledge-Empowered AI Assistant
  • Preprint

February 2025

·

18 Reads

·

Running Zhao

·

·

[...]

·

Edith C. H. Ngai

Growing awareness of wellness has prompted people to consider whether their dietary patterns align with their health and fitness goals. In response, researchers have introduced various wearable dietary monitoring systems and dietary assessment approaches. However, these solutions are either limited to identifying foods with simple ingredients or insufficient in providing analysis of individual dietary behaviors with domain-specific knowledge. In this paper, we present DietGlance, a system that automatically monitors dietary in daily routines and delivers personalized analysis from knowledge sources. DietGlance first detects ingestive episodes from multimodal inputs using eyeglasses, capturing privacy-preserving meal images of various dishes being consumed. Based on the inferred food items and consumed quantities from these images, DietGlance further provides nutritional analysis and personalized dietary suggestions, empowered by the retrieval augmentation generation module on a reliable nutrition library. A short-term user study (N=33) and a four-week longitudinal study (N=16) demonstrate the usability and effectiveness of DietGlance.


Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection

December 2024

·

13 Reads

Intrusion Detection Systems (IDS) are crucial for safeguarding digital infrastructure. In dynamic network environments, both threat landscapes and normal operational behaviors are constantly changing, resulting in concept drift. While continuous learning mitigates the adverse effects of concept drift, insufficient attention to drift patterns and excessive preservation of outdated knowledge can still hinder the IDS's adaptability. In this paper, we propose SSF (Strategic Selection and Forgetting), a novel continual learning method for IDS, providing continuous model updates with a constantly refreshed memory buffer. Our approach features a strategic sample selection algorithm to select representative new samples and a strategic forgetting mechanism to drop outdated samples. The proposed strategic sample selection algorithm prioritizes new samples that cause the `drifted' pattern, enabling the model to better understand the evolving landscape. Additionally, we introduce strategic forgetting upon detecting significant drift by discarding outdated samples to free up memory, allowing the incorporation of more recent data. SSF captures evolving patterns effectively and ensures the model is aligned with the change of data patterns, significantly enhancing the IDS's adaptability to concept drift. The state-of-the-art performance of SSF on NSL-KDD and UNSW-NB15 datasets demonstrates its superior adaptability to concept drift for network intrusion detection.


USpeech: Ultrasound-Enhanced Speech with Minimal Human Effort via Cross-Modal Synthesis

October 2024

·

10 Reads

Speech enhancement is crucial in human-computer interaction, especially for ubiquitous devices. Ultrasound-based speech enhancement has emerged as an attractive choice because of its superior ubiquity and performance. However, inevitable interference from unexpected and unintended sources during audio-ultrasound data acquisition makes existing solutions rely heavily on human effort for data collection and processing. This leads to significant data scarcity that limits the full potential of ultrasound-based speech enhancement. To address this, we propose USpeech, a cross-modal ultrasound synthesis framework for speech enhancement with minimal human effort. At its core is a two-stage framework that establishes correspondence between visual and ultrasonic modalities by leveraging audible audio as a bridge. This approach overcomes challenges from the lack of paired video-ultrasound datasets and the inherent heterogeneity between video and ultrasound data. Our framework incorporates contrastive video-audio pre-training to project modalities into a shared semantic space and employs an audio-ultrasound encoder-decoder for ultrasound synthesis. We then present a speech enhancement network that enhances speech in the time-frequency domain and recovers the clean speech waveform via a neural vocoder. Comprehensive experiments show USpeech achieves remarkable performance using synthetic ultrasound data comparable to physical data, significantly outperforming state-of-the-art ultrasound-based speech enhancement baselines. USpeech is open-sourced at https://github.com/aiot-lab/USpeech/.



AOC-IDS: Autonomous Online Framework with Contrastive Learning for Intrusion Detection
  • Conference Paper
  • Full-text available

May 2024

·

26 Reads

·

3 Citations

The rapid expansion of the Internet of Things (IoT)has raised increasing concern about targeted cyber attacks.Previous research primarily focused on static Intrusion DetectionSystems (IDSs), which employ offline training to safeguard IoTsystems. However, such static IDSs struggle with real-worldscenarios where IoT system behaviors and attack strategies canundergo rapid evolution, necessitating dynamic and adaptableIDSs. In response to this challenge, we propose AOC-IDS, anovel online IDS that features an autonomous anomaly detectionmodule (ADM) and a labor-free online framework for contin-ual adaptation. In order to enhance data comprehension, theADM employs an Autoencoder (AE) with a tailored ClusterRepelling Contrastive (CRC) loss function to generate distinctiverepresentation from limited or incrementally incoming data inthe online setting. Moreover, to reduce the burden of manuallabeling, our online framework leverages pseudo-labels automat-ically generated from the decision-making process in the ADMto facilitate periodic updates of the ADM. The elimination ofhuman intervention for labeling and decision-making boosts thesystem’s compatibility and adaptability in the online setting toremain synchronized with dynamic environments. Experimentalvalidation using the NSL-KDD and UNSW-NB15 datasets demon-strates the superior performance and adaptability of AOC-IDS,surpassing the state-of-the-art solutions.

Download

HealthPrism : A Visual Analytics System for Exploring Children's Physical and Mental Health Profiles with Multimodal Data

October 2023

·

27 Reads

·

3 Citations

IEEE Transactions on Visualization and Computer Graphics

The correlation between children's personal and family characteristics (e.g., demographics and socioeconomic status) and their physical and mental health status has been extensively studied across various research domains, such as public health, medicine, and data science. Such studies can provide insights into the underlying factors affecting children's health and aid in the development of targeted interventions to improve their health outcomes. However, with the availability of multiple data sources, including context data (i.e., the background information of children) and motion data (i.e., sensor data measuring activities of children), new challenges have arisen due to the large-scale, heterogeneous, and multimodal nature of the data. Existing statistical hypothesis-based and learning model-based approaches have been inadequate for comprehensively analyzing the complex correlation between multimodal features and multi-dimensional health outcomes due to the limited information revealed. In this work, we first distill a set of design requirements from multiple levels through conducting a literature review and iteratively interviewing 11 experts from multiple domains (e.g., public health and medicine). Then, we propose HealthPrism , an interactive visual and analytics system for assisting researchers in exploring the importance and influence of various context and motion features on children's health status from multi-levelperspectives. Within HealthPrism , a multimodal learning model with a gate mechanism is proposed for health profiling and cross-modality feature importance comparison. A set of visualization components is designed for experts to explore and understand multimodal data freely. We demonstrate the effectiveness and usability of HealthPrism through quantitative evaluation of the model performance, case studies, and expert interviews in associated domains.


Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

September 2023

·

27 Reads

·

7 Citations

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies

Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping. However, considering the practicality in real scenarios, latency and recognizable vocabulary size are two critical factors that cannot be overlooked. In this paper, we propose Radio2Text, the first mmWave-based system for streaming automatic speech recognition (ASR) with a vocabulary size exceeding 13,000 words. Radio2Text is based on a tailored streaming Transformer that is capable of effectively learning representations of speech-related features, paving the way for streaming ASR with a large vocabulary. To alleviate the deficiency of streaming networks unable to access entire future inputs, we propose the Guidance Initialization that facilitates the transfer of feature knowledge related to the global context from the non-streaming Transformer to the tailored streaming Transformer through weight inheritance. Further, we propose a cross-modal structure based on knowledge distillation (KD), named cross-modal KD, to mitigate the negative effect of low quality mmWave signals on recognition performance. In the cross-modal KD, the audio streaming Transformer provides feature and response guidance that inherit fruitful and accurate speech information to supervise the training of the tailored radio streaming Transformer. The experimental results show that our Radio2Text can achieve a character error rate of 5.7% and a word error rate of 9.4% for the recognition of a vocabulary consisting of over 13,000 words.


Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning

August 2023

·

8 Reads

Federated learning (FL) inevitably confronts the challenge of system heterogeneity in practical scenarios. To enhance the capabilities of most model-homogeneous FL methods in handling system heterogeneity, we propose a training scheme that can extend their capabilities to cope with this challenge. In this paper, we commence our study with a detailed exploration of homogeneous and heterogeneous FL settings and discover three key observations: (1) a positive correlation between client performance and layer similarities, (2) higher similarities in the shallow layers in contrast to the deep layers, and (3) the smoother gradients distributions indicate the higher layer similarities. Building upon these observations, we propose InCo Aggregation that leverags internal cross-layer gradients, a mixture of gradients from shallow and deep layers within a server model, to augment the similarity in the deep layers without requiring additional communication between clients. Furthermore, our methods can be tailored to accommodate model-homogeneous FL methods such as FedAvg, FedProx, FedNova, Scaffold, and MOON, to expand their capabilities to handle the system heterogeneity. Copious experimental results validate the effectiveness of InCo Aggregation, spotlighting internal cross-layer gradients as a promising avenue to enhance the performance in heterogenous FL.


Fig. 3. The system overview of Radio2Text. Dotted lines indicate training only, and solid lines represent training and inference. Snowflake represents frozen network without parameter updating, and flame represents a trainable network.
Fig. 4. The architecture of the tailored streaming Transformer.
Fig. 5. Matching strategy of Guidance Initialization.
Fig. 7. The experimental scenarios.
Fig. 8. Mel-Spectrograms of speech and mmWave signals, and the corresponding recognition results of microphone-based streaming ASR method and mmWave-based non-streaming ASR methods.

+3

Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals

August 2023

·

102 Reads

Millimeter wave (mmWave) based speech recognition provides more possibility for audio-related applications, such as conference speech transcription and eavesdropping. However, considering the practicality in real scenarios, latency and recognizable vocabulary size are two critical factors that cannot be overlooked. In this paper, we propose Radio2Text, the first mmWave-based system for streaming automatic speech recognition (ASR) with a vocabulary size exceeding 13,000 words. Radio2Text is based on a tailored streaming Transformer that is capable of effectively learning representations of speech-related features, paving the way for streaming ASR with a large vocabulary. To alleviate the deficiency of streaming networks unable to access entire future inputs, we propose the Guidance Initialization that facilitates the transfer of feature knowledge related to the global context from the non-streaming Transformer to the tailored streaming Transformer through weight inheritance. Further, we propose a cross-modal structure based on knowledge distillation (KD), named cross-modal KD, to mitigate the negative effect of low quality mmWave signals on recognition performance. In the cross-modal KD, the audio streaming Transformer provides feature and response guidance that inherit fruitful and accurate speech information to supervise the training of the tailored radio streaming Transformer. The experimental results show that our Radio2Text can achieve a character error rate of 5.7% and a word error rate of 9.4% for the recognition of a vocabulary consisting of over 13,000 words.


HealthPrism: A Visual Analytics System for Exploring Children's Physical and Mental Health Profiles with Multimodal Data

July 2023

·

47 Reads

The correlation between children's personal and family characteristics (e.g., demographics and socioeconomic status) and their physical and mental health status has been extensively studied across various research domains, such as public health, medicine, and data science. Such studies can provide insights into the underlying factors affecting children's health and aid in the development of targeted interventions to improve their health outcomes. However, with the availability of multiple data sources, including context data (i.e., the background information of children) and motion data (i.e., sensor data measuring activities of children), new challenges have arisen due to the large-scale, heterogeneous, and multimodal nature of the data. Existing statistical hypothesis-based and learning model-based approaches have been inadequate for comprehensively analyzing the complex correlation between multimodal features and multi-dimensional health outcomes due to the limited information revealed. In this work, we first distill a set of design requirements from multiple levels through conducting a literature review and iteratively interviewing 11 experts from multiple domains (e.g., public health and medicine). Then, we propose HealthPrism, an interactive visual and analytics system for assisting researchers in exploring the importance and influence of various context and motion features on children's health status from multi-level perspectives. Within HealthPrism, a multimodal learning model with a gate mechanism is proposed for health profiling and cross-modality feature importance comparison. A set of visualization components is designed for experts to explore and understand multimodal data freely. We demonstrate the effectiveness and usability of HealthPrism through quantitative evaluation of the model performance, case studies, and expert interviews in associated domains.


Citations (5)


... In dynamic network environments, both the threat landscapes and normal operational behaviors are constantly changing [8]. Attackers always look for new vulnerabilities, leading to new attack types. ...

Reference:

Continual Learning with Strategic Selection and Forgetting for Network Intrusion Detection
AOC-IDS: Autonomous Online Framework with Contrastive Learning for Intrusion Detection

... Research indicates that users' motions during VR interactions leak sensitive data [3,11,52]. Attackers exploit these motions to identify activities and keystroke inputs, potentially accessing critical information like banking credentials, passwords, and private messages [6,23,39,40,63,64]. Body movements reveal activity types, while hand movements, particularly during virtual keyboard use, expose keystroke details [7,8,51]. ...

Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
  • Citing Article
  • September 2023

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies

... InfoNCE [16] is the most commonly used loss function in contrastive learning, measuring the similarity between two views by comparing their representations [37]. Contrastive learning has achieved state-of-the-art results in various domains, including computer vision [38], natural language processing [39], and mobile computing [40]. ...

Human Activity Recognition From Motion and Acoustic Sensors Using Contrastive Learning
  • Citing Conference Paper
  • June 2023

... Here in, the development of smart devices such as wearable sensors can bring intelligence to healthcare systems that can be useful in a variety of felds such as infant safety and health-tracking, care for the elderly, healthcare monitoring [10], military and law enforcement, sports, and preventive medicine [11]. Even these days, mental health disorders are being depicted in not only adults [12] but in children [13] and adolescents who have been taken into challenge to resolve using machine learning and deep learning algorithms. Wearable sensor-based technologies for infants have also been introduced to warn of lifethreatening situations [14]. ...

A Data-Driven Context-Aware Health Inference System for Children during School Closures
  • Citing Article
  • March 2023

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies

... Intrusion Detection Systems (IDSs) serve as the primary defenders of digital infrastructures and interconnected systems, such as the Internet of Things, playing a critical role in monitoring network traffic to identify and alert on unauthorized or malicious activities [1]- [6]. They act as early warning systems, providing vital defense against the constant threat of cyberattacks and ensuring system integrity [7]. ...

Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals