Shigang Chen’s research while affiliated with University of Florida and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (291)


XTSFormer: Cross-Temporal-Scale Transformer for Irregular-Time Event Prediction in Clinical Applications
  • Article

April 2025

·

1 Read

Proceedings of the AAAI Conference on Artificial Intelligence

·

Zelin Xu

·

·

[...]

·

Adverse clinical events related to unsafe care are among the top ten causes of death in the U.S. Accurate modeling and prediction of clinical events from electronic health records (EHRs) play a crucial role in patient safety enhancement. An example is modeling de facto care pathways that characterize common step-by-step plans for treatment or care. However, clinical event data pose several unique challenges, including the irregularity of time intervals between consecutive events, the existence of cycles, periodicity, multi-scale event interactions, and the high computational costs associated with long event sequences. Existing neural temporal point processes (TPPs) methods do not effectively capture the multi-scale nature of event interactions, which is common in many real-world clinical applications. To address these issues, we propose the cross-temporal-scale transformer (XTSFormer), specifically designed for irregularly timed event data. Our model consists of two vital components: a novel Feature-based Cycle-aware Time Positional Encoding (FCPE) that adeptly captures the cyclical nature of time, and a hierarchical multi-scale temporal attention mechanism, where different temporal scales are determined by a bottom-up clustering approach. Extensive experiments on several real-world EHR datasets show that our XTSFormer outperforms multiple baseline methods.



Multi-Information Sampling and Mixed Estimation for Multi-Task Spread Measurement With Supercube

January 2025

Spread measurement is an essential problem in high-speed networks with broad applications, such as anomaly detection and network telemetry. Network administrators typically need to concurrently monitor the spreads of different types of flows to detect various abnormal behaviors. Although many studies have designed memory-efficient structures, such as sketches, for a specific spread measurement task, they have to deploy multiple sketches to support multiple spread measurement tasks, resulting in significant memory and computational overhead. This paper proposes an efficient multi-task information compression method to simultaneously estimate differently defined flow spreads. We introduce multi-information sampling to capture multi-task spread information from each arriving packet by one pass and store it in off-chip memory, thereby conserving on-chip memory and computational resources. Additionally, we carefully designed a one-access multi-dimensional structure called Supercube to preserve as much spread information as possible while catching up with the line rate, thereby enhancing estimation accuracy. We implement our estimator in hardware using NetFPGA. Experiments based on real Internet traces show that our method reduces the ARE by 83.36% for spread estimation compared to rSkt (SOTA) with 300KB of on-chip memory and increases update throughput by 251.252-fold compared to Supersketch. All source codes are available at https://github.com/Hanwen808/MIME.


Adaptive Denoising for Network Traffic Measurement

January 2025

IEEE Transactions on Network Science and Engineering

Traffic measurement in high-speed networks is crucial for applications like traffic engineering, network management, and surveillance. Restricted by the limitations of on-chip memory resources and the speed of packet processing, most existing solutions use compact data structures, namely sketches, to facilitate line-speed measurement. Nevertheless, these sketches, due to their shared record units (bits/counters) among flows, inevitably introduce noise into the measurement result of each flow. While conventional average denoising strategies can mitigate noise from raw estimates, they fall short of providing sufficient accuracy for medium-sized flows, primarily due to the uneven distribution of noise. To complement prior work, we propose two algorithms, ADN and mADN, which can perform denoising by considering the sizes of shared flows. ADN employs an optimization algorithm to model interconnections among flows, thereby reconstructing noise propagation and accurately restoring their sizes. Meanwhile, mADN retains the benefits of ADN yet excels in being more memory-efficient and precise. We apply our estimators to five essential tasks: per-flow size estimation, heavy hitter detection, heavy change detection, distribution estimation, and entropy estimation. Experimental results based on real Internet traffic traces show that our measurement solutions surpass existing state-of-the-art approaches, reducing the mean absolute error by approximately an order of magnitude under the same on-chip memory constraints. The source codes of ADN are available on GitHub [1].


Scout Sketch+: Finding Both Promising and Damping Items Simultaneously in Data Streams

December 2024

·

9 Reads

IEEE/ACM Transactions on Networking

Data stream processing holds great potential value in lots of practical application scenarios. This paper studies two new but important patterns for items in data streams, called promising and damping items. The promising items mean that the frequencies of an item in multiple continuous time windows show an upward trend overall, while a slight decrease in some of these windows is allowed. In contrast to promising items exhibiting an increasing trend, the definition of damping items indicates a decreasing trend. Many applications can benefit from the property of promising or damping items, e.g., monitoring latent attacks in computer networks, pre-adjusting bandwidth allocation in communication channels, detecting potential hot events/news, or finding topics that gradually lose momentum in social networks. We first introduce how to accurately find promising items in data streams in real-time under limited memory space. To this end, we propose a novel structure named Scout Sketch, which consists of Filter and Finder. Filter is devised based on the Bloom filter to eliminate the ungratified items with less memory overload; Finder records some necessary information about the potential items and detects the promising items at the end of each time window, where we propose some tailor-made detection operations. We then enhance Scout Sketch (called Scout Sketch + ) to adaptively detect both types of promising and damping items simultaneously. Finally, we conducted extensive experiments on four real-world datasets, which show that the F1 Score and throughput of Scout Sketch( + ) are about 2.02 and 5.61 times that of the compared solutions. All source codes are available at Github (https://github.com/Aoohhh/ScoutSketch).


A Fast AI Surrogate for Coastal Ocean Circulation Models

October 2024

·

40 Reads

Nearly 900 million people live in low-lying coastal zones around the world and bear the brunt of impacts from more frequent and severe hurricanes and storm surges. Oceanographers simulate ocean current circulation along the coasts to develop early warning systems that save lives and prevent loss and damage to property from coastal hazards. Traditionally, such simulations are conducted using coastal ocean circulation models such as the Regional Ocean Modeling System (ROMS), which usually runs on an HPC cluster with multiple CPU cores. However, the process is time-consuming and energy expensive. While coarse-grained ROMS simulations offer faster alternatives, they sacrifice detail and accuracy, particularly in complex coastal environments. Recent advances in deep learning and GPU architecture have enabled the development of faster AI (neural network) surrogates. This paper introduces an AI surrogate based on a 4D Swin Transformer to simulate coastal tidal wave propagation in an estuary for both hindcast and forecast (up to 12 days). Our approach not only accelerates simulations but also incorporates a physics-based constraint to detect and correct inaccurate results, ensuring reliability while minimizing manual intervention. We develop a fully GPU-accelerated workflow, optimizing the model training and inference pipeline on NVIDIA DGX-2 A100 GPUs. Our experiments demonstrate that our AI surrogate reduces the time cost of 12-day forecasting of traditional ROMS simulations from 9,908 seconds (on 512 CPU cores) to 22 seconds (on one A100 GPU), achieving over 450×\times speedup while maintaining high-quality simulation results. This work contributes to oceanographic modeling by offering a fast, accurate, and physically consistent alternative to traditional simulation models, particularly for real-time forecasting in rapid disaster response.


Multi-View Neural Differential Equations for Continuous-Time Stream Data in Long-Term Traffic Forecasting

August 2024

·

5 Reads

Long-term traffic flow forecasting plays a crucial role in intelligent transportation as it allows traffic managers to adjust their decisions in advance. However, the problem is challenging due to spatio-temporal correlations and complex dynamic patterns in continuous-time stream data. Neural Differential Equations (NDEs) are among the state-of-the-art methods for learning continuous-time traffic dynamics. However, the traditional NDE models face issues in long-term traffic forecasting due to failures in capturing delayed traffic patterns, dynamic edge (location-to-location correlation) patterns, and abrupt trend patterns. To fill this gap, we propose a new NDE architecture called Multi-View Neural Differential Equations. Our model captures current states, delayed states, and trends in different state variables (views) by learning latent multiple representations within Neural Differential Equations. Extensive experiments conducted on several real-world traffic datasets demonstrate that our proposed method outperforms the state-of-the-art and achieves superior prediction accuracy for long-term forecasting and robustness with noisy or missing inputs.


Spatio-Temporal Partial Sensing Forecast for Long-term Traffic

August 2024

·

5 Reads

Traffic forecasting uses recent measurements by sensors installed at chosen locations to forecast the future road traffic. Existing work either assumes all locations are equipped with sensors or focuses on short-term forecast. This paper studies partial sensing traffic forecast of long-term traffic, assuming sensors only at some locations. The study is important in lowering the infrastructure investment cost in traffic management since deploying sensors at all locations could incur prohibitively high cost. However, the problem is challenging due to the unknown distribution at unsensed locations, the intricate spatio-temporal correlation in long-term forecasting, as well as noise in data and irregularities in traffic patterns (e.g., road closure). We propose a Spatio-Temporal Partial Sensing (STPS) forecast model for long-term traffic prediction, with several novel contributions, including a rank-based embedding technique to capture irregularities and overcome noise, a spatial transfer matrix to overcome the spatial distribution shift from permanently sensed locations to unsensed locations, and a multi-step training process that utilizes all available data to successively refine the model parameters for better accuracy. Extensive experiments on several real-world traffic datasets demonstrate that STPS outperforms the state-of-the-art and achieves superior accuracy in partial sensing long-term forecasting.


Guest Editorial Introduction to the Special Section on Next-Generation Traffic Measurement With Network-Wide Perspective and Artificial Intelligence

May 2024

·

13 Reads

·

1 Citation

IEEE Transactions on Network Science and Engineering

Traffic measurement is the bedrock of the next-generation network systems. While it plays a crucial role in bringing fundamental data and support to core network functions, it also confronts the challenge of meeting the diverse demands of new network traffic characteristics and emerging applications. The network-wide measurement has received more and more attention. Given that big network data is distributed in nature, it is essential to aggregate the views of multiple measurement points to build a network-wide perception of traffic. Another latest trend involves artificial intelligence technologies that allow seamless aggregation of multifaceted network traffic data to advance traffic data analysis and support related applications. Nonetheless, a gap remains in existing methodologies, which often fail to fully address the diverse demands of network traffic measurement in this evolving landscape.


Spatial-Logic-Aware Weakly Supervised Learning for Flood Mapping on Earth Imagery

March 2024

·

29 Reads

·

1 Citation

Proceedings of the AAAI Conference on Artificial Intelligence

Flood mapping on Earth imagery is crucial for disaster management, but its efficacy is hampered by the lack of high-quality training labels. Given high-resolution Earth imagery with coarse and noisy training labels, a base deep neural network model, and a spatial knowledge base with label constraints, our problem is to infer the true high-resolution labels while training neural network parameters. Traditional methods are largely based on specific physical properties and thus fall short of capturing the rich domain constraints expressed by symbolic logic. Neural-symbolic models can capture rich domain knowledge, but existing methods do not address the unique spatial challenges inherent in flood mapping on high-resolution imagery. To fill this gap, we propose a spatial-logic-aware weakly supervised learning framework. Our framework integrates symbolic spatial logic inference into probabilistic learning in a weakly supervised setting. To reduce the time costs of logic inference on vast high-resolution pixels, we propose a multi-resolution spatial reasoning algorithm to infer true labels while training neural network parameters. Evaluations of real-world flood datasets show that our model outperforms several baselines in prediction accuracy. The code is available at https://github.com/spatialdatasciencegroup/SLWSL.


Citations (61)


... Lately, transformer models have demonstrated their effectiveness in handling irregular grids (Lee and Oh 2024), point clouds (Zhao et al. 2021) as well as geospatial datasets He et al. 2023;Jia et al. 2024;Unlu 2024). The attention mechanism selectively focuses on elements in a sequence and aggregates the information according to relation-based attention scores. ...

Reference:

GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular Data
A Hierarchical Spatial Transformer for Massive Point Samples in Continuous Space
  • Citing Article
  • December 2023

Advances in Neural Information Processing Systems

... Organizations recognize the influential role of AI technologies in supporting network traffic analysis methods to improve overall network performance and detect malicious activity, which improves both security and efficiency. Effective analysis of AI algorithms such as machine learning and deep learning enables proactive adjustments to network configuration, and reinforcement learning techniques can improve routing and resource allocation and enhance overall network efficiency [2], [3]. AI plays a pivotal role in improving network traffic analysis, which significantly improves both security and efficiency. ...

Guest Editorial Introduction to the Special Section on Next-Generation Traffic Measurement With Network-Wide Perspective and Artificial Intelligence
  • Citing Article
  • May 2024

IEEE Transactions on Network Science and Engineering

... Hublet et al. discussed using metric first-order temporal logic (MFOTL) for real-time policy enforcement, which allows system behavior to be monitored and modified in real time [62]. As a result of its architecture, the system is reminiscent of the softwaredefined middleboxes proposed by Odegbile et al., which facilitate automated policy enforcement in non-SDN networks through load balancing and traffic minimization [63]. As an added bonus, combining static and dynamic learning is like encapsulating and forwarding enforcement rules to service functions in SDN environments [64]. ...

Policy Enforcement in Traditional Non-SDN Networks
  • Citing Article
  • February 2023

Journal of Parallel and Distributed Computing

... Existing research has yet to offer a universal detection framework capable of coordinating multiple algorithms to manage a broad spectrum of attacks effectively. Universal sketches such as UnivMon [1] and Light-weight Universal Sketch (LUS) [7] take advantage of dedicated measurement modules to collect flow metrics that can be utilized for various other applications. Nevertheless, this approach is orthogonal to the de-tection of a range of attack patterns of volumetric DDoS attacks, as it focuses only on traffic measurement tasks. ...

Universal and Accurate Sketch for Estimating Heavy Hitters and Moments in Data Streams
  • Citing Article
  • October 2023

IEEE/ACM Transactions on Networking

... Especially time series models, such as Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) networks, have been widely applied in capturing crowd dynamics and predicting the spatio-temporal distribution of leisure activities. For example, LSTM has been widely used to predict user locations using social media data [31] and to forecast traffic flow in urban environments [32]. Gated Recurrent Unit (GRU), a variant of RNN, addresses the limitations of traditional RNNs in processing long-sequence data. ...

Enabling smart curb management with spatiotemporal deep learning
  • Citing Article
  • Full-text available
  • January 2023

Computers Environment and Urban Systems

... For instance, the works [30] and [31] propose algorithms for adjusting the sampling rate on each switch in response to variations in traffic rate. Other works in this category focus on providing better traffic estimates from the sampled packets [4], [32] or reducing the communication overhead [33]. These works, however, do not consider sampling coordination among network switches, and as such, are less efficient compared to coordinated sampling solutions. ...

Self-Adaptive Sampling Based Per-Flow Traffic Measurement
  • Citing Article
  • January 2022

IEEE/ACM Transactions on Networking

... Huang et al. [24] propose an efficient spread estimator with nonduplicate sampling to support online spread queries for any flow. Ma et al. [25] propose a virtual filter with non-duplicate sampling in estimating flow-spread, which increases throughput, reduces memory overhead, and adaptively adjusts the sampling probability according to the traffic dynamics. However, the sampling-based approaches illustrate low accuracy and high memory usage. ...

Virtual Filter for Non-Duplicate Sampling With Network Applications
  • Citing Article
  • December 2022

IEEE/ACM Transactions on Networking

... We have conducted extensive experiments to compare our proposed three sketches with vHLL [5], rSkt1 [14], rSkt2 [14], AROMA [15], and AROMA+ (online query version of AROMA). Among them, rSkt1 and AROMA+ are the recent solutions that can support online estimation of per-flow spread. ...

Randomized Error Removal for Online Spread Estimation in High-Speed Networks
  • Citing Article
  • January 2022

IEEE/ACM Transactions on Networking

... Node cardinality estimation is a crucial problem in several domains, including RFID systems, the Internet of Things, and data networks. In [58]- [61], the authors discuss the problem of node cardinality estimation in packet-switching networks for network traffic monitoring, popularity tracking on social media, and network security. Online cardinality estimation schemes are proposed in [60], [61] for automatically adapting to different stream sizes in data networks. ...

Online Cardinality Estimation by Self-morphing Bitmaps
  • Citing Conference Paper
  • May 2022

... Besides, spammers generally send a large number of spam emails to distinct email accounts by the controlled email accounts, where spammers are called super sources. Despite many efforts in super node identification over recent decades [11][12][13], super nodes still attract considerable attention in both academic and industrial fields. ...

Short-Term Memory Sampling for Spread Measurement in High-Speed Networks
  • Citing Conference Paper
  • May 2022