Feng Wu’s research while affiliated with University of Science and Technology of China and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (169)


Learned Image Compression With Efficient Cross-Platform Entropy Coding
  • Article

January 2025

·

2 Reads

IEEE Journal on Emerging and Selected Topics in Circuits and Systems

Runyu Yang

·

Dong Liu

·

Feng Wu

·

Wen Gao

Learned image compression has shown remarkable compression efficiency gain over the traditional image compression solutions, which is partially attributed to the learned entropy models and the adopted entropy coding engine. However, the inference of the entropy models and the sequential nature of the entropy coding both incur high time complexity. Meanwhile, the neural network-based entropy models usually involve floatingpoint computations, which incur inconsistent probability estimation and decoding failure in different platforms. We address these limitations by introducing an efficient and cross-platform entropy coding method, chain coding-based latent compression (CC-LC), into learned image compression. First, we leverage the classic chain coding and carefully design a block-based entropy coding procedure, significantly reducing the number of coding symbols and thus the coding time. Second, since CC-LC is not based on neural networks, we propose a rate estimation network as a surrogate of CC-LC during the end-to-end training. Third, we alternately train the analysis/synthesis networks and the rate estimation network for the rate-distortion optimization, making the learned latent fit CC-LC. Experimental results show that our method achieves much lower time complexity than the other learned image compression methods, ensures cross-platform consistency, and has comparable compression efficiency with BPG. Our code and models are publicly available at https://github.com/Yang-Runyu/CC-LC.


Semantic-Aware Late-Stage Supervised Contrastive Learning for Fine-Grained Action Recognition

January 2025

·

1 Read

IEEE Transactions on Circuits and Systems for Video Technology

Yijun Pan

·

Quan Zhao

·

Yueyi Zhang

·

[...]

·

Feng Wu

Fine-grained action recognition typically faces challenges with lower inter-class variances and higher intra-class variances. Supervised contrastive learning is inherently suitable for this task, as it can decrease intra-class feature distances while increasing inter-class ones. However, directly applying it into fine-grained action recognition encounters two main problems. The first problem stems from the heavy training cost associated with supervised contrastive learning, which requires numerous training epochs, each involving double augmentation views per instance. To address this issue, we propose the late-stage supervised contrastive learning (late-SC) strategy, which effectively reduces the number of training epochs needed for the contrastive learning process. The second problem is that supervised contrastive loss does not explicitly consider the semantic distances between fine-grained actions when adjusting representation distances. This results in less reasonable and efficient adjustments to the representation space. To overcome this limitation, we introduce the semantic-aware temperature adaptation (STA) mechanism, enhancing the suitability of the supervised contrastive loss for fine-grained action recognition. We conduct experiments on several benchmark datasets for fine-grained action recognition, including Epic-Kitchens-55/100, SomethingSomething-V1, and Diving48-V2. The results demonstrate that our proposed method (referred to as LSC-STA) consistently enhances performance across various base feature extractors, without introducing additional inference overhead and incurring only a marginal increase in training expenses.




Toward Decentralized Task Offloading and Resource Allocation in User-Centric MEC

December 2024

·

3 Reads

·

2 Citations

IEEE Transactions on Mobile Computing

In the traditional cellular-based mobile edge computing (MEC), users at the edge of the cell are prone to suffer severe inter-cell interference and signal attenuation, leading to low throughput even transmission interruptions. Such edge effect severely obstructs offloading of tasks to MEC servers. To address this issue, we propose user-centric mobile edge computing (UCMEC), a novel MEC architecture integrating user-centric transmission, which can ensure high throughput and reliable communication for task offloading. Then, we formulate an long-term delay minimization problem by jointly optimizing task offloading, power allocation, and computing resource allocation in UCMEC. To solve the intractable problem, we propose two decentralized joint optimization schemes based on multi-agent deep reinforcement learning (MADRL) and convex optimization, which consider both cooperation and non-cooperation among network nodes. Simulation results demonstrate that the proposed schemes in UCMEC can significantly improve the uplink transmission rate by at least 176.99% and reduce the long-term average total delay by at least 16.36% compared to traditional cellular-based MEC.


Object Segmentation-Assisted Inter Prediction for Versatile Video Coding

December 2024

·

3 Reads

·

6 Citations

IEEE Transactions on Broadcasting

In modern video coding standards, block-based inter prediction is widely adopted, which brings high compression efficiency. However, in natural videos, there are usually multiple moving objects of arbitrary shapes, resulting in complex motion fields that are difficult to represent compactly. This problem has been tackled by more flexible block partitioning methods in the Versatile Video Coding (VVC) standard, but the more flexible partitions require more overhead bits to signal and still cannot be made arbitrarily shaped. To address this limitation, we propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies. With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions without any extra signal. Using the segmentation mask, motion compensation is separately performed for different regions, achieving higher prediction accuracy. The segmentation mask is further used to code the motion vectors of different regions more efficiently. Moreover, the segmentation mask is considered in the joint rate-distortion optimization for motion estimation and partition estimation to derive the motion vector of different regions and partition more accurately. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences, under the Low-delay P, Low-delay B, and Random Access configurations, respectively.



Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

October 2024

·

6 Reads

Generation of plausible but incorrect factual information, often termed hallucination, has attracted significant research interest. Retrieval-augmented language model (RALM) -- which enhances models with up-to-date knowledge -- emerges as a promising method to reduce hallucination. However, existing RALMs may instead exacerbate hallucination when retrieving lengthy contexts. To address this challenge, we propose COFT, a novel \textbf{CO}arse-to-\textbf{F}ine highligh\textbf{T}ing method to focus on different granularity-level key texts, thereby avoiding getting lost in lengthy contexts. Specifically, COFT consists of three components: \textit{recaller}, \textit{scorer}, and \textit{selector}. First, \textit{recaller} applies a knowledge graph to extract potential key entities in a given context. Second, \textit{scorer} measures the importance of each entity by calculating its contextual weight. Finally, \textit{selector} selects high contextual weight entities with a dynamic threshold algorithm and highlights the corresponding paragraphs, sentences, or words in a coarse-to-fine manner. Extensive experiments on the knowledge hallucination benchmark demonstrate the effectiveness of COFT, leading to a superior performance over 30%30\% in the F1 score metric. Moreover, COFT also exhibits remarkable versatility across various long-form tasks, such as reading comprehension and question answering.



Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias

September 2024

·

1 Read

·

5 Citations

IEEE Transactions on Pattern Analysis and Machine Intelligence

Node representation learning on attributed graphs—whose nodes are associated with rich attributes (e.g., texts and protein sequences)—plays a crucial role in many important downstream tasks. To encode the attributes and graph structures simultaneously, recent studies integrate pre-trained models with graph neural networks (GNNs), where pre-trained models serve as node encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs on large-scale graphs suffers from severe scalability issues, many methods propose to train NEs and GNNs separately. Consequently, they do not take feature convolutions in GNNs into consideration in the training phase of NEs, leading to a significant learning bias relative to the joint training. To address this challenge, we propose an efficient label regularization technique, namely L abel D econvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. The inverse mapping leads to an objective function that is equivalent to that by the joint training, while it can effectively incorporate GNNs in the training phase of NEs against the learning bias. More importantly, we show that LD converges to the optimal objective function values by the joint training under mild assumptions. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph Benchmark datasets.


Citations (39)


... Recently, machine learning-based methods have demonstrated great promise in a wide range of protein-related applications, including 3D structure prediction [4,5], mutation effects prediction [6], functionality prediction [7,8,9] and de novo protein design [10,11]. With the development of Large Language Models (LLMs), Protein Language Models (PLMs) pretrained on large-scale protein sequence corpora have succeeded in acquiring powerful protein representations, showcasing outstanding performance across diverse tasks [12,13,14]. ...

Reference:

Multi-Modal CLIP-Informed Protein Editing
Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias
  • Citing Article
  • September 2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

... Thus, multi-modal image fusion is often combined with downstream tasks to achieve superior performance compared to single-modal inputs. Applications include semantic segmentation [20]- [23], object detection [24], [25], object tracking [26], and pedestrian recognition [27]. While existing deep learning-based approaches can successfully generate high-quality fusion images, most approaches only consider evaluation metrics or visual quality, which may not directly benefit high-level vision tasks [28]- [30]. ...

Object Segmentation-Assisted Inter Prediction for Versatile Video Coding
  • Citing Article
  • December 2024

IEEE Transactions on Broadcasting

... This has been considered in (Berthold et al. 2022), where a regression model is used to determine whether using local cuts at a node of the branch and bound tree can lead to a reduction in solution time. The alternative is to rely on reinforcement learning to determine which cuts to add in each node of the branch and bound tree (Tang et al. 2020;Wang et al. 2023). ...

Learning to Cut via Hierarchical Sequence/Set Model for Efficient Mixed-Integer Programming
  • Citing Article
  • July 2024

IEEE Transactions on Pattern Analysis and Machine Intelligence

... Combined with continuously growing user bases and increasing qualityof-service (QoS) requirements, the limitations of cloud com- Yuang puting, which centralizes service operations in the cloud, are gradually becoming apparent [6,7]. Edge computing has emerged as an advanced computing paradigm that enables the deployment of computing and storage resources closer to users, significantly improving real-time performance and reducing data traffic costs [8][9][10][11]. To fully utilize computing resources, edge computing must reallocate hardware and software resources via virtualization technologies to serve multiple users on the same hardware [12]. Nevertheless, compared to centralized cloud networks, edge networks face tougher challenges in terms of user mobility, device heterogeneity, limited resources, and geographically dispersed edge nodes [13][14][15]. ...

Toward Decentralized Task Offloading and Resource Allocation in User-Centric MEC
  • Citing Article
  • December 2024

IEEE Transactions on Mobile Computing

... Recently, Chen et al. [13] used cross-attention mechanisms to provide additional temporal and spatial context to voxelized event inputs, leading to state-of-the-art (SOTA) performance. In Motion-Guided Event-Based Stereo Disparity Estimation Network (EV-MGDispNet) [9], stacked event volumes called MES (borrowed from Conc-Net Fig. 12) are fused with time surfaces (called motion confidence maps here) to first generate "edge-aware" event frames, from which multi-scale features are extracted using deformable attention layers. ...

Event-Based Stereo Depth Estimation by Temporal-Spatial Context Learning
  • Citing Article
  • January 2024

Signal Processing Letters, IEEE

... Combined with continuously growing user bases and increasing qualityof-service (QoS) requirements, the limitations of cloud com- Yuang puting, which centralizes service operations in the cloud, are gradually becoming apparent [6,7]. Edge computing has emerged as an advanced computing paradigm that enables the deployment of computing and storage resources closer to users, significantly improving real-time performance and reducing data traffic costs [8][9][10][11]. To fully utilize computing resources, edge computing must reallocate hardware and software resources via virtualization technologies to serve multiple users on the same hardware [12]. Nevertheless, compared to centralized cloud networks, edge networks face tougher challenges in terms of user mobility, device heterogeneity, limited resources, and geographically dispersed edge nodes [13][14][15]. ...

Energy-Efficient Blockchain-Enabled User-Centric Mobile Edge Computing
  • Citing Article
  • August 2024

IEEE Transactions on Cognitive Communications and Networking

... Specifically, early RGB-T datasets like LITIV [1] provide only limited sequences, while subsequent datasets [2]- [5] expand the scale and complexity of the benchmarks. Similarly, RGB-E tracking evolves from experiments on simulated data [6], [7] to substantial datasets such as FE108 [8] and VisEvent [9]. In the field of RGB-L tracking, pioneering works such as OTB-LANG [10] pave the way for larger datasets like LaSOT [11] and TNL2K [12]. ...

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows
  • Citing Article
  • October 2023

IEEE Transactions on Cybernetics

... Mobile edge computing (MEC) has the potential to achieve low latency services by deploying cloud capabilities to the network edge [1]. In MEC, edge servers (ESs) are equipped with communication, computing, and storage resources to cache a variety of services, such as databases and programs, from remote cloud servers (CSs). ...

Joint Optimization of Base Station Clustering and Service Caching in User-Centric MEC
  • Citing Article
  • January 2023

IEEE Transactions on Mobile Computing

... prediction accuracy can be improved. In [15], an augmented lesion network mapping (A-LNM) was presented to generate functional lesion network (FLN) maps using structural MR images of patients and brain functional connection of healthy people. The FLN maps, which can reflect the influence of brain tumor in brain function, were fed to a 3D ResNet based network [16] to classify the OS time of patients into short, middle and long. ...

Overall Survival Time Prediction of Glioblastoma on Preoperative MRI Using Lesion Network Mapping
  • Citing Chapter
  • October 2023

Lecture Notes in Computer Science