Tiesong Zhao’s research while affiliated with Fuzhou University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (143)


Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference
  • Preprint

December 2024

·

5 Reads

Keke Zhang

·

·

Tiesong Zhao

·

Zhou Wang

Image Quality Assessment (IQA) with references plays an important role in optimizing and evaluating computer vision tasks. Traditional methods assume that all pixels of the reference and test images are fully aligned. Such Aligned-Reference IQA (AR-IQA) approaches fail to address many real-world problems with various geometric deformations between the two images. Although significant effort has been made to attack Geometrically-Disparate-Reference IQA (GDR-IQA) problem, it has been addressed in a task-dependent fashion, for example, by dedicated designs for image super-resolution and retargeting, or by assuming the geometric distortions to be small that can be countered by translation-robust filters or by explicit image registrations. Here we rethink this problem and propose a unified, non-training-based Deep Structural Similarity (DeepSSIM) approach to address the above problems in a single framework, which assesses structural similarity of deep features in a simple but efficient way and uses an attention calibration strategy to alleviate attention deviation. The proposed method, without application-specific design, achieves state-of-the-art performance on AR-IQA datasets and meanwhile shows strong robustness to various GDR-IQA test cases. Interestingly, our test also shows the effectiveness of DeepSSIM as an optimization tool for training image super-resolution, enhancement and restoration, implying an even wider generalizability. \footnote{Source code will be made public after the review is completed.


Operation process of our proposed SmartEx system (human faces are processed for privacy protection)
YOLO-based object recognition network, and its DECM/PCFM modules
RepVGG architecture and its convolution block structure
Real examination room surveillance videos from various environments (typical scenes)
Reference images for the examinee behavior dataset: (a) turning head, (b) writing, (c) chatting, (d) invigilator, (e) raising hand, (f) standing, (g) reading, (h) looking up (human faces are processed for privacy protection)

+6

Smart proctoring with automated anomaly detection
  • Article
  • Publisher preview available

December 2024

·

14 Reads

Education and Information Technologies

With the emergence of Artificial Intelligence (AI), smart education has become an attractive topic. In a smart education system, automated classrooms and examination rooms could help reduce the economic cost of teaching, and thus improve teaching efficiency. However, existing AI algorithms suffer from low surveillance accuracies and high computational costs, which affect their practicability in real-word scenarios. To address this issue, we propose an AI-driven anomaly detection framework for smart proctoring. The proposed method, namely, Smart Exam (SmartEx), consists of two artificial neural networks: an object recognition network to locate invigilators and examinees, and a behavior analytics network to detect anomalies of examinees during the exam. To validate the performance of our method, we construct a dataset by annotating 6,429 invigilator instances, 34,074 examinee instances and 8 types of behaviors with 267,888 instances. Comprehensive experiments on the dataset show the superiority of our SmartEx method, with a superior proctoring performance and a relatively low computational cost. Besides, we also examine the pre-trained SmartEx in an examination room in our university, which shows high robustness to identify diversified anomalies in real-world scenarios.

View access options

Target-Aware Camera Placement for Large-Scale Video Surveillance

December 2024

·

8 Reads

IEEE Transactions on Circuits and Systems for Video Technology

Hongxin Wu

·

·

Chen Guo

·

[...]

·

Chang Wen Chen

In large-scale surveillance of urban or rural areas, an effective placement of cameras is critical in maximizing surveillance coverage or minimizing economic cost of cameras. Existing Surveillance Camera Placement (SCP) methods generally focus on physical coverage of surveillance by implicitly assuming uniform distribution of interested targets or objects across all blocks, which is, however, uncommon in real-world scenarios. In this paper, we are the first to propose a target-aware SCP (tSCP) model, which prioritizes optimizing the task based on uneven target densities, allowing cameras to preferentially cover blocks with more interested targets. First, we define target density as the likelihood of interested targets occurring in a block, which is positively correlated with the importance of the block. Second, we combine aerial imagery with a lightweight object detection network to identify target density. Third, we formulate tSCP as an optimization problem to maximize target coverage in surveillance area, and solve this problem with a target-guided genetic algorithm. Our method optimizes the rational and economical utilization of cameras in large-scale video survillance. Compared with the state-of-the-art methods, our tSCP achieves the highest target coverage with a fixed number of cameras (8.31%-14.81% more than its peers), or utilizes the minimum number of cameras to achieve a preset target coverage. Codes are available at https://github.com/wu-hongxin/tSCP_main .





Enlarged Motion-Aware and Frequency-Aware Network for Compressed Video Artifact Reduction

October 2024

·

7 Reads

·

7 Citations

IEEE Transactions on Circuits and Systems for Video Technology

Making full use of spatial-temporal information is the key factor for removing compressed video artifacts. Recently, many deep learning-based compression artifact reduction methods have emerged. Among them, a series of methods based on deformable convolution have shown excellent capabilities in spatio-temporal feature extraction. However, local deformable offset prediction and pixel-wise inter-frame feature alignment in the unidirectional form limit the full utilization of temporal features in the existing method. Additionally, compressed video shows inconsistent degrees of distortion on different frequency components, and their restoration difficulty is also nonuniform. For the above problems presented by existing methods, we propose an enlarged motion-aware and frequency-aware network (EMAFA) to further extract spatio-temporal information and enhance information of different frequency components. To perceive different degrees of motion artifacts between compressed frames as accurately as possible, we design a bidirectional dense propagation pattern with pixel-wise and patch-wise deformable convolution (PIPA) module in the feature domain. In addition, we propose a multi-scale atrous deformable alignment (MSADA) module to enrich spatio-temporal features in image domain. Moreover, we design a multi-direction frequency enhancement (MDFE) module with multiple direction convolution to enhance the features of different frequency components. The experimental results show that the proposed method performs better than the state-of-the-art methods in both objective evaluation and visual perception experience. Supplementary experiments for Internet Streamed Video with hybrid-distortion demonstrate that our method also exhibits considerable generalizability for quality enhancement.


Facing Differences of Similarity: Intra- and Inter-Correlation Unsupervised Learning for Chest X-Ray Anomaly Detection

September 2024

·

4 Reads

IEEE Transactions on Medical Imaging

Anomaly detection can significantly aid doctors in interpreting chest X-rays. The commonly used strategy involves utilizing the pre-trained network to extract features from normal data to establish feature representations. However, when a pre-trained network is applied to more detailed X-rays, differences of similarity can limit the robustness of these feature representations. Therefore, we propose an intra- and inter-correlation learning framework for chest X-ray anomaly detection. Firstly, to better leverage the similar anatomical structure information in chest X-rays, we introduce the Anatomical-Feature Pyramid Fusion Module for feature fusion. This module aims to obtain fusion features with both local details and global contextual information. These fusion features are initialized by a trainable feature mapper and stored in a feature bank to serve as centers for learning. Furthermore, to Facing Differences of Similarity (FDS) introduced by the pre-trained network, we propose an intra- and inter-correlation learning strategy: (1) We use intra-correlation learning to establish intra-correlation between mapped features of individual images and semantic centers, thereby initially discovering lesions; (2) We employ inter-correlation learning to establish inter-correlation between mapped features of different images, further mitigating the differences of similarity introduced by the pre-trained network, and achieving effective detection results even in diverse chest disease environments. Finally, a comparison with 18 state-of-the-art methods on three datasets demonstrates the superiority and effectiveness of the proposed method across various scenarios.


Scale-Adaptive Asymmetric Sparse Variational AutoEncoder for Point Cloud Compression

September 2024

·

11 Reads

IEEE Transactions on Broadcasting

Learning-based point cloud compression has achieved great success in Rate-Distortion (RD) efficiency. Existing methods usually utilize Variational AutoEncoder (VAE) network, which might lead to poor detail reconstruction and high computational complexity. To address these issues, we propose a Scale-adaptive Asymmetric Sparse Variational AutoEncoder (SAS-VAE) in this work. First, we develop an Asymmetric Multiscale Sparse Convolution (AMSC), which exploits multi-resolution branches to aggregate multiscale features at encoder, and excludes symmetric feature fusion branches to control the model complexity at decoder. Second, we design a Scale Adaptive Feature Refinement Structure (SAFRS) to adaptively adjust the number of Feature Refinement Modules (FRMs), thereby improving RD performance with an acceptable computational overhead. Third, we implement our framework with AMSC and SAFRS, and train it with an RD loss based on Fine-grained Weighted Binary Cross-Entropy (FWBCE) function. Experimental results on 8iVFB, Owlii, and MVUV datasets show that our method outperforms several popular methods, with a 90.0% time reduction and a 51.8% BD-BR saving compared with V-PCC. The code will be available soon at https://github.com/fancj2017/SAS-VAE .


LightViD: Efficient Video Deblurring With Spatial–Temporal Feature Fusion

August 2024

·

11 Reads

·

4 Citations

IEEE Transactions on Circuits and Systems for Video Technology

Natural video capturing suffers from visual blurriness due to high-motion of cameras or objects. Until now, the video blurriness removal task has been extensively explored for both human vision and machine processing. However, its computational cost is still a critical issue and has not yet been fully addressed. In this paper, we propose a novel Lightweight Video Deblurring (LightViD) method that achieves the top-tier performance with an extremely low parameter size. The proposed LightViD consists of a blur detector and a deblurring network. In particular, the blur detector effectively separate blurriness regions, thus avoid both unnecessary computation and over-enhancement on non-blurriness regions. The deblurring network is designed as a lightweight model. It employs a Spatial Feature Fusion Block (SFFB) to extract hierarchical spatial features, which are further fused by ConvLSTM for effective spatial-temporal feature representation. Comprehensive experiments with quantitative and qualitative comparisons demonstrate the effectiveness of our LightViD method, which achieves competitive performances on GoPro and DVD datasets, with reduced computational costs of 1.63M parameters and 96.8 GMACs. Trained model available: https://github.com/wgp/LightVid .


Citations (48)


... The underwater environment holds immense scientific and ecological significance, sheltering diverse habitats across over 71% of Earth's surface [1]. However, capturing clear images underwater presents a significant challenge as a result of complex light interaction with water. ...

Reference:

Seeing Through the Haze: A Comprehensive Review of Underwater Image Enhancement Techniques
Underwater image quality optimization: Researches, challenges, and future trends
  • Citing Article
  • June 2024

Image and Vision Computing

... In addition, in the field of face super-resolution (FSR), FSR may alter the identity of an individual or introduce artifacts that affect recognizability, yet existing image quality assessment (IQA) methods are not yet able to assess this problem well. Chen et al. [55] used a benchmark dataset and a simplified reference quality metric to subjective and objective assessment of FSR-IQA to effectively address such problems. ...

Face Super-Resolution Quality Assessment Based On Identity and Recognizability
  • Citing Article
  • July 2024

IEEE Transactions on Biometrics Behavior and Identity Science

... However, these methods have limitations including sensitivity to specific conditions, information loss, color distortions, and difficulty in recovering lost details. In recent years, deep learning-based methods for image enhancement and restoration have shown remarkable performance in various domains, including image super-resolution [8]- [12], low-light image enhancement [?], [13]- [17] and video processing [18]- [20]. These advancements have also been extended to the task of UIE [21]- [27]. ...

FUVC: A Flexible Codec for Underwater Video Transmission
  • Citing Article
  • January 2024

IEEE Transactions on Geoscience and Remote Sensing

... SSP-Deblur leverages superpixel segmentation prior to guide blind image deblurring, using segmentation entropy as a novel metric to enhance boundary definition and optimize image clarity through a convex energy minimization algorithm [37]. LightViD introduces a lightweight video deblurring approach with a blur detector and Spatial Feature Fusion Block, achieving top-tier performance with minimal computational cost [38]. MSFS-Net proposes a frequency separation module and contrastive learning to refine image details at multiple scales [39]. ...

LightViD: Efficient Video Deblurring With Spatial–Temporal Feature Fusion
  • Citing Article
  • August 2024

IEEE Transactions on Circuits and Systems for Video Technology

... The integration of Retinex theory (Rahman, Jobson, and Woodell 2004) into deep learning represents a predominant approach in addressing the majority of current methods (Wei et al. 2018;Wang et al. 2019;Zhang et al. 2021;Ma et al. 2023;Gao et al. 2023a;Liu et al. 2023;Li et al. 2024). Further, Retinex theory is also combined with neural architecture search (Liu, Simonyan, and Yang 2018;Liu et al. 2021b,a) and unrolling modes to address low-light enhancement tasks. ...

Perceptual Decoupling With Heterogeneous Auxiliary Tasks for Joint Low-Light Image Enhancement and Deblurring
  • Citing Article
  • January 2024

IEEE Transactions on Multimedia

... Regarding SIQA, these techniques are particularly useful in the development of sonar technology, such as transmission, enhancement, and compression algorithms. They can also assist in optimizing the placement and configuration of sonar equipment and developing super-resolution models to enhance the pixel density in low-resolution sonar images (Chen et al., 2024). ...

Perception-and-Cognition-Inspired Quality Assessment for Sonar Image Super-Resolution
  • Citing Article
  • January 2024

IEEE Transactions on Multimedia

... In contrast, deep-learning-based AR-IQA methods show robustness to slight geometric deformation, but these methods cannot handle IQA tasks where the resolutions of reference and test images are inconsistent. To this end, various GDR-IQA methods have been proposed for specific tasks including Super-Resolution IQA (SRIQA) [6]- [10], Image Retargeting Quality Assessment (IRQA) [11]- [15] and Geometric Transformation Similarity Quality Assessment (GTSQA) (i.e., the similarity between source and transformed images) [5], [16]- [18]. ...

Perception-Driven Similarity-Clarity Tradeoff for Image Super-Resolution Quality Assessment
  • Citing Article
  • January 2023

IEEE Transactions on Circuits and Systems for Video Technology

... In the field of dynamic bit rate encoded videos, some flow control strategies limit the amount of data from the encoding level. In recent research of [27], a 360 degree video encoding rate control (RC) algorithm based on virtual competitors is proposed, which combines game theory to propose a frame level bit allocation model based on virtual competitors. The algorithm provides a GOP level bit allocation scheme and designs an overall bit rate allocation scheme based on this to reduce the bit rate fluctuation of GOP. ...

Virtual-Competitors-Based Rate Control for 360-Degree Video Coding
  • Citing Article
  • January 2023

IEEE Transactions on Broadcasting

... FR (e.g., [8], [16]) necessitates complete information; RR (e.g., [11]), relies on partial information; and NR (e.g., [17,18,19,13]) operates without any dependence on the available source video data. ciency Video Coding (HEVC), and VP9 exhibit blocking, ringing, and blurring artifacts [20]. Spatial artifacts that frequently result from encoding comprise false contouring, mosaic patterns, and contrast distortion. ...

Video Compression Artifacts Removal With Spatial-Temporal Attention-Guided Enhancement
  • Citing Article
  • January 2023

IEEE Transactions on Multimedia

... In recent years, the application of deep learning techniques in coding has effectively improved coding performance. In end-to-end coding, neural networks are designed for different coding targets to improve coding efficiency, such as fast coding [10,20,33,40], rate control [27] and machine vision [17,21,41]. Since current international video coding standards rarely adopt deep learning-based approaches, the research on applying deep learning in video coding has certain prospective significance, especially in optimizing coding complexity. ...

ELFIC: A Learning-based Flexible Image Codec with Rate-Distortion-Complexity Optimization
  • Citing Conference Paper
  • October 2023