May 2025
Neurocomputing
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
May 2025
Neurocomputing
February 2025
January 2025
Journal of Image and Graphics
December 2024
October 2024
·
2 Reads
September 2024
June 2024
·
3 Reads
May 2024
·
39 Reads
Traditional video frame interpolation methods based on deep convolutional neural networks face challenges in handling large motions. Their performance is limited by the fact that convolutional operations cannot directly integrate the rich temporal and spatial information of inter-frame pixels, and these methods rely heavily on additional inputs such as optical flow to model motion. To address this issue, we develop a novel framework for video frame interpolation that uses Transformer to efficiently model the long-range similarity of inter-frame pixels. Furthermore, to effectively aggregate spatio-temporal features, we design a novel attention mechanism divided into temporal attention and spatial attention. Specifically, spatial attention is used to aggregate intra-frame information, integrating both attention and convolution paradigms through the simple mapping approach. Temporal attention is used to model the similarity of pixels on the timeline. This design achieves parallel processing of these two types of information without extra computational cost, aggregating information in the space–time dimension. In addition, we introduce a context extraction network and multi-scale prediction frame synthesis network to further optimize the performance of the Transformer. Our method and state-of-the-art methods are extensively quantitatively and qualitatively experimented on various benchmark datasets. On the Vimeo90K and UCF101 datasets, our model achieves improvements of 0.09 dB and 0.01 dB in the PSNR metrics over UPR-Net-large, respectively. On the Vimeo90K dataset, our model outperforms FLAVR by 0.07 dB, with only 40.56% of its parameters. The qualitative results show that for complex and large-motion scenes, our method generates sharper and more realistic edges and details.
April 2024
·
59 Reads
·
5 Citations
Neural Computing and Applications
Images captured under low-light conditions often suffer from low contrast, high noise, and uneven brightness due to nightlight, backlight, and shadow. These challenges make it difficult to use them as high-quality inputs for visual tasks. Existing low-light enhancement methods tend to increase overall image brightness, which can cause overexposure of normal-light areas after enhancement. To solve this problem, this paper proposes an Uneven Dark Vision Network (UDVN) that consists of two sub-networks. The Luminance Domain Network (LDN) uses Direction-aware Spatial Context (DSC) and Feature Enhancement Module (FEM) to segment different light regions in the image and output the luminance domain mask. Guided by this mask, the Light Enhancement Network (LEN) uses the Cross-Domain Transformation Residual block (CDTR) to adaptively illuminate different regions with various lights. We also introduce a new region loss function to constrain the LEN to better enhance the quality of different light regions. In addition, we have constructed a new low-light synthesis dataset (UDL) that is larger, more diverse, and includes uneven lighting states in the real world. Extensive experiments on several benchmark datasets demonstrate that our proposed method is highly competitive with state-of-the-art (SOTA) methods. Specifically, it outperforms other methods in light recovery and detail preservation when processing uneven low-light images. The UDL dataset is publicly available at: https://github.com/YuhangLi-li/UDVN.
March 2024
·
11 Reads
·
2 Citations
Video frame interpolation aims to generate intermediate frames in a video to showcase finer details. However, most methods are only trained and tested on low-resolution datasets, lacking research on 4K video frame interpolation problems. This limitation makes it challenging to handle high-frame-rate video processing in real-world scenarios. In this paper, we propose a 4K video dataset at 120 fps, named UHD4K120FPS, which contains large motion. We also propose a novel framework for solving the 4K video frame interpolation task, based on a multi-scale pyramid network structure. We introduce self-attention to capture long-range dependencies and self-similarities in pixel space, which overcomes the limitations of convolutional operations. To reduce computational cost, we use a simple mapping-based approach to lighten self-attention, while still allowing for content-aware aggregation weights. Through extensive quantitative and qualitative experiments, we demonstrate the excellent performance achieved by our proposed model on the UHD4K120FPS dataset, as well as illustrate the effectiveness of our method for 4K video frame interpolation. In addition, we evaluate the robustness of the model on low-resolution benchmark datasets.
... Liu et al. [8] introduced RAUNA; the structure comprises a decomposition network (DecNet) influenced by algorithmic unrolling and adjustment networks that take into account both global and local brightness. Li et al. [44] introduced UDVN, an enhancement algorithm that effectively handles images with uneven low-light conditions. This algorithm concentrates on the light and shadow details within the image for low-light enhancement tasks, achieving competitive performance in enhancing low-light images. ...
April 2024
Neural Computing and Applications
... The existing VFI methods can be mainly classified into two categories: flowbased [5,8,17,18] and kernel-based [19][20][21][22]. ...
March 2024
... Lu Zou et al. [83] conceptualize 2D poses as graphs, redefining 3D estimation as a graph regression problem, where GCNs infer latent structural relationships within the human body. Bing Yu et al. [84] develop a Perceptual U-shaped Graph Convolutional Network (M-UGCN) using a Ushaped network with map-aware local enhancement, extending the receptive field and intensifying local node interactions across multiple scales to improve 2D-to-3D estimation. Building on this, Hua et al. [85] combine 2D pose estimates from dual views with triangulation to produce an initial 3D pose, subsequently refining it through a Cross-view U-shape Graph Convolutional Network (CV-UGCN) under weak supervision, applicable to any preceding 2D method. ...
October 2023
... Although SSL pretext tasks can be designed and employed for many different types of data (e.g., timeseries [16], text [17], video [18] [19], audio [20], point clouds [21], or even multimodal data [22] [23]), this article focuses on image analysis for computer vision applications. Moreover, it focuses on generic SSL methods and not ones explicitly designed for specific tasks (e.g., for multi-view clustering [24], product attribute recognition [25], etc.). The remainder of this paper is organized as follows: Section 2 briefly presents the most common categories of pretext tasks for visual SSL. ...
August 2023
Neurocomputing
... As in several real-life situations, the ground truth version for the video signals in the new domain is not available, the use of the unsupervised learning techniques for carrying out the task of video-to-video translation is inevitable. Deep neural networks provide the state-of-the-art performances in various computer vision tasks [3][4][5][6][7][8][9], in view of their endto-end learning capability between the input and output data domains. In view of these explanations, the use of the deep unsupervised learning-based schemes seem to be a legitimate choice for designing a high-performance video-to-video translation system. ...
July 2023
Signal Image and Video Processing
... • SGRNet [185] is a two-stage network, which first employs a generator to create a shadow mask by merging foreground and background, and then predicts shadow parameters and fills the shadow area, producing an image with realistic shadows. • Liu et al. [186] enhances shadow generation in image compositing with multi-scale feature enhancement and multi-level feature fusion. This approach improves mask prediction accuracy and minimizes information loss in shadow parameter prediction, leading to enhanced shadow shapes and ranges. ...
March 2023
... The results show that the proposed model produces sharper results closer to the ground truth, with fewer blurring effects and artifacts. Finally, Li et al. [208] (SRAGAN) design a complex GAN with local and global channel and spatial attention modules both in the generator and the discriminator network to capture short-as well as long-range dependencies between pixels. Several experiments proved the superiority of the proposed model, especially at higher scaling factors. ...
December 2022
... Chu et al. [11] proposed a two-stage network that first extracts the subtitle mask using a mask extraction network, then feeds the predicted mask and video frame into the generator to remove subtitles. Tu et al. [33] propose a lightweight mask extraction network that uses gated convolutions to generate unsubtitled videos. Although BVDNet [10] has small model parameters, it does not perform well in removing subtitles. ...
December 2022
... Despite data limitations, recent research has explored different SLLIE [17] approaches. The end-to-end learningbased methods [2,9,11,15,22,26,29,30,39,43,53] aim to directly map low-light images to well-lit counterparts, often integrating Convolutional Neural Networks (CNNs). Several recent methods leverage transformers [7,41,52] to enhance the receptive field of vanilla convolutions for learning spatial information, inspired by advances in image restoration [48,57,58]. ...
December 2022
Machine Vision and Applications
... Assuming x is a natural image, φ is a histogram matching operator that can be be widely applied to image color transfer 49,50 , color correction 51,52 , and contrast enhancement 53 . For an input image x = (x1, x2, . . . ...
October 2022