January 2025
·
6 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
January 2025
·
6 Reads
January 2025
·
4 Reads
IEEE Transactions on Pattern Analysis and Machine Intelligence
In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences. The main idea behind our approach is leveraging the domain-neutral capabilities of CLIP as a bridging mechanism, while utilizing a separate module to extract abstract, domain-agnostic semantics from the embeddings of both the source and target realms. Fusing these abstract semantics with target-specific semantics results in a transformed embedding within the CLIP space. To bridge the gap between the disparate worlds of CLIP and StyleGAN, we introduce a new non-linear mapper, the CLIP2P mapper. Utilizing CLIP embeddings, this module is tailored to approximate the latent distribution in the StyleGAN's latent space, effectively acting as a connector between these two spaces. The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations, even in visually challenging scenarios across different visual domains. Notably, UniTranslator generates high-quality translations that showcase domain relevance, diversity, and improved image quality. UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks. The source code and trained models will be released to the public.
December 2024
·
3 Reads
In compute-first networking, maintaining fresh and accurate status information at the network edge is crucial for effective access to remote services. This process typically involves three phases: Status updating, user accessing, and user requesting. However, current studies on status effectiveness, such as Age of Information at Query (QAoI), do not comprehensively cover all these phases. Therefore, this paper introduces a novel metric, TPAoI, aimed at optimizing update decisions by measuring the freshness of service status. The stochastic nature of edge environments, characterized by unpredictable communication delays in updating, requesting, and user access times, poses a significant challenge when modeling. To address this, we model the problem as a Markov Decision Process (MDP) and employ a Dueling Double Deep Q-Network (D3QN) algorithm for optimization. Extensive experiments demonstrate that the proposed TPAoI metric effectively minimizes AoI, ensuring timely and reliable service updates in dynamic edge environments. Results indicate that TPAoI reduces AoI by an average of 47\% compared to QAoI metrics and decreases update frequency by an average of 48\% relative to conventional AoI metrics, showing significant improvement.
December 2024
·
4 Reads
As deep neural networks (DNNs) are increasingly deployed on edge devices, optimizing models for constrained computational resources is critical. Existing auto-pruning methods face challenges due to the diversity of DNN models, various operators (e.g., filters), and the difficulty in balancing pruning granularity with model accuracy. To address these limitations, we introduce AutoSculpt, a pattern-based automated pruning framework designed to enhance efficiency and accuracy by leveraging graph learning and deep reinforcement learning (DRL). AutoSculpt automatically identifies and prunes regular patterns within DNN architectures that can be recognized by existing inference engines, enabling runtime acceleration. Three key steps in AutoSculpt include: (1) Constructing DNNs as graphs to encode their topology and parameter dependencies, (2) embedding computationally efficient pruning patterns, and (3) utilizing DRL to iteratively refine auto-pruning strategies until the optimal balance between compression and accuracy is achieved. Experimental results demonstrate the effectiveness of AutoSculpt across various architectures, including ResNet, MobileNet, VGG, and Vision Transformer, achieving pruning rates of up to 90% and nearly 18% improvement in FLOPs reduction, outperforming all baselines. The codes can be available at https://anonymous.4open.science/r/AutoSculpt-DDA0
December 2024
·
12 Reads
Intelligent Marine Technology and Systems
Phytoplankton are crucial for aquatic ecosystems and provide valuable insights into ocean environments and changes in ecosystems. Traditional phytoplankton monitoring methods are often complex and lack timely analysis capabilities. Thus, deep learning algorithms offer a promising approach for automated phytoplankton monitoring. However, the lack of large-scale, high-quality training datasets presents a major bottleneck in advancing phytoplankton tracking. Herein, we propose a challenging benchmark dataset called multiple phytoplankton tracking (MPT), which covers diverse background information and motion variations during observation. The dataset includes 27 phytoplankton and zooplankton species, 14 different backgrounds to simulate diverse and complex underwater environments, and 140 videos. To enable accurate real-time phytoplankton observation, we introduce the deviation-corrected multiscale feature fusion tracker (DSFT), a multiobject tracking method designed to overcome key issues such as focus shifts during tracking and the loss of critical information on small targets when computing frame-to-frame similarity. To enhance efficiency, we incorporate an additional feature extractor that predicts residuals from the output of the standard feature extractor; this enables multiscale frame-to-frame similarity comparisons based on features from different extractor layers. Extensive experiments conducted on the MPT dataset validated its effectiveness and demonstrated the superior performance of the DSFT method in tracking phytoplankton, providing an effective solution for phytoplankton monitoring.
December 2024
·
5 Reads
Personalized image generation has made significant strides in adapting content to novel concepts. However, a persistent challenge remains: balancing the accurate reconstruction of unseen concepts with the need for editability according to the prompt, especially when dealing with the complex nuances of facial features. In this study, we delve into the temporal dynamics of the text-to-image conditioning process, emphasizing the crucial role of stage partitioning in introducing new concepts. We present PersonaMagic, a stage-regulated generative technique designed for high-fidelity face customization. Using a simple MLP network, our method learns a series of embeddings within a specific timestep interval to capture face concepts. Additionally, we develop a Tandem Equilibrium mechanism that adjusts self-attention responses in the text encoder, balancing text description and identity preservation, improving both areas. Extensive experiments confirm the superiority of PersonaMagic over state-of-the-art methods in both qualitative and quantitative evaluations. Moreover, its robustness and flexibility are validated in non-facial domains, and it can also serve as a valuable plug-in for enhancing the performance of pretrained personalization models.
December 2024
·
19 Reads
Intelligent Marine Technology and Systems
The exponential progression in oceanic observational technology has fostered the accumulation of substantial time series data pivotal for predictions in ocean meteorology. Foremost among the phenomena observed is El Niño-Southern Oscillation (ENSO), a critical determinant in the interplay of global ocean atmosphere interactions, with its severe manifestations inducing extreme meteorological conditions. Therefore, precisely predicting ENSO events carries immense gravitas. Historically, predictions hinged primarily on dynamic models and statistical approaches; however, the intricate and multifaceted spatiotemporal dynamics of ENSO events have often impeded the accuracy of these traditional methodologies. A notable lacuna in contemporary research is the insufficient exploration of long-term dependencies within oceanic data and the suboptimal integration of spatial information derived from spatiotemporal data. To address these limitations, this study introduces a forward-thinking ENSO prediction framework synergizing multiscale spatial features with temporal attention mechanisms. This innovation facilitates a more profound exploration of temporal and spatial domains, enhancing the retention of extensive-period data while optimizing the use of spatial information. Preliminary analyses executed on the global ocean data assimilation system dataset attest to the superior efficacy of the proposed method, underscoring a substantial improvement over established methods including SA-convolutional long short-term memory, particularly in facilitating long-term predictions.The source code and datasets are provided. The code is available at https://github.com/tse1998/ENSO-prediction.
November 2024
·
9 Reads
We describe the Forensics Adapter, an adapter network designed to transform CLIP into an effective and generalizable face forgery detector. Although CLIP is highly versatile, adapting it for face forgery detection is non-trivial as forgery-related knowledge is entangled with a wide range of unrelated knowledge. Existing methods treat CLIP merely as a feature extractor, lacking task-specific adaptation, which limits their effectiveness. To address this, we introduce an adapter to learn face forgery traces -- the blending boundaries unique to forged faces, guided by task-specific objectives. Then we enhance the CLIP visual tokens with a dedicated interaction strategy that communicates knowledge across CLIP and the adapter. Since the adapter is alongside CLIP, its versatility is highly retained, naturally ensuring strong generalizability in face forgery detection. With only trainable parameters, our method achieves a significant performance boost, improving by approximately on average across five standard datasets. We believe the proposed method can serve as a baseline for future CLIP-based face forgery detection methods.
November 2024
·
2 Reads
In this work, we concentrate on exciting the intrinsic local consistency of stereo matching through the incorporation of superpixel soft constraints, with the objective of mitigating inaccuracies at the boundaries of predicted disparity maps. Our approach capitalizes on the observation that neighboring pixels are predisposed to belong to the same object and exhibit closely similar intensities within the probability volume of superpixels. By incorporating this insight, our method encourages the network to generate consistent probability distributions of disparity within each superpixel, aiming to improve the overall accuracy and coherence of predicted disparity maps. Experimental evalua tions on widely-used datasets validate the efficacy of our proposed approach, demonstrating its ability to assist cost volume-based matching networks in restoring competitive performance.
November 2024
·
7 Reads
Intelligent Marine Technology and Systems
El Niño-Southern Oscillation (ENSO) is a periodic climate phenomenon in the equatorial Pacific that significantly influences global climate patterns. Accurate prediction and monitoring of ENSO events are essential for meteorological agencies and governmental institutions. This study introduces a content-guided attention module within a convolutional neural network to improve prediction accuracy. This module models inter-channel relationships and enhances information interaction by integrating channel and spatial attention weights. These advancements substantially improve prediction accuracy and help overcome the spring prediction barrier in ENSO forecasting. The research emphasizes global feature modeling and proposes a novel content-guided ENSO prediction model. It also includes an ocean data generation model utilizing global attention. Furthermore, a layered rendering technique is employed to invert ocean data, facilitating detailed analysis and contributing to the development of an ocean synthetic dataset.
... Warping-based methods warp the garment explicitly to fit the model's skeleton and then take the warped garment as input for the following generation process [12,26,28]. In particular, many methods in the stream put the warped garment onto the cloth-agnostic model image, so the generation process becomes a typical inpainting problem [10,29]. However, these methods heavily rely on the performance of the warping module, which is highly challenging itself, as can be seen in the green dashed box of Figure 1. ...
October 2024
... Different from CNNs, which rely on local operators to extract information, the Transformer employs a multi-head attention mechanism to establish a global relational model for remote sensing images, capturing long-range dependencies [16]. By converting two-dimensional image tasks into one-dimensional sequences, the Transformer efficiently models global context, pioneering a new paradigm in image processing and inspiring various innovative CD methods [34][35][36][37][38][39][40][41][42][43]. These methods can be categorized into pure Transformer architectures and hybrid CNN-Transformer architectures. ...
January 2024
IEEE Geoscience and Remote Sensing Letters
... Although a valuable resource, even video datasets such as PMOT2023 (Yu et al. 2023) have limitations in terms of sample density and scale. These datasets often include only some species or frames, which limits their variety and richness. ...
May 2023
Journal of Marine Science and Engineering
... In the context of medical image segmentation, although extensive research has been conducted on federated learning Sheller et al., 2019;Zhang et al., 2024;Luo et al., 2023), it is assumed that all the domains have sufficient labeled data, which is not practical in real-world scenarios. In practice, labeled data is often scarce and expensive to obtain in medical image segmentation tasks. ...
October 2024
IEEE Transactions on Neural Networks and Learning Systems
... GP-Gait (Fu et al. 2023) transforms the arbitrary human pose into a unified representation and allows efficient partitions of the human graph. Min et al. 2024) represents the coordinates of human joints as a heatmap to provide explicit structural features. (Li and Zhao 2022;Choi et al. 2019) utilize gait periodicity priors and frame-level discriminative power, respectively. ...
July 2024
... However, due to the numerous uncertainties inherent in aquatic environments and the absorption and scattering effects of water on light, the quality of raw underwater video footage often deteriorates significantly. These low-quality videos fail to meet human visual standards, impairing subsequent deep learning-based tasks such as video segmentation [1,2], object detection [3,4], multi-agent systems [5,6], image detection and classification [7,8], 3D image reconstruction [9], and medical image analysis [10]. ...
September 2024
Journal of Marine Science and Engineering
... The remarkable achievements of artificial intelligence techniques have sparked a wave of interest and research in the field of remote sensing (RS) [23]. These data-driven approaches have been widely used in object/target detection [24], image super-resolution [25], scene classification [26], [27], surface change detection [28], etc. In particular, with the aid of domain expertise, DL models have demonstrated their capability to solve some challenging detection tasks. ...
January 2024
IEEE Geoscience and Remote Sensing Letters
... Recent studies have explored the use of machine learning and deep learning methods for 2D facial palsy grading [4,6,7,[24][25][26][27][28]38], which have shown promising results in terms of accuracy and efficiency [2,24,[29][30][31][32][39][40][41][42]. Most often, they attempt to replicate existing grading [31,33,41,[43][44][45] or attempt to utilize other facial features to estimate the patient health state [29,[46][47][48]. One main advantage of these approaches is their ability to analyze 2D facial images, which are widely available and easily accessible, making them a cost-effective solution for facial palsy diagnosis. ...
August 2024
IEEE transactions on neural systems and rehabilitation engineering: a publication of the IEEE Engineering in Medicine and Biology Society
... Segmentation of anatomical structures and pathology within medical images holds paramount importance for clinical diagnosis (Zhou et al. 2019), treatment planning (Li et al. 2023b;Zhang et al. 2024a), and disease research (Zhang et al. 2022b). While significant progress has been achieved through deep learning-based segmentation techniques, many approaches encounter substantial bottlenecks when lacking sufficient well-annotated datasets (Zhang et al. 2024c(Zhang et al. , 2022a. Consequently, there is a critical need to develop more effective yet precise segmentation methods to decrease the dependence on large-scale pixel-wise annotated data. ...
August 2024
Pattern Recognition
... The remarkable achievements of artificial intelligence techniques have sparked a wave of interest and research in the field of remote sensing (RS) [23]. These data-driven approaches have been widely used in object/target detection [24], image super-resolution [25], scene classification [26], [27], surface change detection [28], etc. In particular, with the aid of domain expertise, DL models have demonstrated their capability to solve some challenging detection tasks. ...
January 2024
IEEE Geoscience and Remote Sensing Letters