Xiaozhao Fang’s research while affiliated with Guangdong University of Technology and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (94)


Fuzzy bifocal disambiguation for partial multi-label learning
  • Article

May 2025

·

3 Reads

Neural Networks

Xiaozhao Fang

·

Xi Hu

·

Yan Hu

·

[...]

·

Na Han

Figure 1: The framework of the lightweight contrastive distilled hashing (LCDH) for online cross-modal retrieval.
Figure 2: Precision-recall and Top-N precision curves of LCDH and baselines on MIRFlickr-25K with 128 bits hash codes.
Figure 3: Precision-recall and Top-N precision curves of LCDH and baselines on IAPR TC-12 with 128 bits hash codes.
Figure 4: Precision-recall and Top-N precision curves of LCDH and baselines on NUS-WIDE with 128 bits hash codes.
Lightweight Contrastive Distilled Hashing for Online Cross-modal Retrieval
  • Preprint
  • File available

February 2025

·

10 Reads

Deep online cross-modal hashing has gained much attention from researchers recently, as its promising applications with low storage requirement, fast retrieval efficiency and cross modality adaptive, etc. However, there still exists some technical hurdles that hinder its applications, e.g., 1) how to extract the coexistent semantic relevance of cross-modal data, 2) how to achieve competitive performance when handling the real time data streams, 3) how to transfer the knowledge learned from offline to online training in a lightweight manner. To address these problems, this paper proposes a lightweight contrastive distilled hashing (LCDH) for cross-modal retrieval, by innovatively bridging the offline and online cross-modal hashing by similarity matrix approximation in a knowledge distillation framework. Specifically, in the teacher network, LCDH first extracts the cross-modal features by the contrastive language-image pre-training (CLIP), which are further fed into an attention module for representation enhancement after feature fusion. Then, the output of the attention module is fed into a FC layer to obtain hash codes for aligning the sizes of similarity matrices for online and offline training. In the student network, LCDH extracts the visual and textual features by lightweight models, and then the features are fed into a FC layer to generate binary codes. Finally, by approximating the similarity matrices, the performance of online hashing in the lightweight student network can be enhanced by the supervision of coexistent semantic relevance that is distilled from the teacher network. Experimental results on three widely used datasets demonstrate that LCDH outperforms some state-of-the-art methods.

Download

Collaboratively Semantic Alignment and Metric Learning for Cross-modal Hashing

January 2025

·

2 Reads

IEEE Transactions on Knowledge and Data Engineering

Cross-modal retrieval is a promising technique nowadays to find semantically similar instances in other modalities while a query instance is given from one modality. However, there still exists many challenges for reducing heterogeneous modality gap by embedding label information to discrete hash codes effectively, solving the binary optimization when generating unified hash codes and reducing the discrepancy of data distribution efficiently during common space learning. In order to overcome the above-mentioned challenges, we propose a Collaboratively Semantic alignment and Metric learning for cross-modal Hashing (CSMH) in this paper. Specifically, by a kernelization operation, CSMH firstly extracts the non-linear data features for each modality, which are projected into a latent subspace to align both marginal and conditional distributions simultaneously. Then, a maximum mean discrepancy-based metric strategy is customized to mitigate the distribution discrepancies among features from different modalities. Finally, semantic information obtained from the label similarity matrix, is further incorporated to embed the latent semantic structure into the discriminant subspace. Experimental results of CSMH and baseline methods on four widely-used datasets show that CSMH outperforms some state-of-the-art hashing baseline methods for cross-modal retrieval on efficiency and precision.





Dual insurance for generalized zero-shot learning

International Journal of Machine Learning and Cybernetics

Traditional zero-shot learning aims to use the trained model to accurately classify samples from unseen classes, while for the more difficult task of generalized zero-shot learning, the trained model needs to classify samples from both seen and unseen classes into the correct classes. Because only seen class samples are available during training, generalized zero-shot learning meets great challenges in classification. Generative model is one of the good methods to solve this problem. However, the samples generated by the generative model are often of poor quality. In addition, there are semantic redundancies in the generated samples that are not conducive to classification. To solve these problems, we proposed the dual insurance model (DI-GAN) for generalized zero-shot learning in this paper, including a feature generation module and a semantic separation module. They guarantee the high quality of generated features and the good classification performance respectively. Specifically, the first insurance is based on generative adversarial network, whose generator is constrained by a clustering method to make the generated samples close to the real samples. The second insurance is based on variational autoencoder, including semantic separation, instance network and classification network. Semantic separation is designed to extract the semantically related parts which are beneficial to classification, while instance network acting on the semantically related parts is used to ensure the classification performance. Extensive experiments on four benchmark datasets show the competitiveness of the proposed DI-GAN.


Anchor-based Domain Adaptive Hashing for unsupervised image retrieval

International Journal of Machine Learning and Cybernetics

Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method.


CKDH: CLIP-based Knowledge Distillation Hashing for Cross-modal Retrieval

July 2024

·

42 Reads

·

15 Citations

IEEE Transactions on Circuits and Systems for Video Technology

Recently, deep hashing-based cross-modal retrieval has attracted much attention of researchers, due to its advantages of fast retrieval efficiency and low storage overhead, etc. However, the existing deep hashing-based cross-modal retrieval methods typically 1) suffer from inadequately capturing the semantic relevance and coexistent information for cross-modal data, which may result in sub-optimal retrieval performance, 2) require a more comprehensive similarity measurement for cross-modal features to ensure high retrieval accuracy, 3) lack of scalability for lightweight deployment framework. To handle the issues mentioned above, we propose a CLIP-based knowledge distillation hashing (CKDH) for cross-modal retrieval, by referring the research trend of combining traditional methods and modern neural architecture to design lightweight networks based on large language models. Specifically, to effectively help capture the semantic relevance and coexistent information, CLIP is fine-tuned to extract visual features, while a graph attention network is used to enhance textual features extracted by bag-of-words model in the teacher model. Then, for better supervising the training of student model, a more comprehensive similarity measurement is introduced to represent distilled knowledge by jointly preserving the log-likelihood, intra and inter modality similarities. Finally, the student model extracts deep features by a lightweight networks, and generates the hash codes under the supervision of the similarity matrix produced by the teacher model. Experimental results on three widely used datasets demonstrate that CKDH can outperform some state-of-the-art methods, by delivering the best result consistently.


Dynamically Tunable Half‐Ring Fano Resonator Based on Black Phosphorus

June 2024

·

35 Reads

physica status solidi (RRL) - Rapid Research Letters

A tunable material black phosphorus (BP) terahertz (THz) half‐ring Fano resonator is proposed, exhibiting enhanced sensitivity, tunable frequency parameters, and the flexible sensing range. A half‐ring is positioned above the main channel, while a groove is excavated beneath it to produce the Fano resonance. The discrete mode of the half‐ring is coupled with the continuous mode of the groove, leading to a significantly enhanced sensitivity. This sensor can pick up subtle changes in the surrounding environment. Additionally, the incorporation of BP into the half‐ring positioned above the channel enables the flexible adjustment of the Fano resonator's resonant frequency. This adjustment is achieved through the manipulation of the electron doping concentration of the BP material. At the third‐order resonance around 5.81 THz, the frequency shift margin can reach 160 GHz. Adjusting the structural parameters of the Fano resonator, such as the radius of its outer ring, the distance of this ring to the main channel, and the groove's height, significantly affects its transmission spectrum. The Fano resonator demonstrates its considerable potential for applications in the field of integrated electronics. It not only provides an innovative design perspective, but also lays the foundation for the study of THz systems. This article is protected by copyright. All rights reserved.


Citations (65)


... A foundation model is a model trained on a wide range of datasets and can be adapted (e.g., fine-tuned) for other downstream tasks [36], [41], [42]. Keetha et al. [2] investigate which of the existing foundation models [27], [38]- [40] suits VPR best. ...

Reference:

SciceVPR: Stable Cross-Image Correlation Enhanced Model for Visual Place Recognition
CKDH: CLIP-based Knowledge Distillation Hashing for Cross-modal Retrieval
  • Citing Article
  • July 2024

IEEE Transactions on Circuits and Systems for Video Technology

... Supervised discrete online hashing (SDOH) embeds semantic label into the common latent space for parallel calculating the common representation of the newly coming data from multiple modalities, and then generates hash codes by the continuous common latent representation ). Random online hashing (ROH) proposes a linear bridging strategy to simplify the similarity factorization problem into a linear optimization problem, and then proposes a MED embedding methods to learn features and preserve significant semantic information into hash codes (Jiang et al. 2023a). ...

Random Online Hashing for Cross-Modal Retrieval
  • Citing Article
  • December 2023

IEEE Transactions on Neural Networks and Learning Systems

... Probability weighted compact feature learning (PWCF) [12] proposes focal-triplet loss and histogram feature of neighbors to learn a domain-invariant and discriminative hash function. TSS [13] proposes a novel two-step strategy that consists a domain adaptation step and then a hashing step, and makes use of non-discriminant features to achieve clear discrepancy among classes. DCS-LSG [14] dynamically selects pseudo-labels with high-confidence of the target domain for training, as well as proposes a dual-projection strategy to learn the projection matrix for the source domain and the target domain respectively. ...

Two-Step Strategy for Domain Adaptation Retrieval
  • Citing Article
  • January 2023

IEEE Transactions on Knowledge and Data Engineering

... Vu et al. utilize graph convolutional networks to learn label representations for handling the tasks of multi-label images recognition and multi-label text classification [13]. Guo et al. impose low-rank constraints to address both overall domain reconstruction and alignment of data with the same label, ensuring a transferrable and discriminative feature representation [14]. Luo et al. employ a projective structured double reconstruction strategy to train class-oriented sub-dictionaries from specific classes in cross-domain samples, and incorporate an adaptive geometrical structurepreserving function to maintain local manifold structures [15]. ...

Low-rank constraint-based multiple projections learning for cross-domain classification
  • Citing Article
  • June 2023

Knowledge-Based Systems

... As a generalization of real-valued numbers and complex-valued numbers, the algebraic structure of quaternions is well-recognized for its ability to empower networks with increased representational capacity and a reduced count of parameters when addressing complex interrelationships and patterns. By integrating quaternionbased weights and activation functions, the quaternion-valued neural network (QVNN) improves the traditional mathematical representation of neurons and makes it possible to process multi-channel data, including color imagery [5], video content [6], and 3D spatial information [7]. Currently, the dynamic properties of QVNN have been widely concerned, such as dissipativity [8,9], stability [10,11], optimization [12], and synchronization [13,14]. ...

QMGR-Net: quaternion multi-graph reasoning network for 3D hand pose estimation

International Journal of Machine Learning and Cybernetics

... Offline cross-modal hashing methods attempt to train hash codes and functions with full database in a batch-based manner, which suffers from the limitation that cannot handle the real time data chunks (Huang et al. 2023). For example, deep cross-modal hashing (DCMH) adopts the end-to-end strategy to jointly learn hash codes and functions for coss-modal data (Jiang and Li 2017). ...

Two-Stage Asymmetric Similarity Preserving Hashing for Cross-Modal Retrieval
  • Citing Article
  • January 2023

IEEE Transactions on Knowledge and Data Engineering

... As reviewed by Ni, Zhang, Kang, et al. [9], many researchers have proposed innovative approaches to address this issue. One notable contribution in this field is the introduction of the Multi-label learning with Missing and Completely Unobserved Labels (MCUL) approach by Huang, Xu, Qian, et al. [4]. ...

Cross-modal hashing with missing labels
  • Citing Article
  • May 2023

Neural Networks

... In the past decades, the volume of multimedia data has explosively grown in an exponential way (Jiang et al. 2024b;Cui et al. 2023b;Jiang et al. 2023b;Xu et al. 2023;Ling et al. 2023;Cui et al. 2023a;Chen et al. 2024), especially in some artificial intelligence (AI)-based applications. Under this circumstance, it is much more difficult to provide the real time retrieval services for the cross-modal data with such huge volume. ...

Low-rank constraint based dual projections learning for dimensionality reduction
  • Citing Article
  • March 2023

Signal Processing

... SVDD employs generally kernel functions to project normal latent variables onto latent space. In this study, learnable Encoder 2 instead of a kernel function is used to make the projective transformation more suitable for the training data [20,21]. Figures 2 and 3 illustrate the architectures of Encoders 1, 2, Decoders 1, and 2. Characters f, k, and s in parentheses represent the number of filters, kernel size, and stride size, respectively. ...

Anomaly Detection Algorithm Based on Broad Learning System and Support Vector Domain Description

... SAS-Rec [13], Bert4Rec [27]) in order to better model user behaviour sequences. Moreover, existing approaches [3,17,36,37,39,46] also integrated the vanilla transformer with a GNN to exploit the global user-item relationships. Such approaches enhance the recommendation performance by going beyond merely aggregating local neighbours to encompassing a global context. ...

Graph Transformer Collaborative Filtering Method for Multi-Behavior Recommendations