May 2025
·
3 Reads
Neural Networks
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
May 2025
·
3 Reads
Neural Networks
February 2025
·
10 Reads
Deep online cross-modal hashing has gained much attention from researchers recently, as its promising applications with low storage requirement, fast retrieval efficiency and cross modality adaptive, etc. However, there still exists some technical hurdles that hinder its applications, e.g., 1) how to extract the coexistent semantic relevance of cross-modal data, 2) how to achieve competitive performance when handling the real time data streams, 3) how to transfer the knowledge learned from offline to online training in a lightweight manner. To address these problems, this paper proposes a lightweight contrastive distilled hashing (LCDH) for cross-modal retrieval, by innovatively bridging the offline and online cross-modal hashing by similarity matrix approximation in a knowledge distillation framework. Specifically, in the teacher network, LCDH first extracts the cross-modal features by the contrastive language-image pre-training (CLIP), which are further fed into an attention module for representation enhancement after feature fusion. Then, the output of the attention module is fed into a FC layer to obtain hash codes for aligning the sizes of similarity matrices for online and offline training. In the student network, LCDH extracts the visual and textual features by lightweight models, and then the features are fed into a FC layer to generate binary codes. Finally, by approximating the similarity matrices, the performance of online hashing in the lightweight student network can be enhanced by the supervision of coexistent semantic relevance that is distilled from the teacher network. Experimental results on three widely used datasets demonstrate that LCDH outperforms some state-of-the-art methods.
January 2025
·
2 Reads
IEEE Transactions on Knowledge and Data Engineering
Cross-modal retrieval is a promising technique nowadays to find semantically similar instances in other modalities while a query instance is given from one modality. However, there still exists many challenges for reducing heterogeneous modality gap by embedding label information to discrete hash codes effectively, solving the binary optimization when generating unified hash codes and reducing the discrepancy of data distribution efficiently during common space learning. In order to overcome the above-mentioned challenges, we propose a Collaboratively Semantic alignment and Metric learning for cross-modal Hashing (CSMH) in this paper. Specifically, by a kernelization operation, CSMH firstly extracts the non-linear data features for each modality, which are projected into a latent subspace to align both marginal and conditional distributions simultaneously. Then, a maximum mean discrepancy-based metric strategy is customized to mitigate the distribution discrepancies among features from different modalities. Finally, semantic information obtained from the label similarity matrix, is further incorporated to embed the latent semantic structure into the discriminant subspace. Experimental results of CSMH and baseline methods on four widely-used datasets show that CSMH outperforms some state-of-the-art hashing baseline methods for cross-modal retrieval on efficiency and precision.
November 2024
·
3 Reads
Pattern Recognition
October 2024
·
6 Reads
·
1 Citation
October 2024
·
6 Reads
October 2024
·
10 Reads
International Journal of Machine Learning and Cybernetics
Traditional zero-shot learning aims to use the trained model to accurately classify samples from unseen classes, while for the more difficult task of generalized zero-shot learning, the trained model needs to classify samples from both seen and unseen classes into the correct classes. Because only seen class samples are available during training, generalized zero-shot learning meets great challenges in classification. Generative model is one of the good methods to solve this problem. However, the samples generated by the generative model are often of poor quality. In addition, there are semantic redundancies in the generated samples that are not conducive to classification. To solve these problems, we proposed the dual insurance model (DI-GAN) for generalized zero-shot learning in this paper, including a feature generation module and a semantic separation module. They guarantee the high quality of generated features and the good classification performance respectively. Specifically, the first insurance is based on generative adversarial network, whose generator is constrained by a clustering method to make the generated samples close to the real samples. The second insurance is based on variational autoencoder, including semantic separation, instance network and classification network. Semantic separation is designed to extract the semantically related parts which are beneficial to classification, while instance network acting on the semantically related parts is used to ensure the classification performance. Extensive experiments on four benchmark datasets show the competitiveness of the proposed DI-GAN.
August 2024
·
7 Reads
International Journal of Machine Learning and Cybernetics
Traditional image retrieval methods suffer from a significant performance degradation when the model is trained on the target dataset and run on another dataset. To address this issue, Domain Adaptive Retrieval (DAR) has emerged as a promising solution, specifically designed to overcome domain shifts in retrieval tasks. However, existing unsupervised DAR methods still face two primary limitations: (1) they under-explore the intrinsic structure among domains, resulting in limited generalization capabilities; and (2) the models are often too complex to be applied to large-scale datasets. To tackle these limitations, we propose a novel unsupervised DAR method named Anchor-based Domain Adaptive Hashing (ADAH). ADAH aims to exploit the commonalities among domains with the assumption that a consensus latent space exists for the source and target domains. To achieve this, an anchor-based similarity reconstruction scheme is proposed, which learns a set of domain-shared anchors and domain-specific anchor graphs, and then reconstructs the similarity matrix with these anchor graphs, thereby effectively exploiting inter- and intra-domain similarity structures. Subsequently, by treating the anchor graphs as feature embeddings, we solve the Distance-Distance Difference Minimization (DDDM) problem between them and their corresponding hash codes. This preserves the similarity structure of the similarity matrix in the hash code. Finally, a two-stage strategy is employed to derive the hash function, ensuring its effectiveness and scalability. Experimental results on four datasets demonstrate the effectiveness of the proposed method.
July 2024
·
42 Reads
·
15 Citations
IEEE Transactions on Circuits and Systems for Video Technology
Recently, deep hashing-based cross-modal retrieval has attracted much attention of researchers, due to its advantages of fast retrieval efficiency and low storage overhead, etc. However, the existing deep hashing-based cross-modal retrieval methods typically 1) suffer from inadequately capturing the semantic relevance and coexistent information for cross-modal data, which may result in sub-optimal retrieval performance, 2) require a more comprehensive similarity measurement for cross-modal features to ensure high retrieval accuracy, 3) lack of scalability for lightweight deployment framework. To handle the issues mentioned above, we propose a CLIP-based knowledge distillation hashing (CKDH) for cross-modal retrieval, by referring the research trend of combining traditional methods and modern neural architecture to design lightweight networks based on large language models. Specifically, to effectively help capture the semantic relevance and coexistent information, CLIP is fine-tuned to extract visual features, while a graph attention network is used to enhance textual features extracted by bag-of-words model in the teacher model. Then, for better supervising the training of student model, a more comprehensive similarity measurement is introduced to represent distilled knowledge by jointly preserving the log-likelihood, intra and inter modality similarities. Finally, the student model extracts deep features by a lightweight networks, and generates the hash codes under the supervision of the similarity matrix produced by the teacher model. Experimental results on three widely used datasets demonstrate that CKDH can outperform some state-of-the-art methods, by delivering the best result consistently.
June 2024
·
35 Reads
physica status solidi (RRL) - Rapid Research Letters
A tunable material black phosphorus (BP) terahertz (THz) half‐ring Fano resonator is proposed, exhibiting enhanced sensitivity, tunable frequency parameters, and the flexible sensing range. A half‐ring is positioned above the main channel, while a groove is excavated beneath it to produce the Fano resonance. The discrete mode of the half‐ring is coupled with the continuous mode of the groove, leading to a significantly enhanced sensitivity. This sensor can pick up subtle changes in the surrounding environment. Additionally, the incorporation of BP into the half‐ring positioned above the channel enables the flexible adjustment of the Fano resonator's resonant frequency. This adjustment is achieved through the manipulation of the electron doping concentration of the BP material. At the third‐order resonance around 5.81 THz, the frequency shift margin can reach 160 GHz. Adjusting the structural parameters of the Fano resonator, such as the radius of its outer ring, the distance of this ring to the main channel, and the groove's height, significantly affects its transmission spectrum. The Fano resonator demonstrates its considerable potential for applications in the field of integrated electronics. It not only provides an innovative design perspective, but also lays the foundation for the study of THz systems. This article is protected by copyright. All rights reserved.
... A foundation model is a model trained on a wide range of datasets and can be adapted (e.g., fine-tuned) for other downstream tasks [36], [41], [42]. Keetha et al. [2] investigate which of the existing foundation models [27], [38]- [40] suits VPR best. ...
July 2024
IEEE Transactions on Circuits and Systems for Video Technology
... Supervised discrete online hashing (SDOH) embeds semantic label into the common latent space for parallel calculating the common representation of the newly coming data from multiple modalities, and then generates hash codes by the continuous common latent representation ). Random online hashing (ROH) proposes a linear bridging strategy to simplify the similarity factorization problem into a linear optimization problem, and then proposes a MED embedding methods to learn features and preserve significant semantic information into hash codes (Jiang et al. 2023a). ...
December 2023
IEEE Transactions on Neural Networks and Learning Systems
... Probability weighted compact feature learning (PWCF) [12] proposes focal-triplet loss and histogram feature of neighbors to learn a domain-invariant and discriminative hash function. TSS [13] proposes a novel two-step strategy that consists a domain adaptation step and then a hashing step, and makes use of non-discriminant features to achieve clear discrepancy among classes. DCS-LSG [14] dynamically selects pseudo-labels with high-confidence of the target domain for training, as well as proposes a dual-projection strategy to learn the projection matrix for the source domain and the target domain respectively. ...
January 2023
IEEE Transactions on Knowledge and Data Engineering
... Vu et al. utilize graph convolutional networks to learn label representations for handling the tasks of multi-label images recognition and multi-label text classification [13]. Guo et al. impose low-rank constraints to address both overall domain reconstruction and alignment of data with the same label, ensuring a transferrable and discriminative feature representation [14]. Luo et al. employ a projective structured double reconstruction strategy to train class-oriented sub-dictionaries from specific classes in cross-domain samples, and incorporate an adaptive geometrical structurepreserving function to maintain local manifold structures [15]. ...
June 2023
Knowledge-Based Systems
... As a generalization of real-valued numbers and complex-valued numbers, the algebraic structure of quaternions is well-recognized for its ability to empower networks with increased representational capacity and a reduced count of parameters when addressing complex interrelationships and patterns. By integrating quaternionbased weights and activation functions, the quaternion-valued neural network (QVNN) improves the traditional mathematical representation of neurons and makes it possible to process multi-channel data, including color imagery [5], video content [6], and 3D spatial information [7]. Currently, the dynamic properties of QVNN have been widely concerned, such as dissipativity [8,9], stability [10,11], optimization [12], and synchronization [13,14]. ...
June 2023
International Journal of Machine Learning and Cybernetics
... Offline cross-modal hashing methods attempt to train hash codes and functions with full database in a batch-based manner, which suffers from the limitation that cannot handle the real time data chunks (Huang et al. 2023). For example, deep cross-modal hashing (DCMH) adopts the end-to-end strategy to jointly learn hash codes and functions for coss-modal data (Jiang and Li 2017). ...
January 2023
IEEE Transactions on Knowledge and Data Engineering
... As reviewed by Ni, Zhang, Kang, et al. [9], many researchers have proposed innovative approaches to address this issue. One notable contribution in this field is the introduction of the Multi-label learning with Missing and Completely Unobserved Labels (MCUL) approach by Huang, Xu, Qian, et al. [4]. ...
May 2023
Neural Networks
... In the past decades, the volume of multimedia data has explosively grown in an exponential way (Jiang et al. 2024b;Cui et al. 2023b;Jiang et al. 2023b;Xu et al. 2023;Ling et al. 2023;Cui et al. 2023a;Chen et al. 2024), especially in some artificial intelligence (AI)-based applications. Under this circumstance, it is much more difficult to provide the real time retrieval services for the cross-modal data with such huge volume. ...
March 2023
Signal Processing
... SVDD employs generally kernel functions to project normal latent variables onto latent space. In this study, learnable Encoder 2 instead of a kernel function is used to make the projective transformation more suitable for the training data [20,21]. Figures 2 and 3 illustrate the architectures of Encoders 1, 2, Decoders 1, and 2. Characters f, k, and s in parentheses represent the number of filters, kernel size, and stride size, respectively. ...
September 2022
... SAS-Rec [13], Bert4Rec [27]) in order to better model user behaviour sequences. Moreover, existing approaches [3,17,36,37,39,46] also integrated the vanilla transformer with a GNN to exploit the global user-item relationships. Such approaches enhance the recommendation performance by going beyond merely aggregating local neighbours to encompassing a global context. ...
August 2022