Yilong Yin’s research while affiliated with Shandong University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (437)


MGRL4RE: A Multi-Graph Representation Learning Approach for Urban Region Embedding
  • Article

January 2025

·

1 Read

ACM Transactions on Intelligent Systems and Technology

Meng Chen

·

Zechen Li

·

Hongwei Jia

·

[...]

·

Yilong Yin

Using multi-modal data to learn region representations has gained popularity for its ability to reveal diverse socioeconomic features in cities. However, many studies focus solely on semantic features from points-of-interest (POIs), neglecting the issue of spatial imbalance. This paper introduces a Multi-Graph Representation Learning framework for Region Embedding (MGRL4RE), which leverages both inter-region and intra-region correlations through two main components: multi-graph construction based on various region correlations and multi-graph representation learning. The construction module creates a multi-graph reflecting various correlations among regions, utilizing geo-tagged POIs, region data, and human mobility data. Specifically, we assess a region's importance relative to its spatial context (neighborhood) and develop spatially invariant semantic features to address spatial imbalance. Further, the representation learning module generates comprehensive and effective region representations via multi-view embedding fusion. Our extensive experiments across various downstream tasks, including land use clustering, region popularity prediction, and crime prediction, confirm that our model significantly outperforms existing state-of-the-art region embedding methods.


Figure 1: The comparison of training curves between our method and ER on C-PASCAL-VOC.
Figure 2: The comparison of overall test performances (test on all tasks after learning each single task) between our method and ER on C-PASCAL-VOC.
Figure 3: Imbalance statistics of samples of each class in three tasks in C-MSCOCO.
Towards Macro-AUC oriented Imbalanced Multi-Label Continual Learning
  • Preprint
  • File available

December 2024

·

7 Reads

In Continual Learning (CL), while existing work primarily focuses on the multi-class classification task, there has been limited research on Multi-Label Learning (MLL). In practice, MLL datasets are often class-imbalanced, making it inherently challenging, a problem that is even more acute in CL. Due to its sensitivity to imbalance, Macro-AUC is an appropriate and widely used measure in MLL. However, there is no research to optimize Macro-AUC in MLCL specifically. To fill this gap, in this paper, we propose a new memory replay-based method to tackle the imbalance issue for Macro-AUC-oriented MLCL. Specifically, inspired by recent theory work, we propose a new Reweighted Label-Distribution-Aware Margin (RLDAM) loss. Furthermore, to be compatible with the RLDAM loss, a new memory-updating strategy named Weight Retain Updating (WRU) is proposed to maintain the numbers of positive and negative instances of the original dataset in memory. Theoretically, we provide superior generalization analyses of the RLDAM-based algorithm in terms of Macro-AUC, separately in batch MLL and MLCL settings. This is the first work to offer theoretical generalization analyses in MLCL to our knowledge. Finally, a series of experimental results illustrate the effectiveness of our method over several baselines. Our codes are available at https://github.com/ML-Group-SDU/Macro-AUC-CL.

Download

Dynamic Prompt Allocation and Tuning for Continual Test-Time Adaptation

December 2024

Continual test-time adaptation (CTTA) has recently emerged to adapt a pre-trained source model to continuously evolving target distributions, which accommodates the dynamic nature of real-world environments. To mitigate the risk of catastrophic forgetting in CTTA, existing methods typically incorporate explicit regularization terms to constrain the variation of model parameters. However, they cannot fundamentally resolve catastrophic forgetting because they rely on a single shared model to adapt across all target domains, which inevitably leads to severe inter-domain interference. In this paper, we introduce learnable domain-specific prompts that guide the model to adapt to corresponding target domains, thereby partially disentangling the parameter space of different domains. In the absence of domain identity for target samples, we propose a novel dynamic Prompt AllocatIon aNd Tuning (PAINT) method, which utilizes a query mechanism to dynamically determine whether the current samples come from a known domain or an unexplored one. For known domains, the corresponding domain-specific prompt is directly selected, while for previously unseen domains, a new prompt is allocated. Prompt tuning is subsequently performed using mutual information maximization along with structural regularization. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our PAINT method for CTTA. We have released our code at https://github.com/Cadezzyr/PAINT.





Fig. 8: Model Estimation and Adaptation Framework for datamodel fusion scenario
Unveiling the Superior Paradigm: A Comparative Study of Source-Free Domain Adaptation and Unsupervised Domain Adaptation

November 2024

·

9 Reads

In domain adaptation, there are two popular paradigms: Unsupervised Domain Adaptation (UDA), which aligns distributions using source data, and Source-Free Domain Adaptation (SFDA), which leverages pre-trained source models without accessing source data. Evaluating the superiority of UDA versus SFDA is an open and timely question with significant implications for deploying adaptive algorithms in practical applications. In this study, we demonstrate through predictive coding theory and extensive experiments on multiple benchmark datasets that SFDA generally outperforms UDA in real-world scenarios. Specifically, SFDA offers advantages in time efficiency, storage requirements, targeted learning objectives, reduced risk of negative transfer, and increased robustness against overfitting. Notably, SFDA is particularly effective in mitigating negative transfer when there are substantial distribution discrepancies between source and target domains. Additionally, we introduce a novel data-model fusion scenario, where data sharing among stakeholders varies (e.g., some provide raw data while others provide only models), and reveal that traditional UDA and SFDA methods do not fully exploit their potential in this context. To address this limitation and capitalize on the strengths of SFDA, we propose a novel weight estimation method that effectively integrates available source data into multi-SFDA (MSFDA) approaches, thereby enhancing model performance within this scenario. This work provides a thorough analysis of UDA versus SFDA and advances a practical approach to model adaptation across diverse real-world environments.


Characterizing Hierarchical Semantic-Aware Parts With Transformers for Generalized Zero-Shot Learning

November 2024

·

7 Reads

·

2 Citations

IEEE Transactions on Circuits and Systems for Video Technology

This paper presents a novel Transformer architecture for zero-shot learning (ZSL), termed TransZSL, which can characterize hierarchical semantic-aware parts. It consists of an adaptive token refinement (ATR), a hierarchical token aggregation (HTA), and semantic-aware prototypes (SAP). Firstly, the ViT is used as the backbone that provides comprehensive local information without missing details. To address the different degrees of noise caused by large appearance variations, the ATR is proposed to highlight important tokens and suppress useless ones adaptively. However, due to the complex image structure, some important tokens may be incorrectly discarded. Therefore, a random perturbation is proposed to reactivate discarded tokens randomly, reducing the risk of missing discriminative information. Secondly, dataset descriptions contain both low- and high-level attributes. To this end, the HTA aggregates complementary hierarchical tokens from multiple ViT layers. Thirdly, semantically similar content may be distributed in different tokens. To overcome this issue, the SAP is proposed to group semantically identical tokens into one prototype, focusing on semantic-aware parts. Besides, diversity loss is used to encourage networks to learn diverse prototypes that discover diverse parts. Both qualitative and quantitative results on several challenging tasks demonstrate the usefulness and effectiveness of our proposed methods.




Citations (49)


... It employs a storage mechanism to calculate the average logits for each class, preparing this simple storage pattern for subsequent OOD detection tasks. Drawing inspiration from experience, DDCS [84] adapts to select suitable channels for data classification after correcting each channel in the neural network. These channels are evaluated based on inter-class similarity and variance to measure their discriminative power for ID data. ...

Reference:

Recent Advances in OOD Detection: Problems and Approaches
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
  • Citing Conference Paper
  • June 2024

... The modeling for a road path is similar, starting with a [cls] token at the beginning and placing [sep] tokens at the end of each road sub-path. For instance, a road path R ( ) comprising three road sub-paths-1 = ⟨ 1 , 2 , 3 ⟩, 2 = ⟨ 4 , 5 , 6 ⟩, and 3 = ⟨ 7 , 8 ⟩generates the node token sequence [cls, 1 , 2 , 3 , sep, 4 , 5 , 6 , sep, 7 , 8 , sep]. Then, the node tokenizer linearly projects each node from the road path R ( ) to a node embedding v ∈ R , initialized using Node2vec [16]. ...

Profiling Urban Streets: A Semi-Supervised Prediction Model Based on Street View Imagery and Spatial Topology
  • Citing Conference Paper
  • August 2024

... ProLLM has been validated 342 on benchmark datasets, demonstrating improved accuracy 343 and generalizability compared to existing methods for PPI 344 prediction. 345 346 The process of developing new drugs is a lengthy and 347 costly endeavor, often requiring 10-15 years and over 2 348 [96]. However, models can 368 sometimes generate molecules that are challenging to syn-369 thesize or possess undesirable properties, such as toxicity 370 or instability. ...

Generalized Universal Domain Adaptation
  • Citing Article
  • August 2024

Knowledge-Based Systems

... In (Yoo et al 2022), deep learning analysis was performed on the segmentation of subretinal fluid lesions in fundus photos to evaluate central serous chorioretinopathy. In Wang et al (2024), a new semi-supervised network structure based on the SSL framework was proposed to capture the subtle visual differences between different CNV types, which helped to classify and treat CNV in practical clinical applications. These methods can effectively evaluate diseases by focusing on the features such as the lesion area, but they do not consider the longitudinal development of lesions. ...

Discriminative atoms embedding relation dual network for classification of choroidal neovascularization in OCT images
  • Citing Article
  • July 2024

Pattern Recognition

... Vision Transformers (ViTs) have shown remarkable success in numerous computer vision tasks [2,5,26,34,37,57,68,69], benefiting from their proficiency to capture long-range dependencies among image patches. To match or surpass the performance of CNNs of similar size trained on ImageNet [11], ViTs are usually trained on large-scale datasets. ...

Characterizing Hierarchical Semantic-Aware Parts With Transformers for Generalized Zero-Shot Learning
  • Citing Article
  • November 2024

IEEE Transactions on Circuits and Systems for Video Technology

... In healthcare, the diversity and complexity of data 1291 mean that LLMs are frequently confronted with cases that 1292 differ significantly from their training data, such as rare con-1293 ditions or unique patient demographics. This is where OOD 1294 detection becomes crucial-identifying when the model is 1295 encountering new, unseen categories allows for better han-1296 dling of such cases [344], [345], [346], [347]. Effective OOD 1297 detection methods can help LLMs determine when they are 1298 less confident, enabling healthcare professionals to step in 1299 and mitigate risks [348]. ...

Visual Out-of-Distribution Detection in Open-Set Noisy Environments

International Journal of Computer Vision

... Compared with VideoQA, TextVideoQA focuses on understanding local and even tiny scene texts in the video, where frame-level grounding struggles to capture subtle scene text details. In this work, we study spatio-temporal grounding [41] for the proposed Grounded TextVideoQA task. For video spatio-temporal grounding, MIST [13] designs an iterative attention module to select video segment and region features and fuse them for question answering. ...

Learning Feature Semantic Matching for Spatio-Temporal Video Grounding
  • Citing Article
  • January 2024

IEEE Transactions on Multimedia

... Enabling robots to acquire diverse manipulation tasks by learning from expert demonstrations, commonly referred to as visual imitation learning, has been a long-standing objective in robotic learning and embodied AI (Atkeson and Schaal 1997;Argall et al. 2009;Wang et al. 2024a;Zare et al. 2024;Li et al. 2024). Prior work attempts to tackle this problem either by explicitly defining it as a supervised regression task (Zhang et al. 2018;Florence, Manuelli, and Tedrake 2019;Toyer et al. 2020;Rahmatizadeh et al. 2018), or by implicitly modeling action distributions using energybased models (Florence et al. 2022;Jarrett, Bica, and van der Schaar 2020). ...

DiffAIL: Diffusion Adversarial Imitation Learning
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... VRA (Xu et al. 2023) zeros out anomalously low activations and truncates anomalously high activations. BATS proposes rectifying activations towards their typical set, while LAPS (He et al. 2024) improves BATS by considering channel-aware typical sets. These methods only examine anomalies at the activation level, whereas managing overconfidence anomalies at a more granular parameter level is important for more effective OOD detection. ...

Exploring Channel-Aware Typical Features for Out-of-Distribution Detection
  • Citing Article
  • March 2024

Proceedings of the AAAI Conference on Artificial Intelligence

... A Multi-axis Interactive Multidimensional Attention Network (MIMA-Net) was developed by the authors in [64] to address the challenges of re-identifying vehicles. The main intuition was to capture fine-grained discriminative information crucial for distinguishing between similar vehicles. ...

Multi-axis interactive multidimensional attention network for vehicle re-identification
  • Citing Article
  • March 2024

Image and Vision Computing