About
99
Publications
5,984
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,036
Citations
Introduction
Yongshun Gong is a professor in the School of Software, Shandong University. His principal research interest covers data science and machine learning, particularly in the areas of smart city, intelligent transportation system, mobility prediction, recommender systems, and sequential pattern mining. He has published about 50 papers in top journals and refereed conference proceedings, e.g., TPAMI, PR, TCYB, TKDE, TNNLS, NeruIPS, IJCAI, CVPR, AAAI, KDD, CIKM, etc.
Current institution
Additional affiliations
January 2021 - present
Education
March 2017 - January 2021
Publications
Publications (99)
Negative sequential patterns (NSPs), which capture both frequent occurring and nonoccurring behaviors, become increasingly important and sometimes play a role irreplaceable by analyzing occurring behaviors only. Repetition sequential patterns capture repetitions of patterns in different sequences as well as within a sequence and are very important...
Crowd Flow Prediction (CFP) is one major challenge in the intelligent transportation systems of the Sydney Trains Network. However, most advanced CFP methods only focus on entrance and exit flows at the major stations or a few subway lines, neglecting Crowd Flow Distribution (CFD) forecasting problem across the entire city network. CFD prediction p...
Recently, practical applications for passenger flow prediction have brought many benefits to urban transportation development. With the development of urbanization, a real-world demand from transportation managers is to construct a new metro station in one city area that never planned before. Authorities are interested in the picture of the future...
As a key mission of the modern traffic management, crowd flow prediction (CFP) benefits in many tasks of intelligent transportation services. However, most existing techniques focus solely on forecasting entrance and exit flows of metro stations that do not provide enough useful knowledge for traffic management. In practical applications, managers...
As a developing trend of urbanization, massive amounts of urban statistical data with multiple views (e.g., views of Population and Economy) are increasingly collected and benefited to diverse domains, including transportation service, regional analysis, etc. Unfortunately, these statistical data that are divided into fine-grained regions usually s...
Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data:...
The discriminative feature is crucial for point cloud registration. Recent methods improve the feature discriminative by distinguishing between non-overlapping and overlapping region points. However, they still face challenges in distinguishing the ambiguous structures in the overlapping regions. Therefore, the ambiguous features they extracted res...
Black-box textual adversarial attacks are challenging due to the lack of model information and the discrete, non-differentiable nature of text. Existing methods often lack versatility for attacking different models, suffer from limited attacking performance due to the inefficient optimization with word saliency ranking, and frequently sacrifice sem...
The discriminative feature is crucial for point cloud registration. Recent methods improve the feature discriminative by distinguishing between non-overlapping and overlapping region points. However, they still face challenges in distinguishing the ambiguous structures in the overlapping regions. Therefore, the ambiguous features they extracted res...
Accurate urban flow prediction plays a crucial role in transportation management, as it enables optimized resource allocation and improved traffic efficiency. Although current methods have made advances, they still encounter challenges such as high computational overhead and the risk of overfitting complex models. To tackle these issues, we introdu...
Traffic prediction is an essential task in intelligent transportation systems dealing with complex and dynamic spatio-temporal correlations. To date, most work is focused on point estimation models, which only output a single value w.r.t an attribute of traffic data at a time, falling short of depicting diverse situations and uncertainty in future....
Large Language Models (LLMs) have advanced rapidly in recent years, with their applications in software engineering expanding to more complex repository-level tasks. GitHub issue resolving is a key challenge among these tasks. While recent approaches have made progress on this task, they focus on textual data within issues, neglecting visual data....
Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data:...
Existing conditional Denoising Diffusion Probabilistic Models (DDPMs) with a Noise-Conditional Framework (NCF) remain challenging for 3D scene understanding tasks, as the complex geometric details in scenes increase the difficulty of fitting the gradients of the data distribution (the scores) from semantic labels. This also results in longer traini...
Deep learning underpins most of the currently advanced natural language processing (NLP) tasks such as textual classification, neural machine translation (NMT), abstractive summarization and question-answering (QA). However, the robustness of the models, particularly QA models, against adversarial attacks is a critical concern that remains insuffic...
Weather forecasting is of great importance for human life and various real-world fields, e.g., traffic prediction, agricultural production, and tourist industry. Existing methods can be roughly divided into two categories: theory-driven (e.g., numerical weather prediction (NWP)) and data-driven methods. Theory-driven methods require a complex simul...
Points of interest (POIs) carry a wealth of semantic information of varying locations in cities and thus have been widely used to enable various location-based services. To understand POI semantics, existing methods usually model contextual correlations of POI categories in users' check-in sequences and embed categories into a latent space based on...
Semantic annotation for points of interest (POIs) is the process of annotating a POI with a category label, which facilitates many services related to POIs, such as POI search and recommendation. Most of the existing solutions extract features related to POIs from abundant user-generated content data (e.g., check-ins and user comments). However, su...
Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges. The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field. Current evaluations heavily rely on classif...
The task of code generation aims to generate code solutions based on given programming problems. Recently, code large language models (code LLMs) have shed new light on this task, owing to their formidable code generation capabilities. While these models are powerful, they seldom focus on further improving the accuracy of library-oriented API invoc...
Fine-grained urban flow inference (FUFI) is a crucial transportation service aimed at improving traffic efficiency and safety. FUFI can infer fine-grained urban traffic flows based solely on observed coarse-grained data. However, most of existing methods focus on the influence of single-scale static geographic information on FUFI, neglecting the in...
Pre-training a model and then fine-tuning it on down-stream tasks has demonstrated significant success in the 2D image and NLP domains. However, due to the unordered and non-uniform density characteristics of point clouds, it is non-trivial to explore the prior knowledge of point clouds and pre-train a point cloud backbone. In this paper, we propos...
Semi-supervised learning can significantly boost model performance by leveraging unlabeled data, particularly when labeled data is scarce. However, real-world unlabeled data often contain unseen-class samples, which can hinder the classification of seen classes. To address this issue, mainstream safe SSL methods suggest detecting and discarding uns...
Recently, learning urban region representations utilizing multi-modal data (information views) has become increasingly popular, for deep understanding of the distributions of various socioeconomic features in cities. However, previous methods usually blend multi-view information in a posteriors stage, falling short in learning coherent and consiste...
Detecting out-of-distribution (OOD) data is essential to ensure the reliability of machine learning models when deployed in real-world scenarios. Different from most previous test-time OOD detection methods that focus on designing OOD scores, we delve into the challenges in OOD detection from the perspective of typicality and regard the feature’s h...
Single-cell RNA sequencing (scRNA-seq) is widely used to study cellular heterogeneity in different samples. However, due to technical deficiencies, dropout events often result in zero gene expression values in the gene expression matrix. In this paper, we propose a new imputation method called scCAN, based on adaptive neighborhood clustering, to es...
Nowadays, capturing cherished moments results in an abundance of photos, which necessitates the selection of the finest one from a pool of akin images—a process both intricate and time-intensive. Thus, series photo selection (SPS) techniques have been developed to recommend the optimal moment from nearly identical photos through the use of aestheti...
Obtaining image-level class labels for Remote Sensing (RS) images is a relatively straightforward process, sparking significant interest in Weakly Supervised Semantic Segmentation (WSSS). However, RS images present challenges beyond those encountered in generic WSSS, including complex backgrounds, densely distributed small objects, and considerable...
In this paper, we propose a precise algorithm to eliminate reflections from two images by utilizing temporal and spatial priors. For the temporal prior, we compute the motion information between reflection layers in the two input reflectioncontaminated images. Different from numerous popular multiimage reflection removal methods, our proposed algor...
Deep learning models are known immensely brittle to adversarial text examples. Existing text adversarial attack strategies can be roughly divided into character-level, word-level, and sentence-level attacks. Despite the success brought by recent text attack methods, how to induce misclassification with minimal text modifications while keeping the l...
The Sequence Alignment/Map (SAM) format file is the text file used to record alignment information. Alignment is the core of sequencing analysis, and downstream tasks accept mapping results for further processing. Given the rapid development of the sequencing industry today, a comprehensive understanding of the SAM format and related tools is neces...
Large language models (LLMs), such as Codex and GPT-4, have recently showcased their remarkable code generation abilities, facilitating a significant boost in coding efficiency. This paper will delve into utilizing LLMs for code generation in private libraries, as they are widely employed in everyday programming. Despite their remarkable capabiliti...
Data missing is very common in the spatial–temporal traffic data collected by various detectors, and how to accurately impute the missing values is particularly important in intelligent transportation systems. Because the method based on tensor decomposition has advantages in solving the problem of multidimensional data imputation, in this paper, w...
High utility sequential pattern (HUSP) mining aims to mine actionable patterns with high utilities, widely applied in real-world learning scenarios such as market basket analysis, scenic route planning and click-stream analysis. The existing HUSP mining algorithms mainly attempt to improve computation efficiency while maintaining the algorithm stab...
Many real-world problems deal with collections of data with missing values, e.g., RNA sequential analytics, image completion, video processing, etc. Usually, such missing data is a serious impediment to a good learning achievement. Existing methods tend to use a universal model for all incomplete data, resulting in a suboptimal model for each missi...
The prevalence of location-based services has generated a deluge of check-ins, enabling the task of human mobility understanding. Among the various types of information associated with the check-in venues, categories (e.g., Bar and Museum ) are vital to the task, as they often serve as excellent semantic characterization of the venues. Despite its...
Numerous photos are taken in daily life, and sorting them is laborious and time consuming. The large number of similar images exacerbates the difficulty of album management, under this scenario, serial photo selection (SPS) emerges. As an important branch of image aesthetic quality assessment, it focuses on identifying the best image among a series...
Graph-based change point detection (CPD) play an irreplaceable role in discovering anomalous graphs in the time-varying network. While several techniques have been proposed to detect change points by identifying whether there is a significant difference between the target network and successive previous ones, they neglect the natural evolution of t...
Traffic flow prediction is closely related to the intelligent transportation construction and public risk assessment, which is a typical spatio-temporal prediction problem. Traffic flow has complex interrelationship in the time
and space perspectives, and is also affected by a large number of external factors. However, most existing methods cannot...
Open access link: https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Exploring_Domain-Invariant_Parameters_for_Source_Free_Domain_Adaptation_CVPR_2022_paper.pdf
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs). Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest i...
Series photo selection (SPS) is an important branch of the image aesthetics quality assessment, which focuses on finding the best one from a series of nearly identical photos. While a great progress has been observed, most of the existing SPS approaches concentrate solely on extracting features from the original image, neglecting that multiple view...
As a critical task of the urban traffic services, fine-grained urban flow inference (FUFI) benefits in many fields including intelligent transportation management, urban planning, public safety. FUFI is a technique that focuses on inferring fine-grained urban flows depending solely on observed coarse-grained data. However, existing methods always r...
The past decade has witnessed the increasingly created point-of-interest (POI) data, which are utilized to express the semantics of places. To understand the POI semantics, current methods usually embed POI categories into a latent space via certain trajectory sequential models, while neglecting the underlining spatial information. It is noteworthy...
Deep learning models are known immensely brittle to adversarial image examples, yet their vulnerability in text classification is insufficiently explored. Existing text adversarial attack strategies can be roughly divided into three categories, i.e., character-level attack, word-level attack, and sentence-level attack. Despite the success brought b...
Weakly supervised video moment retrieval or weakly supervised language moment retrieval aims to search the most relevant moment given a language query. In order to guide the model to capture the most matching video segments with the text description, we design a two-granularity loss function that simultaneously considers both video-level and instan...
Deep generative models have been an upsurge in the deep learning community since they were proposed. These models are designed for generating new synthetic data including images, videos and texts by fitting the data approximate distributions. In the last few years, deep generative models have shown superior performance in drug discovery especially...
The sequence analysis handles sequential discrete events and behaviors, which can be represented by temporal point processes (TPPs). However, TPP models only occurring events and behaviors. This article explores an efficient method for the negative sequential pattern (NSP) mining to leverage TPP in modeling both frequently occurring and nonoccurrin...
With the explosive growth of the shared information on social media platforms, people are increasingly interested in sharing and making their travel plans by referring to others' travel experiences. However, different social media sources render the heterogeneity of these valuable data, bringing difficulties for data collection and fusion. Thus, fa...
Nonoccurring behavior (NOB) studies have attracted the growing attention of scholars as a crucial part of behavioral science. As an effective method to discover both NOB and occurring behaviors (OB), negative sequential pattern (NSP) mining is successfully used in analyzing medical treatment and abnormal behavior patterns. At this time, NSP mining...
Non-occurring behavior (NOB) studies have attracted
the growing attention of scholars as a crucial part of
behavioral science. As an effective method to discover both NOB and Occurring Behaviors (OB), Negative Sequential Pattern (NSP) mining is successfully used in analyzing medical treatment and abnormal behavior patterns. At this time, NSP mining...
We propose a new method for learning with multi-field categorical data. Multi-field categorical data are usually collected over many heterogeneous groups. These
groups can reflect in the categories under a field. The existing methods try to learn a universal model that fits all data, which is challenging and inevitably results in
learning a complex...
https://opus.lib.uts.edu.au/bitstream/10453/147358/2/02whole.pdf
We propose a new method for learning with multi-field categorical data. Multi-field categorical data are usually collected over many heterogeneous groups. These groups can reflect in the categories under a field. The existing methods try to learn a universal model that fits all data, which is challenging and inevitably results in learning a complex...
Large volumes of urban statistical data with multiple views imply rich knowledge about the development degree of cities. These data present crucial statistics which play an irreplaceable role in the regional analysis and urban computing. In reality, however, the statistical data divided into fine-grained regions usually suffer from missing data pro...
Recently, practical applications for passenger flow prediction have brought many benefits to urban transportation development. With the development of urbanization, a real-world demand from transportation managers is to construct a new metro station in one city area that never planned before. Authorities are interested in the picture of the future...
Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Meanwhile, they are also one of the most fragile parts in railway systems. Points failures cause...
Railway points are among the key components of railway infrastructure. As a part of signal equipment, points control the routes of trains at railway junctions, having a significant impact on the reliability, capacity, and punctuality of rail transport. Traditionally, maintenance of points is based on a fixed time interval or raised after the equipm...
Nowadays, almost all the online orders were placed through screened devices such as mobile phones, tablets, and computers. With the rapid development of the Internet of Things (IoT) and smart appliances, more and more screenless smart devices, e.g., smart speaker and smart refrigerator, appear in our daily lives. They open up new means of interacti...
Mining negative sequential patterns (NSP) has been an important research area in data mining and knowledge discovery and it is much more challenging than mining positive sequential patterns (PSP) due to the computational complexity and search space. Only a few methods have been proposed to mine NSP and most of them only use single minimum support,...
Negative sequential patterns (NSPs), which focus on nonoccurring but interesting behaviors (e.g. missing consumption records), provide a special perspective of analyzing sequential patterns. So far, very few methods have been proposed to solve for NSP mining problem, and these methods only mine NSP from positive sequential patterns (PSPs). However,...
Negative sequential pattern (NSP), which contains both non-occurring and occurring items, can discover much more interest roles than positive sequential pattern (PSP) in many applications. NSP mining, however, has been just caught attention and very limited methods are available to mine NSP. Furthermore, there is not a unified definition about nega...
Taking repetitive property into consideration can help the analyst to capture more useful information. However, most of the existing algorithms of repetitive sequence mining are used for DNA or genome, and there are very few researches to mine such patterns from sequence database. So in this paper, we (1) propose a method to clearly determine the t...
Negative sequential pattern (NSP), which contains both non-occurring and occurring items, can play more important roles than positive sequential pattern (PSP) in many applications. NSP mining, however, has been just caught attention and very limited methods are available to mine NSP. Furthermore, there is not a unified definition about negative con...