About
2,215
Publications
554,892
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
140,991
Citations
Publications
Publications (2,215)
As more and more automatic vehicles, power con-1 sumption prediction becomes a vital issue for task scheduling and 2 energy management. Most research focuses on automatic vehicles 3 in transportation, but few focus on automatic ground vehicles 4 (AGVs) in smart factories, which face complex environments and 5 generate large amounts of data. There i...
In this work, we develop a family of Aligned Entropic Graph Kernels (AEGK) for graph classification. We commence by performing the Continuous-time Quantum Walk (CTQW) on each graph structure, and compute the Averaged Mixing Matrix (AMM) to describe how the CTQW visits all vertices from a starting vertex. More specifically, we show how this AMM matr...
A fundamental challenge confronting supervised graph outlier detection algorithms is the prevalent problem of class imbalance, where the scarcity of outlier instances compared to normal instances often results in suboptimal performance. Recently, generative models, especially diffusion models, have demonstrated their efficacy in synthesizing high-f...
PyGOD is an open-source Python library for detecting outliers in graph data. As the first comprehensive library of its kind, PyGOD supports a wide array of leading graph-based methods for outlier detection under an easy-to-use, well-documented API designed for use by both researchers and practitioners. PyGOD provides modularized components of the d...
Rumor spreaders are increasingly utilizing multimedia content to attract the attention and trust of news consumers. Though quite a few rumor detection models have exploited the multi-modal data, they seldom consider the inconsistent semantics between images and texts, and rarely spot the inconsistency among the post contents and background knowledg...
Recent advances in path-based explainable recommendation systems have attracted increasing attention thanks to the rich information from knowledge graphs. Most existing explainable recommendations only utilize static knowledge graphs and ignore the dynamic user-item evolutions, leading to less convincing and inaccurate explanations. Although some w...
Graph embedding is essential for graph mining tasks. With the prevalence of graph data in real-world applications, many methods have been proposed in recent years to learn high-quality graph embedding for various types of graphs, among which the Generative Adversarial Networks (GAN) based methods attract increasing attention among researchers. Howe...
User identity linkage, which aims to link identities of a natural person across different social platforms, has attracted increasing research interest recently. Existing approaches usually first embed the identities as deterministic vectors in a shared latent space, and then learn a classifier based on the available annotations. However, the format...
Although some existing models are proposed to exploit reviews for improving performance for recommender systems, few of them can handle the following issues led by the insufficient review data: (i) The regular training process does not exactly fit the scenario of preference prediction with few historical behaviors. (ii) Extracting informative and s...
Heterogeneous graphs are commonly used to describe networked data with multiple types of nodes and edges. Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for analyzing heterogeneous graphs. However, designing neural architectures of HGNNs requires extensive domain knowledge and time-consuming manual work. Recently, neural architectur...
The Internet of Behavior is a research theme that aims to analyze human behavior data on the Internet from the perspective of behavioral psychology, obtain insights about human behavior, and better understand the intention behind the behavior. In this way, the Internet of Behavior can predict human behavioral trends in the future and even change hu...
To ease the process of building Knowledge Graphs (KGs) from scratch, a cost-effective method is required to enrich a KG using the triples extracted from a corpus. However, it is challenging to enrich a KG with newly extracted triples since they contain noisy information. This paper proposes to refine a KG by leveraging information extracted from a...
With the widespread application of efficient pattern mining algorithms, sequential patterns that allow gap constraints have become a valuable tool to discover knowledge from biological data such as DNA and protein sequences. Among all kinds of gap-constrained mining, non-overlapping sequence mining can mine interesting patterns and satisfy the anti...
Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in dru...
Machine learning has attracted widespread attention and evolved into an enabling technology for a wide range of highly successful applications, such as intelligent computer vision, speech recognition, medical diagnosis, and more. Yet a special need has arisen where, due to privacy, usability, and/or the right to be forgotten , information about som...
Machine learning has attracted widespread attention and evolved into an enabling technology for a wide range of highly successful applications, such as intelligent computer vision, speech recognition, medical diagnosis, and more. Yet a special need has arisen where, due to privacy, usability, and/or the right to be forgotten, information about some...
Rumor spreaders are increasingly utilizing multimedia content to attract the attention and trust of news consumers. Though quite a few rumor detection models have exploited the multi-modal data, they seldom consider the inconsistent semantics between images and texts, and rarely spot the inconsistency among the post contents and background knowledg...
Federated learning (FL) has been gaining attention for its ability to share knowledge while maintaining user data, protecting privacy, increasing learning efficiency, and reducing communication overhead. Decentralized FL (DFL) is a decentralized network architecture that eliminates the need for a central server in contrast to centralized FL (CFL)....
Relation extraction (RE) tasks show promising performance in extracting relations from two entities mentioned in sentences, given sufficient annotations available during training. Such annotations would be labor-intensive to obtain in practice. Existing work adopts data augmentation techniques to generate pseudo-annotated sentences beyond limited a...
Evidence plays a crucial role in automated fact-checking. When verifying real-world claims, existing fact-checking systems either assume the evidence sentences are given or use the search snippets returned by the search engine. Such methods ignore the challenges of collecting evidence and may not provide sufficient information to verify real-world...
Multimodal relation extraction (MRE) is the task of identifying the semantic relationships between two entities based on the context of the sentence image pair. Existing retrieval-augmented approaches mainly focused on modeling the retrieved textual knowledge, but this may not be able to accurately identify complex relations. To improve the predict...
Cross-lingual natural language inference is a fundamental problem in cross-lingual language understanding. Many recent works have used prompt learning to address the lack of annotated parallel corpora in XNLI. However, these methods adopt discrete prompting by simply translating the templates to the target language and need external expert knowledg...
Social network alignment, aiming at linking identical identities across different social platforms, is a fundamental task in social graph mining. Most existing approaches are supervised models and require a large number of manually labeled data, which are infeasible in practice considering the yawning gap between social platforms. Recently, isomorp...
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM probl...
Social network alignment, identifying social accounts of the same individual across different social networks, shows fundamental importance in a wide spectrum of applications, such as link prediction and information diffusion. Individuals more often than not join in multiple social networks, and it is in fact much too expensive or even impossible t...
Existing recommender systems face difficulties with zero-shot items, i.e. items that have no historical interactions with users during the training stage. Though recent works extract universal item representation via pre-trained language models (PLMs), they ignore the crucial item relationships. This paper presents a novel paradigm for the Zero-Sho...
Named Entity Recognition (NER) is a well and widely studied task in natural language processing. Recently, the nested NER has attracted more attention since its practicality and difficulty. Existing works for nested NER ignore the recognition order and boundary position relation of nested entities. To address these issues, we propose a novel seq2se...
Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising s...
Real-world fact verification task aims to verify the factuality of a claim by retrieving evidence from the source document. The quality of the retrieved evidence plays an important role in claim verification. Ideally, the retrieved evidence should be faithful (reflecting the model's decision-making process in claim verification) and plausible (conv...
Relation extraction (RE) aims to extract potential relations according to the context of two entities, thus, deriving rational contexts from sentences plays an important role. Previous works either focus on how to leverage the entity information (e.g., entity types, entity verbalization) to inference relations, but ignore context-focused content, o...
With the evolution of content on the web and the Internet, there is a need for cyberspace that can be used to work, live, and play in digital worlds regardless of geography. The Metaverse provides the possibility of future Internet and represents a future trend. In the future, the Metaverse will be a space where the real and the virtual are combine...
ion optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical St...
Generative models have attracted significant interest due to their ability to handle uncertainty by learning the inherent data distributions. However, two prominent generative models, namely Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs), exhibit challenges that impede achieving optimal performance in sequential recommen...
Graph embedding based on deep learning is an effective approach for link prediction. However, there still remain some unsolved problems. Firstly, existing methods iteratively aggregate node embeddings from the neighborhood, which cannot retain the global structural information at lower node aggregation cost. Secondly, existing methods can hardly re...
Deep learning methods have demonstrated success in diagnosis prediction on Electronic Health Records (EHRs). Early attempts utilize sequential models to encode patient historical records, but they lack the ability to identify critical diseases for patient health conditions. Besides, some works focus on the hierarchical structure of EHR data during...
Graph collaborative filtering (GCF) is a popular technique for capturing high-order collaborative signals in recommendation systems. However, GCF's bipartite adjacency matrix, which defines the neighbors being aggregated based on user-item interactions, can be noisy for users/items with abundant interactions and insufficient for users/items with sc...
DNA motif discovery is an important issue in gene research, which aims to identify transcription factor binding sites (i.e., motifs) in DNA sequences to reveal the mechanisms that regulate gene expression. However, the phenomenon of data silos and the problem of privacy leakage have seriously hindered the development of DNA motif discovery. On the...
With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scho...
Recent years have witnessed the success of heterogeneous graph neural networks (HGNNs) in modeling heterogeneous information networks (HINs). In this paper, we focus on the benchmark task of HGNNs, i.e., node classification, and empirically find that typical HGNNs are not good at predicting the label of a test node whose receptive field (1) has few...
With the rapid growth of the Internet, human daily life has become deeply bound to the Internet. To take advantage of massive amounts of data and information on the internet, the Web architecture is continuously being reinvented and upgraded. From the static informative characteristics of Web 1.0 to the dynamic interactive features of Web 2.0, scho...
Since the first appearance of the World Wide Web, people more rely on the Web for their cyber social activities. The second phase of World Wide Web, named Web 2.0, has been extensively attracting worldwide people that participate in building and enjoying the virtual world. Nowadays, the next internet revolution: Web3 is going to open new opportunit...
Graph Neural Networks (GNNs) are de facto solutions to structural data learning. However, it is susceptible to low-quality and unreliable structure, which has been a norm rather than an exception in real-world graphs. Existing graph structure learning (GSL) frameworks still lack robustness and interpretability. This paper proposes a general GSL fra...
This paper presents the first comprehensive analysis of ChatGPT's Text-to-SQL ability. Given the recent emergence of large-scale conversational language model ChatGPT and its impressive capabilities in both conversational abilities and code generation, we sought to evaluate its Text-to-SQL performance. We conducted experiments on 12 benchmark datas...
Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI) techniques belong to the category of A...
Time series classification (TSC) is an important task in time series data mining and has attracted a lot of research attention. Most TSC algorithms aim to achieve high classification accuracy while reducing the computational complexity. Currently, Time Series Combination of Heterogeneous and Integrated Embedding Forest (TS-CHIEF) is considered to b...
By transferring knowledge from a source domain, the performance of deep clustering on an unlabeled target domain can be greatly improved. When achieving this, traditional approaches assume that an adequate amount of labeled data are available in the source domain. However, this assumption is not always satisfied in practice. First, it cannot be gua...
Ce Zhou Qian Li Chen Li- [...]
Lichao Sun
The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea o...
Poverty status identification is the first obstacle to eradicating poverty. Village-level poverty identification is very challenging due to the arduous field investigation and insufficient information. The development of the Web infrastructure and its modeling tools provides fresh approaches to identifying poor villages. Upon those techniques, we b...
Social media is one of the main sources for news consumption, especially among the younger generation. With the increasing popularity of news consumption on various social media platforms, there has been a surge of misinformation which includes false information or unfounded claims. As various text- and social context-based fake news detectors are...
COVID-19 (Coronavirus disease 2019) has been quickly spreading since its outbreak, impacting financial markets and healthcare systems globally. Countries all around the world have adopted a number of extraordinary steps to restrict the spreading virus, where early COVID-19 diagnosis is essential. Medical images such as X-ray images and Computed Tom...
Anomaly detection (AD) is a crucial task in machine learning with various applications, such as detecting emerging diseases, identifying financial frauds, and detecting fake news. However, obtaining complete, accurate, and precise labels for AD tasks can be expensive and challenging due to the cost and difficulties in data annotation. To address th...
Since group activities have become very common in daily life, there is an urgent demand for generating recommendations for a group of users, referred to as group recommendation task. Existing group recommendation methods usually infer groups' preferences via aggregating diverse members' interests. Actually, groups' ultimate choice involves compromi...
Graph Neural Networks (GNNs) are currently dominating in modeling graph-structure data, while their high reliance on graph structure for inference significantly impedes them from widespread applications. By contrast, Graph-regularized MLPs (GR-MLPs) implicitly inject the graph structure information into model weights, while their performance can ha...
Self-supervised sequential recommendation significantly improves recommendation performance by maximizing mutual information with well-designed data augmentations. However, the mutual information estimation is based on the calculation of Kullback Leibler divergence with several limitations, including asymmetrical estimation, the exponential need of...
Graph Neural networks (GNNs) which are powerful and widely applied models are based on the assumption that graph topologies play key roles in the graph representation learning.However, the existing GNN methods are based on the Euclidean space embedding, which is difficult to represent a variety of graph geometric properties well. Recently, Riemanni...
Biomedical Named Entity Recognition (BioNER) aims at identifying biomedical entities such as genes, proteins, diseases, and chemical compounds in the given textual data. However, due to the issues of ethics, privacy, and high specialization of biomedical data, BioNER suffers from the more severe problem of lacking in quality labeled data than the g...
Session-based Recommendation (SBR) is to predict users' next interested items based on their previous browsing sessions. Existing methods model sessions as graphs or sequences to estimate user interests based on their interacted items to make recommendations. In recent years, graph-based methods have achieved outstanding performance on SBR. However...
Mood disorders can be difficult to diagnose, evaluate, and treat. They involve affective and cognitive components, both of which need to be closely monitored over the course of the illness. Current methods like interviews and rating scales can be cumbersome, sometimes ineffective, and oftentimes infrequently administered. Even ecological momentary...
Natural Language Inference (NLI) is a growingly essential task in natural language understanding, which requires inferring the relationship between the sentence pairs (
premise
and
hypothesis
). Recently, low-resource natural language inference has gained increasing attention, due to significant savings in manual annotation costs and a better fi...
Next-basket recommendation methods focus on the inference of the next basket by considering the corresponding basket sequence. Although many methods have been developed for the task, they usually suffer from data sparsity. The number of interactions between entities is relatively small compared to their huge bases, so it is crucial to mine as much...
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explo...
High-utility sequential pattern mining (HUSPM) has emerged as an important topic due to its wide application and considerable popularity. However, due to the combinatorial explosion of the search space when the HUSPM problem encounters a low utility threshold or large-scale data, it may be time-consuming and memory-costly to address the HUSPM probl...
Data mining research has been significantly motivated by and benefited from real-world applications in novel domains. This special issue was proposed and edited to draw attention to domain-driven data mining and disseminate research in foundations, frameworks, and applications for data-driven and actionable knowledge discovery. Along with this spec...
The discovery of utility-driven patterns is a useful and difficult research topic. It can extract significant and interesting information from specific and varied databases, increasing the value of the services provided. In practice, the measure of utility is often used to demonstrate the importance, profit, or risk of an object or a pattern. In th...
Graph Neural Network (GNN), as a powerful representation learning model on graph data, attracts much attention across various disciplines. However, recent studies show that GNN is vulnerable to adversarial attacks. How to make GNN more robust? What are the key vulnerabilities in GNN? How to address the vulnerabilities and defend GNN against the adv...