Xin Huang’s research while affiliated with Hong Kong Baptist University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (33)


Efficient Maximal Motif-Clique Enumeration over Large Heterogeneous Information Networks
  • Article

August 2024

·

9 Reads

Proceedings of the VLDB Endowment

Yingli Zhou

·

·

Chenhao Ma

·

[...]

·

Xin Huang

In the heterogeneous information network (HIN), a motif-clique is a "complete graph" for a given motif (or a small connected graph) that could capture the desired relationship in the motif. The maximal motif-cliques of HINs have found various applications in community discovery, recommendation, and biological network analysis. The state-of-the-art algorithm for enumerating maximal motif-cliques may have to explore all possible subgraphs of a maximal motif-clique and check whether a maximal motif-clique has been enumerated at each recursive step, which is very time-consuming. To improve the efficiency of enumeration, in this paper, we develop efficient algorithms for maximal motif-clique enumeration over large HINs. We first introduce an order-based framework to avoid duplicated enumeration, which results in lower time complexity compared to the existing algorithm. We then propose a pivot-based pruning strategy, which significantly reduces the search space. We further optimize the process of identifying the candidate sets and locating the subgraphs containing the maximal motif-cliques. Extensive experiments on five real-world HINs demonstrate that our proposed algorithm achieves high efficiency and is up to three orders of magnitude faster than the state-of-the-art algorithm.


Path-LLM: A Shortest-Path-based LLM Learning for Unified Graph Representation
  • Preprint
  • File available

August 2024

·

161 Reads

Unified graph representation learning aims to produce node embeddings, which can be applied to multiple downstream applications. However, existing studies based on graph neural networks and language models either suffer from the limitations of numerous training needed toward specific downstream predictions or have shallow semantic features. In this work, we propose a novel Path-LLM model to learn unified graph representation, which leverages a powerful large language model (LLM) to incorporate our proposed path features. Our Path-LLM framework consists of several well-designed techniques. First, we develop a new mechanism of long-to-short shortest path (L2SP) selection, which covers essential connections between different dense groups. An in-depth comparison of different path selection plans is offered to illustrate the strength of our designed L2SP. Then, we design path textualization to obtain L2SP-based training texts. Next, we feed the texts into a self-supervised LLM training process to learn embeddings. Extensive experiments on benchmarks validate the superiority of Path-LLM against the state-of-the-art WalkLM method on two classical graph learning tasks (node classification and link prediction) and one NP-hard graph query processing task (keyword search), meanwhile saving more than 90% of training paths.

Download

Joint Domain Adaptive Graph Convolutional Network

August 2024

·

5 Reads

·

1 Citation

In the realm of cross-network tasks, graph domain adaptation is an effective tool due to its ability to transfer abundant labels from nodes in the source domain to those in the target domain. Existing adversarial domain adaptation methods mainly focus on domain-wise alignment. These approaches, while effective in mitigating the marginal distribution shift between the two domains, often ignore the integral aspect of structural alignment, potentially leading to negative transfer. To address this issue, we propose a joint adversarial domain adaptive graph convolutional network (JDA-GCN) that is uniquely augmented with structural graph alignment, so as to enhance the efficacy of knowledge transfer. Specifically, we construct a structural graph to delineate the interconnections among nodes within identical categories across the source and target domains. To further refine node representation, we integrate the local consistency matrix with the global consistency matrix, thereby leveraging the learning of the sub-structure similarity of nodes to enable more robust and effective representation of nodes. Empirical evaluation on diverse real-world datasets substantiates the superiority of our proposed method, marking a significant advancement over existing state-of-the-art graph domain adaptation algorithms.


Fast algorithms for core maximization on large graphs

March 2022

·

11 Reads

·

8 Citations

Proceedings of the VLDB Endowment

Core maximization, that enlarges the k -core as much as possible by inserting a few new edges into a graph, is particularly useful for social group engagement and network stability improvement. However, the core maximization problem has been theoretically proven to be NP-hard even APX-hard for k ≥ 3. Existing heuristic approaches suffer from the limitation of inefficiency on large graphs. To address this limitation, in this paper, we revisit this challenging yet important problem of core maximization, that is, given a graph G , a number k , and a budget b , to insert b new edges into G such that the corresponding k -core is maximized. We propose a novel algorithm FastCM+ based on several fast search strategies. The core idea is to apply graph partition to divide ( k - 1)-shell into different components. Then, FastCM+ considers each ( k - 1)-shell component independently to convert different layered vertices into k -core, in two manners of completely and partially. Based on the complete/partial conversions, FastCM+ is generalized to further handle ( k - λ)-shell conversions for 2 ≤λ k . Leveraging dynamic programming combinations of different components' potential answers, FastCM+ finds a good-quality answer for edge insertions. Experimental results on eleven datasets demonstrate that our algorithm runs much faster than state-of-the-art methods on large graphs meanwhile achieving better answers.



Butterfly-core community search over labeled graphs

July 2021

·

17 Reads

·

42 Citations

Proceedings of the VLDB Endowment

Community search aims at finding densely connected subgraphs for query vertices in a graph. While this task has been studied widely in the literature, most of the existing works only focus on finding homogeneous communities rather than heterogeneous communities with different labels. In this paper, we motivate a new problem of cross-group community search, namely Butterfly-Core Community (BCC), over a labeled graph, where each vertex has a label indicating its properties and an edge between two vertices indicates their cross relationship. Specifically, for two query vertices with different labels, we aim to find a densely connected cross community that contains two query vertices and consists of butterfly networks, where each wing of the butterflies is induced by a k-core search based on one query vertex and two wings are connected by these butterflies. We first develop a heuristic algorithm achieving 2-approximation to the optimal solution. Furthermore, we design fast techniques of query distance computations, leader pair identifications, and index-based BCC local explorations. Extensive experiments on seven real datasets and four useful case studies validate the effectiveness and efficiency of our BCC and its multi-labeled extension models.


Figure 1: An example of labeled graph í µí°º in IT professional networks with three labels denote in different shapes and colors: SE, UI and PM. The collaborations between two employees of the same role (across over different roles) denote by the solid edges (dashed edges).
Figure 3: An example of labeled graph í µí°º and its bipartite subgraph í µí°µ.
Butterfly-Core Community Search over Labeled Graphs

May 2021

·

130 Reads

Community search aims at finding densely connected subgraphs for query vertices in a graph. While this task has been studied widely in the literature, most of the existing works only focus on finding homogeneous communities rather than heterogeneous communities with different labels. In this paper, we motivate a new problem of cross-group community search, namely Butterfly-Core Community (BCC), over a labeled graph, where each vertex has a label indicating its properties and an edge between two vertices indicates their cross relationship. Specifically, for two query vertices with different labels, we aim to find a densely connected cross community that contains two query vertices and consists of butterfly networks, where each wing of the butterflies is induced by a k-core search based on one query vertex and two wings are connected by these butterflies. Indeed, the BCC structure admits the structure cohesiveness and minimum diameter, and thus can effectively capture the heterogeneous and concise collaborative team. Moreover, we theoretically prove this problem is NP-hard and analyze its non-approximability. To efficiently tackle the problem, we develop a heuristic algorithm, which first finds a BCC containing the query vertices, then iteratively removes the farthest vertices to the query vertices from the graph. The algorithm can achieve a 2-approximation to the optimal solution. To further improve the efficiency, we design a butterfly-core index and develop a suite of efficient algorithms for butterfly-core identification and maintenance as vertices are eliminated. Extensive experiments on eight real-world networks and four novel case studies validate the effectiveness and efficiency of our algorithms.



QD-GCN: Query-Driven Graph Convolutional Networks for Attributed Community Search

April 2021

·

41 Reads

Recently, attributed community search, a related but different problem to community detection and graph clustering, has been widely studied in the literature. Compared with the community detection that finds all existing static communities from a graph, the attributed community search (ACS) is more challenging since it aims to find dynamic communities with both cohesive structures and homogeneous node attributes given arbitrary queries. To solve the ACS problem, the most popular paradigm is to simplify the problem as two sub-problems, that is, structural matching and attribute filtering and deal with them separately. However, in real-world graphs, the community structure and the node attributes are actually correlated to each other. In this vein, current studies cannot capture these correlations which are vital for the ACS problem. In this paper, we propose Query-Driven Graph Convolutional Networks (QD-GCN), an end-to-end framework that unifies the community structure as well as node attribute to solve the ACS problem. In particular, QD-GCN leverages the Graph Convolutional Networks, which is a powerful tool to encode the graph topology and node attributes concurrently, as the backbones to extract the query-dependent community information from the original graph. By utilizing this query-dependent community information, QD-GCN is able to predict the target community given any queries. Experiments on real-world graphs with ground-truth communities demonstrate that QD-GCN outperforms existing attributed community search algorithms in terms of both efficiency and effectiveness.



Citations (18)


... SDA [29] studies the problem under the open-set setting, utilizing an entropybased separation strategy to categorize target nodes into certain and uncertain groups and specifically aligning nodes from the certain group with adversarial learning technique. Most recently, JDA-GCN [131] introduces a joint adversarial domain adaptive graph convolutional network, leveraging both local and global graph structures through structural graph alignment, improving the model's ability to capture intricate dependencies within graph data. ...

Reference:

A Survey of Deep Graph Learning under Distribution Shifts: from Graph Out-of-Distribution Generalization to Adaptation
Joint Domain Adaptive Graph Convolutional Network
  • Citing Conference Paper
  • August 2024

... Traditional methods aim to incorporate both structural and attribute information but are mostly executed in the original data space, which may overlook the quality of the representation, leading to suboptimal performance, especially with high-dimensional datasets [6], [7], [8], [9], [10]. In contrast, dimensionality reduction methods preprocess in the attributed network using techniques like spectral clustering [11], [12], [13], [14], Nonnegative Matrix Factorization (NMF) [15], [16], [17], [18], [19], [20], and deep clustering [21], [22], [23], [24], [25], [26] effectively reducing the dimensionality of the data for improved clustering results. ...

Community Detection in Attributed Graphs: An Embedding Approach
  • Citing Article
  • April 2018

Proceedings of the AAAI Conference on Artificial Intelligence

... In the context of core resilience, such nodes/edges are the weak structures with low resilience against removal and are suitable for targeted attacks. Regarding the insertion, the objective is to find new edges that can increase the user engagement [38,45] or incentivize existing nodes to stay engaged so that other nodes are kept engaged as well [9,23,41]. In the scope of core resilience, such nodes/edges are the critical graph structures that are most vulnerable to increases in core numbers or core sizes. ...

Fast algorithms for core maximization on large graphs
  • Citing Article
  • March 2022

Proceedings of the VLDB Endowment

... A straightforward operation to enhance the connectivity and robustness of graph structures is adding edges (Beygelzimer et al., 2005;Sun et al., 2021a). Besides, anchoring nodes (i.e., forcefully including some nodes in a cohesive subgraph) (Bhawalkar et al., 2015;Zhang et al., 2017aZhang et al., , 2018aLaishram et al., 2020;Linghu et al., 2020) has also been widely studied. ...

Budget-constrained Truss Maximization over Large Graphs: A Component-based Approach
  • Citing Conference Paper
  • October 2021

... Additionally, common neighbor counts can help prune unpromising vertices in ( , )-biclique counting [55,61]. Other tasks that benefit from counting common neighbors in bipartite graphs include anomaly detection [43], bipartite graph projection [40,64], bipartite clustering coefficient computation [2,17], community search [1,9,46], and wedge-based motif counting [47,53]. Although computing common neighbors is straightforward in the conventional setting, it inevitably involves releasing the neighborhood information of the vertices, posing a significant privacy risk for users in real-world applications. ...

Butterfly-core community search over labeled graphs
  • Citing Article
  • July 2021

Proceedings of the VLDB Endowment

... At present, research on uncertain graphs mainly focuses on mining dense subgraphs, such as (k, η)-core [10][11][12] and (k, γ )-truss [13][14][15][16][17], etc. A (k, η)-core is a subgraph where the probability that each vertex has the degree at least k is no less than η, and a (k, γ )-truss is a subgraph where the probability that each edge is contained in at least k-2 triangles is no less than γ . ...

Efficient Probabilistic Truss Indexing on Uncertain Graphs
  • Citing Conference Paper
  • April 2021

... Attributed graphs integrate attributes into the graph structure, resulting in a richer network representation [25]. Attributed community detection [36,39] aims to find densely connected communities with homogeneous attributes by leveraging both topological and attribute information. Method like CDE [13] formulates the problem as a NMF optimization task, while ACDM [4] constructs an attributed k-NN layer to extract common node representations. ...

When Structure Meets Keywords: Cohesive Attributed Community Search
  • Citing Conference Paper
  • October 2020

... For both two datasets, DBLP and Hep-Small, the accuracies show a trend of first rising and then slowly decreasing. This is also consistent with the fact that most documents may contain one or two topics [36]. Analysis of the number of top words As for the number of edges between topic nodes and word nodes, we can see that with the growth of the numbers of edges, the performance also rises first, but the performance will drop quickly if the number of edges is larger than 10 for both two datasets, DBLP and Hep-Small, in Fig. 6d. ...

Detecting Communities with Multiplex Semantics by Distinguishing Background, General and Specialized Topics
  • Citing Article
  • September 2019

IEEE Transactions on Knowledge and Data Engineering

... Such methods typically employ a global directional cost function, which guides the process through the whole search space toward the best possible solution. However, this approach has the disadvantage of analyzing the entire search space, significantly increasing the computational resources needed to obtain a solution [6,7]. This approach becomes unsuitable; even if it computes the best path, the resources required to represent the search space require an enormous amount of computer memory. ...

A survey of community search over big graphs

The VLDB Journal