About
227
Publications
32,008
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,153
Citations
Publications
Publications (227)
Building-integrated photovoltaics (BIPV) incorporated with battery energy storage (BES) and building energy flexibility (BEF) system is nowadays increasingly prevalent. During the operation and maintenance (O&M) of BIPV, BES, and BEF, various knowledge is contained and generated. This highlights information interaction among systems and the demand...
Nan Huo Reynold Cheng Ben Kao- [...]
Ge Qu
Entity alignment (EA), a crucial task in knowledge graph (KG) research, aims to identify equivalent entities across different KGs to support downstream tasks like KG integration, text-to-SQL, and question-answering systems. Given rich semantic information within KGs, pre-trained language models (PLMs) have shown promise in EA tasks due to their exc...
Given a graph G , a motif (e.g., 3-node clique) is a fundamental building block for G. Recently, motif-based graph analysis has attracted much attention due to its efficacy in tasks such as clustering, ranking, and link prediction. These tasks require Network Motif Discovery (NMD) at the early stage to identify the motifs of G. However, existing NM...
Additive Kernel SVM has been extensively used in many applications, including human activity detection and pedestrian detection. Since training an additive kernel SVM model is very time-consuming, which is not scalable to large-scale datasets, many efficient solutions have been developed in the past few years. However, most of the existing methods...
Subgraphs are obtained by extracting a subset of vertices and a subset of edges from the associated original graphs, and many graph properties are known to be inherited by subgraphs. Subgraphs can be applied in many areas such as social networks, recommender systems, biochemistry and fraud discovery. Researchers from various communities have paid a...
With the increasing complexity of the building process, it is difficult for project stakeholders to retrieve large and multi-disciplinary building information models (BIMs). A natural language interface (NLI) is beneficial for users to query BIM models using natural language. However, parsing natural language queries (NLQs) is challenging due to am...
Given a directed graph G, the directed densest subgraph (DDS) problem refers to finding a subgraph from G, whose density is the highest among all subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fake follower detection and community mining. Theoretically, the DDS problem closely connects to other essential gra...
The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well to unseen databases. Recently, the pre-trained text-to-text transformer model,...
Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database conte...
The emerging Omicron variant poses a serious threat to human health. Public transports play a critical role in infection spread. Based on the data of nearly 4 billion smartcard uses, between January 1, 2019 and January 31, 2021 from the Mass Transit Railway Corporation of Hong Kong, we analyzed the subway travel behavior of different population gro...
COVID-19 continues to threaten the world. Relaxing local travel behaviours on preventing the spread of COVID-19, may increase the infection risk in subsequent waves of SARS-CoV-2 transmission. In this study, we analysed changes in the travel behaviour of different population groups (adult, child, student, elderly) during four pandemic waves in Hong...
The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is do...
The main goal of the Social Technology and Research Laboratory (STAR Lab) in the University of Hong Kong (https://star.hku.hk) is to develop novel IT technologies for serving the society. Our team has more than three years of experience in project development, web, app, and game design, photography, and video production. We are interested in?Data S...
Finding the densest subgraph (DS) from a graph is a fundamental problem in graph databases. The DS obtained, which reveals closely related entities, has been found to be useful in various application domains such as e-commerce, social science, and biology. However, in a big graph that contains billions of edges, it is desirable to find more than on...
Big Railway Data, such as train movement logs and timetables, have become increasingly available. By analyzing these data, insights about train movement and delay can be extracted, allowing train operators to make smarter train management decisions. In this paper, we study the problem of performing long-range analysis on Big Railway Data, such as e...
In this paper, we study anomalous trajectory detection, which aims to extract abnormal movements of vehicles on the roads. This important problem, which facilitates understanding of traffic behavior and detection of taxi fraud, is challenging due to the varying traffic conditions at different times and locations. To tackle this problem, we propose...
Given a directed graph G , the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G , whose density is the highest among all the subgraphs of G . The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from...
Heterogeneous Information Networks (HINs) capture complex relations among entities of various kinds and have been used extensively to improve the effectiveness of various data mining tasks, such as in recommender systems. Many existing HIN-based recommendation algorithms utilize hand-crafted meta-paths to extract semantic information from the netwo...
Knowledge graph (KG) embedding methods are at the basis of many KG-based data mining tasks, such as link prediction and node clustering. However, graphs may contain confidential information about people or organizations, which may be leaked via embeddings. Research recently studied how to apply differential privacy to a number of graphs (and KG) an...
Network kernel density visualization, or NKDV, has been extensively used to visualize spatial data points in various domains, including traffic accident hotspot detection, crime hotspot detection, disease outbreak detection, and business and urban planning. Due to a wide range of applications for NKDV, some geographical software, e.g., ArcGIS, can...
[This corrects the article DOI: 10.1002/adtp.202100055.].
Background:
The occupancy of healthcare resources by the COVID-19 outbreak had led to the unmet health needs of non-COVID-19 diseases. We aimed to explore whether the social media information could help surveil and understand the characteristics of unmet non-COVID-19 health needs during the COVID-19 outbreak in Wuhan city.
Methods:
This was an o...
Kernel density visualization (KDV) is a commonly used visualization tool for many spatial analysis tasks, including disease outbreak detection, crime hotspot detection, and traffic accident hotspot detection. Although the most popular geographical information systems, e.g., QGIS, and ArcGIS, can also support this operation, these solutions are not...
Given a directed graph G, the directed densest subgraph (DDS) problem refers to the finding of a subgraph from G, whose density is the highest among all the subgraphs of G. The DDS problem is fundamental to a wide range of applications, such as fraud detection, community mining, and graph compression. However, existing DDS solutions suffer from eff...
Identifying effective drug treatments for COVID‐19 is essential to reduce morbidity and mortality. Although a number of existing drugs have been proposed as potential COVID‐19 treatments, effective data platforms and algorithms to prioritize drug candidates for evaluation and application of knowledge graph for drug repurposing have not been adequat...
COVID-19 threatens the world. Social distancing is a significant factor in determining the spread of this disease, and social distancing is strongly affected by the local travel behaviour of people in large cities. In this study, we analysed the changes in the local travel behaviour of various population groups in Hong Kong, between 1 January and 3...
Path-based solutions have been shown to be useful for various graph analysis tasks, such as link prediction and graph clustering. However, they are no longer adequate for handling complex and gigantic graphs. Recently, motif-based analysis has attracted a lot of attention. A motif, or a small graph with a few nodes, is often considered as a fundame...
Background
COVID-19 continues to threaten human life worldwide. We explored how human behaviours have been influenced by the COVID-19 pandemic in Hong Kong, and how the transmission of other respiratory diseases (e.g. influenza) has been influenced by human behaviour.
Methods
We focused on the spread of COVID-19 and influenza infections based on r...
Judgment prediction is the task of predicting various outcomes of legal cases of which sentencing prediction is one of the most important yet difficult challenges. We study the applicability of machine learning (ML) techniques in predicting prison terms of drug trafficking cases. In particular, we study how legal domain knowledge can be integrated...
Heterogeneous information networks (HINs), which are typed graphs with labeled nodes and edges, have attracted tremendous interest from academia and industry. Given two HIN nodes
${s}$
and
${t}$
, and a natural number
${k}$
, we study the discovery of the
${k}$
most important meta paths in real time, which can be used to support friend sear...
Crowdsourcing can be used to determine a total order for an object set (e.g., the top-10 NBA players) based on crowd opinions. This ranking problem is often decomposed into a set of microtasks (e.g., pairwise comparisons). These microtasks are passed to a large number of workers and their answers are aggregated to infer the ranking. The number of m...
In the original article, the Table 1 was published with incorrect figures. The correct Table 1 is given below
Kernel functions support a broad range of applications that require tasks like density estimation, classification, regression or outlier detection. For these tasks, a common online operation is to compute the weighted aggregation of kernel function values with respect to a set of points. However, scalable aggregation methods are still unknown for t...
With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many...
Crowdsourcing can be used to determine a total order for an object set (e.g., the top-10 NBA players) based on crowd opinions. This ranking problem is often decomposed into a set of microtasks (e.g., pairwise comparisons). These microtasks are passed to a large number of workers and their answers are aggregated to infer the ranking. The number of m...
In this paper, we propose a Deep Reinforcement Learning (RL) framework for task arrangement, which is a critical problem for the success of crowdsourcing platforms. Previous works conduct the personalized recommendation of tasks to workers via supervised learning methods. However, the majority of them only consider the benefit of either workers or...
In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial patternP. A pattern P is a graph whose vertices represent spatial objects, and edges denote distance rela...
Many real-world networks (e.g., friendship network among Facebook users) generate data (e.g., friend requests) in a stream fashion. Recently, several network embedding methods are proposed to learn embeddings on such networks incrementally. However, these methods perform incremental updates in a heuristic manner and thus fail to quantitatively rest...
In graph applications (e.g., biological and social networks), various analytics tasks (e.g., clustering and community search) are carried out to extract insight from large and complex graphs. Central to these tasks is the counting of the number of motifs , which are graphs with a few nodes. Recently, researchers have developed several fast motif co...
In this paper, we study the spatial pattern matching (SPM) query. Given a set D of spatial objects (e.g., houses and shops), each with a textual description, we aim at finding all combinations of objects from D that match a user-defined spatial pattern P. A pattern P is a graph whose vertices represent spatial objects, and edges denote distance rel...
Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D...
Densest subgraph discovery (DSD) is a fundamental problem in graph mining. It has been studied for decades, and is widely used in various areas, including network science, biological analysis, and graph databases. Given a graph G, DSD aims to find a subgraph D of G with the highest density (e.g., the number of edges over the number of vertices in D...
In crowdsourcing, human workers are employed to tackle problems that are traditionally difficult for computers (e.g., data cleaning, missing value filling, and sentiment analysis). In this paper, we study the effective use of crowdsourcing in filling missing values in a given relation (e.g., a table containing different attributes of celebrity star...
With the rapid development of information technologies, various big graphs are prevalent in many real applications (e.g., social media and knowledge bases). An important component of these graphs is the network community. Essentially, a community is a group of vertices which are densely connected internally. Community retrieval can be used in many...
Novel road-network applications often recommend a moving object (e.g., a vehicle) about interesting services or tasks on its way to a destination. A taxi-sharing system, for instance, suggests a new passenger to a taxi while it is serving another one. The traveling cost is then shared among these passengers. A fundamental query is: given two nodes...
Given a graph $G$ and a vertex $q\in G$, the community search (CS) problem aims to efficiently find a subgraph of $G$ whose vertices are closely related to $q$. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (PCS...
This book constitutes the proceedings of the 20th International Conference on Web Information Systems Engineering, WISE 2019, held in Hong Kong, China, in November 2019.
The 50 full papers presented were carefully reviewed and selected from 211 submissions. The papers are organized in the following topical sections: blockchain and crowdsourcing; ma...
Given a graph G and a vertex
$q \epsilon G$
, the community search (CS) problem aims to efficiently find a subgraph of G whose vertices are closely related to q. Communities are prevalent in social and biological networks, and can be used in product advertisement and social event recommendation. In this paper, we study profiled community search (P...
Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS), extracting a dense subgraph containing a query vertex q from a graph, has received great attention. However, existing CS solutions are designed for undirected graphs, and overlook directions of edges which potential...
Query recommendation, which suggests related queries to search engine users, has attracted a lot of attention in recent years. Most of the existing solutions, which perform analysis of users’ search history (or query logs), are often insufficient for long-tail queries that rarely appear in query logs. To handle such queries, we study the use of ent...
Communities are prevalent in social networks, knowledge graphs, and biological networks. Recently, the topic of community search (CS) has received plenty of attention. The CS problem aims to look for a dense subgraph that contains a query vertex. Existing CS solutions do not consider the spatial extent of a community. They can yield communities who...
To automatically extract data records from Web pages, the data record extraction algorithm is required to be robust and efficient. However, most of existing algorithms are not robust enough to cope with rich information or noisy data. In this paper, we propose a novel suffix tree-based extraction method (STEM) for this challenging task. First, we e...
Large graphs are prevalent in social networks, traffic networks, and biology. These graphs are often inexact. For example, in a friendship network, an edge between two nodes u and v indicates that users u and v have a close relationship. This edge may only exist with a probability. To model such information, the uncertain graph model has been propo...
We study the classical kNN queries on road networks. Existing solutions mostly focus on reducing query processing time. In many applications, however, system throughput is a more important measure. We devise a mathematical model that describes throughput in terms of a number of system characteristics. We show that query time is only one of the many...
Given a graph G and a vertex \(q \in G\), the community search query returns a subgraph of G that contains vertices related to q. Communities, which are prevalent in attributed graphs such as social networks and knowledge bases, can be used in emerging applications such as product advertisement and setting up of social events. In this paper, we inv...