About
183
Publications
22,399
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,277
Citations
Introduction
Current institution
Additional affiliations
February 2016 - present
February 2011 - February 2016
Publications
Publications (183)
Large language models (LLMs) have demonstrated strong capabilities in language understanding and generation, and their potential in educational contexts is increasingly being explored. One promising area is learnersourcing, where students engage in creating their own educational content, such as multiple-choice questions. A critical step in this pr...
Machine learning (ML) models have become essential tools in various scenarios. Their effectiveness, however, hinges on a substantial volume of data for satisfactory performance. Model marketplaces have thus emerged as crucial platforms bridging model consumers seeking ML solutions and data owners possessing valuable data. These marketplaces leverag...
Recently there has been a large amount of research designing mechanisms for auction scenarios where the bidders are connected in a social network. Different from the existing studies in this field that focus on specific auction scenarios e.g. single-unit auction and multi-unit auction, this paper considers the following question: is it possible to...
The issue of fairness in AI arises from discriminatory practices in applications like job recommendations and risk assessments, emphasising the need for algorithms that do not discriminate based on group characteristics. This concern is also pertinent to auctions, commonly used for resource allocation, which necessitate fairness considerations. Our...
The sealed-bid auction enables bidders to secretly send their bids to the auctioneer, which compares all bids and publishes the winning one on the bid-opening day. This type of auction is friendly for protecting the bid privacy, and sufficiently fair for all bidders if the auctioneer acts faithfully. Unfortunately, the auctioneer may not always be...
The issue of fairness in AI arises from discriminatory practices in applications like job recommendations and risk assessments, emphasising the need for algorithms that do not discriminate based on group characteristics. This concern is also pertinent to auctions, commonly used for resource allocation, which necessitate fairness considerations. Our...
Recently there has been a large amount of research designing mechanisms for auction scenarios where the bidders are connected in a social network. Different from the existing studies in this field that focus on specific auction scenarios e.g. single-unit auction and multi-unit auction, this paper considers the following question: is it possible to...
Lexicographic Ranking SuperMartingale (LexRSM) is a probabilistic extension of Lexicographic Ranking Function (LexRF), which is a widely accepted technique for verifying program termination. In this paper, we are the first to propose sound probabilistic extensions of LexRF with a weaker non-negativity condition, called single-component (SC) non-neg...
Lexicographic Ranking SuperMartingale (LexRSM) is a probabilistic extension of Lexicographic Ranking Function (LexRF), which is a widely accepted technique for verifying program termination. In this paper, we are the first to propose sound probabilistic extensions of LexRF with a weaker non-negativity condition, called single-component (SC) non-neg...
Learnersourcing offers great potential for scalable education through student content creation. However, predicting student performance on learnersourced questions, which is essential for personalizing the learning experience, is challenging due to the inherent noise in student-generated data. Moreover, while conventional graph-based methods can ca...
Designing suitable reward functions for numerous interacting intelligent agents is challenging in real-world applications. Inverse reinforcement learning (IRL) in mean field games (MFGs) offers a practical framework to infer reward functions from expert demonstrations. While promising, the assumption of agent homogeneity limits the capability of ex...
Better understanding the natural world is a crucial task with a wide range of applications. In environments with close proximity between humans and animals, such as zoos, it is essential to better understand the causes behind animal behaviour and what interventions are responsible for changes in their behaviours. This can help to predict unusual be...
Log analysis can diagnose software system issues. Log anomaly detection always faces the challenge of class distribution imbalance and data noise. In addition, existing methods often overlook log event structural relationships, causing instability. In this work, we propose AdvGraLog, a Generative Adversarial Network (GAN) model based on log graph r...
This paper proposes an ensemble model for the Stanford Question Answering Dataset (SQuAD) with the aim of improving performance compared to baseline models such as Albert, and Electra. The proposed ensemble model incorporates Sentence Attention (SA-Net) and Answer Attention (AA-Net) components, which leverage attention mechanisms to emphasize impor...
Logs offer vital insights into system states and contextual details, crucial for identifying anomalies. Numerous machine learning and deep learning approaches have been proposed for log anomaly detection. Recent studies reveal that distinct software systems tend to generate a substantial volume of complexity and diversity of logs that exhibit consi...
In the digital age, data is a valuable commodity, and data marketplaces offer lucrative opportunities for data owners to monetize their private data. However, data privacy is a significant concern, and differential privacy has become a popular solution to address this issue. Private data trading systems (PDQS) facilitate the trade of private data b...
Diffusion auction is an emerging business model where a seller aims to incentivise buyers in a social network to diffuse the auction information thereby attracting potential buyers. We focus on designing mechanisms for multi-unit diffusion auctions. Despite numerous attempts at this problem, existing mechanisms either fail to be incentive compatibl...
Non-intrusive load monitoring (NILM) enables extracting individual appliances’ power consumption data from an aggregated power signal in a cost-effective way. The extracted appliance-level power data can greatly facilitate tasks such as malfunction diagnosis and load forecasting, which are of significant importance for efficient energy use. Various...
A data marketplace is an online venue that brings data owners, data brokers, and data consumers together and facilitates commoditisation of data amongst them. Data pricing, as a key function of a data marketplace, demands quantifying the monetary value of data. A considerable number of studies on data pricing can be found in literature. This paper...
In the digital age, data is a valuable commodity, and data marketplaces offer lucrative opportunities for data owners to monetize their private data. However, data privacy is a significant concern, and differential privacy has become a popular solution to address this issue. Private data trading systems (PDQS) facilitate the trade of private data b...
Non-intrusive load monitoring (NILM) aims to decompose aggregated electrical usage signal into appliance-specific power consumption and it amounts to a classical example of blind source separation tasks. Leveraging recent progress on deep learning techniques, we design a new neural NILM model {\em Multi-State Dual CNN} (MSDC). Different from previo...
Unsupervised/self-supervised graph neural networks (GNN) are susceptible to the inherent randomness in the input graph data, which adversely affects the model's performance in downstream tasks. In this paper, we propose USER, an unsupervised and robust version of GNN based on structural entropy, to alleviate the interference of graph perturbations...
Abstract—Business process management focuses on the automatic discovery and optimisation of business process models
for a wide range of business scenarios. At the same time, the
development of natural language processing (NLP), in particular
some large-scale pre-trained language models such as BERT and
GPT, has recently achieved great success and b...
Pre-trained large language model (LLM) is under exploration to perform NLP tasks that may require logical reasoning. Logic-driven data augmentation for representation learning has been shown to improve the performance of tasks requiring logical reasoning, but most of these data rely on designed templates and therefore lack generalization. In this r...
Earthquakes can cause severe damage to structural and non-structural elements of buildings; consequently, they pose high risks to human lives. To mitigate such risks, attention has been paid to enhancing the indoor environment for increased building safety. Yet little effort has been made to assess a building occupants' evacuation behaviors in resp...
Next POI recommendation intends to forecast users' immediate future movements given their current status and historical information, yielding great values for both users and service providers. However, this problem is perceptibly complex because various data trends need to be considered together. This includes the spatial locations, temporal contex...
Diffusion auction is an emerging business model where a seller aims to incentivise buyers in a social network to diffuse the auction information thereby attracting potential buyers. We focus on designing mechanisms for multi-unit diffusion auctions. Despite several attempts at this problem, existing mechanisms are unsatisfactory in one way or anoth...
Correlated Equilibrium (CE) is a well-established solution concept that captures coordination among agents and enjoys good algorithmic properties. In real-world multi-agent systems, in addition to being in an equilibrium, agents' policies are often expected to meet requirements with respect to safety, and fairness. Such additional requirements can...
Diffusion auction refers to an emerging paradigm of online marketplace where an auctioneer utilises a social network to attract potential buyers. Diffusion auction poses significant privacy risks. From the auction outcome, it is possible to infer hidden, and potentially sensitive, preferences of buyers. To mitigate such risks, we initiate the study...
Unsupervised/self-supervised graph neural networks (GNN) are vulnerable to inherent randomness in the input graph data which greatly affects the performance of the model in downstream tasks. In this paper, we alleviate the interference of graph randomness and learn appropriate representations of nodes without label information. To this end, we prop...
Non-intrusive load monitoring (NILM) aims to decompose aggregated electrical usage signal into appliance-specific power consumption and it amounts to a classical example of blind source separation tasks. Leveraging recent progress on deep learning techniques, we design a new neural NILM model Multi-State Dual CNN (MSDC). Different from previous mod...
Covert communication is an method that plays an important role in secure data transmission. The technology embeds covert information into data and propagates it through covert channels. The communication quality depends on the choice of channel and data embedding techniques. Recently, blockchain has emerged to become the preferred channel to carry...
Few-shot learning (FSL) is an emergent paradigm of learning that attempts to learn with low sample complexity to mimic the way humans can learn, generalise and extrapolate based on only a few examples. While FSL attempts to mimic these human characteristics, fundamentally, the task of FSL as conventionally described and modelled using meta-learning...
A data marketplace provides a platform for data trading by bringing together data owners and data consumers with a data broker. Recent advancements in mechanism design of data marketplaces have introduced a number of private data query mechanisms that facilitate the trading of private data. A critical assumption to ensuring that these mechanisms fu...
The problem addressed by dictionary learning (DL) is the representation of data as a sparse linear combination of columns of a matrix called dictionary. Both the dictionary and the sparse representations are learned from the data. We show how DL can be employed in the imputation of multivariate time series. We use a structured dictionary, which is...
Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language...
Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language...
Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-ar...
Understanding, modelling and predicting human risky decision-making is challenging due to intrinsic individual differences and irrationality. Fuzzy trace theory (FTT) is a powerful paradigm that explains human decision-making by incorporating gists, i.e., fuzzy representations of information which capture only its quintessential meaning. Inspired b...
Contextual multi-armed bandit algorithms are widely used to solve online decision-making problems. However, traditional methods assume linear rewards and low dimensional contextual information, leading to high regrets and low online efficiency in real-world applications. In this paper, we propose a novel framework called interconnected neural-linea...
A major theme in the study of social dynamics is the formation of a community structure on a social network, i.e., the network contains several densely connected region that are sparsely linked between each other. In this paper, we investigate the network integration process in which edges are added to dissolve the communities into a single unified...
The recent mean field game (MFG) formalism has enabled the application of inverse reinforcement learning (IRL) methods in large-scale multi-agent systems, with the goal of inferring reward signals that can explain demonstrated behaviours of large populations. The existing IRL methods for MFGs are built upon reducing an MFG to a Markov decision proc...
Many existing conversation models that are based on the encoder–decoder framework incorporate complex encoders. These powerful encoders serve to enrich the context vectors, so that the generated responses are more diverse and informative. However, these approaches face two potential challenges. First, the high complexity of the encoder means relati...
We propose a dictionary learning (DL) algorithm for signals in additive noise with generalized Gaussian distribution (GGD) by redesigning three key components used in DL for Gaussian signals: (i) the orthogonal matching pursuit algorithm, (ii) the approximate K-SVD algorithm and (iii) the information theoretic criteria. In experiments with simulate...
In the aftermath of severe earthquakes, building occupants evacuation behaviour is a vital indicator of the performance of an indoor building design. However, earthquake evacuation has been systematically neglected in the current building design practice. Arguably, one of the primary reasons for this is that post-earthquake evacuation behaviour is...
Community detection has been widely studied from many different perspectives, which include heuristic approaches in the past and graph neural network in recent years. With increasing security and privacy concerns, community detectors have been demonstrated to be vulnerable. A slight perturbation to the graph data can greatly change the detection re...
Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-ar...
Traffic forecasting is an integral part of intelligent transportation systems (ITS). Achieving a high prediction accuracy is a challenging task due to a high level of dynamics and complex spatial-temporal dependency of road networks. For this task, we propose Graph Attention-Convolution-Attention Networks (GACAN). The model uses a novel Att-Conv-At...
Traffic flow forecasting is a crucial task in urban computing. The challenge arises as traffic flows often exhibit intrinsic and latent spatio-temporal correlations that cannot be identified by extracting the spatial and temporal patterns of traffic data separately. We argue that such correlations are universal and play a pivotal role in traffic fl...
Basic research progress requires sustainable and healthy development of the academic community. This study aims to examine community development directed by research funding and the impact of top scientists on this development. To complement existing methods of measuring funding performance, which focus mostly on narrow factors such as citation or...
Leveraging knowledge graph will benefit question answering tasks, as KG contains well-structured informative data. However, training knowledge graph-based simple question answering systems is known computationally expensive due to the complex predicate extraction and candidate pool generation. Moreover, the existing methods based on convolutional n...
In existing ensemble learning algorithms (e.g., random forest), each base learner’s model needs the entire dataset for sampling and training. However, this may not be practical in many real-world applications, and it incurs additional computational costs. To achieve better efficiency, we propose a decentralized framework: Multi-Agent Ensemble. The...
Conditional Variational AutoEncoder (CVAE) effectively increases the diversity and informativeness of responses in open-ended dialogue generation tasks through enriching the context vector with sampled latent variables. However, due to the inherent one-to-many and many-to-one phenomena in human dialogues, the sampled latent variables may not correc...
Simulation models are an undeniable tool to help researchers and designers forecast effects of definite policies regarding pedestrian social and collective movement behaviour. Considering both the environment's details and the complexity of human behaviour in choosing paths simultaneously is the main challenge in micro-simulation pedestrian dynamic...
Many existing conversation models that are based on the encoder-decoder framework have focused on ways to make the encoder more complicated to enrich the context vectors so as to increase the diversity and informativeness of generated responses. However, these approaches face two problems. First, the decoder is too simple to effectively utilize the...
Mean field games (MFG) facilitate the otherwise intractable reinforcement learning (RL) in large-scale multi-agent systems (MAS), through reducing interplays among agents to those between a representative individual agent and the mass of the population. While, RL agents are notoriously prone to unexpected behaviours due to reward mis-specification....
The concept of conventions has attracted much attention in the multi-agent system research. In this article, we study the emergence of conventions from repeated n -player coordination games. Distributed agents learn their policies independently and are capable of observing their neighbours in a network topology. We distinguish two types of informat...
A variety of satellite authentication protocols have been proposed to meet the requirements of the security and efficiency in satellite communication networks nowadays. Most of them are variants from entity authentication technical standards published by the International Organization for Standardization (ISO), the International Electrotechnical Co...
Combining deep learning with symbolic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. However, it is as yet unknown to what degree symbolic reasoning can be acquired by end-to-end neural networks. In this paper, we explore the possibility of adapting a neural symbolic reasoner to function as a neural...
The formation of public opinions is a complex phenomenon that revolves around the aggregation of individuals’ beliefs. To accurately capture this phenomenon, one needs to build links from individualistic experiences to personal beliefs, which evolve in a social space through information exchange and belief revision. Despite many efforts to model op...
The big data generated by Industry 4.0 is expected to increase 20-fold in the next ten years and it has raised various challenges in Industrial Wireless Sensor Networks (IWSNs). Among these challenges, detecting different types of anomalies of industrial electricity consumption in an accurate and timely manner is a priority. If not handled properly...
Private data query combines mechanism design with privacy protection to produce aggregated statistics from privately-owned data records. The problem arises in a data marketplace where data owners have personalised privacy requirements and private data valuations. We focus on the case when the data owners are single-minded, i.e., they are willing to...
The gist can be viewed as an abstract concept that represents only the quintessential meaning derived from a single or multiple sources of information. We live in an age where vast quantities of information are widely available and easily accessible. Identifying the gist contextualises information which facilitates the fast disambiguation and predi...
WiFi access points are sources of considerable security risks as the wireless signals have the potential to leak important private information such as passwords. This paper examines the security issues posed by point-of-sale (POS) terminals which are widely used in WiFi-covered environments such as restaurants, banks and libraries. In particular, w...
Background
—Maintenance of tight controls on circulating blood metabolites is crucial to normal, healthy tissue and organismal function. A number of single nucleotide polymorphisms (SNPs) have been associated with changes in the levels of blood metabolites. However, the impacts of the metabolite-associated SNPs are largely unknown because they fall...
Driver identification and impostor detection suffer different challenges, including costly and invasive data collection. Existing methods incur additional costs due to their data dependency on complex and expensive sensory systems. This article proposes an event-driven framework for driver identification and impostor detection. That utilizes the Gl...
This paper proposes a chatbot framework that adopts a hybrid model which consists of a knowledge graph and a text similarity model. Based on this chatbot framework, we build HHH, an online question-and-answer (QA) Healthcare Helper system for answering complex medical questions. HHH maintains a knowledge graph constructed from medical data collecte...
Social capital captures the positional advantage gained by an individual by being in a social network. A well-known dichotomy defines two types of social capital: bonding capital, which refers to welfare such as trust and norms, and bridging capital, which refers to benefits in terms of influence and power. We present a framework where these notion...
This paper proposes a chatbot framework that adopts a hybrid model which consists of a knowledge graph and a text similarity model. Based on this chatbot framework, we build HHH, an online question-and-answer (QA) Healthcare Helper system for answering complex medical questions. HHH maintains a knowledge graph constructed from medical data collecte...