About
87
Publications
31,904
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,604
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (87)
Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality)...
Semi-supervised learning (SSL) has garnered significant attention due to its ability to leverage limited labeled data and a large amount of unlabeled data to improve model generalization performance. Recent approaches achieve impressive successes by combining ideas from both consistency regularization and pseudo-labeling. However, these methods ten...
The primary challenge of multi-label active learning, differing it from multi-class active learning, lies in assessing the informativeness of an indefinite number of labels while also accounting for the inherited label correlation. Existing studies either require substantial computational resources to leverage correlations or fail to fully explore...
Topic modeling is a fundamental task in natural language processing, allowing the discovery of latent thematic structures in text corpora. While Large Language Models (LLMs) have demonstrated promising capabilities in topic discovery, their direct application to topic modeling suffers from issues such as incomplete topic coverage, misalignment of t...
Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality)...
Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assu...
Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising...
Within the scope of natural language processing, the domain of multi-label text classification is uniquely challenging due to its expansive and uneven label distribution. The complexity deepens due to the demand for an extensive set of annotated data for training an advanced deep learning model, especially in specialized fields where the labeling t...
Knowledge distillation (KD) is a prevalent model compression technique in deep learning, aiming to leverage knowledge from a large teacher model to enhance the training of a smaller student model. It has found success in deploying compact deep models in intelligent applications like intelligent transportation, smart health, and distributed intellig...
In recent years, various distillation methods for semantic segmentation have been proposed. However, these methods typically train the student model to imitate the intermediate features or logits of the teacher model directly, thereby overlooking the high-discrepancy regions learned by both models, particularly the differences in instance edges. In...
Dialogue acts (DAs) can represent conversational actions of tutors or students that take place during tutoring dialogues. Automating the identification of DAs in tutoring dialogues is significant to the design of dialogue-based intelligent tutoring systems. Many prior studies employ machine learning models to classify DAs in tutoring dialogues and...
Dialogue Acts (DAs) can be used to explain what expert tutors do and what students know during the tutoring process. Most empirical studies adopt the random sampling method to obtain sentence samples for manual annotation of DAs, which are then used to train DA classifiers. However, these studies have paid little attention to sample informativeness...
Current work in named entity recognition (NER) uses either cross entropy (CE) or conditional random fields (CRF) as the objective/loss functions to optimize the underlying NER model. Both of these traditional objective functions for the NER problem generally produce adequate performance when the data distribution is balanced and there are sufficien...
Knowledge distillation is a simple yet effective technique for deep model compression, which aims to transfer the knowledge learned by a large teacher model to a small student model. To mimic how the teacher teaches the student, existing knowledge distillation methods mainly adapt an unidirectional knowledge transfer, where the knowledge extracted...
Dialogue acts (DAs) can represent conversational actions of tutors or students that take place during tutoring dialogues. Automating the identification of DAs in tutoring dialogues is significant to the design of dialogue-based intelligent tutoring systems. Many prior studies employ machine learning models to classify DAs in tutoring dialogues and...
Dialogue Acts (DAs) can be used to explain what expert tutors do and what students know during the tutoring process. Most empirical studies adopt the random sampling method to obtain sentence samples for manual annotation of DAs, which are then used to train DA classifiers. However, these studies have paid little attention to sample informativeness...
Continual learning (CL) is a machine learning paradigm that accumulates knowledge while learning sequentially. The main challenge in CL is catastrophic forgetting of previously seen tasks, which occurs due to shifts in the probability distribution. To retain knowledge, existing CL models often save some past examples and revisit them while learning...
Jianping Gou Xin He Lan Du- [...]
Zhang Yi
Deep dictionary learning (DDL) shows good performance in visual classification tasks. However, almost all existing DDL methods ignore the locality relationships between the input data representations and the learned dictionary atoms, and learn sub-optimal representations in the feature coding stage, which are less conducive to classification. To th...
The application of Auto-Encoder (AE) to multi-view representation learning has gained traction due to advancements in deep learning. While some current AE-based multi-view representation learning algorithms incorporate the geometric structure of the input data into their feature representation learning process, their use of a shallow structured gra...
Knowledge tracing (KT) aims to leverage students' learning histories to estimate their mastery levels on a set of pre-defined skills, based on which the corresponding future performance can be accurately predicted. In practice, a student's learning history comprises answers to sets of massed questions, each known as a session, rather than merely be...
Current work in named entity recognition (NER) uses either cross entropy (CE) or conditional random fields (CRF) as the objective/loss functions to optimize the underlying NER model. Both of these traditional objective functions for the NER problem generally produce adequate performance when the data distribution is balanced and there are sufficien...
Recently, self-supervised pretraining of transformers has gained considerable attention in analyzing electronic medical records. However, systematic evaluation of different pretraining tasks in radiology applications using both images and radiology reports is still lacking. We propose PreRadE, a simple proof of concept framework that enables novel...
Domain adaptation is an effective solution to data scarcity in low-resource scenarios. However, when applied to token-level tasks such as bioNER, domain adaptation methods often suffer from the challenging linguistic characteristics that clinical narratives possess, which leads to unsatisfactory performance. In this paper, we present a simple yet e...
Recently, discrete latent variable models have received a surge of interest in both Natural Language Processing (NLP) and Computer Vision (CV), attributed to their comparable performance to the continuous counterparts in representation learning, while being more interpretable in their predictions. In this paper, we develop a topic-informed discrete...
Knowledge distillation (KD), as an efficient and effective model compression technique, has received considerable attention in deep learning. The key to its success is about transferring knowledge from a large teacher network to a small student network. However, most existing KD methods consider only one type of knowledge learned from either instan...
Uncertainty estimation is essential to make neural networks trustworthy in real-world applications. Extensive research efforts have been made to quantify and reduce predictive uncertainty. However, most existing works are designed for unimodal data, whereas multi-view uncertainty estimation has not been sufficiently investigated. Therefore, we prop...
Introduction
Proximal humeral fractures account for a significant proportion of all fractures. Detailed accurate classification of the type and severity of the fracture is a key component of clinical decision making, treatment and plays an important role in orthopaedic trauma research. This research aimed to assess the performance of Machine Learni...
K-nearest neighbor rule (KNN) has been regarded as one of the top 10 methods in the field of data mining. Due to its simplicity and effectiveness, it has been widely studied and applied to various classification tasks. In this article, we develop a novel representation coefficient-based k-nearest centroid neighbor method (RCKNCN), which aims to fur...
How to represent and classify a testing sample for the representation-based classification (RBC) plays an important role in the filed of pattern recognition. As a typical kind of the representation-based classification with promising performance, collaborative representation-based classification (CRC) adopts all the training samples to collaborativ...
Deep neural networks have achieved a great success in a variety of applications, such as self-driving cars and intelligent robotics. Meanwhile, knowledge distillation has received increasing attention as an effective model compression technique for training very efficient deep models. The performance of the student network obtained through knowledg...
We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log...
This paper proposes a transformer over transformer framework, called Transformer$^2$, to perform neural text segmentation. It consists of two components: bottom-level sentence encoders using pre-trained transformers, and an upper-level transformer-based segmentation model based on the sentence embeddings. The bottom-level component transfers the pr...
Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more fine-grained sentence-level information is ignored, and (2) external semantic knowledge regarding documents, sentences a...
This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as...
Continual learning (CL) refers to a machine learning paradigm that using only a small account of training samples and previously learned knowledge to enhance learning performance. CL models learn tasks from various domains in a sequential manner. The major difficulty in CL is catastrophic forgetting of previously learned tasks, caused by shifts in...
Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with nearly a hundred models developed and a wide range of applications in neural language understanding such as text generation, s...
Non-negative tensor factorization models enable predictive analysis on count data. Among them, Bayesian Poisson–Gamma models can derive full posterior distributions of latent factors and are less sensitive to sparse count data. However, current inference methods for these Bayesian models adopt restricted update rules for the posterior parameters. T...
Representation‐based classification (RBC) has been attracting a great deal of attention in pattern recognition. As a typical extension to RBC, collaborative representation‐based classification (CRC) has demonstrated its superior performance in various image classification tasks. Ideally, we expect that the learned class‐specific representations for...
Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred models developed and a wide range of applications in neural language understanding such as text generation, sum...
Few/Zero-shot learning is a big challenge of many classifications tasks, where a classifier is required to recognise instances of classes that have very few or even no training samples. It becomes more difficult in multi-label classification, where each instance is labelled with more than one class. In this paper, we present a simple multi-graph ag...
Obtaining training data for multi-document summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to a sentence graph, taking both linguistic a...
Graph embedding has attracted much more research interests in dimensionality reduction. In this study, based on collaborative representation and graph embedding, the authors propose a new linear dimensionality reduction method called collaborative representation‐based locality preserving projection (CRLPP). In the CRLPP, they assume that the simila...
Representation-based classification (RBC) has attracted much attention in pattern recognition. As a linear representative RBC method, collaborative representation-based classification (CRC) is very promising for classification. Although many extensions of CRC have been developed recently, the discriminative and competitive representations of differ...
Matrix factorization (MF) has been widely applied to collaborative filtering in recommendation systems. Its Bayesian variants can derive posterior distributions of user and item embeddings, and are more robust to sparse ratings. However, the Bayesian methods are restricted by their update rules for the posterior parameters due to the conjugacy of t...
Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in...
Representation-based classification (RBC) methods have recently been the promising pattern recognition technologies for object recognition. The representation coefficients of RBC as the linear reconstruction measure (LRM) can be well used for classifying objects. In this article, we propose two enhanced linear reconstruction measure-based classific...
Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count/binary) data. With the ability of handling high-dimensional and sparse discrete data, models based on probabilistic matrix factorisation and latent factor analysis have enjoyed...
Graph embedding is a very useful dimensionality reduction technique in pattern recognition. In this article, we develop a novel discriminative dimensionality reduction technique entitled sparsity and geometry preserving graph embedding (SGPGE). SGPGE can not only capture the sparse reconstructive relationships among training samples, but also disco...
Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously stu...
In this article we propose several two-phase representation-based classification (RBC) methods that are inspired by the idea of the two-phase test sample sparse representation (TPTSR) method with L2-norm. We first introduce two simple extensions of TPTSR using L1-norm alone and the combination of L1-norm and L2-norm, respectively. We then propose t...
Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and...
Relational data are usually highly incomplete in practice, which inspires us to leverage side information to improve the performance of community detection and link prediction. This paper presents a Bayesian probabilistic approach that incorporates various kinds of node attributes encoded in binary form in relational models with Poisson likelihood....
Nowadays, users of social networks like tweets and weibo have generated massive geo-tagged records, and these records reveal their activities in the physical world together with spatio-temporal dynamics. Existing trajectory data management studies mainly focus on analyzing the spatio-temporal properties of trajectories, while leaving the understand...
The Dirichlet process and its extension, the Pitman–Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them...
In applications we may want to compare different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or cross-collection modeling. We present a differential topic model for this application that models both topic differences and simil...
We develop a novel maximum neighborhood margin discriminant projection (MNMDP) technique for dimensionality reduction of high-dimensional data. It utilizes both the local information and class information to model the intraclass and interclass neighborhood scatters. By maximizing the margin between intraclass and interclass neighborhoods of all poi...
In recent years the research on measuring relationship strength among the people in a social network has gained attention due to its potential applications of social network analysis. The challenge is how we can learn social relationship strength based on various resources such as user profiles and social interactions. In this paper we propose a KP...
To digest tremendous documents efficiently, people often resort to their titles, which normally provide a concise and semantic representation of main text. Some titles however are misleading due to lexical ambiguity or eye-catching intention. The requirement of reference summaries hampers using traditional lexical summarisation evaluation technique...
We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a point-wise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s)....
K-nearest neighbor (KNN) rule is a simple and effective algorithm in pattern classification. In this article, we propose a
local mean-based k-nearest centroid neighbor classifier that assigns to each query pattern a class label with nearest local centroid mean vector
so as to improve the classification performance. The proposed scheme not only take...
Understanding how topics within a document evolve over the structure of the document is an interesting and potentially important problem in exploratory and predictive text analytics. In this article, we address this problem by presenting a novel variant of latent Dirichlet allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers t...
Topic models are increasingly being used for text analysis tasks, often times replacing earlier semantic techniques such as latent semantic analysis. In this paper, we develop a novel adaptive topic model with the ability to adapt topics from both the previous segment and the parent document. For this proposed model, a Gibbs sampler is developed fo...
In this paper, we develop a novel Distance-weighted k -nearest Neighbor rule (DWKNN), using the dual distance-weighted function. The proposed DWKNN is motivated by the sensitivity problem of the selection of the neighborhood size k that exists in k -nearest Neighbor rule (KNN), with the aim of improving classification performance. The experiment re...
Hierarchical modeling and reasoning are fundamental in machine intelligence, and for this the two-parameter Poisson-Dirichlet Process (PDP) plays an important role. The most popular MCMC sampling algorithm for the hierarchical PDP and hierarchical Dirichlet Process is to conduct an incremental sampling based on the Chinese restaurant metaphor, whic...
Understanding how topics within a document evolve over its structure is an interesting and important problem. In this paper, we address this problem by presenting a novel variant of Latent Dirichlet Allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers the underlying sequential structure, i.e., a document consists of multiple s...
Documents come naturally with structure: a section contains paragraphs which itself contains sentences; a blog page contains
a sequence of comments and links to related blogs. Structure, of course, implies something about shared topics. In this paper
we take the simplest form of structure, a document consisting of multiple segments, as the basis fo...
Exact Bayesian network inference exists for Gaussian and multinomial distributions. For other kinds of distributions, approximations or restrictions on the kind of inference done are needed. In this paper we present generalized networks of Dirichlet distributions, and show how, using the two-parameter Poisson-Dirichlet distribution and Gibbs sampli...
It is well known that either domain specific or domain independent knowledge has been adopted in Information retrieval (IR) to improve the retrieval performance. In this paper, we propose a novel IR model for digital forensics by using latent semantic indexing (LSI) and WordNet as an underlying reference ontology to retrieve suspicious emails accor...
Because of the high impact of high-tech digital crime upon our society, it is necessary to develop effective Information Retrieval
(IR) tools to support digital forensic investigations. In this paper, we propose an IR system for digital forensics that targets
emails. Our system incorporates WordNet (i.e. a domain independent ontology for the vocab...
For the problem of soil moisture prediction, existing approaches in literature [M. Kashif et al., 2006; Y. Shao et al., 1997] usually utilize as many decision factors as possible, e.g. rainfall, solar irradiance, drainage, etc. However, the redundancy aspect of the decision factors has not been studied rigorously. Previous research work in data min...