Wray Buntine

Wray Buntine
Monash University (Australia) · Faculty of Information Technology

PhD, UTS; Docent, Uni. Helsinki

About

266
Publications
35,879
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
7,094
Citations
Introduction
Generally, I do Bayesian analysis of problems with documents, text and information access. Recently I focus on discrete non-parametric Bayesian methods with latent variables.
Additional affiliations
February 2014 - present
Monash University (Australia)
Position
  • Professor (Full)
April 2007 - January 2014
National ICT Australia Ltd
Position
  • Principal Investigator
April 2007 - present
Australian National University
Position
  • Adjunct Professor (Associate)
Education
February 1986 - January 1990
University of Technology Sydney
Field of study
  • Computer Science

Publications

Publications (266)
Preprint
We study acquisition functions for active learning (AL) for text classification. The Expected Loss Reduction (ELR) method focuses on a Bayesian estimate of the reduction in classification error, recently updated with Mean Objective Cost of Uncertainty (MOCU). We convert the ELR framework to estimate the increase in (strictly proper) scores like log...
Preprint
Full-text available
Deep generative models have been widely used in several areas of NLP, and various techniques have been proposed to augment them or address their training challenges. In this paper, we propose a simple modification to Variational Autoencoders (VAEs) by using an Isotropic Gaussian Posterior (IGP) that allows for better utilisation of their latent rep...
Preprint
This paper proposes a transformer over transformer framework, called Transformer$^2$, to perform neural text segmentation. It consists of two components: bottom-level sentence encoders using pre-trained transformers, and an upper-level transformer-based segmentation model based on the sentence embeddings. The bottom-level component transfers the pr...
Preprint
Full-text available
Neural topic models (NTMs) apply deep neural networks to topic modelling. Despite their success, NTMs generally ignore two important aspects: (1) only document-level word count information is utilized for the training, while more fine-grained sentence-level information is ignored, and (2) external semantic knowledge regarding documents, sentences a...
Preprint
Full-text available
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages, rather than training separate models for different languages. Learning a single model can enhance the low-resource translation by leveraging data from multiple languages. However, the performance of an MNMT model is highly...
Conference Paper
Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with nearly a hundred models developed and a wide range of applications in neural language understanding such as text generation, s...
Article
Recently, listwise collaborative filtering (CF) algorithms are attracting increasing interest due to their efficiency and prediction quality. Different from rating-oriented (pointwise) CF, they recommend a preference ranking of items to each user without estimating the absolute value of the ratings. In practice, there are extensive side information...
Article
Full-text available
Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media, LLC, part of Springer Nature. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own...
Preprint
Full-text available
Topic modelling has been a successful technique for text analysis for almost twenty years. When topic modelling met deep neural networks, there emerged a new and increasingly popular research area, neural topic models, with over a hundred models developed and a wide range of applications in neural language understanding such as text generation, sum...
Preprint
Full-text available
Supervised learning, characterized by both discriminative and generative learning, seeks to predict the values of single (or sometimes multiple) predefined target attributes based on a predefined set of predictor attributes. For applications where the information available and predictions to be made may vary from instance to instance, we propose th...
Article
Full-text available
Bayesian network classifiers are, functionally, an interesting class of models, because they can be learnt out-of-core, i.e. without needing to hold the whole training data in main memory. The selective K-dependence Bayesian network classifier (SKDB) is state of the art in this class of models and has shown to rival random forest (RF) on problems w...
Preprint
Full-text available
In this paper, we present a new topic modelling approach via the theory of optimal transport (OT). Specifically, we present a document with two distributions: a distribution over the words (doc-word distribution) and a distribution over the topics (doc-topic distribution). For one document, the doc-word distribution is the observed, sparse, low-lev...
Preprint
Full-text available
Modern deep learning methods have equipped researchers and engineers with incredibly powerful tools to tackle problems that previously seemed impossible. However, since deep learning methods operate as black boxes, the uncertainty associated with their predictions is often challenging to quantify. Bayesian statistics offer a formalism to understand...
Preprint
BACKGROUND In the absence of a vaccine or curative treatment, non-pharmaceutical interventions regimes have been implemented by governments around the world, to slow the spread of COVID-19. The success of these NPIs has varied between countries and is likely to relate to the degree of uptake and adherence by the community. Understanding public atti...
Article
Full-text available
Background Nonpharmaceutical interventions (NPIs) (such as wearing masks and social distancing) have been implemented by governments around the world to slow the spread of COVID-19. To promote public adherence to these regimes, governments need to understand the public perceptions and attitudes toward NPI regimes and the factors that influence them...
Chapter
Full-text available
Decision trees are still seeing use in online, non-stationary and embedded contexts, as well as for interpretability. For applications like ranking and cost-sensitive classification, probability estimation trees (PETs) are used. These are built using smoothing or calibration techniques. Older smoothing techniques used counts local to a leaf node, b...
Conference Paper
Decision trees are still seeing use in online, non-stationary and embedded contexts, as well as for interpretability. For applications like ranking and cost-sensitive classification, probability estimation trees (PETs) are used. These are built using smoothing or calibration techniques. Older smoothing techniques used counts local to a leaf node, b...
Article
Full-text available
Besides the text content, documents usually come with rich sets of meta-information, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta-information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in...
Preprint
Full-text available
Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count/binary) data. With the ability of handling high-dimensional and sparse discrete data, models based on probabilistic matrix factorisation and latent factor analysis have enjoyed...
Chapter
Full-text available
Computing the probability of unseen documents is a natural evaluation task in topic modeling. Previous work has addressed this problem for the well-known Latent Dirichlet Allocation (LDA) model. However, the same problem for a more general class of topic models, referred here to as Gamma-Poisson Factor Analysis (GaP-FA), remains unexplored, which h...
Preprint
Full-text available
Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously stu...
Article
Full-text available
This paper introduces a novel parameter estimation method for the probability tables of Bayesian Network Classifiers (BNCs), using Hierarchical Dirichlet Processes (HDPs). The main result of this paper is to show that proper parameter estimation allows BNCs to outperform leading learning methods such as Random Forest for both 0-1 loss and RMSE, alb...
Article
A rich variety of models are now in use for unsupervised modelling of text documents, and, in particular, a rich variety of graphical models exist, with and without latent variables. To date, there is inadequate understanding about the comparative performance of these, partly because they are subtly different, and they have been proposed and evalua...
Article
Full-text available
The questions in a crowdsourcing task typically exhibit varying degrees of difficulty and subjectivity. Their joint effects give rise to the variation in responses to the same question by different crowd-workers. This variation is low when the question is easy to answer and objective, and high when it is difficult and subjective. Unfortunately, cur...
Article
Full-text available
Recent advances have demonstrated substantial benefits from learning with both generative and discriminative parameters. On the one hand, generative approaches address the estimation of the parameters of the joint distribution—\(\mathrm{P}(y,\mathbf{x})\), which for most network types is very computationally efficient (a notable exception to this a...
Article
Full-text available
Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and...
Article
Full-text available
Relational data are usually highly incomplete in practice, which inspires us to leverage side information to improve the performance of community detection and link prediction. This paper presents a Bayesian probabilistic approach that incorporates various kinds of node attributes encoded in binary form in relational models with Poisson likelihood....
Article
Full-text available
Twitter data is extremely noisy -- each tweet is short, unstructured and with informal language, a challenge for current topic modeling. On the other hand, tweets are accompanied by extra information such as authorship, hashtags and the user-follower network. Exploiting this additional information, we propose the Twitter-Network (TN) topic model to...
Conference Paper
Despite the growing importance of exploratory search, information retrieval (IR) systems tend to focus on lookup search. Lookup searches are well served by optimising the precision and recall of search results, however, for exploratory search this may be counterproductive if users are unable to formulate an appropriate search query. We present a sy...
Article
The Dirichlet process and its extension, the Pitman–Yor process, are stochastic processes that take probability distributions as a parameter. These processes can be stacked up to form a hierarchical nonparametric Bayesian model. In this article, we present efficient methods for the use of these processes in this hierarchical context, and apply them...
Article
Full-text available
Despite significant advances in Clinical Decision Support Systems, they have not been extensively used in nursing practice to date. One key problem is the failure of these systems to fully support actionable nursing practices that guide nurse decision-making. In addition, current workflow-related systems have failed to consider the specific workflo...
Article
Full-text available
Bibliographic analysis considers the author’s research areas, the citation network and the paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents, using a non-parametric extension of a combination of the Poisson mixed-topic link model and the auth...
Conference Paper
With the rapid development of clustering analysis technology, there have been many application-specific clustering algorithms, such as text clustering. K-Means algorithm, as one of the classic algorithms of clustering algorithms, and a textual document clustering algorithms commonly used in the analysis process, is widely used because of its simple...
Conference Paper
With the continuous development of network and database technology, share and conversion on heterogeneous data are still a complex problem currently facing. Current methods are mostly based on the similarity between the field to complete the matching process, although this method can finish partial matches on information, its time complexity is ver...
Conference Paper
On the foundation of policy research and semantic analysis of documents, we put forward a mining method of effective latent policy lineage relationship. We apply the factor space theory to policy research and propose a concept-factor decomposition method. With the combine concepts, we put forward the generic extraction method to mine latent genes....
Article
Full-text available
The 5th Asian Conference on Machine Learning (ACML 2013) was held on 13-15 November 2013, at the Australian National University, Canberra, Australia. ACML aims at providing a leading international forum for researchers in machine learning and related fields to share their new ideas and achievements. The conference called for research papers reporti...
Article
In applications we may want to compare different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or cross-collection modeling. We present a differential topic model for this application that models both topic differences and simil...
Conference Paper
Full-text available
Topic modeling is an unsupervised machine learning task of discovering topics, the underlying thematic structure in a text corpus. Dynamic topic models are capable of analysing the time evolution of topics. This paper explores the application of dynamic topic models on emergency department triage notes to identify particular types of disease or inj...
Conference Paper
Full-text available
Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are laden with opinions, their "dirty" nature (as natural language) has discouraged researchers from app...
Conference Paper
Full-text available
In topic modelling, various alternative priors have been de-veloped, for instance asymmetric and symmetric priors for the document-topic and topic-word matrices respectively, the hierarchical Dirichlet process prior for the document-topic matrix and the hierarchical Pitman-Yor process prior for the topic-word matrix. For information retrieval, lan-...
Article
Full-text available
Bibliographic analysis considers author’s research areas, the citation network and paper content among other things. In this paper, we combine these three in a topic model that produces a bibliographic model of authors, topics and documents using a non-parametric extension of a combination of the Poisson mixed-topic link model and the author-topic...
Article
In recent years, research on location predictions by mining trajectories of users has attracted a lot of attention. Existing studies on this topic mostly treat such predictions as just a type of location recommendation, that is, they predict the next ...
Conference Paper
Full-text available
Twitter, or the world of 140 characters poses serious challenges to the efficacy of topic models on short, messy text. While topic models such as Latent Dirichlet Allocation (LDA) have a long history of successful application to news articles and academic abstracts, they are often less coherent when applied to microblog content like Twitter. In thi...
Article
Full-text available
Decision tree induction systems are being used for knowledge acquisition in noisy domains. This paper develops a subjective Bayesian interpretation of the task tackled by these systems and the heuristic methods they use. It is argued that decision tree systems implicitly incorporate a prior belief that the simpler (in terms of decision tree complex...
Article
Full-text available
This paper presents a plausible reasoning system to illustrate some broad issues in knowledge representation: dualities between different reasoning forms, the difficulty of unifying complementary reasoning styles, and the approximate nature of plausible reasoning. These issues have a common underlying theme: there should be an underlying belief cal...
Article
Full-text available
Chain graphs combine directed and undirected graphs and their underlying mathematics combines properties of the two. This paper gives a simplified definition of chain graphs based on a hierarchical combination of Bayesian (directed) and Markov (undirected) networks. Examples of a chain graph are multivariate feed-forward networks, clustering with c...
Conference Paper
Full-text available
We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a point-wise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s)....
Conference Paper
Full-text available
We develop dependent hierarchical normalized random measures and apply them to dynamic topic modeling. The dependency arises via superposition, subsampling and point transition on the underlying Poisson processes of these measures. The measures used include normalised generalised Gamma processes that demonstrate power law properties, unlike Dirichl...
Conference Paper
Twitter data is extremely noisy – each tweet is short, unstructured and with informal language, a challenge for current topic modeling. On the other hand, tweets are accompanied by extra information such as authorship, hashtags and the user-follower network. Exploiting this additional information, we propose the Twitter-Network (TN) topic model to...
Conference Paper
Full-text available
We extend the Bayesian skill rating system of TrueSkill to accommodate score-based match outcomes. TrueSkill has proven to be a very effective algorithm for matchmaking -- the process of pairing competitors based on similar skill-level -- in competitive online gaming. However, for the case of two teams/players, TrueSkill only learns from win, lose,...
Article
Full-text available
Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some...
Article
Full-text available
We develop dependent hierarchical normalized random measures and apply them to dynamic topic modeling. The dependency arises via superposition, subsampling and point transition on the underlying Poisson processes of these measures. The measures used include normalised generalised Gamma processes that demonstrate power law properties, unlike Dirichl...
Article
Full-text available
Understanding how topics within a document evolve over the structure of the document is an interesting and potentially important problem in exploratory and predictive text analytics. In this article, we address this problem by presenting a novel variant of latent Dirichlet allocation (LDA): Sequential LDA (SeqLDA). This variant directly considers t...
Article
Full-text available
This paper presents theory for No