About
101
Publications
15,985
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,786
Citations
Publications
Publications (101)
We propose a general-purpose approximation to the Ferguson-Klass algorithm for generating samples from L\'evy processes without Gaussian components. We show that the proposed method is more than 1000 times faster than the standard Ferguson-Klass algorithm without a significant loss of precision. This method can open an avenue for computationally ef...
Capture‐recapture (CR) data and corresponding models have been used extensively to estimate the size of wildlife populations when detection probability is less than 1. When the locations of traps or cameras used to capture or detect individuals are known, spatially‐explicit CR models are used to infer the spatial pattern of the individual locations...
DNA-based biodiversity surveys involve collecting physical samples from survey sites and assaying the contents in the laboratory to detect species via their diagnostic DNA sequences. DNA-based surveys are increasingly being adopted for biodiversity monitoring. The most commonly employed method is metabarcoding, which combines PCR with high-throughp...
Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analysing these data. Although these models have been parameterised and fitted using different ap...
We introduce a Loss Discounting Framework for model and forecast combination which generalises and combines Bayesian model synthesis and generalized Bayes methodologies. We use a loss function to score the performance
of different models and introduce a multilevel discounting scheme which allows a flexible specification of the dynamics of the model...
The distribution assessment and monitoring of species is key to reliable environmental impact assessments and conservation interventions. Considerable effort is directed towards survey and monitoring of great crested newts (Triturus cristatus) in England. Surveys are increasingly undertaken using indirect methodologies, such as environmental DNA (e...
Environmental DNA (eDNA) surveys have become a popular tool for assessing the distribution of species. However, it is known that false positive and false negative observation error can occur at both stages of eDNA surveys, namely the field sampling stage and laboratory analysis stage. We present an RShiny app that implements the Griffin et al. (202...
Cyber security is an important concern for all individuals, organisations and governments globally. Cyber attacks have become more sophisticated, frequent and dangerous than ever, and traditional anomaly detection methods have been proved to be less effective when dealing with these new classes of cyber threats. In order to address this, both class...
Ecological surveys risk incurring false negative and false positive detections of the target species. With indirect survey methods, such as environmental DNA, such error can occur at two stages: sample collection and laboratory analysis. Here we analyse a large qPCR based eDNA data set using two occupancy models, one of which accounts for false pos...
This paper reviews global-local prior distributions for Bayesian inference in high-dimensional regression problems including important properties of priors and efficient Markov chain Monte Carlo methods for inference. A chemometric example in drug discovery is used to compare the predictive performance of these methods with popular methods such as...
The Probability of Informed Trading (PIN) is a widely used indicator of information asymmetry risk in the trading of securities. Its estimation using maximum likelihood algorithms has been shown to be problematic, resulting in biased or unavailable estimates, especially in the case of liquid and frequently traded assets. We provide an alternative a...
Bayesian variable selection is an important method for discovering variables which are most useful for explaining the variation in a response. The widespread use of this method has been restricted by the challenging computational problem of sampling from the corresponding posterior distribution. Recently, the use of adaptive Monte Carlo methods has...
We present a novel Bayesian nonparametric model for regression in survival analysis. Our model builds on the classical neutral to the right model of Doksum (1974) and on the Cox proportional hazards model of Kim and Lee (2003). The use of a vector of dependent Bayesian nonparametric priors allows us to efficiently model the hazard as a function of...
Environmental DNA (eDNA) surveys have become a popular tool for assessing the distribution of species. However, it is known that false positive and false negative observation error can occur at both stages of eDNA surveys, namely the field sampling stage and laboratory analysis stage.
We present an RShiny app that implements the Griffin et al. (201...
Detecting and correctly classifying malicious executables has become one of the major concerns in cyber security, especially because traditional detection systems have become less effective with the increasing number and danger of threats found nowadays. One way to differentiate benign from malicious executables is to leverage on their hexadecimal...
Wildlife monitoring for open populations can be performed using a number of different survey methods. Each survey method gives rise to a type of data and, in the last five decades, a large number of associated statistical models have been developed for analysing these data. Although these models have been parameterised and fitted using different ap...
We present a novel nonparametric Bayesian approach for performing cluster analysis in a context where observational units have data arising from multiple sources. Our approach uses a particle Gibbs sampler for inference in which cluster allocations are jointly updated using a conditional particle filter within a Gibbs sampler, improving the mixing...
We consider jointly modelling a finite collection of quantiles over time. Formal Bayesian inference on quantiles is challenging since we need access to both the quantile function and the likelihood. We propose a flexible Bayesian time-varying transformation model, which allows the likelihood and the quantile function to be directly calculated. We d...
Nowadays cyber security is an important concern for all individuals, organisations and governments globally. Cyber attacks have become more sophisticated, frequent and more dangerous than ever, and traditional anomaly detection methods have been proven to be less effective when dealing with these new classes of cyber attacks. In order to address th...
The efficient use of testing resources is crucial in the fight against doping in sports. The athlete biological passport relies on the need to identify the right athletes to test, and the right time to test them. Here we present an approach to longitudinal tracking of athlete performance to provide an additional, more intelligence-led approach to i...
Environmental DNA is a survey tool with rapidly expanding applications for assessing the presence of a species at surveyed sites. Environmental DNA methodology is known to be prone to false negative and false positive errors at the data collection and laboratory analysis stages. Existing models for environmental DNA data require augmentation with a...
Many ecological sampling schemes do not allow for unique marking of individuals. Instead, only counts of individuals detected on each sampling occasion are available. In this paper, we propose as novel approach for modelling count data in an open population where individuals can arrive and depart from the site during the sampling period. A Bayesian...
We present particleMDI, a Julia package for performing integrative cluster analysis on multiple heterogeneous data sets, built within the framework of multiple data integration (MDI). particleMDI updates cluster allocations using a particle Gibbs approach which offers better mixing of the MCMC chain—but at greater computational cost—than the origin...
Variance estimation is central to many questions in finance and economics. Until now ex post variance estimation has been based on infill asymptotic assumptions that exploit high-frequency data. This article offers a new exact finite sample approach to estimating ex post variance using Bayesian nonparametric methods. In contrast to the classical co...
Many ecological sampling schemes do not allow for unique marking of individuals. Instead, only counts of individuals detected on each sampling occasion are available. In this paper, we propose a novel approach for modelling count data in an open population where individuals can arrive and depart from the site during the sampling period. A Bayesian...
We present particleMDI, a Julia package for performing integrative cluster analysis on multiple heterogeneous data sets, built within the framework of multiple data integration (MDI). particleMDI updates cluster allocations using a particle Gibbs approach which offers better mixing of the MCMC chain—but at greater computational cost—than the origin...
This paper shows that rejection sampling with a two-piece L´evy intensity envelope can outperform both the Ferguson-Klass algorithm and previously proposed envelopes for simulating realisations of completely random measures typically used in Bayesian nonparametric statistics
Vector autoregressive (VAR) models are the main work-horse model for macroeconomic forecasting, and provide a framework for the analysis of complex dynamics that are present between macroeconomic variables. Whether a classical or a Bayesian approach is adopted, most VAR models are linear with Gaussian innovations. This can limit the model’s ability...
The Probability of Informed Trading (PIN) is a widely used indicator of information asymmetry risk in the trading of securities. Its estimation using maximum likelihood algorithms has been shown to be problematic, resulting in biased estimates, especially in the case of liquid and frequently traded assets. We provide an alternative approach to esti...
The availability of data sets with large numbers of variables is rapidly increasing. The effective application of Bayesian variable selection methods for regression with these data sets has proved difficult since available Markov chain Monte Carlo methods do not perform well in typical problem sizes of interest. The current paper proposes new adapt...
The use of Bayesian nonparametrics models has increased rapidly over the last few decades driven by increasing computational power and the development of efficient Markov chain Monte Carlo algorithms. We review some applications of these models in economic applications including: volatility modelling (using both stochastic volatility models and GAR...
Financial prices are usually modelled as continuous, often involving geometric Brownian motion with drift, leverage and possibly jump components. An alternative modelling approach allows financial observations to take integer values that are multiples of a fixed quantity, the ticksize - the monetary value associated with a single change during the...
Normalized random measures with independent increments are a tractable and wide class of nonparametric prior. Sequential Monte Carlo methods are developed for both conjugate and non-conjugate models. Methods for improving efficiency by including Markov chain Monte Carlo steps without increasing computational com-plexity are discussed. A simulation...
We consider jointly modelling a finite collection of quantiles over time under a Bayesian nonparametric framework. Formal Bayesian inference on quantiles is challenging since we need access to both the quantile function and the likelihood (which is given by the derivative of the inverse quantile function). We propose a flexible Bayesian transformat...
Normalized compound random measures are flexible nonparametric priors for related distributions. We consider building general nonparametric regression models using normalized compound random measure mixture models. We develop a general approach to the unbiased estimation of Laplace functionals of compound random measure (which includes completely r...
In some linear models, such as those with interactions, it is natural to include the relationship between the regression coefficients in the analysis. In this paper, we consider how robust hierarchical continuous prior distributions can be used to express dependence between the size but not the sign of the regression coefficients. For example, to i...
We explore time variation in the shape of the conditional return distribution using a model of multiple quantiles. We propose a joint model of scale (proxied by the interquartile range) and other quantiles standardised by the scale. The model allows us to capture the scale and shape of the distribution in one step without making assumptions about t...
The importance of including jumps in prices in models of financial returns has long been understood. However, there has been relatively little work considering the dynamics of the jumps. In this paper, a stochastic volatility model with jumps is developed in which the jumps follow a Hawkes process which depends on the volatility process. Inference...
Vector autoregressive (VAR) models are the main work-horse model for macroeconomic forecasting, and provide a framework for the analysis of complex dynamics that are present between macroeconomic variables. Whether a classical or a Bayesian approach is adopted, most VAR models are linear with Gaussian innovations. This can limit the model’s ability...
The increasing size of data sets has lead to variable selection in regression
becoming increasingly important. Bayesian approaches are attractive since they
allow uncertainty about the choice of variables to be formally included in the
analysis. The application of fully Bayesian variable selection methods to large
data sets is computationally chall...
This paper describes a new class of dependent random measures which we call
compound random measure and the use of normalized versions of these random
measures as priors in Bayesian nonparametric mixture models. Their tractability
allows the properties of both compound random measures and normalized compound
random measures to be derived. In partic...
This paper presents a method for adaptation in Metropolis-Hastings algorithms. A product of a proposal density and K copies of the target density is used to define a joint density which is sampled by a Gibbs sampler including a Metropolis step. This provides a framework for adaptation since the current value of all K copies of the tar-get distribut...
A wide range of methods, Bayesian and others, tackle regression when there are many variables. In the Bayesian context, the prior is constructed to reflect ideas of variable selection and to encourage appropriate shrinkage. The prior needs to be reasonably robust to different signal to noise structures. Two simple evergreen prior constructions stem...
Many exact Markov chain Monte Carlo algorithms have been developed for
posterior inference in Bayesian nonparametric models which involve
infinite-dimensional priors. However, these methods are not generic and special
methodology must be developed for different classes of prior or different
models. Alternatively, the infinite-dimensional prior can...
Sparse regression problems, where it is usually assumed that there are many
variables and that the effects of a large subset of variables are negligible,
have become increasingly important. This paper describes the construction of
hierarchical prior distributions when the effects are considered related. These
priors allow dependence between the reg...
The MC$^3$ (Madigan and York, 1995) and Gibbs (George and McCulloch, 1997)
samplers are the most widely implemented algorithms for Bayesian Model
Averaging (BMA) in linear regression models. These samplers draw a variable at
random in each iteration using uniform selection probabilities and then propose
to update that variable. This may be computat...
This article describes methods for efficient posterior simulation for Bayesian variable
selection in Generalized Linear Models with many regressors but few observations.
The algorithms use a proposal on model space which contains a tuneable parameter. An
adaptive approach to choosing this tuning parameter is described which allows automatic,
effici...
A methodology for the simultaneous Bayesian non-parametric modelling of several distributions is developed. Our approach uses normalized random measures with independent increments and builds dependence through the superposition of shared processes. The properties of the prior are described and the modelling possibilities of this framework are expl...
We present a nonparametric Bayesian method for disease subtype discovery in
multi-dimensional cancer data. Our method can simultaneously analyse a wide
range of data types, allowing for both agreement and disagreement between their
underlying clustering structure. It includes feature selection and infers the
most likely number of disease subtypes,...
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, wh...
This paper develops a rich class of sparsity priors for regression effects that encourage shrinkage of both regression effects and contrasts between effects to zero whilst leaving sizeable real effects largely unshrunk. The construction of these priors uses some properties of normal-gamma distributions to include design features in the prior specif...
This paper examines prior choice in probit regression through a predictive cross-validation criterion. In particular, we focus
on situations where the number of potential covariates is far larger than the number of observations, such as in gene expression
data. Cross-validation avoids the tendency of such models to fit perfectly. We choose the scal...
We propose a novel Bayesian method for dynamic regression models where both the value of the regression coefficients and the variables selected change over time. We focus on the parsimony and forecasting performance of these models and develop a prior which allows the shrinkage of the regression coefficients to change over time. An efficient MCMC m...
The Lasso has sparked interest in the use of penalization of the log-likelihood for variable selection, as well as for shrinkage. We are particularly interested in the more-variables-than-observations case of characteristic importance for modern data. The Bayesian interpretation of the Lasso as the maximum a posteriori estimate of the regression co...
This article is divided into two parts. The first part considers flexible parametric models while the latter is nonparametric. It gives applications to regional growth data and semi parametric estimation of binomial proportions. It reviews methods for flexible mean regression, using either basis functions or Gaussian processes. This article also di...
In stochastic frontier analysis, firm-specific efficiencies and their distribution are often main variables of interest. If
firms fall into several groups, it is natural to allow each group to have its own distribution. This paper considers a method
for nonparametrically modelling these distributions using Dirichlet processes. A common problem when...
A Bayesian semiparametric stochastic volatility model for financial data is developed. This nonparametrically estimates the return distribution from the data allowing for stylized facts such as heavy tails of the distribution of returns whilst also allowing for correlation between the returns and changes in volatility, which is usually termed the l...
This paper introduces a new class of time-varying, meaure-valued stochastic processes for Bayesian nonparametric inference. The class of priors generalizes the normalized ran-dom measure (Kingman 1975) construction for static problems. The unnormalized measure on any measureable set follows an Ornstein-Uhlenbeck process as described by Barndorff-Ni...
A multivariate distribution which generalizes the Dirichlet distribution is introduced and its use for modeling overdispersion in count data is discussed. The distribution is constructed by normalizing a vector of independent tempered stable random variables. General formulae for all moments and cross-moments of the distribution are derived and the...
This paper describes a Bayesian nonparametric approach to volatility estimation. Volatility is assumed to follow a superposition
of an infinite number of Ornstein–Uhlenbeck processes driven by a compound Poisson process with a parametric or nonparametric
jump size distribution. This model allows a wide range of possible dependencies and marginal di...
This paper considers the problem of defining a time-dependent nonparametric prior for use in Bayesian nonparametric modelling of time series. A recursive construction allows the definition of priors whose marginals have a general stick-breaking form. The processes with Poisson-Dirichlet and Dirichlet process marginals are investigated in some detai...
In this paper, we consider the problem of modelling a pair of related distribu-tions using Bayesian nonparametric methods. A representation of the distributions as weighted sums of distributions is derived through normalisation. This allows us to define several classes of nonparametric priors. The properties of these distributions are explored and...
This paper proposes a novel volatility model that draws from the existing literature on autoregressive stochastic volatility models, aggregation of autoregressive processes, and Bayesian nonparametric modelling to create a dynamic SV model that can explain long range dependence. The volatility process is assumed to be the aggregate of autoregressiv...
We propose a more efficient version of the slice sampler for Dirichlet process mixture models described by Walker (Commun.
Stat., Simul. Comput. 36:45–54, 2007). This new sampler allows for the fitting of infinite mixture models with a wide-range of prior specifications. To illustrate
this flexibility we consider priors defined through infinite seq...
In this paper we discuss the problem of Bayesian fully nonparametric regression. A new construction of priors for nonparametric regression is discussed and a specific prior, the Dirichlet Process Regression Smoother, is proposed. We consider the problem of centring our process over a class of regression models and propose fully nonparametric regres...
This paper presents a method for Bayesian nonparametric analysis of the return distribution in a stochastic volatility model. The distribution of the logarithm of the squared return is flexibly modelled using an infinite mixture of Normal distributions. This allows efficient Markov chain Monte Carlo methods to be developed. Links between the return...
We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily...
The infinite mixture of normals model has become a popular method for density estimation problems. This paper proposes an alternative hierarchical model that leads to hyperparameters that can be interpreted as the location, scale and smoothness of the density. The priors on other parts of the model have little effect on the density estimates and ca...
This paper considers the effects of placing an absolutely continuous prior distribution on the regression coefficients of a linear model. We show that the posterior expectation is a matrix-shrunken version of the least squares estimate where the shrinkage matrix depends on the derivatives of the prior predictive den-sity of the least squares estima...
This article describes posterior simulation methods for mixture models whose mixing distribution has a Normalized Random Measure prior. The methods use slice sampling ideas and introduce no truncation error. The approach can be easily applied to both homogeneous and nonhomogeneous Normalized Random Measures and allows the updating of the parameters...
In this article we describe Bayesian nonparametric procedures for two-sample
hypothesis testing. Namely, given two sets of samples y^{(1)} iid F^{(1)} and
y^{(2)} iid F^{(2)}, with F^{(1)}, F^{(2)} unknown, we wish to evaluate the
evidence for the null hypothesis H_{0}:F^{(1)} = F^{(2)} versus the
alternative. Our method is based upon a nonparametr...
Model search in probit regression is often conducted by simultaneously exploring the model and parameter space, using a reversible jump MCMC sampler. Standard sam-plers often have low model acceptance probabilities when there are many more regres-sors than observations. Implementing recent suggestions in the literature leads to much higher acceptan...
This paper studies the problem of covariance estimation when prices are observed non-synchronously and contaminated by i.i.d. microstructure noise. We derive closed form expressions for the bias and variance of three popular covariance estimators, namely realised covariance, realised covariance plus lead and lag adjustments, and the Hayashi and Yos...
Continuous superpositions of Ornstein-Uhlenbeck processes are proposed as a model for asset return volatility. An interesting class of continuous superpositions is defined by a Gamma mixing distribution which can define long memory processes. In contrast, previously studied discrete superpositions cannot generate this behaviour. Efficient Markov ch...
This article introduces a new model for transaction prices in the presence of market microstructure noise in order to study the properties of the price process on two different time scales, namely, transaction time where prices are sampled with every transaction and tick time where prices are sampled with every price change. Both sampling schemes h...
This paper introduces new and flexible classes of inefficiency distributions for stochastic frontier models. We consider both generalized gamma distributions and mixtures of generalized gamma distri-butions. These classes cover many interesting cases and accommodate both positively and negatively skewed composed error distributions. Bayesian method...
The lasso (Tibshirani,1996) has sparked interest in the use of penalization of the log-likelihood for variable selection, as well as shrinkage. Recently, there have been attempts to propose penalty functions which improve upon the Lassos prop-erties for variable selection and prediction, such as SCAD (Fan and Li, 2001) and the Adaptive Lasso (Zou,...
Markov chain Monte Carlo (MCMC) methods have become a ubiquitous tool in Bayesian analysis. This paper implements MCMC methods for Bayesian analysis of stochastic frontier models using the WinBUGS package, a freely available software. General code for cross-sectional and panel data are presented and various ways of summarizing posterior inference a...
In this paper we propose a new framework for Bayesian nonparametric modelling with continuous covariates. In particular, we allow the nonparametric distribution to depend on covariates through order- ing the random variables building the weights in the stick-breaking representation. We focus mostly on the class of random distributions which induces...
The problem of variable selection in regression and the generalised linear model is ad-dressed. We adopt a Bayesian approach with priors for the regression coefficients that are scale mixtures of normal distributions and embody a high prior probability of proximity to zero. By seeking modal estimates we generalise the lasso. Properties of the prior...
In this paper we propose a semiparametric Bayesian framework for the analysis of stochastic frontiers and efficiency measurement. The distribution of inefficiencies is modelled nonparametrically through a Dirichlet process prior. We suggest prior distributions and implement a Bayesian analysis through an efficient Markov chain Monte Carlo sampler,...