## About

443

Publications

94,447

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

39,103

Citations

## Publications

Publications (443)

The class of location-scale finite mixtures is of enduring interest both from applied and theoretical perspectives of probability and statistics. We establish and prove the following results: to an arbitrary degree of accuracy, (a) location-scale mixtures of a continuous probability density function (PDF) can approximate any continuous PDF, uniform...

There has been increasing attention to semi-supervised learning (SSL) approaches in machine learning to forming a classifier in situations where the training data for a classifier consists of a limited number of classified observations but a much larger number of unclassified observations. This is because the procurement of classified data can be q...

We consider the statistical analysis of heterogeneous data for clustering and prediction purposes, in situations where the observations include functions, typically time series. We extend the modeling with Mixtures-of-Experts (ME), as a framework of choice in modeling heterogeneity in data for prediction and clustering with vectorial observations,...

We supplement the article of Meng (2006) on the EM algorithm and its applications, providing also an update on its more recent developments and applications. The expectation–maximization algorithm, popularly known as the EM algorithm, is a general‐purpose algorithm for maximum‐likelihood estimation in a wide variety of situations best described as...

Multimorbidity constitutes a serious challenge on the healthcare systems in the world, due to its association with poorer health-related outcomes, more complex clinical management, increases in health service utilization and costs, but a decrease in productivity. However, to date, most evidence on multimorbidity is derived from cross-sectional stud...

Thin section microscopy has been historically used for modal mineralogy in exploration and for monitoring plant performance. Despite this, the technique relies on visual detection from expert mineralogists which is error prone and slow. Consequently, mineralogy characterisation has been largely replaced by automated mineralogy solutions like Scanni...

The statistical file-matching problem is a data integration problem with structured missing data. The general form involves the analysis of multiple datasets that only have a strict subset of variables jointly observed across all datasets. Missing-data imputation is complicated by the fact that the joint distribution of the variables is nonidentifi...

A flexible class of multivariate distributions called scale mixtures of fragmental normal (SMFN) distributions, is introduced. Its extension to the case of a finite mixture of SMFN (FM-SMFN) distributions is also proposed. The SMFN family of distributions is convenient and effective for modelling data with skewness, discrepant observations and popu...

The literature on non-normal model-based clustering has continued to grow in recent years. The non-normal models often take the form of a mixture of component densities that offer a high degree of flexibility in distributional shapes. They handle skewness in different ways, most typically by introducing latent ‘skewing’ variable(s), while some othe...

Data fusion involves the integration of multiple related datasets. The statistical file-matching problem is a canonical data fusion problem in multivariate analysis, where the objective is to characterise the joint distribution of a set of variables when only strict subsets of marginal distributions have been observed. Estimation of the covariance...

Mixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue spaces, when inputs and outputs variables are both compactly supported. We further prove an almost uniform convergence result when the input is un...

Comminution circuits can be modelled using phenomenological models of individual unit operations to represent the operation performance. Process optimisation and block model variability analysis require millions of simulations to fully explore process efficiency through mine scheduling which is costly and time consuming. Surrogate modelling is a te...

Parametric distributions are an important part of statistics. There is now a voluminous literature on different fascinating formulations of flexible distributions. We present a selective and brief overview of a small subset of these distributions, focusing on those that are obtained by scaling the mean and/or covariance matrix of the (multivariate)...

Finite mixture models are powerful tools for modeling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation–Maximization (EM) algorithm. Recently, the adoption of flexible distributions as component densities has become increasingly popular. Often, the EM algorithm fo...

Existing studies on spatial linear mixed models typically assume a normal distribution for the random error components. However, such an assumption may not be appropriate in many applications. This work relaxes the normality assumption of a generalized linear mixed model with spatial correlated by using an unrestricted multivariate skew-normal dist...

There has been increasing attention to semi-supervised learning (SSL) approaches in machine learning to forming a classifier in situations where the training data for a classifier consists of a limited number of classified observations but a much larger number of unclassified observations. This is because the procurement of classified data can be q...

Data-fusion involves the integration of multiple related datasets. The statistical file-matching problem is a canonical data-fusion problem in multivariate analysis, where the objective is to characterise the joint distribution of a set of variables when only strict subsets of marginal distributions have been observed. Estimation of the covariance...

Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset. If the manual annotation is carried out by multiple experts, the same training example can be...

Manual labelling of training examples is common practice in supervised learning. When the labelling task is of non-trivial difficulty, the supplied labels may not be equal to the ground-truth labels, and label noise is introduced into the training dataset. If the manual annotation is carried out by multiple experts, the same training example can be...

The determination of the number of mixture components (the order) of a finite mixture model has been an enduring problem in statistical inference. We prove that the closed testing principle leads to a sequential testing procedure (STP) that allows for confidence statements to be made regarding the order of a finite mixture model. We construct finit...

We consider the situation where the observed sample contains some observations whose class of origin is known (that is, they are classified with respect to the g underlying classes of interest), and where the remaining observations in the sample are unclassified (that is, their class labels are unknown). For class-conditional distributions taken to...

Cytometry is frequently used in immunological research, pre-clinical trials, clinical diagnosis, and monitoring of lymphomas, leukemia, and AIDS. However, analysis of modern high-throughput cytometric data presents great challenges for current computational tools due to the high dimensionality, large number of observations, as well as complex distr...

In recent years, several mixtures of skew factor analyzers have been proposed. These models adopt various skew distributions for either the factors or the errors, but not both. This paper examines the connections between these formulations and introduces a unified model that allows for skewness in both the factors and errors.

Mixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue spaces, when inputs and outputs variables are both compactly supported. We further prove an almost uniform convergence result when the input is un...

Mixture of experts (MoE) models are widely applied for conditional probability density estimation problems. We demonstrate the richness of the class of MoE models by proving denseness results in Lebesgue spaces, when inputs and outputs variables are both compactly supported. We further prove an almost uniform convergence result when the input is un...

There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence in the case where the absence of class...

Mixture-of-experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in...

Mixtures of factor analyzers (MFA) provide a powerful tool for modelling high-dimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors were relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the di...

The class of location-scale finite mixtures is of enduring interest both from applied and theoretical perspectives of probability and statistics. We prove the following results: to an arbitrary degree of accuracy, (a) location-scale mixtures of a continuous probability density function (PDF) can approximate any continuous PDF, uniformly, on a compa...

Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation–maximization (EM) algorithm framework, we demonstrate how mini-batch (MB) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the co...

Parametric distributions are an important part of statistics. There is now a voluminous literature on different fascinating formulations of flexible distributions. We present a selective and brief overview of a small subset of these distributions, focusing on those that are obtained by scaling the mean and/or covariance matrix of the (multivariate)...

Finite mixture models are powerful tools for modelling and analyzing heterogeneous data. Parameter estimation is typically carried out using maximum likelihood estimation via the Expectation-Maximization (EM) algorithm. Recently, the adoption of flexible distributions as component densities has become increasingly popular. Often, the EM algorithm f...

Given sufficiently many components, it is often cited that finite mixture models can approximate any other probability density function (pdf) to an arbitrary degree of accuracy. Unfortunately, the nature of this approximation result is often left unclear. We prove that finite mixture models constructed from pdfs in $${{\mathcal C}_0}$$ can be used...

We consider the situation where the observed sample contains some observations whose class of origin is known (that is, they are classified with respect to the g underlying classes of interest), and where the remaining observations in the sample are unclassified (that is, their class labels are unknown). For class-conditional distributions taken to...

Mixtures of skew component distributions are being applied widely to model and partition data into clusters that exhibit non-normal features such as asymmetry and tails heavier than the normal. The number of contributions on skew distributions are now so many that it is beyond the scope of this paper to include them all here. However, many of these...

As the COVID-19 pandemic spread worldwide, it has become clearer that prevalence of certain comorbidities in a given population could make it more vulnerable to serious outcomes of that disease, including fatality. Indeed, it might be insightful from a health policy perspective to identify clusters of populations in terms of the associations betwee...

In the study of multiple failure time data with recurrent clinical endpoints, the classical independent censoring assumption in survival analysis can be violated when the evolution of the recurrent events is correlated with a censoring mechanism such as death. Moreover, in some situations, a cure fraction appears in the data because a tangible prop...

Scale mixtures of normal (SMN) distributions, or normal variance mixture models, refer to a class of normal distributions where the covariance matrix is weighted by (a positive function of) a scale variable with a given prior distribution. Hence, the tails of these distributions can be regulated. They thus provide a robust alternative to the normal...

There has been increasing interest in using semi-supervised learning to form a classifier. As is well known, the (Fisher) information in an unclassified feature with unknown class label is less (considerably less for weakly separated classes) than that of a classified feature which has known class label. Hence assuming that the labels of the unclas...

This paper proposes a new regression model for the analysis of spatial panel data in the case of spatial heterogeneity and non-normality. In empirical economic research, the normality of error components is a routine assumption for the models with continuous responses. However, such an assumption may not be appropriate in many applications. This wo...

Many medical studies yield data on recurrent clinical events from populations which consist of a proportion of cured patients in the presence of those who experience the event at several times (uncured). A frailty mixture cure model has recently been postulated for such data, with an assumption that the random subject effect (frailty) of each uncur...

Existing studies on spatial panel data models typically assume a normal distribution for the random error components. This assumption may not be appropriate in many applications. Here we consider a more flexible and powerful approach that generalizes the traditional model. We propose a skew-normal generalized spatial panel data model that adopts a...

We comment on the paper of Murray, Browne, and McNicholas (2017), who proposed mixtures of skew distributions, which they termed hidden truncation hyperbolic (HTH). They recently made a clarification (Murray, Browne, McNicholas, 2019) concerning their claim that the so-called CFUST distribution is a special case of the HTH distribution. There are a...

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse semi-supervised learning as a missing data problem and identify situations where the missing label pattern is n...

We investigate model based classification with partially labelled training data. In many biostatistical applications, labels are manually assigned by experts, who may leave some observations unlabelled due to class uncertainty. We analyse semi-supervised learning as a missing data problem and identify situations where the missing label pattern is n...

The statistical matching problem is a data integration problem with structured missing data. The general form involves the analysis of multiple datasets that only have a strict subset of variables jointly observed across all datasets. The simplest version involves two datasets, labelled A and B, with three variables of interest $X, Y$ and $Z$. Vari...

The important role of finite mixture models in the statistical analysis of data is underscored by the ever-increasing rate at which articles on mixture applications appear in the statistical and general scientific literature. The aim of this article is to provide an up-to-date account of the theory and methodological developments underlying the app...

Given sufficiently many components, it is often cited that finite mixture models can approximate any other probability density function (pdf) to an arbitrary degree of accuracy. Unfortunately, the nature of this approximation result is often left unclear. We prove that finite mixture models constructed from pdfs in C_0 can be used to conduct approx...

Privacy is becoming increasingly important in collaborative data analysis, especially those involving personal or sensitive information commonly arising from health and commercial settings. The aim of privacy preserving statistical algorithms is to allow inference to be drawn on the joint data without disclosing private data held by each party. Thi...

Given sufficiently many components, it is often cited that finite mixture models can approximate any other probability density function (pdf) to an arbitrary degree of accuracy. Unfortunately, the nature of this approximation result is often left unclear. We prove that finite mixture models constructed from pdfs in $\mathcal{C}_{0}$ can be used to...

Mini-batch algorithms have become increasingly popular due to the requirement for solving optimization problems, based on large-scale data sets. Using an existing online expectation--maximization (EM) algorithm framework, we demonstrate how mini-batch (EM) algorithms may be constructed, and propose a scheme for the stochastic stabilization of the c...

Deep learning is a hierarchical inference method formed by subsequent multiple layers of learning able to more efficiently describe complex relationships. In this work, Deep Gaussian Mixture Models are introduced and discussed. A Deep Gaussian Mixture model (DGMM) is a network of multiple layers of latent variables, where, at each layer, the variab...

Kernel density estimators (KDEs) are ubiquitous tools for nonparametric estimation of probability density functions (PDFs), when data are obtained from unknown data generating processes. The KDEs that are typically available in software packages are defined, and designed, to estimate real-valued data. When applied to positive data, these typical KD...

In the present era of “Big Data”, data collection involving massive amount of features with a mix of variable types is commonplace. Mixture model-based techniques for statistical cluster analysis of mixed numerical and categorical feature data have their limitations, due to the difficulty in specifying appropriate component-densities when common mu...

We present a multilevel frailty model for handling serial dependence and simultaneous heterogeneity in survival data with a multilevel structure attributed to clustering of subjects and the presence of multiple failure outcomes. One commonly observes such data, for example, in multi‐institutional, randomized placebo‐controlled trials in which patie...

In the past few years, there have been a number of proposals for generalizing the factor analysis (FA) model and its mixture version (known as mixtures of factor analyzers (MFA)) using non-normal and asymmetric distributions. These models adopt various types of skew densities for either the factors or the errors. While the relationships between var...

The mitigation of false positives is an important issue when conducting multiple hypothesis testing. The most popular paradigm for false positives mitigation in high-dimensional applications is via the control of the false discovery rate (FDR). Multiple testing data from neuroimaging experiments can be very large, and reduced precision storage of s...

Kernel density estimators (KDEs) are ubiquitous tools for nonpara- metric estimation of probability density functions (PDFs), when data are obtained from unknown data generating processes. The KDEs that are typically available in software packages are defined, and designed, to estimate real-valued data. When applied to positive data, these typical...

Randomized neural networks (NNs) are an interesting alternative to conventional NNs that are more used for data modeling. The random vector functional-link (RVFL) network is an established and theoretically well-grounded randomized learning model. A key theoretical result for RVFL networks is that they provide universal approximation for continuous...

Soft-margin support vector machines (SVMs) are an important class of classification models that are well known to be highly accurate in a variety of settings and over many applications. The training of SVMs usually requires that the data be available all at once, in batch. The Stochastic majorization–minimization (SMM) algorithm framework allows fo...

Mixtures of factor analyzers (MFA) provide a powerful tool for modelling high-dimensional datasets. In recent years, several generalizations of MFA have been developed where the normality assumption of the factors and/or of the errors was relaxed to allow for skewness in the data. However, due to the form of the adopted component densities, the dis...

Calcium is a ubiquitous messenger in neural signaling events. An increasing number of techniques are enabling visualization of neurological activity in animal models via luminescent proteins that bind to calcium ions. These techniques generate large volumes of spatially correlated time series. A model-based functional data analysis methodology via...

Comparative profiling proteomics experiments are important tools in biological research. In such experiments, tens to hundreds of thousands of peptides are measured simultaneously, with the goal of inferring protein abundance levels. Statistical evaluation of these datasets are required to determine proteins that are differentially abundant between...

Clustering techniques are used to arrange genes in some natural way, that is, to organize genes into groups or clusters with similar behavior across relevant tissue samples (or cell lines). These techniques can also be applied to tissues rather than genes. Methods such as hierarchical agglomerative clustering, k-means clustering, the self-organizin...

This article introduces a robust extension of the mixture of factor analysis models based on the restricted multivariate skew-t distribution, called mixtures of skew-t factor analysis (MSTFA) model. This model can be viewed as a powerful tool for model-based clustering of high-dimensional data where observations in each cluster exhibit non-normal f...

Many real problems in supervised classification involve high-dimensional feature data measured for individuals of known origin from two or more classes. When the dimension of the feature vector is very large relative to the number of individuals, it presents formidable challenges to construct a discriminant rule (classifier) for assigning an unclas...

This paper presents a scheme for privacy-preserving clustering in a three-party scenario, focusing on cooperative training of multivariate mixture models. With modern-day big data often collected and stored across multiple independent parties, preservation of private data is an important issue during cross-party communications when carrying out sta...

Support vector machines (SVMs) are an important tool in modern data analysis. Traditionally, support vector machines have been fitted via quadratic programming, either using purpose-built or off-the-shelf algorithms. We present an alternative approach to SVM fitting via the majorization--minimization (MM) paradigm. Algorithms that are derived via M...

The polygonal distributions are a class of distributions that can be defined via the mixture of triangular distributions over the unit interval. The class includes the uniform and trapezoidal distributions, and is an alternative to the beta distribution. We demonstrate that the polygonal densities are dense in the class of continuous and concave de...

Mixture models are powerful tools for density estimation and cluster and discriminant analyses. They have enjoyed widespread popularity in biostatistics, biomedicine, medical imaging, and genetics, among many other applied fields. The mixture model framework provides a formal but convenient and flexible approach to model complex heterogeneous datas...

Big Data has become a pervasive and ubiquitous component of modern data analysis. Due to the pathologies of Big Data, such as network distribution or infeasible scalings of computational time, strategies are required for conducting effective analysis under such conditions. A traditional approach of computer science that has found success in Big Dat...

In recent years, finite mixtures of skew distributions are gaining popularity as a flexible tool for modelling data with asymmetric distributional features. Parameter estimation for these mixture models via the traditional EM algorithm requires the number of components to be specified a priori. In this paper, we consider unsupervised learning of sk...

An often-cited fact regarding mixing distributions is that their densities can approximate the densities of any unknown distribution to arbitrary degrees of accuracy provided that the mixing distribution is sufficiently complex. This fact is often not made concrete. We investigate theorems that provide approximation bounds for mixing distributions....

An increasing number of techniques are enabling visualization of neurological activity in animal models, generating large volumes of spatially correlated time series. A model-based functional data analysis technique for the clustering of such data is proposed and justified. An example analysis of a zebrafish imaging experiment is presented.

Mixture distributions are commonly being applied for modelling and for discriminant and cluster analyses in a wide variety of situations. We first consider normal and t-mixture models. As they are highly parameterized, we review methods to enable them to be fitted to large datasets involving many observations and variables. Attention is then given...

Triangular distributions are a well-known class of distributions that are often used as elementary example of a probability model. In the past, enumeration and order statistic-based methods have been suggested for the maximum likelihood (ML) estimation of such distributions. A novel parametrization of triangular distributions is presented. The para...

Functional data analysis (FDA) is an important modern paradigm for handling infinite-dimensional data. An important task in FDA is model-based clustering, which organizes functional populations into groups via subpopulation structures. The most common approach for model-based clustering of functional data is via mixtures of linear mixed-effects mod...

Finite mixtures of skew distributions provide a flexible tool for modelling heterogeneous data with asymmetric distributional features. However, parameter estimation via the Expectation-Maximization (EM) algorithm can become very time-consuming due to the complicated expressions involved in the E-step that are numerically expensive to evaluate. A m...

Finite mixtures of distributions have provided a mathematical-based approach to the statistical modeling of a wide variety of random phenomena. They underpin a variety of techniques in major areas of statistics, including cluster and latent class analyses, discriminant analysis, image analysis, and survival analysis, in addition to their more direc...

Triangular distributions are a well-known class of distributions that are often used as an elementary example of a probability model. Maximum likelihood estimation of the mode parameter of the triangular distribution over the unit interval can be performed via an order statistics-based method. It had been conjectured that such a method can be condu...

Recently, observed departures from the classical Gaussian mixture model in real datasets motivated the introduction of mixtures of skew t, and remarkably widened the application of model based clustering and classification to great many real datasets. Unfortunately, when data contamination occurs, classical inference for these models could be sever...

Finite mixture models have been widely used for the modelling and analysis of data from heterogeneous populations. Maximum likelihood estimation of the parameters is typically carried out via the Expectation-Maximization (EM) algorithm. The complexity of the implementation of the algorithm depends on the parametric distribution that is adopted as t...