# Antonio PunzoUniversity of Catania | UNICT · Department of Economics and Business

Antonio Punzo

PhD in Methodological and Applied Statistics

## About

122

Publications

9,349

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

1,529

Citations

Introduction

Current research interests: Mixture Models, Hidden Markov Models, Model-Based Clustering and Classification, Heavy-tailed Distributions

Additional affiliations

February 2013 - July 2013

January 2012 - present

## Publications

Publications (122)

This paper develops a quantile hidden semi-Markov regression to jointly estimate multiple quantiles for the analysis of multivariate time series. The approach is based upon the Multivariate Asymmetric Laplace (MAL) distribution, which allows to model the quantiles of all univariate conditional distributions of a multivariate response simultaneously...

Hidden Markov models (HMMs) have been extensively used in the univariate and multivariate literature. However, there has been an increased interest in the analysis of matrix-variate data over the recent years. In this manuscript we introduce HMMs for matrix-variate balanced longitudinal data, by assuming a matrix normal distribution in each hidden...

We propose the family of dimension-wise scaled normal mixtures (DSNMs) to model the joint distribution of a d-variate random variable with real-valued components. Each member of the family generalizes the multivariate normal (MN) distribution in two directions. Firstly, the DSNM has a more general type of symmetry with respect to the elliptical sym...

Two families of parsimonious mixture models are introduced for model-based clustering. They are based on two multivariate distributions-the shifted exponential normal and the tail-inflated normal-recently introduced in the literature as heavy-tailed generalizations of the multivariate normal. Parsimony is attained by the eigen-decomposition of the...

We propose a general approach to detect measurement non-invariance in latent Markov models for longitudinal data. We define different notions of differential item functioning in the context of panel data. We then present a model selection approach based on the Bayesian information criterion (BIC) to choose both the number of latent states and the m...

In the original publication of the article, the line after equation (5) has been published incorrectly

Cluster-weighted models (CWMs) extend finite mixtures of regressions (FMRs) in order to allow the distribution of covariates to contribute to the clustering process. In a matrix-variate framework, the matrix-variate normal CWM has been recently introduced. However, problems may be encountered when data exhibit skewness or other deviations from norm...

Much work has been done in the area of the cluster weighted model (CWM), which extends the finite mixture of regression model to include modelling of the covariates. Although many types of distributions have been considered for both the response(s) and covariates, to our knowledge skewed distributions have not yet been considered in this paradigm....

Analysis of matrix-variate data is becoming ever more prevalent in the literature, especially in the area of clustering and classification. Real data, including real matrix-variate data, are often contaminated by potential outlying observations. Their detection, as well as the development of models insensitive to their presence, is particularly imp...

Hidden Markov models (HMMs) have been extensively used in the univariate and multivariate literature. However, there has been an increased interest in the analysis of matrix-variate data over the recent years. In this manuscript we introduce HMMs for matrix-variate longitudinal data, by assuming a matrix normal distribution in each hidden state. Su...

Finite mixtures of regressions with fixed covariates are a commonly used model-based clustering methodology to deal with regression data. However, they assume assignment independence, i.e., the allocation of data points to the clusters is made independently of the distribution of the covariates. To take into account the latter aspect, finite mixtur...

Despite recent methodological advances in hidden Markov regression models and a rapid increase in their application in a wide range of empirical settings, complex clustering-based research questions that include the contribution of the covariates set to the classification and the presence of atypical observations are often addressed ignoring the po...

Many statistical problems involve the estimation of a d×d orthogonal matrix Q. Such an estimation is often challenging due to the orthonormality constraints on Q. To cope with this problem, we use the well-known PLU decomposition, which factorizes any invertible d×d matrix as the product of a d×d permutation matrix P, a d×d unit lower triangular ma...

Finite mixtures of regressions with fixed covariates are a commonly used model-based clustering methodology to deal with regression data. However, they assume assignment independence, i.e. the allocation of data points to the clusters is made independently of the distribution of the covariates. In order to take into account the latter aspect, finit...

Much work has been done in the area of the cluster weighted model (CWM), which extends the finite mixture of regression model to include modelling of the covariates. Although many types of distributions have been considered for both the response and covariates, to our knowledge skewed distributions have not yet been considered in this paradigm. Her...

One of the challenges in cluster analysis is the evaluation of the obtained clustering results without using auxiliary information. To this end, a common approach is to use internal validity criteria. For mixtures of linear regressions whose parameters are estimated via the maximum likelihood approach, we propose a three-term decomposition of the t...

The search of appropriate models for describing the currency return distribution is one of the main interests not only in finance, but also in the more recent trans-disciplinary econophysics research field. Such a search is recently focusing on cryptocurrencies, due to their proliferation. Although there is no agreement of what theoretical models a...

In allometric studies, the joint distribution of the log-transformed morphometric variables is typically symmetric and with heavy tails. Moreover, in the bivariate case, it is customary to explain the morphometric variation of these variables by fitting a convenient line, as for example the first principal component (PC). To account for all these p...

The expectation–maximization (EM) algorithm is a familiar tool for computing the maximum likelihood estimate of the parameters in hidden Markov and semi‐Markov models. This paper carries out a detailed study on the influence that the initial values of the parameters impose on the results produced by the algorithm. We compare random starts and parti...

The research objective of this paper is to handle situations where the empirical distribution of multivariate real-valued data is elliptical and with heavy tails. Many statistical models already exist that accommodate these peculiarities. This paper enriches this branch of literature by introducing the multivariate tail-inflated normal (MTIN) distr...

A correct modelization of the insurance losses distribution is crucial in the insurance industry. This distribution is generally highly positively skewed, unimodal hump-shaped, and with a heavy right tail. Compound models are a profitable way to accommodate situations in which some of the probability masses are shifted to the tails of the distribut...

Two matrix-variate distributions, both elliptical heavy-tailed generalization of the matrix-variate normal distribution, are introduced. They belong to the normal scale mixture family, and are respectively obtained by choosing a convenient shifted exponential or uniform as mixing distribution. Moreover, they have a closed-form for the probability d...

This paper introduces the multivariate tail-inflated normal (MTIN) distribution, an elliptical heavy-tails generalization of the multivariate normal (MN). The MTIN belongs to the family of MN scale mixtures by choosing a convenient continuous uniform as mixing distribution. Moreover, it has a closed-form for the probability density function charact...

Analysis of three-way data is becoming ever more prevalent in the literature, especially in the area of clustering and classification. Real data, including real three-way data, are often contaminated by potential outlying observations. Their detection, as well as the development of robust models insensitive to their presence, is particularly import...

This article shows how multivariate elliptically contoured (EC) distributions, parameterized according to the mean vector and covariance matrix, can be built from univariate standard symmetric distributions. The obtained distributions are referred to as moment-parameterized EC (MEC) herein. As a further novelty, the article shows how to polynomiall...

In allometric studies, the joint distribution of the log‐transformed morphometric variables is typically elliptical and with heavy tails. To account for these peculiarities, we introduce the multivariate shifted exponential normal (MSEN) distribution , an elliptical heavy‐tailed generalization of the multivariate normal (MN). The MSEN belongs to th...

Mixtures of regression models (MRMs) are widely used to investigate the relationship between variables coming from several unknown latent homogeneous groups. Usually, the conditional distribution of the response in each mixture component is assumed to be (multivariate) normal (MN-MRM). To robustify the approach with respect to possible elliptical h...

The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers (also referred to as ‘bad’ points herein) and automatically detect bad points. The price of these advantages is two additional pa...

We propose a model-based clustering procedure where each component can take into account cluster-specific mild outliers through a flexible distributional assumption, and a proportion of observations is additionally trimmed. We propose a penalized likelihood approach for estimation and selection of the proportions of mild and gross outliers. A theor...

The Mincer human capital earnings function is a regression model that relates individual’s earnings to schooling and experience. It has been used to explain individual behavior with respect to educational choices and to indicate productivity on a large number of countries and across many different demographic groups. However, recent empirical studi...

The contaminated Gaussian distribution represents a simple heavy-tailed elliptical generalization of the Gaussian distribution; unlike the often-considered t-distribution, it also allows for automatic detection of mild outlying or “bad” points in the same way that observations are typically assigned to the groups in the finite mixture model context...

In many countries, income inequality has reached its highest level over the past half century. In the labor market, the technological progress has widened the earnings gap between high- and low-skilled workers. Changes in the structure of households, with a growing percentage of single-headed households, and in family formation, with an increased e...

While latent class (LC) models with distal outcomes are becoming popular in literature as a consequence of the increasing use of stepwise estimators, these models still suffer from severe shortcomings. Namely, using the currently available stepwise estimators the direct effects between the distal outcome and the indicators of the LC membership cann...

One of the challenges in cluster analysis is the evaluation of the obtained clustering results without using auxiliary information. To this end, a common approach is to use internal validity criteria. For mixtures of linear regressions whose parameters are estimated by maximum likelihood, we propose a three-term decomposition of the total sum of sq...

Many statistical problems involve the estimation of a $\left(d\times d\right)$ orthogonal matrix $\textbf{Q}$. Such an estimation is often challenging due to the orthonormality constraints on $\textbf{Q}$. To cope with this problem, we propose a very simple decomposition for orthogonal matrices which we abbreviate as PLR decomposition. It produces...

The empirical distribution of the loss given default (LGD) has support [0,1], contains an excess of 0s and 1s, and is often multimodal on (0,1). Though some parametric models have been used in the credit risk literature to model the LGD distribution, these peculiarities call for more flexible approaches. Thus, we introduce a zero‐and‐one inflated m...

We introduce multivariate models for the analysis of stock market returns. Our models are developed under hidden Markov and semi-Markov settings to describe the temporal evolution of returns, whereas the marginal distribution of returns is described by a mixture of multivariate leptokurtic-normal (LN) distributions. Compared to the normal distribut...

Mixtures of multivariate contaminated shifted asymmetric Laplace distributions are developed for handling asymmetric clusters in the presence of outliers (also referred to as bad points herein). In addition to the parameters of the related non-contaminated mixture, for each (asymmetric) cluster, our model has one parameter controlling the proportio...

Insurance and economic data are often positive, and we need to take into account this peculiarity in choosing a statistical model for their distribution. An example is the inverse Gaussian (IG), which is one of the most famous and considered distributions with positive support. With the aim of increasing the use of the IG distribution on insurance...

The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers, referred to as "bad" points. The MCN can also automatically detect bad points. The price of these advantages is two additional p...

Cluster-weighted models (CWMs) are mixtures of regression models with random covariates. However, besides having recently become rather popular in statistics and data mining, there is still a lack of support for CWMs within the most popular statistical suites. In this paper, we introduce flexCWM, an R package specifically conceived for fitting CWMs...

We introduce the R package ContaminatedMixt, conceived to disseminate the use of mixtures of multivariate contaminated normal distributions as a tool for robust clustering and classification under the common assumption of elliptically contoured groups. Thirteen variants of the model are also implemented to introduce parsimony. The expectationcondit...

We explore the possibility of discovering extreme voting patterns in the U.S. Congressional voting records by drawing ideas from the mixture of contaminated normal distributions. A mixture of latent trait models via contaminated normal distributions is proposed. We assume that the low dimensional continuous latent variable comes from a contaminated...

A time‐varying latent variable model is proposed to jointly analyze multivariate mixed‐support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. H...

The Gaussian cluster-weighted model (CWM) is a mixture of regression models with random covariates that allows for flexible clustering of a random vector composed of a response variable and some covariates. In each mixture component, a Gaussian distribution is adopted for both the covariates and the response given the covariates. To make the approa...

Insurance and economic data are frequently characterized by positivity, skewness, leptokurtosis, and multi-modality; although many parametric models have been used in the literature, often these peculiarities call for more flexible approaches. Here, we propose a finite mixture of contaminated gamma distributions that provides a better characterizat...

Usually in Latent Class Analysis (LCA), external predictors are taken to be cluster conditional probability predictors (LC models with covariates), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class specific distribution is of interest in the distal outcome model, when...

The distribution of insurance losses has a positive support and is often unimodal hump-shaped, right-skewed and with heavy tails. In this work, we introduce a 3-parameter compound model to account for all these peculiarities. As conditional distribution, we consider a 2-parameter unimodal hump-shaped distribution with positive support, parameterize...

The inverse Gaussian (IG) is one of the most famous and considered distributions with positive support. We propose a convenient mode-based parameterization yielding the reparametrized IG (rIG) distribution; it allows/simplifies the use of the IG distribution in various statistical fields, and we give some examples in nonparametric statistics, robus...

Surveys are used to infer the level of social integration of immigrants. Item response theory helps to describe the relationship among responses to test items and latent traits of interest. However, in the presence of nonignorable missing data, which are omitted responses depending on the latent traits to be measured, estimates of the model paramet...

The Ljung–Box test is typically used to test serial independence even if, by construction, it is generally powerful only in presence of pairwise linear dependence between lagged variables. To overcome this problem, Bagnato et al. recently proposed a simple statistic defining a serial independence test which, differently from the Ljung–Box test, is...

en This article proposes the elliptical multivariate leptokurtic‐normal (MLN) distribution to fit data with excess kurtosis. The MLN distribution is a multivariate Gram–Charlier expansion of the multivariate normal (MN) distribution and has a closed‐form representation characterized by one additional parameter denoting the excess kurtosis. It is ob...

Portmanteau tests are typically used to test serial independence even if, by construction, they are generally powerful only in presence of pairwise dependence between lagged variables. In this paper we present a simple statistic defining a new serial independence test which is able to detect more general forms of dependence. In particular, differen...

In recent years, increasing attention has been directed toward problems inherent to quality control in healthcare services. In particular, it is necessary to measure effectiveness with respect to improving healthcare outcomes of diagnostic procedures or specific treatment episodes. The performance of hospitals is usually evaluated by multilevel mod...

The modelling of animal movement is an important ecological and environmental issue. It is well-known that animals change their movement patterns over time, according to observable and unobservable factors. To trace the dynamics of behaviors, to identify factors influencing these dynamics and unobserved characteristics driving intra-subjects correl...

BACKGROUND In broad terms, and apart from ethnic discriminatory rules enforced in some places and at some times, residential segregation may be ascribed both to economic inhomogeneities in the urban space (e.g., in the cost of rents, or in occupation opportunities) and to spatial attraction among individuals sharing the same group identity and cult...

We introduce the R package ContaminatedMixt, conceived to disseminate the use of mixtures of multivariate contaminated normal distributions as a tool for robust clustering and classification under the common assumption of elliptically contoured groups. Thirteen variants of the model are also implemented to introduce parsimony. The expectation-condi...

A class of multivariate linear models under the longitudinal setting, in which unobserved heterogeneity may evolve over time, is introduced. A latent structure is considered to model heterogeneity, having a discrete support and following a first-order Markov chain. Heavy-tailed multivariate distributions are introduced to deal with outliers. Maximu...

The autodependogram is a graphical device recently proposed in the literature to analyze autodependencies. It is defined computing the classical Pearson -statistics of independence at various lags in order to point out the presence lag-depedencies. This paper proposes an improvement of this diagram obtained by substituting the -statistics with an e...

Gaussian mixture models with eigen-decomposed covariance structures, i.e. the Gaussian parsimonious clustering models (GPCM), make up the most popular family of mixture models for clustering and classification. Although the GPCM family has been used for almost 20 years, selecting the best member of the family in a given situation remains a troubles...

The analysis of the decision boundaries plays an important role in understanding the characteristics of a classifier in the framework of model-based clustering and discriminant analysis. The wider is the family of decision boundaries generated by a classifier the larger is its flexibility for classification purposes. In this paper, we present rigor...

The Gaussian hidden Markov model (HMM) is widely considered for the analysis of heterogeneous continuous multivariate longitudinal data. To robustify this approach with respect to possible elliptical heavy-tailed departures from normality, due to the presence of outliers, spurious points, or noise (collectively referred to as bad points herein), th...

The cluster-weighted model (CWM) is a mixture model with random covariates that allows for flexible clustering/classification and distribution estimation of a random vector composed of a response variable and a set of covariates. Within this class of models, the generalized linear exponential CWM is here introduced especially for modeling bivariate...

Cluster-weighted models (CWMs) are a flexible family of mixture models for fitting the joint distribution of a random vector composed of a response variable and a set of covariates. CWMs act as a convex combination of the products of the marginal distribution of the covariates and the conditional distribution of the response given the covariates. I...

Cluster-weighted models represent a convenient approach for model-based clustering, especially when the covariates contribute to defining the cluster-structure of the data. However, applicability may be limited when the number of covariates is high and performance may be affected by noise and outliers. To overcome these problems, common/uncommon \(...

Detecting and measuring lag-dependencies is very important in time-series analysis.
This study is commonly carried out by focusing on the linear lag-dependencies via the
well-known autocorrelogram. However, in practice, there are many situations in which
the autocorrelogram fails because of the nonlinear structure of the serial dependence.
To cope...

A family of parsimonious Gaussian cluster-weighted models (CWMs) is
presented. This family concerns a multivariate extension to cluster-weighted
modelling that can account for correlations between multivariate response.
Parsimony is attained by constraining parts of an eigen-decomposition imposed
on the component covariance matrices. A sufficient c...

The Gaussian cluster-weighted model (CWM) is a mixture of regression models
with random covariates that allows for flexible clustering of a random vector
composed of response variables and covariates. In each mixture component, it
adopts a Gaussian distribution for both the covariates and the responses given
the covariates. To robustify the approac...

The contaminated Gaussian distribution represents a simple robust elliptical
generalization of the Gaussian distribution; differently from the
often-considered $t$-distribution, it also allows for automatic detection of
outliers, spurious points, or noise (collectively referred to as bad points
herein). Starting from this distribution, we propose t...

This article reviews some nonparametric serial independence tests based on measures of divergence between densities. Among others, the well-known Kullback–Leibler, Hellinger, Tsallis, and Rosenblatt divergences are analyzed. Moreover, their copula-based version is taken into account. Via a wide simulation study, the performances of the considered s...

Various parametric/nonparametric techniques have been proposed in literature to graduate mortality data as a function of age. Nonparametric approaches, as for example kernel smoothing regression, are often preferred because they do not assume any particular mortality law. Among the existing kernel smoothing approaches, the recently proposed (univar...

The dissimilarity index of Duncan and Duncan is widely used in a broad range of contexts to assess the overall extent of segregation in the allocation of two groups in two or more units. Its sensitivity to random allocation implies an upward bias with respect to the unknown amount of systematic segregation. In this article, following a multinomial...

Item response theory (IRT) models are a class of statistical models used to describe the response behaviors of individuals to a set of items having a certain number of options. They are adopted by researchers in social science, particularly in the analysis of performance or attitudinal data, in psychology, education, medicine, marketing and other �...

In the context of mixture models with random covariates, this article presents the polynomial Gaussian cluster-weighted model (CWM). It extends the linear Gaussian CWM, for bivariate data, in a twofold way. First, it allows for possible nonlinear dependencies in the mixture components by considering a polynomial regression. Second, it is not restri...

Gaussian mixture models with eigen-decomposed covariance structures make up
the most popular family of mixture models for clustering and classification,
i.e., the Gaussian parsimonious clustering models (GPCM). Although the GPCM
family has been used for almost 20 years, selecting the best member of the
family in a given situation remains a troubles...

We introduce the R package DBKGrad, conceived to facilitate the use of kernel smoothing in graduating mortality rates. The package implements univariate and bivariate adaptive discrete beta kernel estimators. Discrete kernels have been preferred because, in this context, variables such as age, calendar year and duration, are pragmatically considere...

A mixture of contaminated Gaussian distributions is developed for robust mixture model-based clustering. In addition to the usual parameters, each component of our contaminated mixture has a parameter controlling the proportion of outliers, spurious points, or noise (collectively referred to as bad points herein) and one specifying the degree of co...

A novel family of twelve mixture models with random covariates, nested in the
linear $t$ cluster-weighted model (CWM), is introduced for model-based
clustering. The linear $t$ CWM was recently presented as a robust alternative
to the better known linear Gaussian CWM. The proposed family of models provides
a unified framework that also includes the...

Contaminated mixture models are developed for model-based clustering of data
with asymmetric clusters as well as spurious points, outliers, and/or noise.
Specifically, we introduce a contaminated mixture of contaminated shifted
asymmetric Laplace distributions and a contaminated mixture of contaminated
skew-normal distributions. In each case, mixtu...

It is well known that non ignorable item non response may occur when the cause
of the non response is the value of the latent variable of interest. In these cases, a
refusal by a respondent to answer specific questions in a survey should be treated
sometimes as a non ignorable item non response. The Rasch-Rasch model (RRM)
is a new two-dimensional...

This paper enlarges the covariance configurations, on which the classical linear discriminant analysis is based, by considering the four models arising from the spectral decomposition when eigenvalues and/or eigenvectors matrices are allowed to vary or not between groups. Similarly to the classical approach, the assessment of these configurations i...