Preprint

An Accelerated EM algorithm for mixture models with uncertainty for rating data

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the author.

Abstract

The paper is framed within the literature around Louis' identity for the observed information matrix in incomplete data problems, with a focus on the implied acceleration of maximum likelihood estimation for mixture models. The goal is twofold: to obtain direct expressions for standard errors of parameters from the EM algorithm and to reduce the computational burden of the estimation procedure for a class of mixture models with uncertainty for rating variables. This achievement fosters the feasibility of best-subset variable selection, which is an advisable strategy to identify response patterns from regression models for all Mixtures of Experts systems. The discussion is supported by simulation experiments and a real case study.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In sample surveys where people are asked to express their personal opinions it is conceivable to register a high level of indecision among respondents and this circumstance generates sub-optimal statistical analyses caused by large heterogeneity in the responses. In this paper, we discuss a model belonging to the class of generalized cub models which is worthwhile for this kind of surveys. Then, we examine some real case studies where the observed heterogeneity and the subjects’ indecision can be analyzed with the proposed approach leading to convincing interpretations. A comparison with more consolidated models and some concluding remarks end the paper.
Article
Full-text available
In this note, we study the first four moments of the MUB random variable, that is a mixture of two discrete random variables, recently introduced for the fitting of ranks data models. After a brief review of the location and variability indexes, the paper derives and discusses the asymmetry and the kurtosis measures, investigating the shape properties of this distribution on the admissible parametric space. Finally, the usefulness of the parameters moment estimators is shown in order to get starting values for the maximum likelihood estimation procedure.
Article
Full-text available
Personality measurement is based on the idea that values on an unobservable latent variable determine the distribution of answers on a manifest response scale. Typically, it is assumed in the Item Response Theory (IRT) that latent variables are related to the observed responses through continuous normal or logistic functions, determining the probability with which one of the ordered response alternatives on a Likert-scale item is chosen. Based on an analysis of 1731 self- and other-rated responses on the 240 NEO PI-3 questionnaire items, it was proposed that a viable alternative is a finite number of latent events which are related to manifest responses through a binomial function which has only one parameter-the probability with which a given statement is approved. For the majority of items, the best fit was obtained with a mixed-binomial distribution, which assumes two different subpopulations who endorse items with two different probabilities. It was shown that the fit of the binomial IRT model can be improved by assuming that about 10% of random noise is contained in the answers and by taking into account response biases toward one of the response categories. It was concluded that the binomial response model for the measurement of personality traits may be a workable alternative to the more habitual normal and logistic IRT models.
Article
Full-text available
In this paper, we propose preliminary estimators for the parameters of a mixture distribution introduced for the analysis of ordinal data where the mixture components are given by a Combination of a discrete Uniform and a shifted Binomial distribution (cub model). After reviewing some preliminary concepts related to the meaning of parameters which characterize such models, we introduce estimators which are related to the location and heterogeneity of the observed distributions, respectively, in order to accelerate the EM procedure for the maximum likelihood estimation. A simulation experiment has been performed to investigate their main features and to confirm their usefulness. A check of the proposal on real case studies and some comments conclude the paper.
Article
Full-text available
The paper deals with the numerical solution of the likelihood equations for incomplete data from exponential families, that is for data being a function of exponential family data. Illustrative examples especially studied in this paper concern grouped and censored normal samples and normal mixtures. A simple iterative method of solution is proposed and studied. It is shown that the sequence of iterates converges to a relative maximum of the likelihood function, and that the convergence is geometric with a factor of convergence which for large samples equals the maxi-mal relative loss of Fisher information due to the incompleteness of data. This large sample factor of convergence is illustrated diagrammaticaily for the examples mentioned above. Experiences of practical application are mentioned.
Article
Full-text available
In this paper we derive the observed information matrix for M U B models, without and with covariates. After a review of this class of models for ordinal data and of the E-M algorithms, we derive some closed forms for the asymptotic variance-covariance matrix of the maximum likelihood estimators of M U B models. Also, some new results about feeling and uncertainty parameters are presented. The work lingers over the computational aspects of the procedure with explicit reference to a matrix-oriented language. Finally, the finite sample performance of the asymptotic results is investigated by means of a simulation experiment. General considerations aimed at extending the application of M U B models conclude the paper.
Article
Full-text available
In this article we discuss the identifiability of a probability model which has been proven useful for capturing the main features of ordinal data generated by rating surveys. Specifically, we show that the mixture of a shifted Binomial and a Uniform discrete distribution is identifiable when the number of categories is greater than three.
Article
Full-text available
The movie distribution company Netflix has generated considerable buzz in the statistics community by offering a million dollar prize for improvements to its movie rating system. Among the statisticians and computer scientists who have disclosed their techniques, the emphasis has been on machine learning approaches. This article has the modest goal of discussing a simple model for movie rating and other forms of democratic rating. Because the model involves a large number of parameters, it is nontrivial to carry out maximum likelihood estimation. Here we derive a straightforward EM algorithm from the perspective of the more general MM algorithm. The algorithm is capable of finding the global maximum on a likelihood landscape littered with inferior modes. We apply two variants of the model to a dataset from the MovieLens archive and compare their results. Our model identifies quirky raters, redefines the raw rankings, and permits imputation of missing ratings. The model is intended to stimulate discussion and development of better theory rather than to win the prize. It has the added benefit of introducing readers to some of the issues connected with analyzing high-dimensional data.
Article
Full-text available
In the applications of finite mixture of regression models, a large number of covariates are often used and their contributions toward the response variable vary from one component to another of the mixture model. This creates a complex vari- able selection problem. Existing methods, such as AIC and BIC, are computationally expensive as the number of covariates and the components in the mixture model increase. In this paper, we introduce a penalized likelihood approach for variable se- lection in finite mixture of regression models. The new method introduces a penalty which depends on the sizes of regression coecients and the mixture structure. The new method is shown to have the desired sparsity property. A data adaptive method for selecting tuning parameters, and an EM-algorithm for ecient numerical compu- tations are developed. Simulations show that the method has very good performance with much lower demand on computing power. The new method is also illustrated by analyzing a real data set in marketing applications.
Article
The paper is the rejoinder to a series of Discussions on the class of cub models for rating data. The main topics advanced by Discussants are reviewed and debated, with focus on the most prominent issues. As a result, the trailhead of possible future research developments is outlined.
Article
Traditional statistical models with random effects account for heterogeneity in the population with respect to the location of the response in a subject-specific way. This approach ignores that also uncertainty of the responses can vary across individuals and items: for example, subject-specific indecision may play a role in the rating process relative to questionnaire items. In this setting, a generalized mixture model is advanced that accounts for subjective heterogeneity in response behaviour for multivariate ordinal responses: to this aim, random effects are specified for the individual propensity to a structured or an uncertain response attitude. Simulations and a case study illustrate the effectiveness of the proposed model and its implications.
Article
In this paper a novel approach is proposed for ranking data analysis that is based on marginal distributions. The marginal scores for one of the items to be ranked can be considered as a proxy of preference ratings and thus they can be employed to investigate drivers of the outcome. The rationale is that a model for the univariate marginal distributions should suitably convey information about the veracity of ordinal scores given that it would disregard the ranking constraints. The method relies on CUBREMOT (CUB REgression MOdel Trees), a tool for growing trees for ordinal resposes based on the local estimation of CUB models, a class of models able to explain both preferences and related uncertainty. Here we propose the Uncertainty Tree to disclose preference patterns on the basis of the model uncertainty component. Two applications on real data are discussed; comparison with other model-based trees is provided and a synthetic index for the selection of the best CUBREMOT is also advanced.
Article
This paper discusses a general framework for the analysis of rating and preference data that is rooted on a class of mixtures of discrete random variables. These models have been extensively studied and applied in the last 15 years thanks to a flexible and parsimonious parametrization of data generating process and to prompt interpretation of results. The approach considers the final response as the combination of feeling and uncertainty, by allowing for finer model specifications to include refuge options, response styles and possible overdispersion, also in relation to subjects’ and objects’ covariates. The article establishes the state of art of the research inherent to this paradigm, in terms of methodology, inferential procedures and fitting measures, by emphasizing capabilities and limitations yet establishing new findings. In particular, explicative power and predictive performances of cub statistical models for ordinal data are examined and new topics that could boost and support the modelling of uncertainty in this framework are provided. Possible developments are outlined throughout the whole presentation and final comments conclude the paper.
Article
The paper introduces a new technique for growing trees for ordinal responses in the model-based framework. We consider the class of CUB mixtures which is particularly appropriate to model perceptions and evaluations, as it designs the response process as the combination of a personal feeling and an inherent uncertainty. In our proposal, the partitioning process is based on the local estimation of CUBregression models to profile responses according to feeling and uncertainty conditional to the splitting variables. In this regard, two alternative splitting criteria are introduced featuring both inferential and fitting issues. Moreover, the chosen modelling framework allows for advantageous visualization of the classification results. The proposal is illustrated using real data from a survey conducted by the Italian National Statistical Office, with focus on perceived trust towards the European Parliament. A benchmark study is conducted to settle the proposal among the available tree methods for ordinal responses.
Article
Ordinal measurements as ratings, preference and evaluation data are very common in applied disciplines, and their analysis requires a proper modelling approach for interpretation, classification and prediction of response patterns. This work proposes a comparative discussion between two statistical frameworks that serve these goals: the established class of cumulative models and a class of mixtures of discrete random variables, denoted as CUB models, whose peculiar feature is the specification of an uncertainty component to deal with indecision and heterogeneity. After surveying their definition and main features, we compare the performances of the selected paradigms by means of simulation experiments and selected case studies. The paper is tailored to enrich the understanding of the two approaches by running an extensive and comparative analysis of results, relative advantages and limitations, also at graphical level. In conclusion, a summarising review of the key issues of the alternative strategies and some final remarks are given, aimed to support a unifying setting.
Article
Mixture models for ordinal responses in the tradition of cub models use the uniform distribution to account for uncertainty of respondents. A model is proposed that uses more flexible distributions in the uncertainty component: the discretized Beta distribution allows to account for response styles, in particular the preference for middle or extreme categories. The proposal is compared with traditional cub models in simulation studies and its use is illustrated by two applications.
Article
The present paper deals with the robustness of estimators and tests for ordinal response models. In this context, gross-errors in the response variable, specific deviations due to some respondents’ behavior, and outlying covariates can strongly affect the reliability of the maximum likelihood estimators and that of the related test procedures. The paper highlights that the choice of the link function can affect the robustness of inferential methods, and presents a comparison among the most frequently used links. Subsequently robust M-estimators are proposed as an alternative to maximum likelihood estimators. Their asymptotic properties are derived analytically, while their performance in finite samples is investigated through extensive numerical experiments either at the model or when data contaminations occur. Wald and t-tests for comparing nested models, derived from M-estimators, are also proposed. M based inference is shown to outperform maximum likelihood inference, producing more reliable results when robustness is a concern. © 2017, Institute of Mathematical Statistics. All rights reserved.
Article
In rating surveys, people are requested to express preferences on several aspects related to a topic by selecting a category in an ordered scale. For such data, we propose a model defined by a mixture of a uniform distribution and a Sarmanov distribution with CUB (combination of uniform and shifted binomial) marginal distributions (D'Elia and Piccolo, 2005). This mixture generalizes the CUB model to the multivariate case by taking into account the association among answers of the same individual to the items of a questionnaire. It also allows us to distinguish two kinds of uncertainty: specific uncertainty, related to the indecision for single items, and global uncertainty referred to the respondent's hesitancy in completing the whole questionnaire. A simulation and a real case study highlight the usefulness of the new methodology.
Article
This book introduces basic and advanced concepts of categorical regression with a focus on the structuring constituents of regression, including regularization techniques to structure predictors. In addition to standard methods such as the logit and probit model and extensions to multivariate settings, the author presents more recent developments in flexible and high-dimensional regression, which allow weakening of assumptions on the structuring of the predictor and yield fits that are closer to the data. A generalized linear model is used as a unifying framework whenever possible in particular parametric models that are treated within this framework. Many topics not normally included in books on categorical data analysis are treated here, such as nonparametric regression; selection of predictors by regularized estimation procedures; ternative models like the hurdle model and zero-inflated regression models for count data; and non-standard tree-based ensemble methods, which provide excellent tools for prediction and the handling of both nominal and ordered categorical predictors. The book is accompanied by an R package that contains data sets and code for all the examples.
Article
A general statistical model for ordinal or rating data, which includes some existing approaches as special cases, is proposed. The focus is on the CUB models and a new class of models, called Nonlinear CUB, which generalize CUB. In the framework of the Nonlinear CUB models, it is possible to express a transition probability, i.e. the probability of increasing one rating point at a given step of the decision process. Transition probabilities and the related transition plots are able to describe the state of mind of the respondents about the response scale used to express judgments. Unlike classical CUB, the Nonlinear CUB models are able to model decision processes with non-constant transition probabilities. R codes to estimate NLCUB models availabe at https://www.researchgate.net/publication/277717348_The_R_code_for_Nonlinear_CUB_models
Article
The EM algorithm is a numerical technique for the evaluation of maximum likelihood estimates for parameters describing incomplete data models. It is easy to apply in many problems and is stable but slow. The algorithm fails to provide a consistent estimator of the standard errors of the maximum likelihood estimates unless the additional analysis required by the Louis method is performed. Newton‐type or other gradient methods are faster and provide error estimates but tend to be unstable and require the analytical evaluation of likelihoods to derive expressions for the score function and (at least) approximations to the Fisher information matrix. The purpose of this paper is to expand on a result by Fisher that permits a unification of EM methodology and Newton methods. The evaluation of the individual observation‐by‐observation score functions of the incomplete data is a by‐product of the application of the E step of the EM algorithm. Once these become available, the Fisher information matrix may be consistently estimated, and the M step may be replaced by a fast Newton‐type step.
Article
A simple matrix formula is given for the observed information matrix when the EM algorithm is applied to categorical data with missing values. The formula requires only the design matrices, a matrix linking the complete and incomplete data, and a few simple derivatives. It can be easily programmed using a computer language with operators for matrix multiplication, element-by-element multiplication and division, matrix concatenation, and creation of diagonal and block diagonal arrays. The formula is applicable whenever the incomplete data can be expressed as a linear function of the complete data, such as when the observed counts represent the sum of latent classes, a supplemental margin, or the number censored. In addition, the formula applies to a wide variety of models for categorical data, including those with linear, logistic, and log-linear components. Examples include a linear model for genetics, a log-linear model for two variables and nonignorable nonresponse, the product of a log-linear model for two variables and a logit model for nonignorable nonresponse, a latent class model for the results of two diagnostic tests, and a product of linear models under double sampling.
Article
We argue that model selection uncertainty should be fully incorporated into statistical inference whenever estimation is sensitive to model choice and that choice is made with reference to the data. We consider different philosophies for achieving this goal and suggest strategies for data analysis. We illustrate our methods through three examples. The first is a Poisson regression of bird counts in which a choice is to be made between inclusion of one or both of two covariates. The second is a line transect data set for which different models yield substantially different estimates of abundance. The third is a simulated example in which truth is known.
Article
A procedure is derived for extracting the observed information matrix when the EM algorithm is used to find maximum likelihood estimates in incomplete data problems. The technique requires computation of a complete‐data gradient vector or second derivative matrix, but not those associated with the incomplete data likelihood. In addition, a method useful in speeding up the convergence of the EM algorithm is developed. Two examples are presented.
Article
The expectation maximization (EM) algorithm is a popular, and often remarkably simple, method for maximum likelihood estimation in incomplete-data problems. One criticism of EM in practice is that asymptotic variance–covariance matrices for parameters (e.g., standard errors) are not automatic byproducts, as they are when using some other methods, such as Newton–Raphson. In this article we define and illustrate a procedure that obtains numerically stable asymptotic variance–covariance matrices using only the code for computing the complete-data variance–covariance matrix, the code for EM itself, and code for standard matrix operations. The basic idea is to use the fact that the rate of convergence of EM is governed by the fractions of missing information to find the increased variability due to missing information to add to the complete-data variance–covariance matrix. We call this supplemented EM algorithm the SEM algorithm. Theory and particular examples reinforce the conclusion that the SEM algorithm can be a practically important supplement to EM in many problems. SEM is especially useful in multiparameter problems where only a subset of the parameters are affected by missing information and in parallel computing environments. SEM can also be used as a tool for monitoring whether EM has converged to a (local) maximum.
Article
A mixture model for preferences data, which adequately represents the composite nature of the elicitation mechanism in ranking processes, is proposed. Both probabilistic features of the mixture distribution and inferential and computational issues arising from the maximum likelihood parameters estimation are addressed. Moreover, empirical evidence from different data sets confirming the goodness of fit of the proposed model to many real preferences data is shown.
Article
A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Article
A general class of regression models for ordinal data is developed and discussed. These models utilize the ordinal nature of the data by describing various modes of stochastic ordering and this eliminates the need for assigning scores or otherwise assuming cardinality instead of ordinality. Two models in particular, the proportional odds and the proportional hazards models are likely to be most useful in practice because of the simplicity of their interpretation. These linear models are shown to be multivariate extensions of generalized linear models. Extensions to non‐linear models are discussed and it is shown that even here the method of iteratively reweighted least squares converges to the maximum likelihood estimate, a property which greatly simplifies the necessary computation. Applications are discussed with the aid of examples.
Article
This article examines incomplete data for the class of generalized linear models, in which incompleteness is due to partially missing covariates on some observations. Under the assumption that the missing data are missing at random, it is shown that the E step of the EM algorithm for any generalized linear model can be expressed as a weighted complete data log-likelihood when the unobserved covariates are assumed to come from a discrete distribution with finite range. Expressing the E step in this manner allows for a straightforward maximization in the M step, thus leading to maximum likelihood estimates (MLE's) for the parameters. Asymptotic variances of the MLE's are also derived, and results are illustrated with two examples.
Article
Many real life problems require the classification of items into naturally ordered classes. These problems are traditionally handled by conventional methods intended for the classification of nominal classes where the order relation is ignored. This paper introduces a new machine learning paradigm intended for multi-class classification problems where the classes are ordered. The theoretical development of this paradigm is carried out under the key idea that the random variable class associated with a given query should follow a unimodal distribution. In this context, two approaches are considered: a parametric, where the random variable class is assumed to follow a specific discrete distribution; a nonparametric, where the random variable class is assumed to be distribution-free. In either case, the unimodal model can be implemented in practice by means of feedforward neural networks and support vector machines, for instance. Nevertheless, our main focus is on feedforward neural networks. We also introduce a new coefficient, r(int), to measure the performance of ordinal data classifiers. An experimental study with artificial and real datasets is presented in order to illustrate the performances of both parametric and nonparametric approaches and compare them with the performances of other methods. The superiority of the parametric approach is suggested, namely when flexible discrete distributions, a new concept introduced here, are considered.
Article
A simple explicit formula is provided for the matrix of second derivatives of the observed data log-likelihood in terms of derivatives of the criterion function (conditional expectation of the complete data log-likelihood given the observed data) invoked by the EM algorithm.
Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization
  • M Corduas
Corduas M. (2011). Assessing similarity of rating distributions by Kullback-Liebler divergence, in: Fichet A et al. (eds.) Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization. Berlin, Heidelberg: Springer-Verlag, pp.221-228.
German General Social Survey (ALLBUS) -Cumulation 1980-2014, GESIS Data Archive, Cologne. ZA4584 Data file version
GESIS Leibniz Institute for the Social Sciences (2016). German General Social Survey (ALLBUS) -Cumulation 1980-2014, GESIS Data Archive, Cologne. ZA4584 Data file version 1.0.0. DOI: 10.4232/1.12574
Selecting feeling covariates in rating surveys
  • M Iannario
Iannario M. (2008). Selecting feeling covariates in rating surveys. Statistica Applicata, 20(2), 121-134.
CUB: A Class of Mixture Models for Ordinal Data
  • M Iannario
  • D Piccolo
  • R Simone
Iannario M., Piccolo D., Simone R. (2018). CUB: A Class of Mixture Models for Ordinal Data. (R package version 1.1.3), http://CRAN.R-project.org/ package=CUB
Maximum likelihood estimation using pseudo-data interactions
  • T A Louis
Louis T.A. (1976). Maximum likelihood estimation using pseudo-data interactions. Boston University Research Report, No. 2-76.
On the generalised distance in statistics
  • P C Mahalanobis
Mahalanobis P.C. (1936). On the generalised distance in statistics, Proceedings of the National Institute of Sciences of India, 2(1), 49-55.
  • G J Mclachlan
  • T Krishnan
McLachlan G.J., Krishnan T. (1997). The EM Algorithm and Extensions, 2nd Edition, Wiley Series in Probability and Statistics.