Article

A Mixed Poisson-Inverse Gaussian Regression Model

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The mixed Poisson–inverse-Gaussian distribution has been used by Holla, Sankaran, Sichel, and others in univariate problems involving counts. We propose a Poisson–inverse-Gaussian regression model which can be used for regression analysis of counts. The model provides an attractive framework for incorporating random effects in Poisson regression models and in handling extra-Poisson variation. Maximum-likelihood and quasilikelihood-moment estimation is investigated and illustrated with an example involving motor-insurance claims.Un mélange pondéré de lois de Poisson, avec des poids suivant une loi gaussienne inverse, a été utilisé par Holla, Sankaran, Sichel et d'autres comme modèle unidimensionnel dans des problèmes de dénombrement. Nous proposons un modèle de régression basé sur un tel mélange. Celui-ci peut être utilisé dans des analyses de régression faites à partir de dénombrements. Il permet d'incorporer des effets aléatoires dans un modèle de régression basé sur la loi de Poisson, ainsi que le traitement de la variation non représentée par la loi de Poisson. L'estimation par la méthode du maximum de vraisemblance et par la quasi-vraisemblance/moments est étudiée et illustrée à l'aide de données au sujet de réclamations relatives à l'assurance automobile.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Several models have been developed to overcome overdispersion, e.g., generalized Poisson regression (GPR) [5][6][7], negative binomial regression (NBR) [8][9][10], inverse Gaussian Poisson regression (IGPR) [11][12][13] and the Poisson-generalized Lindley distribution [14]. Some of these models are derived from a mixed Poisson distribution, which is a blend of the Poisson distribution with other distributions, continuous or discrete. ...
... where E(V ) = 1 and Var(V ) = τ [11]. The IGP distribution consists of two parameters, μ (mean) as a location parameter and τ (dispersion parameter) as a shape parameter. ...
... where τ is the overdispersion parameter Var(V ), which is caused by the presence of heterogeneity or diversity related to the observation unit with a specific character [11]. ...
Chapter
Infant mortality has generally been increasing and has become an issue that urgently needs to be addressed. As the number of infant deaths is count data, a Poisson regression model is needed to determine the causal factors. However, the assumption of equidispersion in Poisson regression is rarely satisfied. The overdispersion issue is frequently found in real data. Thus, this research employs mixed Poisson distribution modeling to overcome the overdispersion issue, namely, the inverse Gaussian Poisson regression (IGPR) model. In this study, a simple IGPR model, a modified IGPR model, and the negative binomial regression (NBR) model are compared. The results show that the modified IGPR model and the NBR model with an exposure variable outperform the benchmark, based on the global deviance and Akaike Information Criteria (AIC) value, to model the number of infant deaths in East Nusa Tenggara, Indonesia. The significant predictors that affect the number of infant mortalities are the percentage of complete basic immunization, the percentage of low birth weight (LBW), the percentage of babies under six months who receive exclusive breastfeeding, the percentage of infants who receive vitamin A, and the percentage of births assisted by health workers in the district.
... The PIG regression model is a special form of the Sichel distribution proposed by [8]. In this model, the shape parameter in the Sichel distribution is set to be a constant, that is, s = − T . ...
... Therefore, the model is characterized by two parameters in contrast with the SI model. This model is more tractable and very useful when handling data exhibiting longer tails compared to those of a NB model [8,35]. A variety of Poisson mixture distributions that can be used to handle count data have been proposed, [29]. ...
... y ( ) is a modified Bessel function of a third kind. This parametrization was first used in literature with Š = ,[8].As → 0 the distribution converges to a Poisson distribution. Under this parametrization the distribution can be communicated as a multiplicative arbitrary effect model. ...
... Common types of CRMs include the Poisson regression model (PRM), which assumes that the counts follow a Poisson distribution with a mean equal to variance, and the Negative Binomial regression model (NBRM), which relaxes the equality assumption and allows for over-dispersion. Conway-Maxwell-Poisson regression model (CMPRM) is an extension of the traditional PRM, designed to address over-and underdispersion in count data, and the Poisson-inverse Gaussian regression model (PIGRM) is an alternative to the NBRM because the PIG distribution has slightly longer tails and larger kurtosis [13,46,49]. The PRM is often used with count datasets, assuming equal dispersion where the mean and variance are equivalent. ...
... To address overdispersion in count data, various regression models can be employed. Common approaches include NBRM, CMPRM, and PIGRM [13], which are used as alternatives to NBRM. Willmot [49] used the 983 PIGRM as an alternative to the NBRM, as it is particularly well-suited for highly skewed count datasets with heterogeneity due to its capability to model long-tailed data, making it preferable over the NBRM in such cases. ...
Article
The Poisson-Inverse Gaussian regression model (PIGRM) is commonly used to analyze count datasets with over-dispersion. While the maximum likelihood estimator (MLE) is a standard choice for estimating PIGRM parameters, its performance may be suboptimal in the presence of correlated explanatory variables. To overcome this limitation, we introduce a novel Liu-type estimator for PIGRM. Our analysis includes an examination of the matrix mean square error (MMSE) and scalar mean square error (SMSE) properties of the proposed estimator, comparing them with those of the MLE, ridge, and Liu estimators. We also present several parameters of the Liu-type estimator for PIGRM. We evaluated the performance of the proposed estimator through a simulation study and application to real-life data, using SMSE as the primary evaluation criterion. Our results demonstrate that the proposed estimators outperform the MLE, ridge, and Liu estimators in both simulated and real-world scenarios.
... For example, Negative Binomial regression model, which is a mixture of Poisson and Gamma distribution (Greenwood & Yule [1]), is extensively used while modelling over-dispersed count data and it overcomes the equi-dispersion property of the Poisson regression by describing the heterogeneity in the data by an additional parameter. Poisson inverse Gaussian regression model (Willmot [2]) is another initial Poisson mixture model proposed to model over-dispersed discrete data. These models though widely used in all fields cannot accommo-date all types of count data and cannot always describe the relationships that exist between variables efficiently. ...
... Assume that the random variables X 1 , X 2 , ..., X n follow a TPPXG distribution, and X (1) , X (2) , ..., X (n) are the corresponding order statistics. The cdf for i th order statistic, say, Z = X (i) is given by ...
... For example, Negative Binomial regression model, which is a mixture of Poisson and Gamma distribution (Greenwood & Yule [1]), is extensively used while modelling over-dispersed count data and it overcomes the equi-dispersion property of the Poisson regression by describing the heterogeneity in the data by an additional parameter. Poisson inverse Gaussian regression model (Willmot [2]) is another initial Poisson mixture model proposed to model over-dispersed discrete data. These models though widely used in all fields cannot accommo-date all types of count data and cannot always describe the relationships that exist between variables efficiently. ...
... Assume that the random variables X 1 , X 2 , ..., X n follow a TPPXG distribution, and X (1) , X (2) , ..., X (n) are the corresponding order statistics. The cdf for i th order statistic, say, Z = X (i) is given by ...
Article
Full-text available
Count data regression is one of the widely used techniques for modelling medical data and has applications in various other fields as well. Count data usually exhibits the characteristic of over-dispersion and new models are developed to open up the opportunity to model such kind of data. This paper introduces a new probability model namely two parameter Poisson-Xgamma distribution and its associated count data regression model is also developed. The structural properties of the proposed probability model like moments, skewness, kurtosis and generating functions among others have been derived. The parameter estimation for the proposed model is discussed using two well-known methods. A Monte Carlo simulation is carried out to investigate the finite sample behavior of maximum likelihood estimates for parameters of new regression model. Furthermore, the empirical applications of the proposed probability model and its associated regression model are validated using two health care data sets. Based on some famous statistical criteria, the proposed model demonstrates better fitting results as compared to other existing discrete models.
... When dealing with count time series, overdispersion is commonly observed (Barreto-Souza 2017; Gonçalves and Barreto-Souza 2020), a peculiarity that makes the Poisson not suitable, since it is equidispersed. A solution for solving this problem is to consider mixed Poisson distributions like the Negative binomial (NB) and Poisson inverse Gaussian (PIG) (Dean, Lawless, and Willmot 1989). In addition to these, alternative approaches to model correlated counts using different conditional distributions, such as the Conway-Maxwell-Poisson and Bernoulli-Geometric, have arisen in literature (Davis et al. ...
... The PIG distribution was introduced into the time series context by Barreto-Souza (2017) using the INteger-valued AutoRegressive (INAR) structure. However, the use of the PIG in regression models was proposed in Dean et al. (1989) for modeling insurance data, being considered an attractive distribution in the presence of heavy tails. For further information on modeling insurance data using the PIG regression, see Willmot (1987). ...
Article
Full-text available
Extensions of the Autoregressive Moving Average, ARMA(p, q), class for modeling non-Gaussian time series have been proposed in the literature in recent years, being applied in phenomena such as counts and rates. One of them is the Generalized Autoregres-sive Moving Average, GARMA(p, q), that is supported by the Generalized Linear Models theory and has been studied under the Bayesian perspective. This paper aimed to study models for time series of counts using the Poisson, Negative binomial and Poisson inverse Gaussian distributions, and adopting the Bayesian framework. To do so, we carried out a simulation study and, in addition, we showed a practical application and evaluation of these models by using a set of real data, corresponding to the number of vehicle thefts in Brazil.
... Since the inverse Gaussian distribution is more flexible than the gamma distribution, the SI distribution is considered a potential alternative to the negative binomial model. Many researchers such as Stein and Juritz (1988) and Dean et al. (1989) have investigated the linear regression model with the SI distribution. The basis of working with this distribution is the generalized Bessel function of the third type, which is defined as Equation 1. ...
Preprint
Full-text available
Planning to reduce accidents and the resulting costs often require crash prediction processes. Since the distribution of data related to the frequency of crashes often has long tails in addition to overdispersion, studies have shown that sometimes the negative binomial (NB) distribution cannot properly model this type of data. In this study, using the geometric, traffic and crash data of the main intersections in the city of Qazvin, Iran, with the help of NB and Sichel (SI) models in two fixed and predictor-dependent dispersion states of the crash counts, the effect of the two variables of traffic volume and lane width on the number of crashes has been analyzed. According to the analysis, it was found that the full generalized models can better show the effect of predictor variables on the number of crashes. Since the full generalized SI model (-Loglik = 180.03, AIC = 368.06, and BIC = 375.78) has lower goodness of fit criteria than the full generalized NB model (-Loglik = 183.36, AIC = 374.73, and BIC = 383.45), it has better efficiency. The conclusion does not apply for the models with reduced variables. The results show that the dispersion parameter of the SI model can estimate the level of dispersion with more accuracy and confidence.
... To better accommodate these data characteristics and improve model fit, alternatives to NB regression have been developed. For instance, Dean et al. (1989) examined the Poisson-inverse Gaussian regression model. The Poisson-inverse Gaussian distribution has a longer tail than the NB distribution. ...
Article
Full-text available
In the transportation services industry, the proper assessment of insurance claim count distribution is an important step to determine insurance premiums based on policyholders’ risk profiles. Risk factors are identified through regression analysis. In this paper, the inverse trinomial distribution is proposed as a count data model for insurance claims characterised by having long tails and a high index of dispersion. Two regression models are developed to identify associated risk factors. Other popular models, such as the negative binomial and COM-Poisson, are fitted and compared to information criteria. The risk profiles of policyholders are determined based on the selected model. To illustrate the application of the inverse trinomial regression models, the ausprivautolong dataset of automobile claims in Australia has been fitted with identification of risk factors.
... To obtain strata, the SDI was subjected to the k-means clustering technique, in which the number of SDI bands was identified by the elbow graph. To explain the relationship between the SDI and the VDI, the Poisson-Inverse Gaussian regression model [25] was employed, which showed adjustment according to the Akaike metric (AIC) [26]. All calculations were performed using the R statistical programming language version 4.1.0, ...
Preprint
Full-text available
Lymphatic filariasis (LF) is a neglected tropical disease associated with poverty and poor environmental conditions. With the inclusion of vector control activities in LF surveillance actions, there is a need to develop simple methods to identify areas with higher mosquito density and consequent risk of W. bancrofti transmission. An ecological study was conducted in Igarassu, Metropolitan Region of Recife, Pernambuco, Brazil. The mosquitoes were captured in 2,060 houses distributed across 117 census tracts. The vector density index (VDI) was constructed, that measures the average number of lymphatic filariasis transmitting mosquitoes per number of houses collected in the risk stratum. Moreover, the social deprivation indicator (SDI) was constructed and was carried out through principal component factor analysis. The average number of female C. quinquefasciatus found in the high-risk stratum was 242, while the low-risk stratum had an average of 108. The overall VDI was 6.8 mosquitoes per household. The VDI for the high-risk stratum was 13.2 mosquitoes per household, while for the low/medium-risk stratum, it was 5.2. This study offers an SDI for the density of C. quinquefasciatus mosquitoes, which reduces the cost associated with data collection and allows for indicating priority areas for vector control actions.
... To obtain strata, the SDI was subjected to the k-means clustering technique, in which the number of SDI bands was identified by an elbow graph. To explain the relationship between the SDI and the VDI, the Poisson inverse Gaussian (PIG) regression model [26] was employed, which showed adjustment according to the generalized Akaike metric (GAIC) [27]. ...
Article
Full-text available
Lymphatic filariasis (LF) is a neglected tropical disease associated with poverty and poor environmental conditions. With the inclusion of vector control activities in LF surveillance actions, there is a need to develop simple methods to identify areas with higher mosquito density and thus a higher consequent risk of W. bancrofti transmission. An ecological study was conducted in Igarassu, which is in the metropolitan region of Recife, Pernambuco, Brazil. The mosquitoes were captured in 2060 houses distributed across 117 census tracts. The vector density index (VDI), which measures the average number of lymphatic-filariasis-transmitting mosquitoes per number of houses collected in the risk stratum, was constructed. Moreover, the social deprivation indicator (SDI) was constructed and calculated through principal component factor analysis. An average of 242 female C. quinquefasciatus were found in the high-risk stratum, while the average in the low-risk stratum was 108. The overall VDI was 6.8 mosquitoes per household. The VDI for the high-risk stratum was 13.2 mosquitoes per household, while for the low/medium-risk stratum, it was 5.2. This study offers an SDI for the density of C. quinquefasciatus mosquitoes, which can help reduce the costs associated with data collection and allows for identifying priority areas for vector control actions.
... Cara mengatasi overdispersi adalah membentuk beberapa pemodelan yang merupakan kombinasi dari distribusi poisson dengan beberapa distribusi baik diskret maupun kontinu (mixed poisson distribution). Model regresi mixed poisson sangat berguna dalam mengatasi overdispersi (Dean et al., 1989). Salah satu mixed poisson distribution yang digunakan dalam penelitian adalah distribusi Poisson Inverse Gaussian (PIG) dengan random efek distribusi inverse gaussian. ...
Article
Full-text available
Reducing toddler mortality is one of the desire of sustainable development programs.Modeling count data may be analyzed the usage of Poisson regression.The assumption that must be met in Poisson regression is that the mean and variance values must be equal, often in count data there is a violation of this assumption. This is indicated by the variance value which is greater than the mean value (overdispersion). Poisson Inverse Gaussian (PIG) regression is one form of mixed Poisson regression to model data that experience overdispersion cases. The MLE method is used to estimate the PIG regression parameters and hypothesis testing using the MLTR method. The best model of the PIG regression form is based on the smallest AIC value. The results of hypothesis testing concluded that the percentage of under-fives who received exclusive breast feeding had a significant effect on the number of pneumonia cases among toddler. Data modeling using the PIG regression method in this study is complemented by the creation of a Graphical User Interface (GUI) that can facilitate the process of selecting the best model.
... Notably, negative binomial regression models were discussed by Dionne and Vanasse (1989), Frees and Valdez (2008), and Wüthrich and Merz (2008). Inverse Gaussian models were studied by Dean et al. (1989) and Wang et al. (2023). Consul (1993) compared the generalized Poisson (GP) distribution with several well-known distributions and concluded that the GP distribution is a plausible model for claim frequency data. ...
Article
Full-text available
Predictive modeling has been widely used for insurance rate making. In this paper, we focus on insurance claim count data and address their common issues with more flexible modeling techniques. In particular, we study the zero-inflated and hurdle-generalized Poisson and negative binomial distributions in a functional form for modeling insurance claim count data. It is shown that these models are useful in addressing the problem of excess zeros and over-dispersion of the claim count variable. In addition, we show that including the exposure as a covariate in both the zero and the count part of the model is an effective approach to incorporating exposure information in zero-inflated and hurdle models. We illustrate the effectiveness and versatility of the introduced models using three real datasets. The results suggest their promising applications in insurance risk classification and beyond.
... Still, PIG models are not completely lacking and there is a variety of fields that these are used: PIG and NB distributions have been fitted to the popular horseshoe crab data and it was found that PIG provides a better fit [21], identification of the relationship between farm management practices and risk of cow mastitis [5], comparing PIG and NB for the analysis of automobile claim frequency data [31], modeling overdispersed dengue data in the city of Campo Grande, MS, Brazil [32], application of NBI, PIG, and Negative Binomial-Inverse Gaussian (NBIG) models to car insurance claim frequency [33,34], application of generalized PIG for analysis of bibliometric data sets [35], modeling species abundance data [36], application to the bonus-malus system in car insurance claims [37], modeling infectious disease count data [38], application to health services: number of hospital stays [39], assessing microdata disclosure risk [40], modeling the number of HIV and AIDS cases in two districts in Indonesia [41], analysing long-term survival data in patients with skin cancer [42], modeling the numbers of pauci-baciliary and multi-baciliary leprosy cases with geographical conditions factors in West Sumatera, Indonesia [43], application to accident data for men in a soap factory and for women working with highly explosive shells [44], number of falls in Parkinson's disease patients over a 10-week period, analysing both the mean (µ) and variability (σ) parameters [45], analysis of data from a Phase III cutaneous melanoma clinical trial (E1690 data) [46], transportation origin-destination analysis by introducing predictory variables-based models which incorporate different transport modeling phases and also allow for direct probabilistic inference on link traffic based on Bayesian predictions [47], analysis of motor vehicle crash data [48]. ...
Preprint
Full-text available
We tested the possibility of using “adverse events count” (AEC) as a drug-risk indicator and side-effect severity indicator. Data from 3938 adverse event (AE) reports for COVID-19 vaccines (Comirnaty, Moderna, Vaxzevria, and Janssen) and 6869 AE reports for other medicines were collected from the Bulgarian Drug Agency database (01.01.2018–31.03.2022). AEC was modeled with zero-adjusted negative binomial (ZANBI) and zero-adjusted Poisson inverse Gaussian (ZAPIG) regression models, which account for zero absence and overdispersion. The models’ fit was checked with residual diagnostic plots and parametric correspondence to normality. Explanatory variables were: age, sex, sequence number (order of submission), severity of AE, and vaccine type. Average AEC was higher in severe vs. non-severe AE, and in females vs. males; it decreased with age and was lower in Comirnaty than other vaccines. Variability of AEC for COVID-19 data decreased with sequence number in severe AE and increased in non-severe AE. Full ZANBI models had greater sensitivity than parametric ZANBI or ZAPIG models. Results showed correlation between AEC and AE severity, suggesting AEC as a simple and reliable measure of drug-risk and side-effect severity for clinicians and regulators, especially for COVID-19 vaccines.
... With this in mind, we tested the abundances of immature sea lice for differences between the 2021 sampling events. To perform these tests, we used a standard inference procedure (Dean et al. 1989) for the analysis of such data where the individual observations are counts (in this instance, counts of immature sea lice on individual fish). Specifically, we modelled the distribution of these counts as being independently generated from a Poisson inverse Gaussian distribution (a computationally efficient generalization of the standard Poisson distribution, which plays a similar role to the withinsampling-event random effect in the above model). ...
Article
Full-text available
In response to a federal government order, the number of salmon farms operating in the Discovery Islands region declined from eight in 2020, to one in 2022. Over this period, 1627 juvenile pink (Oncorhynchus gorbuscha) and chum (Oncorhynchus keta) salmon captured at sites throughout the study area were examined for sea lice. The average number of sea lice per juvenile salmon declined by 96% over the study period. Such a substantial decline was not witnessed in similar samples from the nearby Broughton Archipelago. The decline could not be attributed to chance sampling, and only a small proportion of it was associated with environmental fluctuations.
... As a result, excessive overdispersion is aggravated, making the NB distribution unsuitable for this data type. Various methods have been employed to develop a new class of discrete distributions, such as mixed Poisson, [7], generalized Poisson, [8], zero-inflated generalized Poisson, [9], mixed Poisson-inverse-Gaussian, [10], and mixed NB distributions. Many mixed NB distributions have also been proposed, such as the NB-Lindley (NB-L), [11], NB-generalized exponential, [12], NB-gamma, [13], NB-Sushila, [14], NB-generalized Lindley, [15], NB-Quasi Lindley, [16], and NB-modified Quasi Lindley, [17] distributions. ...
Article
A new distribution was developed that mixed the negative binomial (NB) and Samade distributions, called the negative binomial-Samade (NB-SA) distribution. The properties of this distribution were studied, and the newly created distribution was applied using the framework of generalized linear models to build a time series data count model. The characteristics of overdispersion and heavy-tailed distribution of the count response variables were applied in the actual dataset modeling. Distribution parameters and the regression coefficient were estimated using a Bayesian approach. Results showed that the NB-SA model had significantly the highest efficiency compared with the classical NB and Poisson models for analyzing factors influencing the daily number of COVID-19 deaths in Thailand.
... The Poisson inverse Gaussian distribution has a lot of attention in the statistical field and it was used in the study of regression models (see Dean et al. (1989), Holla (1967), Ord and Whitmore (1986), Rigby et al. (2008), Tremblay (1992) and Willmot (1987)). In Kokonendji (1995) has determined the form of the variance function of the Poisson inverse Gaussian distribution in Example 2.3. ...
... And for the over-dispersion situation caused by the unobserved heterogeneity, the mixed models provide more appropriate results. These models include the NBRM and the Poisson-Inverse Gaussian regression model (PIGRM) [41]. The NBRM is the combination of the PRM and the gamma regression model (GRM), while the PIGRM is a combined form of the PRM and the inverse-Gaussian regression model (IGRM). ...
Article
Full-text available
The Poisson Inverse Gaussian Regression model (PIGRM) is used for modeling the count datasets to deal with the issue of over-dispersion. Generally, the maximum likelihood estimator (MLE) is used to estimate the PIGRM estimates. In the PIGRM, when the explanatory variables are correlated, the MLE does not provide efficient results. To overcome this problem, we propose a ridge estimator for the PIGRM. The matrix mean square error (MSE) and the scalar MSE properties are derived and then compared with the MLE. In the ridge estimator, ridge parameter play a significant role, so, this study also proposes different ridge parameter estimators for the PIGRM. The performance of the proposed estimator is evaluated with the help of a simulation study and a real-life application using MSE as a performance evaluation criterion. The simulation study and the real-life application results show the superiority of the proposed parameter estimators as compared to the MLE.
... It can also be found in the website of Professor E. Frees. See also [19] and the references therein. The explanatory variables of interest are described below in Table 5. ...
Article
Full-text available
There are some generalizations of the classical exponential distribution in the statistical literature that have proven to be helpful in numerous scenarios. Some of these distributions are the families of distributions that were proposed by Marshall and Olkin and Gupta. The disadvantage of these models is the impossibility of fitting data of a bimodal nature of incorporating covariates in the model in a simple way. Some empirical datasets with positive support, such as losses in insurance portfolios, show an excess of zero values and bimodality. For these cases, classical distributions, such as exponential, gamma, Weibull, or inverse Gaussian, to name a few, are unable to explain data of this nature. This paper attempts to fill this gap in the literature by introducing a family of distributions that can be unimodal or bimodal and nests the exponential distribution. Some of its more relevant properties, including moments, kurtosis, Fisher’s asymmetric coefficient, and several estimation methods, are illustrated. Different results that are related to finance and insurance, such as hazard rate function, limited expected value, and the integrated tail distribution, among other measures, are derived. Because of the simplicity of the mean of this distribution, a regression model is also derived. Finally, examples that are based on actuarial data are used to compare this new family with the exponential distribution.
... Shoukri et al, Putri et al, dan Liteng Zha melakukan penelitian yang sama yaitu membandingkan metode PIG dengan regresi binomial negatif pada masing-masing kasus yaitu pada kasus mastitis dalam sampel peternakan sapi perah di Intario, kasus data kepiting tapal kuda dan pada kasus jumlah kecelakaan motor di dua tempat yaitu Texas dan Washington (Shoukri et al., 2004). Dean et al dan Ouma et al, melakukan penelitian yang sama dengan hasil yang didapatkan yaitu pemodelan terbaik dengan metode regresi PIG dalam menghitung nilai semu/momen pada data klaim asuransi mobil dan pada kasus data penyakit menular (Dean, 1989;Ouma et al., 2016). Gillian Z. Heller et al., dalam penelitiannya mampu memodelkan efek perlakuan pada parameter mean dan dispersi yang signifikan pada kasus penyakit parkinson serta dilakukan juga dengan memperhatikan parametrikisasi ortogonal dari distribusi PIG (Heller et al., 2019). ...
Article
Full-text available
Kematian neonatus adalah kematian bayi yang lahir hidup sampai dengan 28 hari sejak lahir. Angka kematian neonatal di Provinsi Sulawesi Selatan pada tahun 2018 masih cukup tinggi yaitu sebanyak 799 kasus. Oleh karena itu, diperlukan suatu analisis untuk mengetahui faktor-faktor apa saja yang berpengaruh signifikan terhadap jumlah kematian neonatus. Dalam penelitian ini, jumlah kematian neonatus dapat dimodelkan dengan menggunakan analisis regresi Poisson. Namun pada model ini terdapat masalah overdispersi sehingga analisis dilanjutkan dengan menggunakan analisis regresi Gaussian inverse Poisson sehingga diperoleh hasil hanya satu variabel yang berpengaruh signifikan terhadap jumlah kematian neonatus yaitu variabel persalinan ditolong oleh tenaga kesehatan (x〗_7)). Model regresi Gaussian inverse Poisson yang diperoleh untuk jumlah kematian neonatus di Provinsi Sulawesi Selatan tahun 2018 adalah μ̂=exp(4,97785−0,11603x7).
... This distribution is a mixture of a Poisson distribution and an inverse-Gaussian distribution. The main advantage of this distribution is that it may properly model overdispersed long-tail data because it has a larger range of skewness than a negative binomial distribution [9][10][11][12]. For this model, we also link the expected value of the response variable to a set of p explanatory variables by using a log-linear relationship. ...
Article
Full-text available
Dengue fever is a tropical disease transmitted mainly by the female Aedes aegypti mosquito that affects millions of people every year. As there is still no safe and effective vaccine, currently the best way to prevent the disease is to control the proliferation of the transmitting mosquito. Since the proliferation and life cycle of the mosquito depend on environmental variables such as temperature and water availability, among others, statistical models are needed to understand the existing relationships between environmental variables and the recorded number of dengue cases and predict the number of cases for some future time interval. This prediction is of paramount importance for the establishment of control policies. In general, dengue-fever datasets contain the number of cases recorded periodically (in days, weeks, months or years). Since many dengue-fever datasets tend to be of the overdispersed, long-tail type, some common models like the Poisson regression model or negative binomial regression model are not adequate to model it. For this reason, in this paper we propose modeling a dengue-fever dataset by using a Poisson-inverse-Gaussian regression model. The main advantage of this model is that it adequately models overdispersed long-tailed data because it has a wider skewness range than the negative binomial distribution. We illustrate the application of this model in a real dataset and compare its performance to that of a negative binomial regression model.
... Since it is a count variable, it was decided to apply the regression model using the Poisson -Inverse Gaussian (PIG) (Dean et al. 1989) probability distribution, which presented an adjustment, according to the information of Akaike's metric (Sakamoto et al. 1986). Table S1 presents the results of the regression analysis, where two-phase modeling was performed: (a) and (b). ...
Article
Full-text available
Culex quinquefasciatus is a vector of lymphatic filariasis. One important component in planning filariasis control activities is the mapping of vector distribution. A tool that involves socio-environmental factors and Cx. quinquefasciatus density can contribute to the identification of areas that should be prioritized in surveillance actions. This is an ecological study based on the construction and validation of a risk score of urban areas according to social and environmental variables extracted from a national database. Based on this stratification, female Cx. quinquefasciatus were captured. In total, 30,635 Cx. quinquefasciatus were captured, of which 17,161 (56%) were females. The highest vector density index of mosquitoes were captured in households located in the high-risk stratum and the indicator proved to be a tool that identified an association between social and environmental conditions and areas with the highest vector density index of females Cx. quinquefasciatus.
... Therefore, alternative methods are needed to model data with overdispersion conditions. Mixed Poisson models are often used as an alternative to Poisson regression models, such as the negative binomial (NBR) regression model [3][4][5] and Poisson-inverse Gaussian regression (PIGR) model [6][7][8]. In this study, the PIGR model was chosen because this model performs better when modeling data with high overdispersion. ...
Article
Full-text available
This study aims to develop a method for multivariate spatial overdispersion count data with mixed Poisson distribution, namely the Geographically Weighted Multivariate Poisson Inverse Gaussian Regression (GWMPIGR) model. The parameters of the GWMPIGR model are estimated locally using the maximum likelihood estimation (MLE) method by considering spatial effects. Therefore, the significance of the regression parameter differs for each location. In this study, four GWMPIGR models are evaluated based on the exposure variable and the spatial weighting function. We compare the performance of those four models in real-world application using data on the number of infant, under-5 and maternal deaths in East Java in 2019 using five predictor variables. In this study, the GWMPIGR model uses one exposure variable and three exposure variables. Compared to the fixed kernel Gaussian weighting function, the GWMPIGR model with the fixed kernel bisquare weighting function and one exposure variable has a better fit based on the AICc value. Furthermore, according to the best GWMPIGR model, there are several regional groups formed based on predictors that significantly affected each event in East Java in 2019.
... However, it is widely known that the Poisson equidispersion (mean equal to variance) premise is usually violated. In fact, to handle the case of overdispersion (variance greater than mean), one may consider the mixed Poisson (MP) models, such as the negative binomial (Lawless, 1987;Hilbe, 2007;Cameron and Trivedi, 2013) and the Poisson-inverse Gaussian (Holla, 1966;Willmot, 1987;Dean et al., 1989) models; for a unified general class of mixed Poisson regression models with varying dispersion/precision, see Barreto-Souza and Simas (2016). Moreover, to deal with the phenomenon of underdispersion (mean greater than variance) phenomenon, one may use the generalized Poisson (Consul and Famoye, 1992;Famoye and Singh, 2006) and the Conway-Maxwell-Poisson (Sellers and Shmueli, 2010) models. ...
Preprint
Full-text available
The premise of independence among subjects in the same cluster/group often fails in practice, and models that rely on such untenable assumption can produce misleading results. To overcome this severe deficiency, we introduce a new regression model to handle overdispersed and correlated clustered counts. To account for correlation within clusters, we propose a Poisson regression model where the observations within the same cluster are driven by the same latent random effect that follows the Birnbaum-Saunders distribution with a parameter that controls the strength of dependence among the individuals. This novel multivariate count model is called Clustered Poisson Birnbaum-Saunders (CPBS) regression. As illustrated in this paper, the CPBS model is analytically tractable, and its moment structure can be explicitly obtained. Estimation of parameters is performed through the maximum likelihood method, and an Expectation-Maximization (EM) algorithm is also developed. Simulation results to evaluate the finite-sample performance of our proposed estimators are presented. We also discuss diagnostic tools for checking model adequacy. An empirical application concerning the number of inpatient admissions by individuals to hospital emergency rooms, from the Medical Expenditure Panel Survey (MEPS) conducted by the United States Agency for Health Research and Quality, illustrates the usefulness of our proposed methodology.
... where and . Based on Equations (4) and (5), PIG distribution is formed with the probability mass function as follows [15][16][17]: (6) where , , and is a modified Bassel function of the third kind. ...
Article
Full-text available
Generalized Additive Models for Location, Scale, and Shape (GAMLSS) is a robust approach used to model various types and characteristics of data. Therefore, this research aims to model the count data using the GAMLSS approach through Poisson Regression (PR), Poisson Inverse Gaussian Regression (PIGR), and Negative Binomial Regression (NBR). PIGR and NBR are the best models compared to PR based on their application to modeling the number of dengue hemorrhagic fever (DHF) cases in East Kalimantan Province, Indonesia, in 2019. Furthermore, both models produced varying results on the factors with a significant effect on DHF. Only one factor of the PIGR model, namely altitude, significantly affected these cases. Meanwhile, the NBR model produced three factors that affected the number of dengue cases: altitude, population density, and health workers.
... The reader is also referred to Kleiber and Zeileis [18] for further details about these data. In addition to the Poisson, Bell, Negative Binomial and Bell-Touchard models, we consider the well-known Poisson Inverse Gaussian (PIG) model [see, for example, 19 ]. An one-parameter version of the generalized Poisson (GePo) model studied by Calderín-Ojeda et al. [20] to examine count data is also considered. ...
Article
The two-parameter Bell–Touchard distribution corresponds to a quite flexible yet tractable parametric family of discrete distributions, which arises prominently as an empirical model for count data. In this paper, we introduce a novel parametric regression model for count response variables on the basis of the mean-parameterized Bell–Touchard distribution, which allows interpretation of the regression coefficients in terms of the expectation of the count response variable, like the generalized linear model framework. The unknown parameters of the mean-parameterized Bell–Touchard regression model are estimated using the traditional maximum likelihood estimation procedure. We also consider the deviance residuals to assess departures from model assumptions. Empirical applications are provided to illustrate the mean-parameterized Bell–Touchard regression model in practice, and comparisons with some popular existing count regression models are made.
... When the count data have been generated under both demographic and environmental processes, such as with catch-curve data, the variance-mean relationship will be more likely quadratic (Engen et al. 1998), providing further support for the use of the GLM NB2 . Under such conditions, other, more complex extensions should ideally also be considered, such as the Poisson-normal (Hinde 1982), the Poisson-inverse Gaussian (Dean et al. 1989), and the hyper-Poisson (Sáez-Castillo and Conde-Sánchez 2013) extensions, on top of the mean-parametrized CMP, although a trade-off has to be made regarding the advantage of considering multiple extensions to obtain a better fit to the count data on the one hand and the analytical time this would require on the other. Our GLM-based method also offers a more reliable statistical approach that should be adopted by fisheries biologists to compare mortality rates between grouping factors, as it outperforms the one currently proposed (Miranda and Bettoli 2007;Ogle 2016), especially given the inadvisable use of the LR method (Smith et al. 2012; this study) on which it is based. ...
Article
Full-text available
Catch‐curve analyses are routinely used to estimate instantaneous mortality (Z) in fish and as the age‐frequency data are often over‐dispersed, the application of a variance bias‐correction factor has been recommended. The extensions of the Poisson generalized linear model (GLMPoisson) may however constitute a better alternative, as they model the variance (SE) in counts more adequately with their specific dispersion parameter for more accurate estimations and statistical comparisons. To test this idea, simulated age‐frequency data generated under four dispersion scenarios were analyzed according to six currently‐available methods and compared with the results of a GLMPoisson and five of its extensions to evaluate each method‐specific bias in Z ± SE estimates. Empirical age‐frequency data from sampled walleye (Sander vitreus) and Arctic charr (Salvelinus alpinus) populations in Québec, Canada, were then used to illustrate the applicability of our GLM‐based method, which relies on Pearson residuals behavior to assess model adequacy and an information‐theoretic approach for model selection. All analyses revealed that Z estimates were generally accurate among the methods considered, except under the most‐likely situation of quadratic over‐dispersion met in ecological studies, for which only the negative binomial type 2 (NB2) and the mean‐parametrized Conway‐Maxwell‐Poisson (CMP) extensions were adequate to estimate both Z and its SE. Linearly over‐dispersed data were best modeled by the negative binomial type 1 and the generalized Poisson (GLMGP) extensions, the GLMCMP and GLMGP were the most appropriate to model under‐dispersed data, whereas the GLMPoisson adequately modeled equi‐dispersed data, similarly as the Chapman and Robson (1960) method. Statistical comparisons of Z ± SE for grouping factors, such as year or site, were correctly achieved when the most adequate and statistically‐supported GLMPoisson extension was applied. Altogether, the proposed GLM‐based method should help to circumvent the identified issues related to SE estimation for statistical inferences about mortality rates for fisheries management decision‐making. *** See also: https://sites.google.com/view/catch-curve-analyses/accueil
... See also, Karlis and Xekalaki (2005) for general mixed-Poisson distributions. b The probability generating function can be obtained, see also, Dean et al. (1989). ...
Article
This paper provides a review of the literature regarding methods for constructing prediction intervals for counting variables, with particular focus on those whose distributions are Poisson or derived from Poisson and with an over-dispersion property. Independent and identically distributed models and regression models are both considered. The motivating problem for this review is that of predicting the number of daily and cumulative cases or deaths attributable to COVID-19 at a future date. This article is categorized under: • Applications of Computational Statistics > Clinical Trials • Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods • Statistical Models > Generalized Linear Models Abstract Review of prediction interval (PI) construction for Poisson regression is provided. Starting from the independent and identically distributed setting and then proceeding to the regression setting, the focus is on the PI construction for the Poisson and over-dispersed models. Lastly, studies on PI construction pertinent to the COVID-19 pandemic are presented.
... Other works characterized and studied the statistical modeling of these subordinated processes (see Hougaard et al., 1997;Teugels, 1972). Especially, the Poisson inverse Gaussian distribution has a lot of attention in the statistical field and it was used in the study of regression models (see Dean et al., 1989;Holla, 1967;Ord & Whitmore, 1986;Rigby et al., 2008;Tremblay, 1992;Willmot, 1987). The link and the variance functions of this example are given as follows ...
Article
Generalized linear models, introduced by Nelder and Wedderburn, allowed to model the regression of normal and non‐normal data. While doing so, the analysis of these models could not be obtained without the explicit form of the variance function. In this paper, we determine the link and variance functions of the natural exponential family generated by the class of subordinated Lévy processes. In this framework, we introduce a class of variance functions that depends on the Lambert function. In this regard, we call it the Lambert class, which covers the variance functions of the natural exponential families generated by the subordinated gamma processes and the subordinated Lévy processes by the Poisson subordinator. Notice that the gamma process subordinated by the Poisson one is excluded from this class. The concept of reciprocity in natural exponential families was given in order to obtain an exponential family from another one. In this context, we get the reciprocal class of the natural exponential family generated by the class of subordinated Lévy processes. It is well known that the variance function represents an essential element for the determination of the quasi‐likelihood and deviance functions. Then, we use the expression of our variance function in order to maintain them. This leads us to analyze the proposed generalized linear model. We illustrate some of our models with applications to the daily exchange rate returns of the Tunisian Dinar against the US Dollar and the damage incidents of ships.
... The PIG model was first presented by Holla (1967). Information on ML estimation can be found in Dean et al. (1989) and in the references therein. The PIG model has been used in the actuarial science (Willmot, 1987;Carlson, 2002) and in linguistics where the zero-truncated marginal form is of particular interest (e.g. ...
Preprint
Full-text available
In this paper we present Poisson mixture approaches for origin-destination (OD) modeling in transportation analysis. We introduce covariate-based models which incorporate different transport modeling phases and also allow for direct probabilistic inference on link traffic based on Bayesian predictions. Emphasis is placed on the Poisson-inverse Gaussian as an alternative to the commonly-used Poisson-gamma and Poisson-lognormal models. We present a first full Bayesian formulation and demonstrate that the Poisson-inverse Gaussian is particularly suited for OD analysis due to desirable marginal and hierarchical properties. In addition, the integrated nested Laplace approximation (INLA) is considered as an alternative to Markov chain Monte Carlo and the two methodologies are compared under specific modeling assumptions. The case study is based on 2001 Belgian census data and focuses on a large, sparsely-distributed OD matrix containing trip information for 308 Flemish municipalities.
... The Negative Binomial Type I (NBI), or Poisson-Gamma, and Poisson-Inverse Gaussian (PIG) have been traditionally employed for modelling count data primarily due to the simplicity of their log-likelihood function that implies that the formality of parameter estimation via easy to implement Maximum Likelihood (ML) estimation procedures is straightforward. See, for instance, Lawless (1987), Cameron and Trivedi (1998) and Hilbe (2008) regarding the former and Ord and Whitmore (1986), Willmot (1987), and Dean et al. (1989) for the latter. Furthermore, alternative mixed Poisson regression models have been proposed for handling different levels of overdispersion, even if the literature on these models is not as abundant as for the NBI and PIG models due to algebraic intractability or because their densities involve special functions and appropriate numerical methods are required for their maximum likelihood (ML) estimation, such as, for example, the Poisson-Lognormal (PLN) regression model, see Denuit et al. (2007) and Boucher et al. (2007), the Poisson Exponential-Inverse Gaussian (PEIG) regression model, see Gómez-Déniz and Calderín-Ojeda (2016), the Poisson-mixed Inverse Gaussian (PMIG) distribution, where the mixed Inverse Gaussian distribution is a mixture of the Inverse Gaussian Finally, as is well known, in most actuarial applications concerning two parameter mixed Poisson distributions employed for modelling claim counts it is commonly assumed that only the mean claim frequency is modelled in terms of covariates. ...
Article
Full-text available
This article presents the Poisson-Inverse Gamma regression model with varying dispersion for approximating heavy-tailed and overdispersed claim counts. Our main contribution is that we develop an Expectation-Maximization (EM) type algorithm for maximum likelihood (ML) estimation of the Poisson-Inverse Gamma regression model with varying dispersion. The empirical analysis examines a portfolio of motor insurance data in order to investigate the efficiency of the proposed algorithm. Finally, both the a priori and a posteriori, or Bonus-Malus, premium rates that are determined by the Poisson-Inverse Gamma model are compared to those that result from the classic Negative Binomial Type I and the Poisson-Inverse Gaussian distributions with regression structures for their mean and dispersion parameters.
... We remark that other forms of potential outcomes can be constructed in a similar manner, i.e., for example the negative binomial potential outcomes can be obtained by placing Gamma priors on [·] i (Lawless, 1987;Hilbe, 2007). Another possible avenue is to attribute an inverse-Gaussian prior on [·] i that would result in a heavy-tailed count behavior (Dean et al., 1989). ...
Preprint
The literature for count modeling provides useful tools to conduct causal inference when outcomes take non-negative integer values. Applied to the potential outcomes framework, we link the Bayesian causal inference literature to statistical models for count data. We discuss the general architectural considerations for constructing the predictive posterior of the missing potential outcomes. Special considerations for estimating average treatment effects are discussed, some generalizing certain relationships and some not yet encountered in the causal inference literature.
... respectively. Willmot, 1989). The above explanation can be considered to be in line with the Akaike and Schwarz's Bayesian information criterions' values obtained, both of which are lower for the zerotruncated Poisson-inverse Gaussian model, suggesting that the latter model fits the data better. ...
Article
This study sought to analyze the effect of the climate of tourists' region of origin on their length of stay in a specific inland destination as climate of origin has been ignored in previous analyses. The present study collected data from 674 valid surveys of visitors in the selected destination and applied a zero-truncated negative binomial regression model and a Poisson-inverse Gaussian regression model. The results for this destination suggest that climate of origin affects tourists' length of stay. This finding was obtained via the Poisson-inverse Gaussian regression model because of its greater tolerance to long tail distributions. Similarities and differences were found regarding results for other destinations found in the literature. The present findings further include the non-significant effect of reasons for traveling and tourists' satisfaction and the significant influence of tourists' mode of transport, income, and age on length of stay. Cheaper lodging categories also have an important impact on visitors who prefer extended stays.
Article
Coral reefs face a critical crisis worldwide because of rising ocean temperature, excessive use of resources and soil erosion. As reefs have great recreational and tourism value, the degradation of their quality may have a significant effect on tourism. This study employs a contingent behaviour approach to estimate the effect of reef extinction on the recreational demand for Kume Island, Okinawa, Japan. We propose a Poisson‐inverse Gaussian (PIG) model with correction for on‐site sampling issues to derive a more accurate estimate of consumer surplus (CS). The results show that the annual CS per‐person trip is 5898 yen (US$ 49.15 in 2015 currency) according to the random‐effects PIG model.
Chapter
This research paper provides a comprehensive analysis of three distinct Generalized Linear Models (GLMs): the traditional linear regression, the Poisson GLM, and the Poisson-Inverse Gaussian GLM. The study applies these models to the domain of Supply Chain Management for product demand modeling. To evaluate the goodness of fit of our models, we assess them by comparing their performance against the associated deviance function. Our findings indicate that the PoissonInverse Gaussian GLM outperforms both the Poisson GLM and the linear regression model in terms of goodness of fit.
Article
In this paper, we study the number of inpatient admissions by individuals to hospital emergency rooms reported by the 2003 Medical Expenditure Panel Survey (MEPS), which the United States Agency for Health Research and Quality conducts. Explanatory variables such as health status, access, use, and costs of health services in the USA are considered. Our main goal is to properly model the number of inpatient admissions according to the geographical US regions as a tool for measuring the volume of diagnostic procedures in the health care system. In the analysis, four clusters were determined according to the regions in the US, namely, the midwest, northeast, south, and west. The clustered analysis of this count data from the MEPS is a novel contribution to the best of our knowledge. Our analysis demonstrated that a clustered negative binomial (CNB) regression (Poisson model with latent gamma effects) might not be a suitable choice for analyzing the MEPS data. This fact motivates us to introduce a new regression model to handle clustered count data. To account for correlation within clusters, we propose a Poisson regression model where the observations within the same cluster are driven by the same latent random effect that follows a Birnbaum-Saunders distribution with a parameter that controls the strength of dependence among the individuals. This novel multivariate count model is called Clustered Poisson Birnbaum-Saunders (CPBS) regression. The CPBS model is analytically tractable and its moment structure can be explicitly obtained. We also derive theoretical/methodological studies to advise when the Birnbaum-Saunders effect should be preferred over the gamma effect (and vice-versa) in terms of probability tail. Estimation is performed through the maximum likelihood method. Here, we also developed an Expectation-Maximization (EM) algorithm for estimation. Simulation results that evaluate the finite-sample performance of our proposed estimators are presented. Studies on the potential impact of model misspecification were conducted and comparisons between our model and a CNB regression were also addressed. A full statistical analysis of the MEPS data reveals that, compared to the CNB model, the CPBS regression model produces better results in terms of prediction and goodness of-fit.
Article
Poisson regression analysis has a condition of equidispersion in which the variance value is equal to the mean value, the average of the Poisson is the true variance. When the variance is greater than the mean value, it is called overdispersion. An alternative that can be used is negative binomial regression. However, Poisson regression and negative binomial regression are less appropriate when applied to spatial data, or data containing geographical conditions. In this thesis, researchers suspect that the data has spatial heterogeneity and has overdispersion. The Geographically Weighted Negative Binomial Regression (GWNBR) method with Adaptive Gaussian Kernel weights is able to estimate the case. The GWNBR method will be applied to the number of pulmonary tuberculosis cases in each district/city with factors that are thought to affect it. Based on the results of the GWNBR model with Adaptive Gaussian Kernel weights, it is categorized as a very good model because it shows that the models formed are different in each district / city in West Java Province, there are 25 models formed based on significant predictor variables and 2 models formed based on insignificant predictor variables. Abstrak. Analisis regresi Poisson memiliki syarat equidispersi yang di mana nilai varians sama dengan nilai rata-ratanya, rata-rata dari Poisson itu merupakan varians yang sebenarnya. Ketika varians lebih besar dari nilai rata-rata dapat disebut dengan overdispersi. Alternatif yang dapat digunakan yaitu regresi binomial negatif. Akan tetapi regresi Poisson maupun regresi binomial negatif kurang tepat jika diterapkan pada data spasial, atau data yang mengandung kondisi geografis. Dalam skripsi ini, peneliti menduga bahwa data memiliki heterogenitas spasial dan memiliki overdispersi. Metode Geographically Weighted Negative Binomial Regression (GWNBR) dengan pembobot Adaptive Gaussian Kernel mampu menduga kasus tersebut. Metode GWNBR akan di aplikasikan ke dalam jumlah kasus penyakit tuberkulosis paru di setiap kabupaten/kota dengan faktor-faktor yang diduga mempengaruhinya. Berdasarkan hasil model GWNBR dengan pembobot Adaptive Gaussian Kernel di kategorikan sebagai model yang sangat baik karena menunjukan model yang terbentuk berbeda-beda di setiap kabupaten/kota di Provinsi Jawa Barat terdapat 25 model yang terbentuk berdasarkan peubah prediktor yang signifikan dan 2 model yang terbentuk berdasarkan peubah prediktor yang tidak signifikan.
Article
Full-text available
In this article we present a class of mixed Poisson regression models with varying dispersion arising from non-conjugate to the Poisson mixing distributions for modelling overdispersed claim counts in non-life insurance. The proposed family of models combined with the adopted modelling framework can provide sufficient flexibility for dealing with different levels of overdispersion. For illustrative purposes, the Poisson-lognormal regression model with regression structures on both its mean and dispersion parameters is employed for modelling claim count data from a motor insurance portfolio. Maximum likelihood estimation is carried out via an expectation-maximization type algorithm, which is developed for the proposed family of models and is demonstrated to perform satisfactorily.
Article
This article aims to develop a new linear model for count data, which is called the negative binomial - generalized Lindley (NB-GL) regression model. The NB-GL distribution has been proposed and applied to count data analysis, which is constructed as a mixture of the negative binomial and generalized Lindley distributions. The NB-GL distribution has the special sub-models, such as the negative binomial - Lindley, negative binomial - gamma, and negative binomial - exponential distributions. Parameters of the distribution and its regression model are estimated using a Bayesian approach. The NB-GL regression model is applied to fit real data sets. Its performance is compared with some traditional models. The results show that the generalized linear model for the NB-GL model describes the data sets better than other models.
Article
Full-text available
Multivariate Poisson regression is used in order to model two or more count response variables. The Poisson regression has a strict assumption, that is the mean and the variance of response variables are equal (equidispersion). Practically, the variance can be larger than the mean (overdispersion). Thus, a suitable method for modelling these kind of data needs to be developed. One alternative model to overcome the overdispersion issue in the multi-count response variables is the Multivariate Poisson Inverse Gaussian Regression (MPIGR) model, which is extended with an exposure variable. Additionally, a modification of Bessel function that contain factorial functions is proposed in this work to make it computable. The objective of this study is to develop the parameter estimation and hypothesis testing of the MPIGR model. The parameter estimation uses the Maximum Likelihood Estimation (MLE) method, followed by the Newton–Raphson iteration. The hypothesis testing is constructed using the Maximum Likelihood Ratio Test (MLRT) method. The MPIGR model that has been developed is then applied to regress three response variables, i.e., the number of infant mortality, the number of under-five children mortality, and the number of maternal mortality on eight predictors. The unit observation is the cities and municipalities in Java Island, Indonesia. The empirical results show that three response variables that are previously mentioned are significantly affected by all predictors.
Article
In this paper, we consider the use of the EM algorithm for the fitting of distributions by maximum likelihood to overdispersed count data. In the course of this, we also provide a review of various approaches that have been proposed for the analysis of such data. As the Poisson and binomial regression models, which are often adopted in the first instance for these analyses, are particular examples of a generalized linear model (GLM), the focus of the account is on the modifications and extensions to GLMs for the handling of overdispersed count data.
Article
Full-text available
The Sichel distribution is a three-parameter compound Poisson distribution. It is a versatile model for highly skewed frequency distributions of observed counts and has proved useful in fields as diverse as mining engineering, linguistics, ecology, industrial psychology, and market research.We propose a reparameterization of the Sichel distribution and give an algorithm, which can be implemented on a typical desktop microcomputer, for computing the maximum likelihood estimates of the new parameters. The reparameterization has a number of advantages over the old. In the important two-parameter special case of the Sichel distribution known as the inverse Gaussian Poisson the new parameters are the population mean and a shape parameter, and their maximum likelihood estimators are asymptotically uncorrelated.The reparameterization also lends itself to the convenient multivariate extension presented here. This distribution is well suited for modeling correlated count data whose marginal distributions exhibit the long sparse tails characteristic of the univariate Sichel distribution. Properties and maximum likelihood estimators of this multivariate Sichel distribution are considered.Examples of application for both univariate and bivariate cases are given. Since the Sichel distribution encompasses a number of the well-known discrete distributions as limiting forms (Sichel 1971), the estimates of the parameters sometimes suggest an appropriate limiting form for the data. This is illustrated in one of the examples.
Article
Full-text available
The inverse Gaussian-Poisson (two-parameter Sichel) distribution is useful in fitting overdispersed count data. We consider linear models on the mean of a response variable, where the response is in the form of counts exhibiting extra-Poisson variation, and assume an IGP error distribution. We show how maximum likelihood estimation may be carried out using iterative Newton-Raphson IRLS fitting, where GLIM is used for the IRLS part of the maximization. Approximate likelihood ratio tests are given.
Article
Full-text available
The first concern of this work is the development of approximations to the distributions of crude mortality rates, age-specific mortality rates, age-standardized rates, standardized mortality ratios, and the like for the case of a closed population or period study. It is found that assuming Poisson birthtimes and independent lifetimes implies that the number of deaths and the corresponding midyear population have a bivariate Poisson distribution. The Lexis diagram is seen to make direct use of the result. It is suggested that in a variety of cases, it will be satisfactory to approximate the distribution of the number of deaths given the population size, by a Poisson with mean proportional to the population size. It is further suggested that situations in which explanatory variables are present may be modelled via a doubly stochastic Poisson distribution for the number of deaths, with mean proportional to the population size and an exponential function of a linear combination of the explanatories. Such a model is fit to mortality data for Canadian females classified by age and year. A dynamic variant of the model is further fit to the time series of total female deaths alone by year. The models with extra-Poisson variation are found to lead to substantially improved fits.
Article
We study general properties of the class of exponential dispersion models, which is the multivariate generalization of the error distribution of Nelder and Wedderburn's (1972) generalized linear models. Since any given moment generating function generates an exponential dispersion model, there exists a multitude of exponential dispersion models, and some new examples are introduced. General results on convolution and asymptotic normality of exponential dispersion models are presented. Asymptotic theory is discussed, including a new small‐dispersion asymptotic framework, which extends the domain of application of large‐sample theory. Procedures for constructing new exponential dispersion models for correlated data are introduced, including models for longitudinal data and variance components. The results of the paper unify and generalize standard results for distributions such as the Poisson, the binomial, the negative binomial, the normal, the gamma, and the inverse Gaussian distributions.
Article
Poisson regression models are widely used in analyzing count data. This article develops tests for detecting extra-Poisson variation in such situations. The tests can be obtained as score tests against arbitrary mixed Poisson alternatives and are generalizations of tests of Fisher (1950) and Collings and Margolin (1985). Accurate approximations for computing significance levels are given, and the power of the tests against negative binomial alternatives is compared with those of the Pearson and deviance statistics. One way to test for extra-Poisson variation is to fit models that parametrically incorporate and then test for the absence of such variation within the models; for example, negative binomial models can be used in this way (Cameron and Trivedi 1986; Lawless 1987a). The tests in this article require only the Poisson model to be fitted. Two test statistics are developed that are motivated partly by a desire to have good distributional approximations for computing significance levels. Simulations suggest that one of the statistics should be satisfactory for testing extra-Poisson variation in most practical situations involving Poisson regression models.
Article
This paper reviews the development of the inverse Gaussian distribution and of statistical methods based upon it from the paper of Schrödinger (1915) to the present (1978). After summarizing the properties of the distribution, the paper presents tests of hypotheses, estimation, confidence intervals, regression and “analysis of variance” based upon the inverse Gaussian. its potential role in reliability work is discussed and work on Bayesian statistics is reviewed briefly. An extensive set of references to the distribution is given.
Article
An inverse Gaussian mixture of Poisson distributions(the P-IG distribution) is considered as a model for species abundance data,, Minimum chi-square and maximum likelihood methods of estimation for the zero-truncated P-IG distribution are developed, Ihe performance of the P-IG distribution is illustrated and discussed for several well-known sets of insect abundance data.
Article
A modification of the iterated reweighted least squares scheme of D. A. Williams conveniently accommodates extra-Poisson variation when fitting log-linear models to tables of frequencies or rates. The method is applied to the analysis of cancer death rates by age and birth cohort and to testing for mutagenic effects in a standard bioassay. A set of macros for implementing the two procedures with GLIM is given.
Article
The standard contagious distributions (see Douglas 1980) have been used in such varied fields as biology and automobile insurance, often to model various physical phenomena as well as provide a good fit to count data when other models are inadequate. Unfortunately, the parameterizations often used when working with these distributions normally lead to extremely high correlations of the maximum likelihood estimators (MLE's). This tends to lead to mathematical complexities, and causes difficulty or even errors in their interpretation. Furthermore, numerical difficulties may arise when using numerical procedures to locate the estimates. Some of these difficulties were discussed by Douglas (1980, pp. 171, 204-205), who suggested that a reparameterization to reduce or even eliminate such correlation is desirable. If the MLE's are asymptotically uncorrelated, the parameterization is orthogonal. Philpot (1964) derived an orthogonal parameterization for the Neyman Type A distribution; Stein, Zucchini, and Juritz (1987) derived for the Poisson mixture by the inverse Gaussian distribution. Parameter orthogonality has several attractive features in the present context. Since there is no correlation asymptotically, the estimates (with their standard errors) provide a simpler summary of the data than in the absence of such orthogonality. The use of a parameterization where the MLEs are highly correlated can lead to a misleading analysis, or at best a more complicated analysis that would be necessary if an orthogonal parameterization had been used. To the extent that a high correlation exists, the parameters involved tend to measure similar quantities, and orthogonality separates information about the parameters from each other. This article gives an orthogonal parameterization for a large family of discrete distributions, including many of the contagious distributions, some Poisson mixtures, and some other generalized distributions. The previously cited works are unified and extended, for example, to the Polya-Aeppli, Poisson-binomial, and Sichel's Poisson-generalized inverse Gaussian distribution. One of the orthogonal parameters is the mean, and in many applications it is of interest to express the mean as a function of relevant covariate information. For example, Hinde (1982) considered some of these distributions in a regression context. This article shows how the results may be extended to deal with the covariate case in a relatively straightforward manner. Consequently, a convenient parameterization exists for a large family of distributions in a wide variety of situations. Some numerical examples are given, and a simple algorithm is given to find the maximum likelihood estimates in the case of no covariates.
Article
The basic distributional properties and estimation techniques of the Poisson-Inverse Gaussian (P-IG) distribution are reviewed. Its use both as a mixed and compound claim frequency model are also discussed, as well as a review of the aggregate claims distribution when the P-IG is the claim frequency component. The many properties which are analogous to those of the negative binomial are emphasized, and the superior fit to automobile claim frequency data is demonstrated. The P-IG merits consideration as a model for claim frequency data due to its good fit to data, physical justification, and its abundance of convenient mathematical properties.
Article
When count data show extra-Poisson variation, standard log-linear techniques to analyse the data may fail. In ihis paper a generalization of the log-linear modelling technique is proposed for the negative binomial model, as an extension of the Poisson model. An illustration is given by the analysis of a twoway classification of soldering failure data; extensions to more general classifications are possible.
Article
A number of methods have been proposed for dealing with extra-Poisson variation when doing regression analysis of count data. This paper studies negative-binomial regression models and examines efficiency and robustness properties of inference procedures based on them. The methods are compared with quasilikelihood methods.Plusieurs méthodes ont été proposées en vue de traiter le problème de la variation extra-poissonnienne dans une analyse de régression pour données de dénombrement. Cet article a pour objet l'étude de modèles de régression binomiale négative et se penche sur les propriétés d'efficacité et de robustesse des méthodes inférentielles découlant des modèles. Ces dernières sont comparées aux méthodes de quasi-vraisemblancce.
Article
Godambe (1985, 1987) obtained the optimal combination of ‘orthogonal’ estimating functions. Using these results here we extend the concept and technique of quasi-likelihood estimation (Wedderburn, 1974), incorporating possible knowledge of the skewness, kurtosis and higher moments of the underlying distribution. This is done by defining the extended quasi-score function. The definition includes as a special case the quasi-score function, which was implicit in Wedderburn's (1974) quasi-likelihood function but was made explicit as a ‘pseudo-score function’ by Godambe (1985). A close parallel, conceptual and operational, between the ‘extended quasi-score function’ and the ‘score function’ is established. It is pointed out that the former is the natural substitute for the latter when the likelihood function is undefined, as for instance in semi-parametric models.
Article
A classical result due to Wilks [1] on the distribution of the likelihood ratio λ\lambda is the following. Under suitable regularity conditions, if the hypothesis that a parameter θ\theta lies on an r-dimensional hyperplane of k-dimensional space is true, the distribution of 2logλ-2 \log \lambda is asymptotically that of χ2\chi^2 with krk - r degrees of freedom. In many important problems it is desired to test hypotheses which are not quite of the above type. For example, one may wish to test whether θ\theta is on one side of a hyperplane, or to test whether θ\theta is in the positive quadrant of a two-dimensional space. The asymptotic distribution of 2logλ-2 \log \lambda is examined when the value of the parameter is a boundary point of both the set of θ\theta corresponding to the hypothesis and the set of θ\theta corresponding to the alternative. First the case of a single observation from a multivariate normal distribution, with mean θ\theta and known covariance matrix, is treated. The general case is then shown to reduce to this special case where the covariance matrix is replaced by the inverse of the information matrix. In particular, if one tests whether θ\theta is on one side or the other of a smooth (k1)(k - 1)-dimensional surface in k-dimensional space and θ\theta lies on the surface, the asymptotic distribution of λ\lambda is that of a chance variable which is zero half the time and which behaves like χ2\chi^2 with one degree of freedom the other half of the time.
Article
A report is presented on some statistical properties of the family of probability density functions exp[λ(xμ)2/2μ2x][λ/2πx3]1/2\exp \lbrack -\lambda(x - \mu)^2/2\mu^2x\rbrack\lbrack\lambda/2\pi x^3\rbrack^{1/2} for a variate x and parameters μ\mu and λ\lambda, with x,μ,λx, \mu, \lambda each confined to (0,)(0, \infty). The expectation of x is μ\mu, while λ\lambda is a measure of relative precision. The chief result is that the ml estimators of μ\mu and λ\lambda have stochastically independent distributions, and are of a nature which permits of the construction of an analogue of the analysis of variance for nested classifications. The ml estimator of μ\mu is the sample mean, and for a fixed sample size n its distribution is of the same family as x, with the same μ\mu but with λ\lambda replaced by λn\lambda n. The distribution of the ml estimator of the reciprocal of λ\lambda is of the chi-square type. The probability distribution of 1/x, and the estimation of certain functions of the parameters in heterogeneous data, are also considered.
Article
In some cases maximum quasi-likelihood estimation, which is at the core of GLIM, can fail to give reasonable results. A more general class of estimating functions is investigated which avoids such failure and also is not restricted to the particular forms of mean and variance function of GLIM.
Sampling theory of the negative binomial and logarithmic series distributions Extra-Poisson variation in log-linear models The natural variability of vital rates and associated statistics
  • F Anscornbe
  • Biometrika
  • N Breslow
Anscornbe, F. (1950). Sampling theory of the negative binomial and logarithmic series distributions. Biometrika, Breslow, N. (1984). Extra-Poisson variation in log-linear models. Appl. Sratist., 33, 3844. Brillinger, D.R. (1986). The natural variability of vital rates and associated statistics (with Discussion).
Comments on “An extension of quasilikelihood estimation”, by Godambe and Thompson
  • Dean C.