Dipak Dey

Dipak Dey
University of Connecticut | UConn · Department of Statistics

Ph.D.

About

367
Publications
41,334
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
10,634
Citations
Citations since 2017
89 Research Items
3437 Citations
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
20172018201920202021202220230100200300400500600
Additional affiliations
August 1985 - present
University of Connecticut
Position
  • Board of Trustees Distinguished Professor

Publications

Publications (367)
Article
Full-text available
While both zero-inflation and the unobserved heterogeneity in risks are prevalent issues in modeling insurance claim counts, determination of Bayesian credibility premium of the claim counts with these features are often demanding due to high computational costs associated with a use of MCMC. This article explores a way to approximate credibility p...
Article
Full-text available
In this paper, we investigate several random structures, namely two classes of random lobster trees (RLTs) and a class of random spider trees (RSTs). The first class of RLTs grow with a fixed probability, whereas those from the second class evolve in a dynamic manner underlying a flavor of semi-opposite reinforcement. For these two classes, we char...
Article
Full-text available
The gamma distribution has been extensively used in many areas of applications. In this paper, considering a Bayesian analysis we provide necessary and sufficient conditions to check whether or not improper priors lead to proper posterior distributions. Further, we also discuss sufficient conditions to verify if the obtained posterior moments are f...
Article
Full-text available
Using objective priors in Bayesian applications has become a common practice to analyze data without subjective information. Formal rules usually obtain these prior distributions, and the data provide the dominant information in the posterior distribution. However, these priors are typically improper and may lead to improper posterior. Here, for a...
Article
Continuous clustered proportion data often arise in various areas of the social and political sciences where the response variable of interest is a proportion (or percentage). An example is the behavior of the proportion of voters favorable to a political party in municipalities (or cities) of a country over time. This behavior can be different dep...
Article
This paper develops recurrence relations for integrals that relate the density of multivariate extended skew-normal (ESN) distribution, including the well-known skew-normal (SN) distribution introduced by Azzalini and Dalla-Valle (1996) and the popular multivariate normal distribution. These recursions offer a fast computation of arbitrary order pr...
Article
Spatio-temporal Poisson models are commonly used for disease mapping. However, after incorporating the spatial and temporal variation, the data do not necessarily have equal mean and variance, suggesting either over- or under-dispersion. In this paper, we propose the Spatio-temporal Conway Maxwell Poisson model. The advantage of Conway Maxwell Pois...
Article
Nonlinear mixed effects models have received a great deal of attention in the statistical literature in recent years because of their flexibility in handling longitudinal studies, including human immunodeficiency virus viral dynamics, pharmacokinetic analyses, and studies of growth and decay. A standard assumption in nonlinear mixed effects models...
Article
In this paper we propose a statistical modeling framework that contributes to advancing methods for modeling insurance policy premium in the actuarial literature. Specification of separate frequency and severity models, accounting for territorial risk and performing accurate inference, are some of the challenges an actuary faces while modeling poli...
Article
The Heckman selection model is perhaps the most popular econometric model in the analysis of data with sample selection. The analyses of this model are based on the normality assumption for the error terms, however, in some applications, the distribution of the error term departs significantly from normality, for instance, in the presence of heavy...
Article
en We present a scalable Bayesian modelling approach for identifying brain regions that respond to a certain stimulus and use them to classify subjects. More specifically, we deal with multi‐subject electroencephalography (EEG) data with a binary response distinguishing between alcoholic and control groups. The covariates are matrix‐variate with me...
Article
Full-text available
The inability to distinguish aggressive from indolent prostate cancer is a longstanding clinical problem. Prostate specific antigen (PSA) tests and digital rectal exams cannot differentiate these forms. Because only ∼10% of diagnosed prostate cancer cases are aggressive, existing practice often results in overtreatment including unnecessary surgeri...
Article
This is the birth centenary year of the living legend and giant in the world of statistics, Prof. Calyampudi Radhakrishna (C R) Rao. This article is a partial reflection of Dr. Rao’s contributions to statistical theory and methodology, including unbiased estimation, variance reduction by sufficiency, efficiency of estimation, information geometry,...
Article
Full-text available
For existing Bayesian cross-validated measure of influence of each observation on the posterior distribution, this paper considers a generalization using the Bregman Divergence (BD). We investigate various practically useful and desirable properties of these BD based measures to demonstrate the superiority of these measures compared to existing Bay...
Article
Full-text available
Multivariate regression techniques are commonly applied to explore the associations between large numbers of outcomes and predictors. In real-world applications, the outcomes are often of mixed types, including continuous measurements, binary indicators, and counts, and the observations may also be incomplete. Building upon the recent advances in m...
Chapter
One of the fundamental steps in statistical modeling is to select the best-fitting model from a set of candidate models for given data. In this paper, based on Bayesian decision theory, we introduce a new model selection criterion, called Bregman divergence criterion (BDC). The proposed criterion improves many existing Bayesian model selection meth...
Article
Full-text available
This article introduces a novel use of the vine copula which captures dependence among multi-line claim triangles, especially when an insurance portfolio consists of more than two lines of business. First, we suggest a way to choose an optimal joint loss development model for multiple lines of business that considers marginal distribution, vine cop...
Preprint
Full-text available
Multivariate regression techniques are commonly applied to explore the associations between large numbers of outcomes and predictors. In real-world applications, the outcomes are often of mixed types, including continuous measurements, binary indicators, and counts, and the observations may also be incomplete. Building upon the recent advances in m...
Preprint
Full-text available
This paper develops recurrence relations for integrals that relate the density of multivariate extended skew-normal (ESN) distribution, including the well-known skew-normal (SN) distribution introduced by Azzalini and Dalla-Valle (1996) and the popular multivariate normal distribution. These recursions offer a fast computation of arbitrary order pr...
Preprint
Full-text available
While the hurdle Poisson regression is a popular class of models for count data with excessive zeros, the link function in the binary component may be unsuitable for highly imbalanced cases. Ordinary Poisson regression is unable to handle the presence of dispersion. In this paper, we introduce Conway-Maxwell-Poisson (CMP) distribution and integrate...
Article
In this article, mixed-effects state space models (MESSM, [Liu D, Lu T, Niu X-F, et al. Mixed-effects state-space models for analysis of longitudinal dynamic systems. Biometrics. 2011;67(2):476–485.]) are revisited. MESSM can be considered as an alternative to study the HIV dynamic in a longitudinal data environment, defining the mixed-effects comp...
Preprint
Full-text available
For a portfolio of life insurance policies observed for a stated period of time, e.g., one year, mortality is typically a rare event. When we examine the outcome of dying or not from such portfolios, we have an imbalanced binary response. The popular logistic and probit regression models can be inappropriate for imbalanced binary response as model...
Preprint
Full-text available
Nonlinear mixed effects models have received a great deal of attention in the statistical literature in recent years because of their flexibility in handling longitudinal studies, including human immunodeficiency virus viral dynamics, pharmacokinetic analyses, and studies of growth and decay. A standard assumption in nonlinear mixed effects models...
Article
With the advent of modern technologies, it is increasingly common to deal with data of large dimensions in various scientific fields of study. In this paper, we develop a Bayesian approach for the classification of multi‐subject high‐dimensional electroencephalography (EEG) data. In this EEG data, we have a matrix of covariates corresponding to eac...
Preprint
Full-text available
Heckman selection model is perhaps the most popular econometric model in the analysis of data with sample selection. The analyses of this model are based on the normality assumption for the error terms, however, in some applications, the distribution of the error term departs significantly from normality, for instance, in the presence of heavy tail...
Preprint
Full-text available
The use of objective prior in Bayesian applications has become a common practice to analyze data without subjective information. Formal rules usually obtain these priors distributions, and the data provide the dominant information in the posterior distribution. However, these priors are typically improper and may lead to improper posterior. Here, w...
Preprint
Full-text available
This paper proposes a general modeling framework that allows for uncertainty quantification at the individual covariate level and spatial referencing, operating withing a double generalized linear model (DGLM). DGLMs provide a general modeling framework allowing dispersion to depend in a link-linear fashion on chosen covariates. We focus on working...
Article
We provide a fully Bayesian approach to conduct estimation and inference for a copula model to jointly analyze bivariate mixed outcomes. To obtain posterior samples, we use Hamiltonian Monte Carlo, which avoids the random walk behavior of Metropolis and Gibbs sampling algorithms. We also provide an empirical Bayes approach to estimate the copula pa...
Article
Spatial modeling of consumer response data has gained increased interest recently in the marketing literature. In this paper we extend the (spatial) multi-scale model by incorporating both spatial and temporal dimensions in the dynamic multi-scale spatiotemporal modeling approach. Our empirical application with a US company’s catalog purchase data...
Article
Full-text available
In this paper, we introduce a new approach to generate flexible parametric families of distributions. These models arise on competitive and complementary risks scenario, in which the lifetime associated with a particular risk is not observable; rather, we observe only the minimum/maximum lifetime value among all risks. The latent variables have a z...
Preprint
Full-text available
Tweedie exponential dispersion family constitutes a fairly rich sub-class of the celebrated exponential family. In particular, a member, compound Poisson gamma (CP-g) model has seen extensive use over the past decade for modeling mixed response featuring exact zeros with a continuous response from a gamma distribution. This paper proposes a framewo...
Article
Background and Aim: We aim to build a classifier to distinguish between malaria-infected red blood cells (RBCs) and healthy cells using the two-dimensional (2D) microscopic images of RBCs. We demonstrate the process of cell segmentation and feature extraction from the 2D images. Methods and Materials: We describe an approach to address the problem...
Preprint
Full-text available
This article introduces a novel use of vine copula which captures dependence among multi-line claim triangles, especially when an insurance portfolio consists of more than three lines of business. First, we suggest a way to choose optimal joint loss development model for multiple lines of business which considers marginal distribution, vine copula...
Article
Response variables in medical sciences are often bounded, e.g. proportions, rates or fractions of incidence of some disease. In this work, we are interested to study if some characteristics of the population, e.g. sex and race which can explain the incidence rate of colorectal cancer cases. To accommodate such responses, we propose a new class of r...
Preprint
This work develops a valid spatial block-Nearest Neighbor Gaussian process (block-NNGP) for estimation and prediction of location-referenced large spatial datasets. The key idea behind our approach is to subdivide the spatial domain into several blocks which are dependent under some constraints. The cross-blocks capture the large-scale spatial vari...
Article
An authentic way for assessing the goodness of a model is to estimate its predictive capability. In this paper, we propose the D-measure, which measures the goodness of a model by comparing how close its predictions are from the observed data based on the survival function. The proposed D-measure can be used for all kinds of survival data in the pr...
Article
Because of the immense technological advances, very often we encounter data in high dimensions. Any set of measurements taken at multiple time points for multiple subjects leads to data of more than two dimensions (a matrix of covariates for each subject). We present a Bayesian variable‐selection method to identify the active regions in the brain a...
Article
In this work, we propose a flexible cure rate model to allow for spatial correlations by including spatial frailty in the interval-censored data setting. The proposed model is quite flexible and generalizes the Bernoulli, geometric, Poisson, and logarithmic models. It can be tested for the best fit in a straightforward way. Our approach enables dif...
Article
From fitting (training) a multiple linear regression model with p basis function predictors (e.g., polynomial, trigonometric) we study a type of confidence band covering an entire set of the response means, where a constant (C) is utilized to scale individual confidence interval of each response mean under consideration. We prove that the coverage...
Article
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulati...
Article
Full-text available
In some applications of censored regression models, the distribution of the error terms departs significantly from normality, for instance, in the presence of heavy tails, skewness and/or atypical observation. In this paper we extend the censored linear regression model with normal errors to the case where the random errors follow a finite mixture...
Article
Link functions and random effects structures are the two important components in building flexible regression models for dependent ordinal data. The power link functions include the commonly used links as special cases but have an additional skewness parameter making the probability response curves adaptive to the data structure. It overcomes the a...
Article
This paper proposes a Bayesian hierarchical cure rate survival model for spatially clustered time to event data. We consider a mixture cure rate model with covariates and a flexible (semi)parametric baseline survival distribution for uncured individuals. The spatial correlation structure is introduced in the form of frailties which follow a Multiva...
Article
We propose a penalized generalized estimating equations framework to jointly model correlated bivariate binary and continuous outcomes involving multiple predictor variables. We use sparsity-inducing penalty functions to simultaneously estimate the regression coefficients and perform variable selection on the predictors, and use cross-validation to...
Article
In this paper, we investigate the degree profile and Gini index of random caterpillar trees (RCTs). We consider RCTs which evolve in two different manners: uniform and nonuniform. The degrees of the vertices on the central path (i.e., the degree profile) of a uniform RCT follows a multinomial distribution. For nonuniform RCTs, we focus on those gro...
Article
Multivariate outcomes with multivariate features of possibly high dimension are routinely produced in various fields. In many real-world problems, the collected outcomes are of mixed types, including continuous measurements, binary indicators and counts, and a substantial proportion of values may also be missing. Regardless of their types, these mi...
Article
For sparse and high‐dimensional data analysis, a valid approximation of ‐norm has played a key role. However, there is not much study on the ‐norm approximation in the Bayesian literature. In this article, we introduce a new prior, called Gaussian and diffused‐gamma prior, which leads to a nice ‐norm approximation under the maximum a posteriori est...
Article
Full-text available
Attempts have been made to define new classes of distributions that provide more flexibility for modelling skewed data in practice. In this work we define a new extension of the generalized gamma distribution (Stacy, The Annals of Mathematical Statistics, 33, 1187-1192, 1962) for Marshall-Olkin generalized gamma (MOGG) distribution, based on the ge...
Preprint
Full-text available
In this paper, we introduce a new approach to generate flexible parametric families of distributions. These models arise on competitive and complementary risks scenario, in which the lifetime associated with a particular risk is not observable, rather, we observe only the minimum/maximum lifetime value among all risks. The latent variables have a z...
Preprint
In this paper, we investigate the degree profile and Gini index of random caterpillar trees (RCTs). We consider RCTs which evolve in two different manners: uniform and nonuniform. The degrees of the vertices on the central path (i.e., the degree profile) of a uniform RCT follow a multinomial distribution. For nonuniform RCTs, we focus on those grow...
Preprint
Full-text available
In this paper, we showed that the no-arbitrage condition holds if the market follows the mixture of the geometric Brownian motion (GBM). The mixture of GBM can incorporate heavy-tail behavior of the market. It automatically leads us to model the risk and return of multiple asset portfolios via the nonparametric Bayesian method. We present a Dirichl...
Article
Full-text available
In this paper, we present a Weibull link (skewed) model for categorical response data arising from binomial as well as multinomial model. We show that, for such types of categorical data, the most commonly used models (logit, probit and complementary log-log) can be obtained as limiting cases. We further compare the proposed model with some other a...
Article
This article studies autoregressive (AR) models assuming innovations with scale mixtures of skew-normal (SMSN) distributions, an attractive and flexible family of probability distributions. A Bayesian analysis considering informative prior distributions is presented. Comprehensive simulation studies are performed to support the performance of the p...
Article
en In many studies that involve time series variables limited or censored data are naturally collected. Practitioners commonly disregard censored data cases or replace these observations with some function of the limit of detection, which often results in biased estimates. In this article we propose an analytically tractable and efficient stochasti...
Article
Full-text available
It is important for portfolio manager to estimate and analyze the recent portfolio volatility to keep portfolio's risk within limit. Though number of financial instruments in the portfolio are very large, some times more than thousands, however daily returns considered for analysis is only for a month or even less. In this case rank of portfolio co...
Article
Environmental data are often spatially correlated and sometimes include observations below or above detection limits (i.e., censored values reported as less or more than a level of detection). Existing research studies mainly concentrate on parameter estimation using Gibbs sampling, and most research studies conducted from a frequentist perspective...
Article
In this paper, we develop some clique-based methods for social network clustering. The quality of clustering result is measured by a novel clique-based index, which is innovated from the modularity index proposed in [Newman 2006]. We design an effective algorithm based on recursive bipartition in order to maximize the objective function of the prop...
Article
We propose a novel model-based method for social network clustering in this paper. More precisely, we cluster a set of entities in a social network into disjoint communities based a newly adopted distance function. Our model not only allows mixed membership for each entity, but also provides reliable statistical inference on network structure. We d...
Article
We develop a Bayes factor-based approach for the design of non-inferiority clinical trials with a focus on controlling type I error and power. Historical data are incorporated in the Bayesian design via the power prior discussed in Ibrahim and Chen (Stat Sci 15:46–60, 2000). The properties of the proposed method are examined in detail. An efficient...
Article
The purpose of this paper is to develop a Bayesian analysis for the zero-inflated hyper-Poisson model. Markov chain Monte Carlo methods are used to develop a Bayesian procedure for the model and the Bayes estimators are compared by simulation with the maximum likelihood estimators. Regression modeling and model selection are also discussed and case...
Article
Full-text available
In multivariate regression models, a sparse singular value decomposition of the regression component matrix is appealing for reducing dimensionality and facilitating interpretation. However, the recovery of such a decomposition remains very challenging, largely due to the simultaneous presence of orthogonality constraints and co-sparsity regulariza...