## About

112

Publications

20,188

Reads

**How we measure 'reads'**

A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more

3,996

Citations

## Publications

Publications (112)

This new edition of the successful multi-disciplinary text Statistical Modelling in GLIM takes into account new developments in both statistical software and statistical modelling. Including three new chapters on mixture and random effects models, it provides a comprehensive treatment of the theory of statistical modelling with generalised linear m...

R is now the most widely used statistical package/language in university statistics departments and many research organisations. Its great advantages are that for many years it has been the leading-edge statistical package/language and that it can be freely downloaded from the R web site. Its cooperative development and open code also attracts many...

We consider discrete mortality data for groups of individuals observed over time. The fitting of cumulative mortality curves as a function of time involves the longitudinal modelling of the multinomial response. Typically such data exhibit overdispersion, that is greater variation than predicted by the multinomial distribution. To model the extra-m...

A common feature of much survival data is censoring due to incompletely observed lifetimes. Survival analysis methods and models have been designed to take account of this and provide appropriate relevant summaries, such as the Kaplan–Meier plot and the commonly quoted median survival time of the group under consideration. However, a single summary...

A virtual interview with Murray Aitkin by Brian Francis and John Hinde, two of the original members of the Centre for Applied Statistics that Murray created at Lancaster University. The talk ranges over Murray's reflections of a career in statistical modelling and the many different collaborations across the world that have been such a significant...

We consider the analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: over‐dispersion (OD) models and zero‐inflation (ZI) models, both of which can be seen as generalisations of the Poisson distribution; we refer...

Background and Objective: Observational studies and experiments in medicine, pharmacology, and agronomy are often concerned with assessing whether different methods/raters produce similar values over the time when measuring a quantitative variable. This paper aims to describe the statistical package lcc, for are, that can be used to estimate the ex...

We consider models underlying regression analysis of count data in which the observed frequency of zero counts is unusually large, typically with respect to the Poisson distribution. We focus on two alternative modelling approaches: Over-Dispersion (OD) models, and Zero-Inflation (ZI) models, both of which can be seen as generalisations of the Pois...

Transition models are an important framework that can be used to model longitudinal categorical data. They are particularly useful when the primary interest is in prediction. The available methods for this class of models are suitable
for the cases in which responses are recorded individually over time. However,
in many areas, it is common for cate...

The orange variety "x11", which is a spontaneous mutant of the sweet orange, has a short juvenile period with early flowering. The data used in this paper are from a randomized design experiment that aimed to assess the plants' flowering characteristics when grafted onto two different varieties of lemon rootstock. The plants were pruned in each of...

Survival models have been extensively used to analyse time-until-event data. There is a range of extended models that incorporate different aspects, such as overdispersion/frailty, mixtures, and flexible response functions through semi-parametric models. In this work, we show how a useful tool to assess goodness-of-fit, the half-normal plot of resi...

When using univariate models, goodness-of-fit can be assessed through many different methods, including graphical tools such as half-normal plots with a simulation envelope. This is straightforward due to the notion of ordering of a univariate sample, which can readily reveal possible outliers. In the bivariate case, however, it is often difficult...

The package is available on the links: 1) Github: https://github.com/Prof-ThiagoOliveira/lcc or 2) CRAN: https://cran.r-project.org/web/packages/lcc/index.html

We propose a flexible class of regression models for continuous bounded data based on second-moment assumptions. The mean structure is modelled by means of a link function and a linear predictor, while the mean and variance relationship has the form ϕμp(1−μ)p, where μ, ϕ and p are the mean, dispersion and power parameters respectively. The models a...

A Weibull-model-based approach is examined to handle under- and overdispersed discrete data in a hierarchical framework. This methodology was first introduced by Nakagawa and Osaki (1975, IEEE Transactions on Reliability, 24, 300–301), and later examined for under- and overdispersion by Klakattawi et al. (2018, Entropy, 20, 142) in the univariate c...

The maturity stages of papaya fruit based on peel color are frequently characterized from a sample of four points on the equatorial region measured by a colorimeter. However, this procedure may not be suitable for assessing the papaya’s overall mean color and an alternative proposal is to use image acquisition of the whole fruit’s peel. Questions o...

We present global and local likelihood-based tests to evaluate stationarity in transition models. Three motivational studies are considered. A simulation study was carried out to assess the performance of the proposed tests. The results showed that they present good performance with the control of the type-I error, especially for ordinal responses,...

In the analysis of count data often the equidispersion assumption is not suitable, hence the Poisson regression model is inappropriate. As a generalization of the Poisson distribution, the COM-Poisson distribution can deal with under-, equi- and overdispersed count data. It is a member of the exponential family of distributions and has well known s...

In ecological field surveys, it is often of interest to estimate the abundance of species. It is frequently the case that unmarked animals are counted on different sites over several time occasions. A natural starting point to model these data, while accounting for imperfect detection, is by using Royle’s N-mixture model (Biometrics 60:108–115, 200...

Count and proportion data may present overdispersion, i.e., greater variability than expected by the Poisson and binomial models, respectively. Different extended generalized linear models that allow for overdispersion may be used to analyze this type of data, such as models that use a generalized variance function, random-effects models, zero-infl...

In evidence-based medicine, randomised trials are regarded as a gold standard in estimating relative treatment effects. Nevertheless, a potential gain in precision is forfeited by ignoring observational evidence. We describe a simple estimator that combines treatment estimates from randomised and observational data and investigate its properties by...

Transition models are an important framework that can be used to model longitudinal categorical data. A relevant issue in applying these models is the condition of stationarity, or homogeneity of transition probabilities over time. We propose two tests to assess stationarity in transition models: Wald and likelihood-ratio tests, which do not make u...

Sexually dimorphic growth models are typically estimated by fitting growth curves to individuals of known sex. Yet, macrospically ascribing sex can be difficult, particularly for immature animals. As a result, sex-specific growth curves are often fit to known-sex individuals only, omitting unclassified immature individuals occupying an important re...

In agroecosystems, parasitoids and predators may exert top-down regulation and predators for different reasons may avoid or give preference to parasitised prey, i.e., become an intraguild predator. The success of pest suppression with multiple natural enemies depends essentially on predator–prey dynamics and how this is affected by the interplay be...

Tree-based methods are a non-parametric modelling strategy that can be used in combination with generalized linear models or Cox proportional hazards models, mostly at an exploratory stage. Their popularity is mainly due to the simplicity of the technique along with the ease in which the resulting model can be interpreted. Variable selection bias f...

We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form $\mu + \phi\mu^p$, where $\mu$ is the mean, $\phi$ and $p$ are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by...

Chronic diseases tend to depend on a large number of risk factors, both environmental and genetic. Average attributable fractions were introduced by Eide and Gefeller as a way of partitioning overall disease burden into contributions from individual risk factors; this may be useful in deciding which risk factors to target in disease interventions....

Categorical data are quite common in many fields of science including in behaviour studies in animal science. In this article, the data concern the degree of lesions in pigs, related to the behaviour of these animals. The experimental design corresponded to two levels of environmental enrichment and four levels of genetic lineages in a completely r...

When analysing proportion data, a useful framework is that of generalized linear models. Random effects may be included in the linear predictor for different reasons, e.g., to incorporate correlation between observations taken within the same subject or to model overdispersion. In this work, we use binomial mixed models to model the occurrence of e...

We consider the analysis of time to event data from two populations undergoing life-testing under a joint progressive Type-II censoring scheme for both homogeneous and heterogeneous situations. We consider maximum likelihood estimation for this complex sampling scenario and its behaviour under different
censoring schemes. For heterogeneous populati...

We consider the analysis of time to event data from two populations
undergoing life-testing under a joint progressive Type-II censoring scheme for
both homogeneous and heterogeneous situations. We consider maximum likelihood
estimation for this complex sampling scenario and its behaviour under different
censoring schemes. For heterogeneous populati...

The mean residual life function provides a clear and simple summary of the effect of a treatment or a risk factor in units of time, avoiding hazard ratios or probability scales, which require careful interpretation. Estimation of the mean residual life is complicated by the upper tail of the survival distribution not being observed as, for example,...

Entomological data are often overdispersed, characterised by a larger variance than assumed by simple standard models. It is important to model overdispersion properly in order to avoid incorrect and misleading inferences. Outcomes of interest are often in the form of counts or proportions and we present extended models that incorporate overdispers...

Major trauma increases vulnerability to systemic infections due to poorly defined immunosuppressive mechanisms. It confers no evolutionary advantage. Our objective was to develop better biomarkers of post-traumatic immunosuppression (PTI); and to extend our observation that PTI was reversed by anti-coagulated salvaged blood transfusion, in the know...

Em estudos envolvendo insetos, é comum a observação de variáveis respostas que consis-tem de contagens ao longo de um período de tempo. Um modelo simples que pode ser utilizado para analisar esse tipo de dados é o modelo de Poisson, um caso particular de modelo linear generalizado (McCullagh e Nelder, 1989) para o qual a média é igual à variância....

Finite mixture models have been used extensively in clustering applications, where each component of the mixture distribution is assumed to represent an individual cluster. The simplest example describes each cluster in terms of a multivariate Gaussian density with various covariance structures. However, using finite mixture models as a clustering...

Longitudinal data is becoming increasingly common and various methods have been developed to analyze this type of data. Profiles from time-course gene expression studies, where cluster analysis plays an important role to identify groups of co-expressed genes over time, are investigated. A number of procedures have been used to cluster time-course g...

When fitting dose–response models to entomological data it is often necessary to take account of natural mortality and/or overdispersion. The standard approach to handle natural mortality is to use Abbott’s formula, which allows for a constant underlying mortality rate. Commonly used overdispersion models include the beta-binomial model, logistic-n...

We extend the family of multivariate generalized linear mixed models to include random effects that are generated by smooth
densities. We consider two such families of densities, the so-called semi-nonparametric (SNP) and smooth nonparametric (SMNP)
densities. Maximum likelihood estimation, under either the SNP or the SMNP densities, is carried out...

Gene expression over time can be viewed as a continuous process and therefore represented as a continuous curve or function. Functional data analysis (FDA) is a statistical methodology used to analyze functional data that has become increasingly popular in the analysis of time-course gene expression data. Several FDA techniques have been applied to...

Session ES14: Generalized mixed models

BioconductorBuntu is a custom distribution of Ubuntu Linux that automatically installs a server-side microarray processing environment, providing a user-friendly web-based GUI to many of the tools developed by the Bioconductor Project, accessible locally or across a network. System installation is via booting off a CD image or by using a Debian pac...

R is now the most widely used statistical package/language in university statistics departments and many research organisations. Its great advantages are that for many years it has been the leading-edge statistical package/language and that it can be freely downloaded from the R web site. Its cooperative development and open code also attracts many...

R is now the most widely used statistical package/language in university statistics departments and many research organisations. Its great advantages are that for many years it has been the leading-edge statistical package/language and that it can be freely downloaded from the R web site. Its cooperative development and open code also attracts many...

The dataset faults gives the number n of faults in 32 rolls of material of length l metres; the data come from Bissell (1972), and are reproduced in Table 5.1.
The number of faults is a non-negative integer, and is naturally modelled by the Poisson distribution, the standard model for count data.

When overdispersion is present in count data, a negative binomial (NB) model is commonly used in place of the standard Poisson model. However, the model is sometimes not adequate because of the occurrence of excess zeros and a zero-inflated negative binomial (ZNB) model may be more appropriate. This article proposes a general score test statistic f...

Most patients managed in primary care have more than one condition. Multimorbidity presents challenges for the patient and the clinician, not only in terms of the process of care, but also in terms of management and risk assessment.
To examine the effect of the presence of chronic kidney disease and diabetes on mortality and morbidity among patient...

The enthalpies of formation and bond dissociation energies, D(ROO-H), D(RO-OH), D(RO-O), D(R-O 2) and D(R-OOH) of alkyl hydroperoxides, ROOH, alkyl peroxy, RO, and alkoxide radicals, RO, have been computed at CBS-QB3 and APNO levels of theory via isodesmic and atomization procedures for R = methyl, ethyl, n-propyl and isopropyl and n-butyl, tert-bu...

The importance of chronic kidney disease as an independent risk factor for morbidity and mortality in patients with cardiovascular disease in the community is not widely recognized.
A retrospective cohort study based in the West of Ireland followed a randomized practice-based sample of patients with cardiovascular disease. A database of 1609 patien...

An abstract is not available.

In the analysis of morbidity and mortality data, variance component models are commonly used to provide an improvement in the estimation of rates for small regions which typically show large variability. This article investigates Irish suicide data using Poisson mixed models. The random effect distributions are estimated using Nonparametric Maximum...

Random Efiect Modelling for Regression Models with Gamma-Distributed Response

Nonparametric maximum likelihood (NPML) estimation for exponential families with unspecified dispersion parameter ? suffers from computational instability, which can lead to highly fluctuating EM trajectories and suboptimal solutions, in particular when ? is allowed to vary over mixture components. In this paper, a damped version of the EM algorith...

The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-...

Negative binomial maximum likelihood regression models are commonly used to analyze overdispersed Poisson data. There are various forms of the negative binomial model with different mean-variance relationships, however, the most generally used are those with linear, denoted by NB1 and quadratic relationships, represented by NB2. In literature, NB1...

Identification of the temporal pattern of diarrhea disease in children less than 5 years of age in Rio de Janeiro City (1995-1998) to provide support for decisions about prevention and control of the disease.
The weekly counts of hospitalizations and deaths due to diarrhea disease were analyzed separately. An initial generalized linear model (GLM)...

PURPOSE: Identification of the temporal pattern of diarrhea disease in children less than 5 years of age in Rio de Janeiro City (1995-1998) to provide support for decisions about prevention and control of the disease. METHODS: The weekly counts of hospitalizations and deaths due to diarrhea disease were analyzed separately. An initial generalized l...

This study focuses on the Arabic prose style of Qābūs ibn Wushmagīr, a Persian ruler of the 4th century AH (10th century AD ). Through a textual analysis of a selection of his letters it identifies some fascinating rhythmical patterns using the statistical technique of log-linear modelling. The quantitative analyses are based specifically on syllab...

In many situations count data have a large proportion of zeros and the zero-inflated Poisson regression (ZIP) model may be appropriate. A simple score test for zero-inflation, comparing the ZIP model with a constant proportion of excess zeros to a standard Poisson regression model, was given by van den Broek (Biometrics, 51 (1995) 738–743). We exte...

The concavity of some Bayesian D-optimality criteria is investigated and is found in some cases to depend on the prior distribution. In the case of a non-concave criterion, the standard equivalence theorem may fail, but a local version continues to apply.

Count data often show a higher incidence of zero counts than would be expected if the data were Poisson distributed. Zero-inflated Poisson regression models are a useful class of models for such data, but parameter estimates may be seriously biased if the nonzero counts are overdispersed in relation to the Poisson distribution. We therefore provide...

Biological control of pests is an important branch of entomology, providing environmentally friendly forms of crop protection. Bioassays are used to find the optimal conditions for the production of parasites and strategies for application in the field. In some of these assays, proportions are measured and, often, these data have an inflated number...

Overdispersion models for discrete data are considered and placed in a general framework. A distinction is made between completely specified models and those with only a mean-variance specification. Different formulations for the overdispersion mechanism can lead to different variance functions which can be placed within a general family. In additi...

We consider the problem of modelling count data with excess zeros and review some possible models. Aspects of model fitting and inference are considered. An example from horticultural research is used for illustration.

Some Bayesian approaches to D-optimum design of experiments are considered from the viewpoint of invariance under reparameterization of the underlying statistical model. An invariant criterion is proposed which does not require the detailed specification of a prior, and which is shown to be equivalent to G-optimality under a Jeffreys prior. The met...

I A Beginner's Course.- 1 Un Amuse-Gueule.- 2 An XploRe Tutorial.- 2.1 Getting Started.- 2.2 Two-Dimensional Plots.- 2.3 Creating a Macro.- 2.4 The Interactive Help System.- 2.5 Three-Dimensional Plots.- 2.6 Reading and Writing Data.- 3 The Integrated Working Environment.- 3.1 Introduction.- 3.2 The Editor.- 3.3 How to Run and Debug a Program.- 3.4...

The use of classification and regression tree (CART) methodology is explored for the diagnosis of patients complaining of anterior chest pain. The results are compared with those previously obtained using correspondence analysis and independent Bayes classification. The technique is shown to be of potential value for identifying important indicator...