About
72
Publications
23,204
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
824
Citations
Introduction
Publications
Publications (72)
Univariate regression models have rich literature for counting data. However, this is not the case for multivariate count data. Therefore, we present the Multivariate Generalized Linear Mixed Models framework that deals with a multivariate set of responses, measuring the correlation between them through random effects that follows a multivariate no...
Devido ao aumento de volume de dados, a urgência na busca de cientistas de dados devidamente qualificados têm crescido. Desta forma, as Instituições de Ensino Superior (IES) brasileiras têm buscado suprir tal demanda. Neste enredo, o objetivo deste artigo é realizar uma caracterização dos cursos de graduação em Ciência de Dados. Assim, buscou-se re...
Clustered competing risks data are a complex failure time data scheme. Its main characteristics are the cluster structure, which implies a latent within-cluster dependence between its elements, and its multiple variables competing to be the one responsible for the occurrence of an event, the failure. To handle this kind of data, we propose a full l...
Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one response variable at a time. Moreover, it is not poss...
Univariate regression models have rich literature for counting data. However, this is not the case for multivariate count data. Therefore, we present the Multivariate Generalized Linear Mixed Models framework that deals with a multivariate set of responses, measuring the correlation between them through random effects that follows a multivariate no...
This article describes the R package htmcglm implemented for performing hypothesis tests on regression and dispersion parameters of multivariate covariance generalized linear models (McGLMs). McGLMs provide a general statistical modeling framework for normal and non-normal multivariate data analysis along with a wide range of correlation structures...
Clinical trials are common in medical research where multiple non-Gaussian responses and time-dependent observations are frequent. The analysis of data from these studies requires statistical modeling techniques that take these characteristics into account. We propose a general strategy based on the Wald statistics to perform hypothesis tests like...
We propose a covariance specification for modeling spatially continuous multivariate data. This model is based on a reformulation of Kronecker’s product of covariance matrices for Gaussian random fields. The structure holds for different choices of covariance functions with parameters varying in their usual domains. In comparison with classical mod...
Researchers are often interested in understanding the relationship between a set of covariates and a set of response variables. To achieve this goal, the use of regression analysis, either linear or generalized linear models, is largely applied. However, such models only allow users to model one response variable at a time. Moreover, it is not poss...
Multivariate twin and family studies are one of the most important tools to assess diseases inheritance as well as to study their genetic and environment interrelationship. The multivariate analysis of twin and family data is in general based on structural equation modelling or linear mixed models that essentially decomposes sources of covariation...
We propose a covariance specification for modeling spatially continuous multivariate data. This model is based on a reformulation of Kronecker’s product of covariance matrices for Gaussian random fields. We illustrate the case with the Matérn function used for specifying marginal covariances. The structure holds for other choices of covariance func...
We propose a multivariate regression model to handle multiple continuous bounded outcomes. We adopted the maximum likelihood approach for parameter estimation and inference. The model is specified by the product of univariate probability distributions and the correlation between the response variables is obtained through the correlation matrix of t...
A natural dependence among diameters measured within-tree is expected in taper data due to the hierarchical structure. The aim of this paper was to introduce the covariance generalized linear model (CGLM) framework in the context of forest biometrics for Pinus taeda stem form modeling. The CGLMs are based on marginal specification, which requires a...
We propose the unit gamma mixed regression model to deal with continuous bounded variables in the context of repeated measures and clustered data. The proposed model is based on the class of generalized linear mixed models and parameter estimates are obtained based on the maximum likelihood method. The computational implementation combines automati...
Researchers are often interested in understand the relationship between a set of covari-
ates and a set of response variables. In order to achieve this goal, the use of regression
analysis, either linear or generalized linear models, is largely applied. However, such
models only allow users to model one response variable at a time. Moreover, it is...
We aimed to explore the genetic and environmental contributions to variation in the risk of hematologic malignancies and characterize familial dependence within and across hematologic malignancies. The study base included 316,397 individual twins from the Nordic Twin Study of Cancer with a median of 41 years of follow-up: 88,618 (28%) of the twins...
Understanding the processes that underlie the effects of tree diversity on primary production is of foremost importance to enhance climate change mitigation by tropical forests. Here, we investigated the effects of tree diversity on light interception over space and time in two tropical tree experiments, located in Panama—Sardinilla site (monocultu...
In the analysis of multivariate spatial random elds, it is essential to dene a covariance structure that adequately models the relationship between the variables under study. We propose a covariance structure with exponential correlation function for bivariate random elds, the SEC model. We compare the SEC model fits with the bivariate separable ex...
The machine learning area has recently gained prominence and articial neural networks are among the most popular techniques in this eld. Such techniques have the learning capacity that occurs during an iterative process of model tting. Multilayer perceptron (MLP) is one of the rst networks that emerged and, for thisarchitecture, backpropagation and...
We investigated a Gaussian conditional geostatistical spatio-temporal model (CGSTM) aiming to fit data observed at non-fixed locations over discrete times, based only on the observed locations. The model specifies the process state at the current time conditioning on the process state in the recent past. Particularly, the process mean uses a weight...
Maximum likelihood estimation (MLE) applied to ranked set sampling (RSS) designs is usually based on the assumption of perfect ranking. However, it may suffers of lack of efficiency when ranking errors are present. The main goal of this article is to investigate the performance of six alternative estimation methods to MLE for parameter estimation u...
Prediction of financial time series is a great challenge for statistical models. In general, the stock market times series present high volatility due to its sensitivity to economic and political factors. Furthermore, recently, the covid-19 pandemic has caused a drastic change in the stock exchange times series. In this challenging context, several...
We propose a multivariate regression model to deal with multiple outcomes along with repeated measures in the context of longitudinal data analysis. Our model allows for flexible and interpretable modelling of the covariance structure within outcomes by using a linear combination of known matrices, while the generalized Kronecker product is employe...
Human CYP3A enzymes (including CYP3A4 and CYP4A5) metabolize about 40% of all drugs and numerous other environmental and endogenous substances. CYP3A activity is highly variable within and between humans. As a consequence, therapy with standard doses often results in too low or too high blood and tissue concentrations resulting in therapeutic failu...
We propose a multivariate regression model to deal with multiple continuous bounded data. The proposed model is based on second-moment assumptions, only. We adopted the quasi-score and Pearson estimating functions for estimation of the regression and dispersion parameters, respectively. Thus, the proposed approach does not require a multivariate pr...
To quantify the surviving trees in a forest stand and estimate the probability of an individual tree to survival are a fundamental task in forest management planning. Therefore, the main goal of this paper was to estimate the tree survival probability in loblolly pine (Pinus taeda L.) plantations based on generalized linear models (GLM). The data s...
Tweedie regression models (TRMs) are flexible tools to deal with non‐negative right‐skewed data and can handle semi‐continuous data, that is, continuous data with probability mass at zero. The geometric sums of Tweedie random variables lead to the geometric Tweedie distributions. Their corresponding regression models (GTRMs) provide not only additi...
ABSTRACT Variables measured in a forest usually presented some degree of correlation. So, fitting models for estimating biometric variables in an independent way is not the most suitable approach. Thus, multivariate models become more interesting due to the ability of quantifying associations between response variables. In this context, the main ob...
In this paper, we further extend the recently proposed Poisson-Tweedie regression models to include a linear predictor for the dispersion as well as for the expectation of the count response variable. The family of the considered models is specified using only second-moments assumptions, where the variance of the count response has the form $\mu +...
We propose a new class of regression models to deal with longitudinal continuous
bounded data. The model is specified using second-moment assumptions, and we employ
an estimating function approach for parameter estimation and inference. The main advantage of the proposed approach is that it does not need to assume a multivariate probabilitydistribu...
We propose a flexible class of regression models for continuous bounded data based on second-moment assumptions. The mean structure is modelled by means of a link function and a linear predictor, while the mean and variance relationship has the form ϕμp(1−μ)p, where μ, ϕ and p are the mean, dispersion and power parameters respectively. The models a...
This article describes the R package mcglm implemented for fitting multivariate covariance generalized linear models (McGLMs). McGLMs provide a general statistical modeling framework for normal and non-normal multivariate data analysis, designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation s...
In the analysis of count data often the equidispersion assumption is not suitable, hence the Poisson regression model is inappropriate. As a generalization of the Poisson distribution, the COM-Poisson distribution can deal with under-, equi- and overdispersed count data. It is a member of the exponential family of distributions and has well known s...
This paper aims at the identification of black spots for traffic accidents, i.e. locations with accident counts beyond what is usual for similar locations, using spatially and temporally aggregated hospital records from Funen, Denmark. Specifically, we apply an autoregressive Poisson-Tweedie model, which covers a wide range of discrete distribution...
This paper describes the specification, estimation and comparison of double generalized linear compound Poisson models based on the likelihood paradigm. The models are motivated by insurance applications, where the distribution of the response variable is composed by a degenerate distribution at the origin and a continuous distribution on the posit...
Measuring hunting sustainability across West/Central African forests remains a challenge. Long-term assessment of trends is crucial. Via hunter-reported surveys we collected offtake data in three villages near the Dja Biosphere Reserve (southeast Cameroon). During four months (March-June) in 2003, 2009 and 2016, we gathered information on hunters,...
Dados de contagem são frequentes em estudos experimentais. Para análise desses dados, o modelo de regressão Poisson é largamente utilizado porém, devido a sua suposição de equidispersão é inadequado para diversas situações. Uma alternativa paramétrica para análise de contagens não equidispersas é o modelo COM-Poisson que, com a adição de um parâmet...
We present a general statistical modelling framework for handling multivariate mixed types of outcomes in the context of quantitative genetic analysis. The models are based on the multivariate covariance generalized linear models, where the matrix linear predictor is composed of an identity matrix combined with a relatedness matrix defined by a ped...
RESUMO: O capacete é um equipamento de segurança de uso obrigatório para usuários de motocicletas, composto por um invólucro exterior e um sistema de retenção. Esse sistema mantém o capacete fixo à cabeça, podendo ser utilizado de maneira correta (firmemente apertado) ou incorreta (frouxo ou solto). Mediante o uso incorreto, a ejeção se torna mais...
Tweedie regression models provide a flexible family of distributions to deal with non-negative highly right-skewed data as well as symmetric and heavy tailed data and can handle continuous data with probability mass at zero. The estimation and inference of Tweedie regression models based on the maximum likelihood method are challenged by the presen...
We propose a new class of discrete generalized linear models based on the class of Poisson-Tweedie factorial dispersion models with variance of the form $\mu + \phi\mu^p$, where $\mu$ is the mean, $\phi$ and $p$ are the dispersion and Tweedie power parameters, respectively. The models are fitted by using an estimating function approach obtained by...
We present a flexible statistical modelling framework to deal with multivariate count data along with longitudinal and repeated measures structures. The covariance structure for each response variable is defined in terms of a covariance link function combined with a matrix linear predictor involving known matrices. To specify the joint covariance m...
We propose a model-based geostatistical approach to deal with regionalized compositions. We combine the additive-log-ratio transformation with multivariate geostatistical models whose covariance matrix is adapted to take into account the correlation induced by the compositional structure. Such specification allows the usage of standard likelihood m...
We investigate an algorithm for maximum likelihood estimation of spatial generalized linear mixed models based on the Laplace approximation. We compare our algorithm with a set of alternative approaches for two datasets from the literature. The Rhizoctonia root rot and the Rongelap are, respectively, examples of binomial and count datasets modeled...
Background
Phosphatidylserine-containing liposomes (PSL) have been shown to reduce inflammation in experimental models of acute arthritis, by mimicking the apoptotic process. The aim of this study was to evaluate the effect of pegylated PSL (PEG-PSL) on chronic inflammation of collagen induced arthritis (CIA) in DBA/1J mice.
Methods
CIA was induce...
Dissertação de Mestrado apresentada ao programa de Pós-graduação em Métodos Numéricos em Engenharia. Curitiba, Fevereiro, 2010.
We propose a general framework for non-normal multivariate data analysis
called multivariate covariance generalized linear models (McGLMs), designed to
handle multivariate response variables, along with a wide range of temporal and
spatial correlation structures defined in terms of a covariance link function
combined with a matrix linear predictor...
Generalized linear mixed models (GLMM) are a large class of statistical
models, with applications in many areas of science. GLMM extends the linear
mixed models allowing different types of response variable. There are three
main data types : continuous, counts and binary. Common distributions for these
types of response variables are Gaussian, Pois...
Este estudo analisa o Exame Nacional de Desempenho do Estudante (Enade) na área de Estatística em 2009, provas objetivas, pela Teoria de Resposta ao Item (TRI). O Enade não aplica a TRI, mas a perspectiva dessa abordagem é de fornece uma visão geral da utilização através dos modelos de teoria de resposta ao item unidimensional e multidimensional. C...
Event counts are response variables with non-negative integer values
representing the number of times that an event occurs within a fixed domain
such as a time interval, a geographical area or a cell of a contingency table.
Analysis of counts by Gaussian regression models ignores the discreteness,
asymmetry and heterocedasticity and is inefficient,...
Beta regression models are a suitable choice for continuous response
variables on the unity interval. Random effects add further flexibility to the
models and accommodate data structures such as hierarchical, repeated measures
and longitudinal, which typically induce extra variability and/or dependence.
Closed expressions cannot be obtained for par...
This study aims at translating and validating the content of the instrument Conditions of Work Effectiveness - Questionnaire-II (CWEQ-II), developed by Laschinger, Finegan, Shamian and Wilk, modified from the original CWEQ for the Brazilian culture.
the methodological procedure consisted of the stages of translation of the instrument into the Portu...
Aedes aegypti has developed evolution-driven adaptations for surviving in the domestic human habitat. Several trap models have been designed considering these strategies and tested for monitoring this efficient vector of Dengue. Here, we report a real-scale evaluation of a system for monitoring and controlling mosquito populations based on egg samp...
Na modelagem estatística da variabilidade espacial, estimam-se os parâmetros da dependência espacial, que são utilizados na interpolação de valores em locais não amostrados. Para tal, o processo de modelagem deve ser realizado com critérios estatísticos que garantam predições confiáveis e representem a real variabilidade local. Este trabalho avalio...
O dengue é um problema de saúde pública em todas as regiões brasileiras.
Como não se dispõe de uma vacina efetiva, o elo vulnerável da cadeia epidemiológica é o vetor, o mosquito
Aedes aegypti. O entendimento das flutuações na população do mosquito é instrumental para reduzir a sua proliferação e a exposição das pessoas à infecção. Uma classe de...
RESUMO Na modelagem estatística da variabilidade espacial, estimam-se os parâmetros da dependência espacial, que são utilizados na interpolação de valores em locais não amostrados. Para tal, o processo de modelagem deve ser realizado com critérios estatísticos que garantam predições confiáveis e representem a real variabilidade local. Este trabalho...
Regression models are widely used on a diversity of application areas to describe associations between explanatory and response variables. The initially and frequently adopted Gaussian linear model was gradually extended to accommodate different kinds of response variables. These models were latter described as particular cases of the generalized l...
O s desempenhos dos modelos geoestatistico e aditivo generalizado com thin plate splines para a reconstituicao de superficies continuas gaussianas sao comparados atraves de um estudo de simulacao. O procedimento proposto para a comparacao leva em consideracao fatores sobre o processo gerador da superficie, variabilidade e suavidade, bem como, para...
Resumo: A captura de espécies exploradas em pescarias comerciais normalmenté e modelada sem qualquer consideração sobre a dependência do espaço e do tempo. Os modelos espaço-temporais representam uma maneira mais acurada para a modelagem de dados de populações biológicas, onde existe dependência no espaço, no tempo, e possivelmente na interação esp...
A prática do planejamento urbano com base em métodos consistentes é hoje uma demanda para os grandes centros urbanos do país. Neste sentido o uso adequado de técnicas estatísticas para dados
espaciais se faz necessário para subsidiar a tomada das decisões governamentais com bases mais
objetivas. Um dos grandes desafios atuais é a compreensão da di...
A prática do planejamento urbano com base em métodos consistentes é hoje uma demanda para os
grandes centros urbanos do país. Neste sentido o uso adequado de técnicas estatísticas para dados
espaciais se faz necessário para subsidiar a tomada das decisões governamentais com bases mais
objetivas. Um dos grandes desafios atuais é a compreensão da dim...
O Aedes aegypti é o vetor do vírus da dengue, doença para a qual tem-se observado a ocorrência de diversas epidemias. Estudos entomológicos contribuem para o entendimento da dinâmica de proliferação do mosquito. Este artigo tem como objetivo, propor um protocolo de análise combinando métodos estatésticos para identificar fatores associados à intens...
Este trabalho é motivado pelo interesse em modelar padrões espaciais em dados composicionais.
A categoria de problemas de interesse envolve, por exemplo, as frações granulométricas de um solo ou composição química de uma rocha, isto é, estruturas de dados em que as observações são partes de algum todo e referenciadas espacialmente.
O interesse ce...
This work describes the development of an integrated environment providing support for monitoring amounts of eggs of the Aedes aegypti, the main vector of the dengue disease. The proposed environment combines computa-tional resources and tools for spatial and/or temporal statistical analysis. The system is used on eggs count data obtained from the...
O objetivo deste artigó e propor modelos de regressão beta com efeitos aleatórios, para tratar dados no intervalo unitário que apresentem estrutura hierárquica, medidas repeti-das, estrutura longitudinal entre outras. São apresentados dois algoritmos para obter estimativas de máxima verossimilhança para os parâmetros do modelo proposto. A primeira...