Project

Generalized Linear Latent Variable Models (GLLVM) TMB PACKAGE

Goal: Generalized Linear Latent Variable Models (GLLVM) is a complex statistical model with latent variables often considered to multivariate responses. The objective of this research is to implement GLLVMs on multivariate count outcomes using fast-automatic Laplace approximation from TMB package. The TMB package contains high-performance libraries specially designed for models with random effects with fast computation. In this research, we combine and enhance the parameters of related distribution and build models were estimated using programs developed by Herliansyah (2017) in ecology data.

Methods: Generalized Linear Model, Generalized Estimating Equations, Generalized Linear Latent Variable Models, GLLVM

Updates
0 new
0
Recommendations
0 new
0
Followers
0 new
7
Reads
0 new
310

Project log

Rezzy Eko Caraka
added a research item
Background: In heart data mining and machine learning, dimension reduction is needed to remove multicollinear-ity. Meanwhile, it has been proven to improve the interpretation of the parameter model. In addition, dimension reduction can also increase the time of computing in high dimensional data. Methods: In this paper, we perform high dimensional ordination towards event counts in intensive care hospital for Emergency Department (ED 1), First Intensive Care Unit (ICU1), Second Intensive Care Unit (ICU2), Respiratory Care Intensive Care Unit (RICU), Surgical Intensive Care Unit (SICU), Subacute Respiratory Care Unit (RCC), Trauma and Neurosurgery Intensive Care Unit (TNCU), Neonatal Intensive Care Unit (NICU) which use the Generalized Linear Latent Variable Models (GLLVM’s). Results: During the analysis, we measure the performance and calculate the time computing of GLLVM by employing variational approximation and Laplace approximation, and compare the different distributions, including Negative Binomial, Poisson, Gaussian, ZIP, and Tweedie, respectively. GLLVMs (Generalized Linear Latent Variable Models), an extended version of GLMs (Generalized Linear Models) with latent variables, have fast computing time. The major challenge in latent variable modelling is that the function f (Θ) = f (uΘ)h(u)du is not trivial to solve since the marginal likelihood involves integration over the latent variable u. Conclusions: In a nutshell, GLLVMs lead as the best performance reaching the variance of 98% comparing other methods. We get the best model negative binomial and Variational approximation, which provides the best accuracy by accuracy value of AIC, AICc, and BIC. In a nutshell, our best model is GLLVM-VA Negative Binomial with AIC 7144.07 and GLLVM-LA Negative Binomial with AIC 6955.922.
Rezzy Eko Caraka
added 7 research items
In order to account for correlated count data with excess zeros, we use a variational approximation multivariate latent generalized linear model. We performed two different simulation-based on level species and genus with Poisson and negative binomial to subject-specific interpretations. Methods: In this work, we use variational approximation to estimate parameter in multivariate latent generalized linear model. Otherwise, overdispersed a count outcome exhibiting many zeros, above the amount expected under- sampling from a Poisson distribution. Results: Through simulation studies, species counts follows negative binomial, and genus counts follow Poisson distribution and the performance of this methods evaluate by Akaike information criterion (AIC), Akaike information criterion corrected (AICc), and Bayesian Information Criterion (BIC). Conclusion: While these two sets of latent class parameters might be meaningful in certain species counts and genus counts.
Banteng, Bos javanicus, as wild cattle is a vital and importance source of germplasm in Indonesia. Various human activities currently threaten their conservation status. Nonetheless, no long-term monitoring programmes are in place for this species. Using distribution point and statistical analysis based on 46,116 camera trap days from December 2015 to January 2017, we aimed to provide habitat preferences, activity patterns and ecological data for banteng population in Ujung Kulon National Park (UKNP). It is the largest population of banteng in Indonesia and is living in a limited habitat area. According to the best occupancy model, the most suitable areas for this species were the secondary forest located at the center portion of UKNP. The presence of the invasive cluster sugar palm, Arenga obtusifolia, in dry season provides additional alternative food for banteng when its main food is scarcer in the forest. Banteng was cathemeral all year round, with the proportion of cathemeral records and the recording rate did not change with the protection of the level area, moon phase or season. To reduce the probability of encountering predators, banteng avoided the space use of dholes. Selection and avoidance of habitats was stronger than avoidance of the predator activity areas. Habitat competition from domestic cattle which grazed illegally in the national park appears to be a problem to the species since zoonosis appears from domestic cattle to banteng. Therefore, effective law enforcement and an adequate conservation strategy are required to eliminate the impacts of both direct and indirect threats.
Generalized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods. For likelihood based estimation, several closed form approximations for the marginal likelihood of GLLVMs have been proposed, but their efficient implementations have been lacking in the literature. To fill this gap, we show in this paper how to obtain computationally convenient estimation algorithms based on a combination of either the Laplace approximation method or variational approximation method, and automatic optimization techniques implemented in R software. An extensive set of simulation studies is used to assess the performances of different methods, from which it is shown that the variational approximation method used in conjunction with automatic optimization offers a powerful tool for estimation.
Rezzy Eko Caraka
added a research item
At vehicle insurance companies, the determination of the appropriate pure premium will make the business run well. In this study, we were modeling claims frequency data by considering the characteristics of policyholder such as policyholder's age, marital status, sex, car engine capacity, and age. The data used in this study is a non-motor vehicle and non-truck motor vehicle insurance data, which filed claims during 2013 in a general insurance company. Explaining the significance or value of the research. We are using Generalized Linear Model Multivariate Poisson with Artificial Marginal (GLM-MPAM) to estimate model parameters. The parameter values of this model are estimated using the Maximum Likelihood Estimation method. Furthermore, the estimation result of the parameter can be alternative in the calculation of the pure premium in the next period.
Rezzy Eko Caraka
added a research item
Ecology is a branch of biology that studies on the interaction and relationship between organisms and their environment. Abundance, distribution of organisms and patterns of biodiversity are great interests for many ecologists. One of interesting ecosystems to study is a cave. Cave has a typical environment character with a vulnerable ecosystem. Many caves in Indonesia, particularly in Gunungsewu karst area have been developed into tourist objects (show caves) and managed less wisely. Such cave management has the potential to change the environment and leads to ecosystem destruction. Arthropods are the most abundance fauna in cave that play critical roles in maintaining cave ecosystems equilibrium. In the heart of statistical ecology, we need to analyze the differences on Arthropods community and abiotic (climatic-edaphic) parameters among show caves and wild caves. Statistical techniques are needed for the extraction of such information. GLLVM is one method that is able to explain spatial-based information and is particularly suitable for ecology. In this paper, we use negative binomial models to see the differences on spatial patterns of predator and decomposer Arthropods, also characteristic of edaphic and climatic in each cave.
Rezzy Eko Caraka
added a research item
Pada umumnya pemodelan statistika bersifat abstrak yang merupakan konsep sederhana dari sebuah teori yang lumrahnya digunakan pada rumpun sains, teknologi penelitian tentang hubungan diantara fenomena-fenomena real merupakan dasar dari tujuan sains dan memainkan peranan penting dalam kehidupan sehari-hari. Dalam praktek di lapangan, data yang ditemukan seringkali tidak memenuhi asumsi yang diisyaratkan regresi linier klasik. Generalized linier model (GLM) merupakan perluasan dari model regresi linier dengan asumsi prediktor memiliki efek linier akan tetapi tidak mengasumsikan distribusi tertentu dari variabel respon dan digunakan ketika variabel respon merupakan anggota dari keluarga eksponensial. Penulis memberikan penjelasan untuk memahami Generalized Liniear Model (GLM), Generalized Additive Models (GAM), Generalized Additive Mixed Models (GAMM) dan Generalized liniear model (GLM) Kasus untuk respon biner juga Generalized Linear Latent Variable Models (GLLVM). Penulis juga memberikan panduan dalam analisis menggunakan software R. Pada buku ini juga memperkenalkan sebuah package baru yang bernama gllvm untuk estimasi model Generalized Linear Latent Variable Models. Package ini tersedia di CRAN R dan merupakan hasil karya penulis.
Rezzy Eko Caraka
added a research item
Inflation becomes an important thing to become a benchmark for economic growth , investor considerations factor in choosing the type of investment , as well as determining factors for the government in formulating fiscal policy , monetary or non-monetary to be run. Inflation calculations carried out using the Consumer Price Index , known as CPI as an indicator to measure the cost of consumption of goods and services markets. Based on an analysis using GAMM was concluded R2 value of 0.996 or can be interpreted that the inflation amounted to 99.6 % can be explained by the variables used in this study and 0.4 % is explained by other factors
Rezzy Eko Caraka
added a research item
The purposes of this research were to analyse: (i) Modelling the inflation rate in Indonesia with parametric regression. (ii) Modelling the inflation rate in Indonesia using non-parametric regression spline multivariable (iii) Determining the best model the inflation rate in Indonesia (iv) Explaining the relationship inflation model parametric and non-parametric regression spline multivariable. Based on the analysis using the two methods mentioned the coefficient of determination (R2) in parametric regression of 65.1% while non-parametric amounted to 99.39%. To begin with, the factor of money supply or money stock, crude oil prices and the rupiah exchange rate against the dollar is significant on the rate of inflation. The stability of inflation is essential to support sustainable economic development and improve people's welfare. In conclusion, unstable inflation will complicate business planning business activities, both in production and investment activities as well as in the pricing of goods and services produced.DOI: 10.15408/etk.v15i2.3260
Rezzy Eko Caraka
added a research item
Generalised Linear Latent Variable Models (GLLVMs) is a sophisticated joint model with random effects used in ecology for modelling abundances. The major challenge in GLLVMs is that the marginal likelihood involves an intractable integral over the latent variable. Hence, numerical methods are required for estimation. Numerical integration may suffer from long computation time when the number of responses is relatively large. In this thesis, our main goal is to develop fast code for estimation of GLLVMs for multivariate abundance data using the TMB package, and evaluate it by comparison to existing-written R code \cite{Niku}. The study found that in our three simulations the proposed codes produced significantly faster computation times, especially when the number of latent variables increase. Both bias and the MSE of parameters were roughly identical with a slightly lower bias for TMB. For the implementation of GLLVMs, we used counts of bird abundance for 96 species at 37 sites in Central Kalimantan, Indonesia with two explanatory variables. The results show that a model with one latent variable was the best model. This model was chosen based on information criteria and the prediction accuracy. We conclude TMB is a powerful package that can lead to substantial improvements in computation time when fitting latent variable models whose likelihood does not have a closed form. Keywords: GLLVM, random effects, Laplace approximation, computation time, TMB, Negative Binomial
Rezzy Eko Caraka
added a project goal
Generalized Linear Latent Variable Models (GLLVM) is a complex statistical model with latent variables often considered to multivariate responses. The objective of this research is to implement GLLVMs on multivariate count outcomes using fast-automatic Laplace approximation from TMB package. The TMB package contains high-performance libraries specially designed for models with random effects with fast computation. In this research, we combine and enhance the parameters of related distribution and build models were estimated using programs developed by Herliansyah (2017) in ecology data.