ArticlePDF Available

Variational Approximation Multivariate Generalized Linear Latent Model in Diversity Termites in Riau and Peninsular Malaysia

Authors:
  • Badan Riset dan Inovasi Nasional

Abstract and Figures

In order to account for correlated count data with excess zeros, we use a variational approximation multivariate latent generalized linear model. We performed two different simulation-based on level species and genus with Poisson and negative binomial to subject-specific interpretations. Methods: In this work, we use variational approximation to estimate parameter in multivariate latent generalized linear model. Otherwise, overdispersed a count outcome exhibiting many zeros, above the amount expected under- sampling from a Poisson distribution. Results: Through simulation studies, species counts follows negative binomial, and genus counts follow Poisson distribution and the performance of this methods evaluate by Akaike information criterion (AIC), Akaike information criterion corrected (AICc), and Bayesian Information Criterion (BIC). Conclusion: While these two sets of latent class parameters might be meaningful in certain species counts and genus counts.
Content may be subject to copyright.
[SYLWAN., 164(1)]. ISI Indexed 161
Abstract. In order to account for
correlated count data with excess zeros, we use
a variational approximation multivariate
latent generalized linear model. We performed
two different simulation-based on level species
and genus with Poisson and negative binomial
to subject-specific interpretations. Methods: In
this work, we use variational approximation to
estimate parameter in multivariate latent
generalized linear model. Otherwise,
overdispersed a count outcome exhibiting
many zeros, above the amount expected under-
sampling from a Poisson distribution. Results:
Through simulation studies, species counts
follows negative binomial, and genus counts
follow Poisson distribution and the
performance of this methods evaluate by
Akaike information criterion (AIC), Akaike
information criterion corrected (AICc), and
Bayesian Information Criterion (BIC).
Conclusion: While these two sets of latent class
parameters might be meaningful in certain
species counts and genus counts.
Index TermsVariational Approximation,
Latent Variables, Niche Modelling, GLLVM,
Termites.
I. INTRODUCTION
cology is defined as the study of the
relationship of organisms, or groups of
organisms to their environment. Also, Ecology
Manuscript received October 11, 2019; revised January 15,
2020 This work supported by MOST, Taiwan (ROC).
1 College of Informatics, Chaoyang University of Technology,
Taiwan (ROC), 41349.Corresponding email:
crching@cyut.edu.tw, rezzyekocaraka@gmail.com
2 Department of Statistics, College of Natural Sciences Seoul
National University, Shin Lim-Dong, Kwan Ak-Ku, South
Korea,151-747.
3 Department of Statistics, College of Natural Sciences,
Pukyong National University, 45, Busan, South Korea.
can be described as the science of the mutual
relations between living organisms and their
environment. Every ecological process that occurs
in nature is represented as a mathematical equation
that will build a model.
Nevertheless, the mathematical equation can be
determined the solution with the help of
computational techniques and numerical methods
and statistical modelling. Ecology has initially been
a general knowledge and only studied
environmental relations individually based on
physiology. At that time, scholars, especially from
the natural sciences, paid less attention to various
sciences that were general, but more people
directed the development of the sciences toward
specialization. Although people's attention to
ecology compared to other sciences, especially
economics and politics, is inadequate, ecology
continues to grow. As proof that ecology can
continue to grow and spread its wings to other
fields such as botany, and zoology.
Ecological research ranges from the
adaptation of organisms to ecosystem dynamics
because there are many levels and types of
interactions between individuals in dealing with
challenges posed by their abiotic environment. In
general, research on species counts will produce
zero data because identified at that location there
are no species, and this is very difficult to analyze
because the assumptions in other methods that data
should not be 0 so the method with the Poisson
distribution will be beneficial in the analysis.
4 Department of Statistics, Padjadjaran University, Indonesia,
43563.
5 Bioinformatics and Data Science Research Center, Bina
Nusantara University, Jakarta 11480, Indonesia
6 Computer Science Department, BINUS Graduate Program
Master of Computer Science Bina Nusantara University,
Jakarta, Indonesia, 11480.
7School of Environmental and Natural Resource Science,
Universiti Kebangsaan Malaysia, 43600.
Rezzy Eko Caraka1,2,4,5, Rung Ching Chen1†, Youngjo Lee2, Maengseok Noh3,
Toni Toharudin4, Bens Pardamean5,6, Andi Saputra7
Variational Approximation Multivariate Generalized
Linear Latent Variable Model in Diversity Termites
Riau and Peninsular Malaysia
E
[SYLWAN., 164(1)]. ISI Indexed 162
Data count describes the number of events
in a certain period and can only be positive because
an event cannot be negative. In modelling with the
data counting will violate the Ordinary Least
Square (OLS) regression assumption, such as the
error follows the normal distribution (normality)
and has constant variance, so the data counting
cannot use OLS regression.
The modelling of count data in its
development led to Generalized Linear Models
(GLMs). GLMs are generalizations of classical
regression models or OLS regression (M Noh et al.
2019); (Kwon et al. 2016), and there are analytical
methods for data that do not meet the assumption
of a normal distribution (De Jong and Heller,
2008). One member of the GLMs family from
the Poisson distribution is Poisson regression. The
assumption that must be fulfilled in Poisson
regression is that the variation value of the Y
response variable must be the same as the average
value (Myers et al. 2012). In Poisson regression
analysis with discrete data, there is usually a
violation of that assumption (Abraham et al. 2007),
where the variance value is smaller than the
average value which is generally called the under
dispersion or the variance value is higher than the
average value called over dispersion. (Consul &
Famoye 1992) Stated that sometimes cases of over
dispersion were found in the data count. Chopped
data usually has a tremendous integer value and
contains a lot of zero values, so the variance is quite
large. If an assumption violation occurs, the
resulting conclusion is invalid because it
underestimates the estimated error standard. How
to overcome over dispersion is to form several
models that are a combination of the Poisson
distribution with several distributions, both discrete
and two continuous (mixed model distribution)
(Kéry 2010). In Poisson distribution combinations,
only a few distributions are often used in research
due to complex calculations.
In ecological and species modelling
(Warton, Foster, et al. 2015), the analysis will be
more complicated when involving latent variables,
i.e. unobserved variables, so it needs to be
developed into Multivariate GLM (Warton,
Blanchet, et al. 2015) which is one type of
statistical analysis used to analyse data with data
used in the form of many predictor variables
(Warton 2015); (Joyner et al. 2019) and many
response variables are very suitable for species
modelling (Caraka et al. 2018); (Rahman et al.
2019); and (Herliansyah & Fitia 2018). The main
challenge in GLLVM modelling is that the
estimation process involves integrals on random
variables that do not have an explicit form unless
the response variable is normally distributed. To
estimate the integral function, a method is needed.
Some methods currently being developed are
Laplace approximations (Bower & Savitsky 2008)
and Variational approximations (Hui et al. 2017).
However, another challenge is due to a high-
dimensional data (X. D. Wang et al. 2019). Number
of respond variables and random variables used by
computational problems which are quite
complicated.
Latent variable models are powerful
probabilistic tools for extracting useful latent
structure from otherwise unstructured data and
have proved useful in numerous applications.
Especially in ecological modelling (Y. Wang et al.
2012). A particular case of latent variable models,
where observations originate from a linear
transformation of latent variables. Despite their
modelling simplicity, latent linear models are
useful and widely used instruments for data
analysis in practice and include, among others,
such notable examples as probabilistic principal
component analysis and correlation component
analysis, independent component analysis.
Otherwise, it is well known that estimation and
inference are often intractable for many latent
linear models and one has to make use of
approximate methods often with no recovery
guarantees.
The remainder of the paper is organized as
follows. Section II provides an explanation of
Niche-modelling and presents the multivariate
latent glm and variational approximation, Section
III our results and discussion. Finally, conclusions
and future research directions are indicated in
Section IV.
II. METHODS
The models distribution of species generally
known envelope-modelling, habitat modelling, and
niche-modelling. The main objective is to estimate
the similarity of conditions in all regions by using
emergence, and predictor data as objects in models.
Generally, The species distribution uses climate
data (Kurniawan, Soesilohadi, et al. 2018) as a
predictor describing the outline of the modelling
process of species distribution (Kurniawan,
Rahmadi, et al. 2018). Also, when modelling
species, the data will tend to follow the Poisson
distribution (Warton 2005). Poisson distribution is
a distribution for events with a small probability of
[SYLWAN., 164(1)]. ISI Indexed 163
occurrence where the occurrence depends on a
specific time interval or in a particular area with
observations in the form of discrete variables. The
characteristics of the experiments that follow the
Poisson distribution are as follows. 1. Events that
occur in large populations with small probabilities
2. Events depend on specific time intervals 3.
Events are included in the counting process. and, 4.
Repetition of events that follow the distribution of
binomial distributions.
The probability function of the Poisson
distribution can be stated as follows (Ha & Lee
2003).

 
(1)
Where is the average of the random variable Y
with the Poisson distribution where the average
value and variance have values greater than zero.
The function used in the Poisson regression model
is ln, so  ln i = i . Thus Poisson
regression can be stated as follows.



(2)
Poisson regression analysis is a regression analysis
that is part of the Generalized Linear Model
(GLM). Poisson regression is used for data with
response variables that follow the Poisson
distribution (Y ~ Poisson). The important
assumption in this analysis is that the variance must
be equal to the average called equidispersion. But
in some studies, this condition is not met, often
found the count data which has a higher than the
average range is called over dispersion. However,
if the condition found in Poisson regression
analysis with a smaller than average variance, it is
called under dispersion.
According to Hinde and Demetrio (Hinde
& Demétrio 1998), there are several possibilities
for equidispersion not to be fulfilled in a model,
including the diversity of observations (the
difference between individuals as components not
explained by the model), correlations between
individual responses. The consequence of not meet
the equidispersion is that the Poisson regression is
not suitable for modelling the data because the
formed model will produce a biased parameter
estimate. In addition, over dispersion also results in
a smaller standard error value (underestimates)
than it should, resulting in inappropriate
conclusions. Over dispersion can be checked by
use the deviance value. The range of the Poisson
distribution is equal to the average (σ2 = µ). Over
dispersion is detected using the amount of deviance
divided by the degree of freedom, that has a value
greater than 1. At the same time, under dispersion
is detected by the value of deviance divided by the
degree of freedom that has a value of less than 1.
Deviance value can be expressed as an equation:


(3)
Where
n=number of observation
= variable response to-i with i=1,2,...,n
= mean of variable response y which is
influenced by the predictor variable value on the i-
th observation
In a GLM, the response y follows a distribution
from the exponential family of distributions
(including normal, binomial, Poisson, and
gamma), and its expectation is modelled as
E(y) = µ (4)
There is a link function g(·) connecting µ with
such that
g(µ) = Xβ (5)
The variance of y is a function of µ. Take a
Poisson distribution, for instance, the variance of
y is equal to the mean µ (Maengseok Noh & Lee
2007) ; (Lee & Noh 2012). This relationship
between the mean and the variance depends
directly on the assumed distribution of y (Lee &
Nelder 2001). Table 1 represents for all GLMs,
the variance of y is the product of a variance
function V (µ) and a dispersion parameter φ (del
Castillo & Lee 2008). With m being the binomial
denominator, we have the following variances and
variance functions, V (µ).
Table 1 Variance
Variance of y
V(µ)
Normal
1
Poisson
µ
µ
Gamma

Binomial
µ(m- µ)/m
µ(m- µ)/m
[SYLWAN., 164(1)]. ISI Indexed 164
For the Poisson and binomial distributions
=1, whereas for normal and gamma
is a parameter to be estimated. For the normal
distribution, is simply the residual variance.
Generalized linear latent variable models
(GLLVMs) is extended version of GLM with
latent variables (Rahman et al. 2019) ; (Niku, Hui,
et al. 2019), (Niku, Brooks, et al. 2019); (Niku et
al. 2017). Suppose  is the multivariate
responses across species with  being
the observational units and  being the
number of species. The expectation of  is
modeled through the following relationship
  (6)
with  being the linear predictor and is a
link function. Linear components of the predictor
are similar to that of GLM with the inclusion of
random effects as follows:
 (7)
Where represents the row effect, contains a
matrix of the regression coefficient to
corresponding independent variables,, and is
the loading factors or quantities describing the
interactions across species and connecting the
unobserved variables to responses. In many
papers, the distributional choice of latent
variables, , is a normal distribution with mean
zero and constant variance.
In the heart of parameter estimation, the
data likelihood with a lower bound can be
estimated by variational approximation. VA
generally has known computational viability and
trade the bias. However, if our random vector
followed exponential family then we can write the
distribution joint as follows:
 (8)
Where is latent variable X and observed
variables Y,<x,y> and have inner product between
x and y, s a vector of parameters
and  can be defined
as the realised value of the variables.
However is the log
partition function and ensures distribution is
normalised. The parameter will be estimated
using the method of moments is a generalization
of the Gaussian-Poisson model

(9)



 
 
 (10)
It is clear that the under-determination of the
estimating equations (2) is a direct result of
reducing the dimensionality of X via the clustering
function k. By projecting the data
 for each cluster onto a single
dimension of an auxiliary data object
the latent process could be fitted to
the auxiliary data. Without encountering this
problem:
  (11)





 






Where are some functions and
 (12)
  (13)
Are analogous to 
In terms of both interpretability and computational
convenience, restricting to be a linear function
of its arguments can be easily justified, and this
approach is taken here. Several choices of function
are available in this regard, including cluster
averages:

 (14)
Then we can rewrite to random effects:


(15)
or


(17)
And a representative dimension projection
 (18)
[SYLWAN., 164(1)]. ISI Indexed 165
One dimension in each random effect is chosen to
represent the dimension   whose
average data over replicates

 was
closest in norm to the average data over replicates
 

 (19)
 



  
 (20)
Also, if more than one dimension in 
Minimizes the norm then one of them can be
chosen arbitrarily. Once a representative dimension
for each cluster  has been chosen,
sample moments from the data can be used to
approximate the expected values and estimate the
parameters of the latent process.
III. SPECIES COUNTS
Indonesia is a tropical country that is rich in plant
diversity which strongly supports termites. About
80% of Indonesia's land area is a suitable habitat
for development. Termites belong to the order
Blattodea family Termitidae consisting of 2000
species that spread in the world. Termite species
diversity on the island of Sumatra and Peninsular
Malaysia have not been fully inventoried. In the
1990s, an inventory was carried out by several
researchers. However, (Nandika et al. 2003)
examined the types of termites and their spread in
the DKI Jakarta and Bandung regions. The study
found nine species of termites, namely
Microtermes insperatus, M. incertoides,
Macrotermes gilvus, Odontotermes javanicus, O.
malaccensis, Schedorhinotermes javanicus,
Coptotermes curvignathus, C. haviliandi, C.
kalshoveni, C. heimi, and C. travians.
Termites also attack the nursery phase of the
cocoa, so it is hazardous if attacked in that phase.
(Keng 2006) Has researched about termites in
cocoa plantations compared to primary forests in
Bukit Tawau Park, Sabah. Based on the results of
these studies obtained data that termites also attack
cocoa plantations even though the level of diversity
is lower when compared to primary forests.
Termite abundance is more abundant in cocoa
plantations so that the attacks become severe.
Plentiful food sources cause this. The identification
of termites is more accessible from the family level
to the species level. Termites in tropical regions
such as Indonesia have been widely studied, and
there are several families, namely Kalotermitdae,
Rhinotermitidae, and Termitidae. The
Kalotermitidae family is a group of termites that
attack and nest in living trees or nest in dry wood
that is not related to the soil.
This group of termites is commonly called dry
wood termites. 2 Species of dry wood termites that
usually attack settlements include Cryptotermes
cynocephalus. These termite species will attack
settlements whose buildings are made of wood
(Bong et al. 2012). The second family is
Rhinotermitidae. The distinctive feature of this
family is the presence of sclerite pieces on the flat
thorax. This family often nests in wood or other
materials that contain cellulose found on the
surface of the soil. Types of termites from this
family that often attack settlements and plantations
are of the genus Schedorhinotermes and
Coptotermes. Coptotermes genus has a
characteristic that is when the colony feels
disturbed, and the soldiers will release liquid like
milk whose function is to paralyze the enemy
(Saputra et al. 2017). This fluid comes out of the
fontanel in front of the head. The third family is
Termitidae, which is the group of termites with the
most species. Characteristics of the family
Termitidae are the presence of sclerite pieces on the
pronotum shaped like a saddle; the center of the
nest is in the ground and makes a mushroom-
shaped gallery of sponges, and usually makes
mounds of land. Examples of genera that are
known to attack plantations are Macrotermes,
Odontotermes, and Microtermes. Termites and ants
are a group of social insects. Both have almost the
same characteristics.
People often mention that termites are white
ants, even though termites and ants do not have a
kinship. Based on its taxonomy, termites are an
order of Blattodea (Inward et al. 2007) while ants
are a group of the order Hymenoptera. In addition,
in terms of ant morphology, there is a concern
between the piston and the abdomen which is a
characteristic of the wasp group (Hymenoptera:
Apocrita). The hardening is called petiole whereas
in termites there is no hardening, and often there is
no clear line between the piston and the abdomen.
Generally termites feed in closed areas (cryptic)
characterized by wandering tubes that come from
the ground to form channels that connect one
wandering tube to another, whereas ants tend to
forage in open areas. Termite food in the form of
lignocellulose consisting of cellulose, lignin, and
hemicellulose.
Cellulose is composed of glucose polymers that
are rich in fibre while ants eat organic material that
contains sugar which is composed of
polysaccharides. Social insects have different tasks
[SYLWAN., 164(1)]. ISI Indexed 166
in a colony or commonly referred to as caste.
Termites have a caste distribution system, namely
reproductive caste, soldiers, and workers.
Reproductive caste has the duty to mate and lays
eggs consisting of king and queen as primarily
reproductive.
Reproductive caste is divided into two, namely
primary and neotene (secondary). Primary
reproductive caste originates from winged termites
(alate) or larons, whereas neotent reproduction
arises when the queen or king dies or disappears in
the colony so that this neotent reproductive caste
will appear (Nandika et al. 2003). The soldier caste
has the characteristic of not marrying, eyes are
reduced, and its job is only to defend or protect the
colony in the event of an attack from the enemy.
This caste is characterized by the development of a
forward-looking mandible which is usually used to
attack enemies who attack its colony. The termite
group Nasutitermitinae is a group of termites that
is unique compared to other termite groups. The
warrior caste has 3 mandibles that are not well
developed, but the fontanel (forehead) of a more
developed head. Workers can digest wood that
contains a lot of cellulose. The help of microbes in
the termite's body makes it easier in the digestive
process. Microbes that aid in the digestion process
releases cellulolytic enzymes to facilitate the
degradation of cellulose from wood.
According to (Klepzig et al. 2009) the types of
relationships between insects and symbiotic
microorganisms are comprehensive. This
relationship has many variations such as
mutualism, commensalism, or parasitism. Termites
are one example that has a symbiotic mutualism
between termites and their symbiotic organisms.
Termites have a specific relationship to their
symbiotic microorganisms such as bacteria, fungi,
and protozoa. This relationship has benefits both
directly and indirectly. The symbiotic relationship
in these termites can be helpful in food digestion,
nutrient absorption, and in protecting against
natural enemies.
According to (Sarkar 1998), symbiosis is a
shared life in different organisms. Termites consist
of a diverse collection of species, broadly divided
into two, namely low-level termites and high-level
termites. Low-level termites are symbiotic with a
large proportion of prokaryotic and protist
populations (single-celled eukaryotes). High-level
termites only consist of the family Termitidae, but
the species are more than three-quarters of all
species and are symbiotic with most groups of
bacteria. The association of cellulolytic protists in
low-termite digestion is known as an example of
mutual symbiosis. Protists produce acetate from
cellulose particles or wood endocytosis; the result
of the acetate is absorbed by termites as an energy
and carbon source.
This research was conducted in Indonesia and
Malaysia This study was conducted in 11 sampling
locations, which is the entire sampling location in
oil palm plantations located in Riau (Indonesia),
Johor and Pahang (Peninsular Malaysia). Six
different sampling locations in Riau were
conducted while in Johor and Pahang; five different
areas were sampled, each sampling location
representing different types of land and farm
management.
Fig. 1 Sampling locations range from the Riau palm oil ecosystem (Indonesia) to Johor-Pahang,
(Peninsular Malaysia) (Saputra et al. 2016).
[SYLWAN., 164(1)]. ISI Indexed 167
Table 2 Location of Termite Sampling in Indonesia to Peninsular Malaysia
Location
Sampling Location
Farm Management
and Land Type
GPS
IdBCB
Indonesia Batang
Cenaku Belilas
Private & Clay
00036’571” S
102o33’592” T
IdSPTK
Indonesia Sako Pangean
Taluk Kuantan
Company &
Sandland
00o20’314” U
101o35’089” T
IdRS
Indonesia Redang Seko
Private & Clay
00o12’805” S
102o17’032” T
IdSgPK
Indonesia Sungai Pagar
Kampar
Private & Peat
00o15’559”
U101o24’623” T
IdCPSK
Indonesia Central
Plantation Services
Kampar
Private & Peat
00o15’379” U
101o35’979” T
IdFRGB
Indonesia First
Resources Group
Bengkalis
Company & Peat
01o20’968” U
102o01’441” T
MyBPM
Malaysia Bukit Pasir
Muar
Private & Clay
02o06’0.97” U
102o37’02.4” T
MyFKT
Malaysia Felda Kahang
Timur
Company & Clay
02o87’13.9” U
103o28’55.9” T
MyFNT
Malaysia Felda Nitar
Timur
Company & Clay
02o22’14.5” U
103o45’47.2” T
MyLER
Malaysia Ladang Endau
Rompin
Company & Clay
02o36’12.4” U
103o32’47.9”T
MyLSK
Malaysia Ladang Sungai
Kemelai
Company &
Sandland
02o36’19.4” U,
103o30’39.9” T
3.1 Counts Based on Species
In this simulation using not open dataset that has
been performed by previous research on Termites in
Riau and Peninsular Malaysia (Saputra et al. 2016);
(Nur-Atiqah et al. 2017) ; (Halim et al. 2018) ;
(Saputra et al. 2018) ; (Zaki et al. 2019). In the first
stage, the modeling will be carried out by
considering only the groups of species. As seen in
Figure 2 below. Coptotermitinae was not found in
(IdSgPK, MyBPM, MyFKT, MyFNT) but was
found in (IdFRGB, IdCPSK), Rhinotermitinae was
not found in (IdBCB, MyBPM) and was found in
(IdCPSK). Macrotermitinae not found in (IdFRGB,
IdCPSK, and MyLER), Nasutitermitinae not found
in (IdRS, IdSgPK, and MyBPM). Overall different
from Termitinae Found in all research sites (Saputra
et al. 2016). A total of 522 termites been
successfully sampled at oilfields from Belilas (Riau
Province, Indonesia) to Endau (Johor-Pahang,
Peninsular Malaysia) Out of the total number of
individuals, this study recorded five subfamilies of
two The termites are the subfamilies
Coptotermitinae and Rhinotermitinae of the family
Rhinotermitidae and the subfamilies Termitinae,
Macrotermitinae and Nasutitermitinae from the
family Termitidae.
The highest abundance of termites was
recorded from the family Termitidae (43 species;
349 populations) from three subfamilies namely
Termitinae (17 species; 165 populations),
Macrotermitinae (12 species; 65 populations) and
Nasutitermitinae (14 species; 119 populations).
While for Rhinotermitidae 15 species (173
populations) have been successfully recorded from
two subfamilies, first, Coptotermitinae (5 species;
60 populations) and Rhinotermitinae (10 species;
113 populations).
Based on the number of species, MyLSK recorded
the highest number of species and 26 species were
recorded, followed by IdCPSK (21 species),
MyFKT and MyLER (15 species), MyFNT (12
species), IdSPTK (11 species), MyBPM with (8
species), then IdSgPK and IdFRGB (7 species),
IdRS (5 species) and the smallest region of IdBCB
of only three species were found.
[SYLWAN., 164(1)]. ISI Indexed 168
Fig. 2 Species Counts in Riau and Peninsular Malaysia
In species richness and diversity compared to
temperature and humidity acquired abundance
tends to increase with increasing humidity. It can be
seen that the pattern of abundance decreases
immediately after the temperature factor and the
humidity decreases in figure 3. However, the
temperature does not show a significant increase or
decrease in temperature at the sampling location.
Different results were found in the pattern of
temperature and humidity, whereas the temperature
increased, the humidity decreased, but this did not
affect the abundance of termites. The richness of the
termite species is found to show similarities in
abundance.
The structure of the termite community may
be influenced by environmental factors found in the
pattern of temperature and humidity as temperatures
increase, humidity decreases but this does not affect
the abundance of termites and species richness.
However, temperature, humidity is the primary
physical factors affecting the termite pattern. These
different results can be achieved because data
collection of environmental factors needs to be done
more thoroughly and carefully so as not to
misinterpret the existing ones.
Fig. 3. Heatmap Diversity, Richness and Climatic
Factors
Then, we wil perform GLLVM to see the distribution
at the species level. Based on modelling with regard
to AIC. A good model is the one that has the
minimum AIC among all other models (Joyner et al.
2019). So that the model chosen with LV 1 is chosen.
Since the best distribution is negative binomial, we
can write in equation (6) Let  Follows the
negative binomial distribution with mean μ and
variance. By using the log link function.
Table 3. Selected distribution based on species counts
Distribution
LV
log-likelihood:
Accuracy
negative.binomial
1
-171.9731
AIC: 373.9463
AICc: 277.9463
BIC: 379.9147
negative.binomial
2
-175.2916
AIC: 388.5833
AICc: 304.1388
BIC: 396.1433
Poisson
1
-294.5091
AIC: 609.0182
AICc: Inf
BIC: 612.9971
Poisson
2
-186.0558
AIC: 400.1117
AICc: 295.1117
BIC: 405.6822
[SYLWAN., 164(1)]. ISI Indexed 169
We have the same relationship between μ and  like
the Poisson model. The conditional distribution on
is given by (Caraka et al. 2018).
 




 (21)
Function of log-likelihood for negative binomial
response can be written:


 






(22)
Where



(23)
And




(24)
We get the values from the intercept in table 3 and
visualise the species ordination in Figure 4.
Table 4. Parameter GLLVM Based On Species Counts
Intercept
theta.LV1
Dispersion
Coptotermitinae
0.7243423
1.4167729
2.2014351
Rhinotermitinae
1.8027498
1.0410484
1.0636533
Termitinae
2.6231070
-0.4822085
0.8501716
Macrotermitinae
1.4606034
-1.0184050
0.7945360
Nasutitermitinae
2.3342301
0.3717223
1.9698468
Fig. 4 Ordination Species Based on Negative Binomial GLLVM
Based on Figure 5 explained that in each species co
unts there is a perfect relationship (negative 1) Rhi
notermitinae against Termitinae and Macrotermitin
ae and perfectly (positive 1) like Termitinae agains
t Macrotermitinae.
[SYLWAN., 164(1)]. ISI Indexed 170
Fig. 5 Species Correlation
3.2 Counts Based on Genus
After getting information on species counts, then
we are interested to run this model until the genus
level. However, we do the same modelling to
compare binomial negatives and Poisson. Tabel 4
represents that Poisson provides high accuracy than
other models with AIC 1080.345. Besides, after
modelling using Poisson LV 2 we can get the
parameter based on genus in Table 5. The value of
intercept visualized in Figure 6 as ordinary genus.
Table 4.Selected Distribution Based on Genus Counts
Distribution
LV
log-likelihood
Accuracy
negative.binomial
1
-474.9195
AIC: 1117.839
AICc: 924.8659
BIC: 1151.262
negative.binomial
2
-474.1781
AIC: 1170.356
AICc: 924.178
BIC: 1214.523
Poisson
1
-499.4764
AIC: 1110.953
AICc: 972.1703
BIC: 1133.235
Poisson
2
-457.1727
AIC: 1080.345
AICc: 889.3317
BIC: 1113.371
Figure 6, Figure 7, and Figure 8, respectively. We
can form a variational bound of the likelihood

on the . Then, the variational
approximation can be relies on maximizing the lower
bound over a tractable of :
 (24)
Where


 (25)

 
Then, term by term inequality in equation (25) can
be written:

(26)
In our variational approximation, we can set and
choose of product distribution of q-dimensional
[SYLWAN., 164(1)]. ISI Indexed 171
multivariate cases with diagonal covariance
matrices:

 
(27)
Where 
In the Poisson-case, the variational expectation of the
non-linear part involving b the matrix of
conditional expectations A is equal to and can be
expressed as:


(28)
The choice of a good starting value is crucial in
iterative procedures as it helps the algorithm start in
the attractor field of a good local maximum and can
substantially speed-up convergence. Here we
initialize  by fitting a GLLVM-Poisson to Y,
then extracting the regression coefficients and
the variance-covariance matrix of the
Pearson residuals. We set  and 


 

 the best rank q-
approximation of, as given by keeping the
first q-dimensions of a singular value decomposition
of. We set the other starting values as
. In general, the idBCB region has a
significant difference compared to the other areas, as
does IdFRGB, MyLER. However, MyFNT, IDRs,
IdRS have the same kinship for the diversity value of
this species.
Fig. 6 Ordination Genus Based on Poisson- GLLVM
Fig. 7 Correlation Based on Genus
Fig. 8 Q Graph Based on Genus
[SYLWAN., 164(1)]. ISI Indexed 172
Table 5. Paramater Based on Genus GLLVM
Intercept
theta.LV1
theta.LV2
Coptotermes curvignathus
0.4579891
-0.5180037
0.0000000
Coptotermes kalshoveni
0.4945442
-0.7563808
-0.3912430
Coptotermes sepangensis
0.7098347
-0.5713176
-0.4052082
Coptotermes havilandi
0.1429008
-0.3260377
-0.2614940
Parrhinotermes aequalis
0.1861423
0.3279931
-0.6498629
Parrhinotermes pygmaeus
0.3342082
-0.6126105
-0.3818074
Parrhinotermes spp. A
0.0000002
0.0000001
-0.0000001
Schedorhinotermes brevialatus
0.6323911
-0.6129293
-0.6993178
Schedorhinotermes javanicus
0.2002901
-0.2152474
-0.1701744
Schedorhinotermes mediobscurus
0.4900498
-0.3467540
-0.2655336
Schedorhinotermes malaccensis
0.3713058
-0.7299368
-0.5104836
Schedorhinotermes sarawakensis
0.5179787
-0.8183993
0.2943385
Prohamitermes mirabilis
0.0700554
0.0830517
-0.1603465
Microcerotermes dubius
0.2974319
-0.0924066
-0.1246402
Termes rostratus
0.8005997
0.8291831
0.2435449
Procapritermes spp. G
0.2316985
0.0966011
0.1023581
Pericapritermes mohri
0.2982160
0.2042979
-0.3159980
Pericapritermes semarangi
0.0000001
0.0000000
0.0000001
Macrotermes gilvus
0.4152051
-0.2335550
1.1629640
Macrotermes malacensis
0.3328945
0.4642242
-0.0326502
Nasutitermes havilandi
0.3280501
-0.5767489
-0.3486288
Nasutitermes matangensis
0.5653070
0.1825307
-0.5255028
Nasutitermes neopravus
0.1529289
-0.0052302
-0.1658990
Nasutitermes proatripennis
0.6173534
0.4406987
-1.0388870
Nasutitermes roboratus
0.1723334
0.0162690
-0.3598069
Bulbitermes constrictiformis
0.3647910
0.0729814
0.1236907
Bulbitermes constrictoides
0.3067339
0.0637614
0.0542762
Bulbitermes neopasullis
0.0867176
0.0193210
0.0165708
For Termitinae, species that can be found are
Prohamitermes mirabilis, Microcerotermes
dubius, M. havilandi, Termes rostratus,
Procapritermes sp. G., Pericapritermes
buiteinzorgi, P. mohri. The species are distributed
as follows; P. mirabilis species are found in
IdSgPK. The species M. dubius is found at
IdCPSK. M. havilandi species are found IdFRGB.
T. rostratus species found in IdRS, IdBCB,
IdSgPK, IdCPSK Then the species
Procapritermes sp. G is found on IdSgPK. P.
buiteinzorgi species found IdSgPK. Also, P.
mohri species are found in IdSPTK and IdSgPK.
Each diversity based on the genus can be seen in
Figure 9 that the distribution of densities is many
0 which means that in some species there is no
equal relationship with the number at a particular
location.
Fig. 9 Heatmap Based on Genus
Species from the subfamily Termitinae are found
to have a wide variety in each location which
clearly visualize in figure 10. The species of P.
mirabilis, M. dubius, M. serrula is an arboreal
[SYLWAN., 164(1)]. ISI Indexed 173
species of wood eater. While species of T.
rostratus and P. sp. G is a species of wood-eater
or middle-class eater. The nesting method for
species of T. rostratus is nestled by inquilines
which means that this species builds up hostage
on other species and for species of P. sp. G. builds
a hive hypogeal. Species of P. buitenzorgi and P.
mohri are termites of organic soil and are
hypogealous.
Fig. 10 Heatmap Based on Family in Each Location
IV.CONCLUSION AND FUTURE WORK
In this study, we succeeded in simulating the
diversity of species of termites by applying and
perform multivariate latent generalized linear
models or shortly GLLVM. Then to get the best
parameters on our GLLVM, we employ variational
approximation by evaluating based on AIC, AICc,
and BIC. In simulations obtained at the species
level by a negative binomial with level 1, obtained
AIC: 373.9463, AICc: 277.9463, and BIC:
379.9147. Unlike the genus level modelling, the
best distribution is Poisson with AIC: 1080.345,
AICc: 889.3317, and BIC: 1113.371. In general,
GLLVM is well able to present the diversity and
richness of Termites in Riau and Peninsular
Malaysia. For further research, we will compare the
variational approximation technique with Laplace
approximation to see the difference in
computational time. Then the distribution will be
tried using zero-inflated Poisson (Loeys et al.
2012), zero-inflated negative-binomial (Hall
2000), beta-binomial (Kim & Lee 2019), Tweedie
(Shono 2008), extended Tweedie (Bonat et al.
2018), hurdle (Zeileis et al. 2008), and extended
hurdle negative binomial (Lee et al. 2017) ;
(Maengseok Noh & Lee 2019).
Acknowledgement. This paper is supported by
the Ministry of Science and Technology, Taiwan,
under Grant MOST-107-2221-E-324-018-MY2
and MOST-106-2218-E-324-002 and under
collaboration with Lab Hierarchical Generalized
Linear Model (H-GLM), Department of
Statistics, College of Natural Sciences Seoul
National University and Department of Statistics,
Padjadjaran University. This research partially
supported by Bioinformatics Data Science
Research Center Bina Nusantara University.
Author Contributions:
Conceptualization: Rezzy Eko Caraka, Youngjo
Lee, Rung Ching Chen, Maengseok Noh.
Data curation: Rezzy Eko Caraka, Andi Saputra.
Formal analysis: Rezzy Eko Caraka.
Investigation: Rezzy Eko Caraka, Youngjo Lee,
Rung Ching Chen, Maengseok Noh.
Methodology: Rezzy Eko Caraka, Youngjo Lee,
Rung Ching Chen, Maengseok Noh.
Software: Rezzy Eko Caraka.
Validation: Rezzy Eko Caraka, Youngjo Lee,
Rung Ching Chen, Maengseok Noh.
Visualization: Rezzy Eko Caraka
Writing original draft: Rezzy Eko Caraka,
Youngjo Lee, Rung Ching Chen, Maengseok
Noh.
[SYLWAN., 164(1)]. ISI Indexed 174
Writing review & editing: Rezzy Eko Caraka,
Youngjo Lee, Rung Ching Chen, Maengseok
Noh, Toni Toharudin, Bens Pardamean, Andi
Saputra.
REFERENCES
Abraham, V. M., Walpole, R. E. & Myers, R. H.
2007. Probability and Statistics for
Engineers and Scientists. The Mathematical
Gazette. doi:10.2307/3616039
Bonat, W. H., Jørgensen, B., Kokonendji, C. C.,
Hinde, J. & Demétrio, C. G. B. 2018.
Extended PoissonTweedie: Properties and
regression models for count data. Statistical
Modelling.
doi:10.1177/1471082X17715718
Bong, M. C. F., King, P. J. H., Ong, K. H. &
Mahadi, N. M. 2012. Termites assemblages
in oil palm plantation in Sarawak, Malaysia.
Journal of Entomology.
doi:10.3923/je.2012.68.78
Bower, B. & Savitsky, T. 2008. Laplace
Approximation. Graphical Models.
Caraka, R. E., Shohaimi, S., Kurniawan, I. D.,
Herliansyah, R., Budiarto, A., Sari, S. P. &
Pardamean, B. 2018. Ecological Show Cave
and Wild Cave: Negative Binomial Gllvm’s
Arthropod Community Modelling.
Procedia Computer Science 135: 377384.
doi:10.1016/j.procs.2018.08.188
Consul, P. C. & Famoye, F. 1992. Generalized
poisson regression model. Communications
in Statistics - Theory and Methods.
doi:10.1080/03610929208830766
del Castillo, J. & Lee, Y. 2008. GLM-methods for
volatility models. Statistical Modelling
8(3): 263283.
doi:10.1177/1471082X0800800303
Ha, I. Do & Lee, Y. 2003. Estimating Frailty
Models via Poisson Hierarchical
Generalized Linear Models. Journal of
Computational and Graphical Statistics.
doi:10.1198/1061860032256
Halim, M., Nasir, D. M., Saputra, A., Ayob, Z. A.,
Ahmad, S. Z. S., Din, A. M. M.,
Khairuddin, W. N. W. M., et al. 2018.
Komuniti makroartropoda yang berasosiasi
dengan ekosistem sawit di atas jenis tanah
yang berbeza. Serangga 22(3): 3855.
Hall, D. B. 2000. Zero-inflated poisson and
binomial regression with random effects: A
case study. Biometrics. doi:10.1111/j.0006-
341X.2000.01030.x
Herliansyah, R. & Fitia, I. 2018. Latent variable
models for multi-species counts modeling
in ecology. Biodiversitas Journal of
Biological Diversity 19(5): 18711876.
doi:10.13057/biodiv/d190538
Hinde, J. & Demétrio, C. G. B. 1998.
Overdispersion: Models and estimation.
Computational Statistics and Data
Analysis. doi:10.1016/S0167-
9473(98)00007-3
Hui, F. K. C., Warton, D. I., Ormerod, J. T.,
Haapaniemi, V. & Taskinen, S. 2017.
Variational Approximations for
Generalized Linear Latent Variable
Models. Journal of Computational and
Graphical Statistics.
doi:10.1080/10618600.2016.1164708
Inward, D., Beccaloni, G. & Eggleton, P. 2007.
Death of an order: A comprehensive
molecular phylogenetic study confirms that
termites are eusocial cockroaches. Biology
Letters. doi:10.1098/rsbl.2007.0102
Joyner, C., McMahan, C., Baurley, J. &
Pardamean, B. 2019. A twophase Bayesian
methodology for the analysis of binary
phenotypes in genomewide association
studies. Biometrical Journal 111.
doi:10.1002/bimj.201900050
Keng, W. . 2006. Spesies comparison of termite
(Isoptera) in primary forest of Tawau Hill
Park, Sabah and adjacent cocoa plantation
area. University Malaysia Sabah.
Kéry, M. 2010. Poisson Mixed-Effects Model
(Poisson GLMM). Introduction to
WinBUGS for Ecologists, hlm. 203209.
doi:10.1016/B978-0-12-378605-0.00016-8
Kim, G. & Lee, Y. 2019. Marginal versus
conditional beta-binomial regression
models. Statistical Methods in Medical
Research. doi:10.1177/0962280217735703
Klepzig, K. D., Adams, A. S., Handelsman, J. &
Raffa, K. F. 2009. Symbioses: A Key Driver
of Insect Physiological Processes,
Ecological Interactions, Evolutionary
Diversification, and Impacts on Humans.
Environmental Entomology.
doi:10.1603/022.038.0109
Kurniawan, I. D., Rahmadi, C., Caraka, R. E. &
Ardi, T. A. 2018. Short Communication:
Cave-dwelling Arthropod community of
Semedi Show Cave in Gunungsewu Karst
Area, Pacitan, East Java, Indonesia.
Biodiversitas 19(3): 857866.
doi:10.13057/biodiv/d190314
Kurniawan, I. D., Soesilohadi, R. C. H., Rahmadi,
C., Caraka, R. E. & Pardamean, B. 2018.
The difference on Arthropod communities’
structure within show caves and wild caves
in Gunungsewu Karst area, Indonesia.
Ecology, Environment and Conservation
[SYLWAN., 164(1)]. ISI Indexed 175
24(1).
Kwon, S., Oh, S. & Lee, Y. 2016. The use of
random-effect models for high-dimensional
variable selection problems. Computational
Statistics and Data Analysis 103(1): 401
412.
Lee, Y. & Nelder, J. 2001. Modelling and
analysing correlated non-normal data.
Statistical Modeling 1(1): 316.
doi:10.1177/1471082X0100100102
Lee, Y. & Noh, M. 2012. Modelling random
effect variance with double hierarchical
generalized linear models. Statistical
Modelling 12(6): 487502.
doi:10.1177/1471082X12460132
Lee, Y., Rönnegård, L. & Noh, M. 2017. Data
analysis using hierarchical generalized
linear models with R. Data Analysis Using
Hierarchical Generalized Linear Models
with R. doi:10.1201/9781315211060
Loeys, T., Moerkerke, B., de Smet, O. & Buysse,
A. 2012. The analysis of zero-inflated count
data: Beyond zero-inflated Poisson
regression. British Journal of Mathematical
and Statistical Psychology.
doi:10.1111/j.2044-8317.2011.02031.x
Myers, R. H., Montgomery, D. C., Vining, G. G.
& Robinson, T. J. 2012. Generalized Linear
Models: With Applications in Engineering
and the Sciences: Second Edition.
Generalized Linear Models: With
Applications in Engineering and the
Sciences: Second Edition.
doi:10.1002/9780470556986
Nandika, D., Rismayadi, Y., Diba, F. & Harun, J.
. 2003. Rayap: Biologi dan
Pengendaliaannya. Surakarta:
Muhammadiyah University Press.
Niku, J., Brooks, W., Herliansyah, R., Hui, F. K.
C., Taskinen, S. & Warton, D. I. 2019.
Efficient estimation of generalized linear
latent variable models. PLoS ONE 14(5): 1
20. doi:10.1371/journal.pone.0216129
Niku, J., Hui, F. K. C., Taskinen, S. & Warton, D.
I. 2019. gllvm: Fast analysis of multivariate
abundance data with generalized linear
latent variable models in r. Methods in
Ecology and Evolution 110.
doi:10.1111/2041-210X.13303
Niku, J., Warton, D. I., Hui, F. K. C. & Taskinen,
S. 2017. Generalized Linear Latent Variable
Models for Multivariate Count and Biomass
Data in Ecology. Journal of Agricultural,
Biological, and Environmental Statistics.
doi:10.1007/s13253-017-0304-7
Noh, M, Lee, Y., Oud, J. H. . & Toharudin. 2019.
Hierarchical likelihood approach to non-
Gaussian factor analysis. Journal of
Statistical Computation and Simulation
89(3): 15551573.
Noh, Maengseok & Lee, Y. 2007. Robust
modeling for inference from generalized
linear model classes. Journal of the
American Statistical Association 102(479):
10591072.
doi:10.1198/016214507000000518
Noh, Maengseok & Lee, Y. 2019. Extended
negative binomial hurdle models. Statistical
Methods in Medical Research.
doi:10.1177/0962280218766567
Nur-Atiqah, J., Saputra, A., Mohammad Esa, M.
F., Shafuraa, O., Billy, A. N. A., Mohd
Yaziz, N. A. A. & Faszly, R. 2017.
Coptotermes sp. (rhinotermitidae:
Coptotermitinae) infestation pattern shifts
through time in oil palm agroecosystem.
Serangga 22(2): 1531.
Rahman, D. A., Herliansyah, R., Rianti, P.,
Rahmat, U. M., Firdaus, A. Y. &
Syamsudin, M. 2019. Ecology and
Conservation of the Endangered Banteng
(Bos javanicus) in Indonesia Tropical
Lowland Forest. HAYATI Journal of
Biosciences, 26(2), 68. 26(2): 6880.
Saputra, A., Halim, M., Jalaludin, N.-A., Hazmi,
I. R. & Faszly Rahim. 2017. Effects of Day
Time Sampling on The Activities of
Termites in Oil Palm Plantation at
Malaysia-Indonesia. Serangga 22(1): 23
32.
Saputra, A., Jalaludin, N. A., Hazmi, I. R. &
Rahim, F. 2016. Termite assemblages from
oil palm agroecosystems across Riau
Province, Sumatra, Indonesia. AIP
Conference Proceedings.
doi:10.1063/1.4966841
Saputra, A., Muhammad Nasir, D., Jalaludin, N.
A., Halim, M., Bakri, A., Mohammad Esa,
M. F., Riza Hazmi, I., et al. 2018.
Composition of termites in three different
soil types across oil palm agroecosystem
regions in Riau (Indonesia) and Johor
(Peninsular Malaysia). Journal of Oil Palm
Research. doi:10.21894/jopr.2018.0054
Sarkar, S. 1998. Evolution by association: A
history of symbiosis. Studies in History and
Philosophy of Science Part C: Studies in
History and Philosophy of Biological and
Biomedical Sciences. doi:10.1016/s1369-
8486(98)00010-7
Shono, H. 2008. Application of the Tweedie
distribution to zero-catch data in CPUE
analysis. Fisheries Research.
doi:10.1016/j.fishres.2008.03.006
[SYLWAN., 164(1)]. ISI Indexed 176
Wang, X. D., Chen, R. C., Yan, F., Zeng, Z. Q. &
Hong, C. Q. 2019. Fast Adaptive K-Means
Subspace Clustering for High-Dimensional
Data. IEEE Access 7: 4263942651.
doi:10.1109/ACCESS.2019.2907043
Wang, Y., Naumann, U., Wright, S. T. & Warton,
D. I. 2012. Mvabund- an R package for
model-based analysis of multivariate
abundance data. Methods in Ecology and
Evolution. doi:10.1111/j.2041-
210X.2012.00190.x
Warton, D. I. 2005. Many zeros does not mean
zero inflation: Comparing the goodness-of-
fit of parametric models to multivariate
abundance data. Environmetrics.
doi:10.1002/env.702
Warton, D. I. 2015. New opportunities at the
interface between ecology and statistics.
Methods in Ecology and Evolution.
doi:10.1111/2041-210X.12345
Warton, D. I., Blanchet, F. G., O’Hara, R. B.,
Ovaskainen, O., Taskinen, S., Walker, S. C.
& Hui, F. K. C. 2015. So Many Variables:
Joint Modeling in Community Ecology.
Trends in Ecology and Evolution.
doi:10.1016/j.tree.2015.09.007
Warton, D. I., Foster, S. D., De’ath, G., Stoklosa,
J. & Dunstan, P. K. 2015. Model-based
thinking for community ecology. Plant
Ecology. doi:10.1007/s11258-014-0366-3
Zaki, N. I. A., Nasir, D. M., Aziz, A., Azhari, L.
H., Saputra, A., Halim, M., Muslim, S. A.,
et al. 2019. Diversity of ground beetles
(Coleoptera: Carabidae) in oil palm
plantation in endau-rompin, Pahang,
Malaysia. Serangga 24(1): 91102.
Zeileis, A., Kleiber, C. & Jackman, S. 2008.
Hurdle regression models in R. Journal of
Statistical Software.
REZZY EKO CARAKA
received The B.S. degree
(S.Si) from Department of
Statistics Diponegoro
University and Master of
Science by research (MSc-
Res) School of Mathematical
Sciences the National
University of Malaysia.
Moreover, in 2019 he starts PhD in College of
Informatics, Chaoyang University of Technology,
Taiwan. He acts as a researcher in Bioinformatics
& Data Science Research Center University of
Bina Nusantara (BINUS) and Department of
Statistics, Padjadjaran University. He also fellow
researcher in lab GLM-H Department of
Statistics, Seoul National University, South
Korea. At the same time, He was co-founder
Statistical Calculator (STATCAL). His research
interests include Statistical Climatology, Climate
Modeling, Ecological Modelling, Statistical
Machine Learning, and Large-scale Optimization.
Email: rezzyekocaraka@gmail.com
RUNG CHING CHEN
received a B.S. from the
Department of Electrical
Engineering in 1987, and
an M. S. from the Institute
of Computer Engineering
in 1990, both from
National Taiwan
University of Science and
Technology, Taipei,
Taiwan. In 1998, he received his Ph.D. from the
Department of Applied Mathematics in computer
science, National Chung Hsing University. He is
now a distinguished professor in the Department
of Information Management, Taichung, Taiwan.
His research interests include network
technology, pattern recognition, and knowledge
engineering, IoT and data analysis, and
applications of Artificial Intelligence.
Email: crching@cyut.edu.tw
YOUNGJO LEE is full
Profesor in Department of
Statistics, College of Natural
Sciences, Seoul National
University (SNU). He already
published 4 distinct books in
Generalized Linear Models
with Random Effects and
GLM Likelihood. His Research Interest
Generalized Linear Model, Hierarchical
Generalized Linear Model, Random effects, Data
Science, Statistical Software Development.
MAENGSEOK NOH
received the B.S., M.S. and
Ph.D. degrees from the
Department of Statistics,
Seoul National University,
in 1996, 1998 and 2005,
respectively. His thesis was
on analysis of binary data and robust modelling
via hierarchical likelihood. Since 2006, he has
been a Professor with Department of Statistics,
Pukyong National Univeristy, Busan, Korea. His
current research interests are application and
software developments for hierarchical
generalized linear models, development of
methodology for zero-inflated Poisson model
[SYLWAN., 164(1)]. ISI Indexed 177
with spatial correlation and hierarchical approach
non-Gaussian factor analysis
TONI TOHARUDIN
currently works at the
Department of Statistics,
Universitas Padjadjaran.
Toni researches in Statistics.
He received the Master of
Science University of
Leuven Belgium (2004-
2005) and Ph.D. Spatial Sciences University of
Groningen (20072010). Moreover, Toni act as
head of the research group in time series and
regression.
BENS PARDAMEAN has
over thirty years of global
experience in information
technology, bioinformatics,
and education. After
successfully leading the
Bioinformatics Research
Interest Group, He currently
holds a dual appointment as the Director of
Bioinformatics & Data Science Research Center
(BDSRC) and as an Associate Professor of
Computer Science at the University of Bina
Nusantara (BINUS) in Jakarta, Indonesia. He
earned a doctoral degree in informative research
from the University of Southern California
(USC), as well as a master’s degree in computer
education and a bachelor’s degree in computer
science from California State University, Los
Angeles. Andi Saputra received The
B. S. degree (S.Si) from
Department of Biology, Riau
University and Master of
Science by Research (M. Sc)
in Zoology School of
Environmental and Natural
Resource Science, Universiti
Kebangsaan Malaysia. His research interest in
Zoology, Entomology and Ecology.
... The use of regression primarily considers a classic approach that does not include prior information. Bayes regression could solve such weaknesses in the traditional regression (Caraka et al., 2020a(Caraka et al., , 2020b(Caraka et al., , 2020c(Caraka et al., , 2020d. The Bayes approach allows researchers to combine prior knowledge and information obtained from samples and then use it to estimate posterior parameters (Caraka and Tahmid, 2019). ...
... Furthermore, univariate linear regression models explain the variability of dependent variables (Lee et al., 2017). It provides information on one or more variables called independent variables by asserting a direct relationship between the dependent and independent variables (Caraka et al., 2020a(Caraka et al., , 2020b(Caraka et al., , 2020c(Caraka et al., , 2020dNugroho et al., 2020). If there are K parameters to be guessed, the model is used as follows (Güner et al., 2012): ...
... Meanwhile, in Bayes regression, the regression parameters are considered random. Thus, it is subject to certain distribution assumptions (Caraka et al., 2020a(Caraka et al., , 2020b(Caraka et al., , 2020c(Caraka et al., , 2020d. The-n observation of the dependent variable and the independent variable is described in equation (7): ...
Article
Full-text available
Purpose Despite the practice of credit card services by Islamic financial institutions (IFIs) is debatable, Islamic banks (IBs) have been offering this product. Both Muslim and non-Muslim customers have subscribed to the products. Thus, it is critical to analyse the strategy of IBs’ moral messages in reminding their Muslim and non-Muslim customers to repay their credit card debts. This paper aims to investigate this issue in Indonesia using data mining via machine learning. Design/methodology/approach This study examines the IBs’ customers across the 32 provinces of Indonesia regarding their moral status in credit card debt repayment. This work considers 6,979 observations of the variables that affect the moral status of the IBs’ customers in repaying their debt. The five types of data mining via machine learning (i.e. Boruta, logistic regression, Bayesian regression, random forest, XGBoost and spatial cluster) are used. Boruta, random forest and XGBoost are used to select the important features to investigate the moral aspects. Bayesian regression is used to get the odds and opportunity for the transition of each variable and spatially formed based on the information from the logistical intercepts. The best method is selected based on the highest accuracy value to deliver the information on the relationship between moral status categories in the selected 32 provinces in Indonesia. Findings A different variable on moral status in each province is found. The XGBoost finds an accuracy value of 93.42%, which the three provincial groups have the same information based on the importance of the variables. The strategy of IBs’ moral messages by sending the verse of al-Qur’an and al-Hadith (traditions or sayings of the Prophet Muhammad PBUH) and simple messages reminders do not impact the customers’ repaying their debts. Both Muslim and non-Muslim groups are primarily found in the non-moral group. Research limitations/implications This study does not consider socio-economic demographics and culture. This limitation calls future works to consider such factors when conducting a similar topic. Practical implications The industry professionals can take benefit from this study to understand the Indonesian customers’ moral status in repaying credit card debt. In addition, future works may advance the recent findings by considering socio-cultural factors to investigate the moral status approach to Islamic credit warnings that is not covered by this study. Social implications This work finds that religious text of credit card repayment reminders sent to Muslims in several provinces of Indonesia does not affect their decision to repay their debts. To some extent, this finding draws a social issue that the local IBs need to consider when implementing the strategy of credit card repayment reminders. Originality/value This study credits a novelty in the discourse of data science for Islamic finance practices. Specifically, this study pioneers an example of using data mining to investigate Islamic-moral incentives in credit card debt repayment.
... The optimization methods used in these models are Laplace (Huber et al., 2004) and Variational approximations (Caraka et al., 2020b). Laplace is a type of multidimensional integral approximation using Eq. 4. ...
... > and verify the following for all δ> 0. The Laplace method, while considering the integral (1) as the integral of b, shows the Gaussian measure for the small variance of order 1 λ . The ( ) h x in the integrand using the Taylor development Caraka et al., 2020b) in the order is presented in Eq. 5. small variance of order 1 λ . The ( ) h x in the integrand using the Taylor development (Her Caraka et al., 2020b) in the order is presented in Eq. 5. ...
... The ( ) h x in the integrand using the Taylor development Caraka et al., 2020b) in the order is presented in Eq. 5. small variance of order 1 λ . The ( ) h x in the integrand using the Taylor development (Her Caraka et al., 2020b) in the order is presented in Eq. 5. ...
Article
Full-text available
BACKGROUND AND OBJECTIVES: The classification of marine animals as protected species makes data and information on them to be very important. Therefore, this led to the need to retrieve and understand the data on the event counts for stranded marine animals based on location emergence, number of individuals, behavior, and threats to their presence. Whales are generally often stranded in very shallow areas with sloping sea floors and sand. Data were collected in this study on the incidence of stranded marine animals in 20 provinces of Indonesia from 2015 to 2019 with the focus on animals such as Balaenopteridae, Delphinidae, Lamnidae, Physeteridae, and Rhincodontidae. METHODS: Multivariate latent generalized linear model was used to compare several distributions to analyze the diversity of event counts. Two optimization models including Laplace and Variational approximations were also applied. FINDINGS: The best theta parameter in the latent multivariate latent generalized linear latent variable model was found in the Akaike Information Criterion, Akaike Information Criterion Corrected and Bayesian Information Criterion values, and the information obtained was used to create a spatial cluster. Moreover, there was a comprehensive discussion on ocean-atmosphere interaction and the reasons the animals were stranded. CONCLUSION:The changes in marine ecosystems due to climate change, pollution, overexploitation, changes in sea use, and the existence of invasive alien species deserve serious attention.
... Specifically, in one area with another the number of cases of people with certain diseases is also different so that it will produce zero data that is difficult to be modeled on parametric statistics because it will cause dispersion 9. The technique that can be used to overcome the count data is to involve Poisson distribution or negative binomials 10 . Generalized Linear models are commonly used to explain the relationship between dependent variables and independent variables by involving the link function 11,12,13 . ...
... Table 2 informs the loading values based on 17 items or indicators on latent infection disease variables. Acute Flaccid paralysis 0.7763 PY 8 Shigellosis 0.8475 PY 9 Amoebiasis 0.9576 PY 10 Dengue Fever 0.3169 PY 11 Malaria 0.9742 PY 12 Measles 0.9348 PY 13 Zika 0.9161 PY 14 Acute Heptitis A 0.8101 PY 15 Hantavirus Hemorrhagic 0.599 PY 16 Multidrug Resistant Tuberculosis 0.8783 PY 17 Chikungunya 0.9459 Table 3 shows that the three highest indicators based on loading values are malaria, amoebiasis and chikungunya. In other words, these three indicators have the closest closeness to the incidence of infectious disease. ...
Conference Paper
Full-text available
Global warming arising from climate change can increase the spread of deadly diseases. Effort is needed to develop a set of policies for the government to stem or reduce health risks from global warming. The purpose of this paper is to examine more detail and comprehensively about the relationship among climate and event disease count in Taiwan using the partial least square latent regression model. The results obtained that of the 17 types of diseases in Taiwan, that has the most significant loading factor is Amoebiasis, Malaria and Chikungunya. At the same time, climate variables that have the biggest most significant factor are Number day with max tem> 30, Number day Temp> 25, and Rainfall PH. Cronbach's Alpha infectious disease 0.9696 and climate 0.2813. At the same time, the value of Dillon Goldstein's rho infectious disease 0.974 and climate 0.6404, respectively.
... Confirmatory factor analysis is one of the multivariate analysis methods that can be used to test or confirm a hypothesized model [10]- [12] The hypothesized model consists of one or more latent variables, which are measured by one or more latent variables [6], [7]. The terminology of the confirmatory factor analysis can be explained in the equation below [13]- [17] ...
Article
Full-text available
Adolescents from families of 3 different SES groups (Low, Medium and High) have scored their Mother and Father using the format of the Bandung Family Relations Test (BFRT). The sample consisted of 349 pupils from primary and secondary schools in Bandung (Indonesia) selected by a stratified cluster design. In order to find out whether the model of the test was invariant across SES groups, a multi group confirmatory factor analysis, by means of Structural Equation Modelling (SEM) has been conducted. We found with regard to the scoring of the relationship with Mother that only one dimension (Affection) had a significant difference across SES in the comparison of Low and High SES groups. With regard to Father there were two Dimensions significantly different: Vulnerability in the comparison of Medium and High SES groups, while the Dimension of Justice gave very significant differences in all 4315 TESTING MEASUREMENT INVARIANCE ACROSS SES GROUPS three pairs. We finish with a description of the items involved in the significant different comparisons and the interpretation of the consequences of the different scoring of these items.
... Since the utilization of the inorganic element was unavoidably excessive especially in the agricultural regard 8 , the infiltration into water bodies is one of the deleterious consequence 9 one should take into account. Bioremediation is, therefore, seen to be the solution-with-wisdom for a fundamental reason that it harnesses living creatures 5 , mainly the microorganisms, to activate necessary metabolic networks in such a way to, eventually, degrade or reduce the impact or the presence of hazardous pollutants in certain environment systems 10,11 . This approach is environment-friendly thus sustainable for contamination cleanse 12 . ...
Article
Full-text available
Copper (Cu) has been excessively used for some valuable commodities and this creates environmental problems. The inorganic element becomes toxic when presents beyond the recommended tolerated concentration. Bacterial-based remediation is seen to be an excellent tool to overcome it as it reduces the copper contamination without yielding any other forms of contamination. There are some pivotal properties in the bacteria render them being considered as bioremediation agents against coppers contamination, namely bioaccumulation and biosorption. In the present study, we question if these bacteria could be clustered into a strong and representative proximity according to their functional properties. Mostly, bacteria are grouped based on their genetic profiles derived from the 16S rRNA sequencing. We propose that our K-Means clustering model can be employed to identify genetically-unlabelled bacteria. But first, a prominent reference should be developed and we are in this phase. We figured out the K-Means clustering model do not pull the same-genus bacteria into the same cluster. Instead, the model gathers into a proximity those isolates with similarity on a functional characteristic termed minimum inhibitory concentration (MIC), regardless their origins and their hierarchy in taxonomy.
... Specifically, in one area with another the number of cases of people with certain diseases is also different so that it will produce zero data that is difficult to be modeled on parametric statistics because it will cause dispersion 9. The technique that can be used to overcome the count data is to involve Poisson distribution or negative binomials 10 . Generalized Linear models are commonly used to explain the relationship between dependent variables and independent variables by involving the link function 11,12,13 . ...
Article
Full-text available
Global warming arising from climate change can increase the spread of deadly diseases. Effort is needed to develop a set of policies for the government to stem or reduce health risks from global warming. The purpose of this paper is to examine more detail and comprehensively about the relationship among climate and event disease count in Taiwan using the partial least square latent regression model. The results obtained that of the 17 types of diseases in Taiwan, that has the most significant loading factor is Amoebiasis, Malaria and Chikungunya. At the same time, climate variables that have the biggest most significant factor are Number day with max temp more than 30, Number day Temp more than 25, and Rainfall PH. Cronbach’s Alpha infectious disease 0.9696 and climate 0.2813. At the same time, the value of Dillon Goldstein’s rho infectious disease 0.974 and climate 0.6404, respectively.
... Specifically, in one area with another the number of cases of people with certain diseases is also different so that it will produce zero data that is difficult to be modeled on parametric statistics because it will cause dispersion 9. The technique that can be used to overcome the count data is to involve Poisson distribution or negative binomials 10 . Generalized Linear models are commonly used to explain the relationship between dependent variables and independent variables by involving the link function 11,12,13 . ...
Conference Paper
Full-text available
Global warming arising from climate change can increase the spread of deadly diseases. Effort is needed to develop a set of policies for the government to stem or reduce health risks from global warming. The purpose of this paper is to examine more detail and comprehensively about the relationship among climate and event disease count in Taiwan using the partial least square latent regression model. The results obtained that of the 17 types of diseases in Taiwan, that has the most significant loading factor is Amoebiasis, Malaria and Chikungunya. At the same time, climate variables that have the biggest most significant factor are Number day with max temp more than 30, Number day Temp more than 25, and Rainfall PH. Cronbach's Alpha infectious disease 0.9696 and climate 0.2813. At the same time, the value of Dillon Goldstein's rho infectious disease 0.974 and climate 0.6404, respectively. Abstract Global warming arising from climate change can increase the spread of deadly diseases. Effort is needed to develop a set of policies for the government to stem or reduce health risks from global warming. The purpose of this paper is to examine more detail and comprehensively about the relationship among climate and event disease count in Taiwan using the partial least square latent regression model. The results obtained that of the 17 types of diseases in Taiwan, that has the most significant loading factor is Amoebiasis, Malaria and Chikungunya. At the same time, climate variables that have the biggest most significant factor are Number day with max temp more than 30, Number day Temp more than 25, and Rainfall PH. Cronbach's Alpha infectious disease 0.9696 and climate 0.2813. At the same time, the value of Dillon Goldstein's rho infectious disease 0.974 and climate 0.6404, respectively.
... In testing the outer model, it examines the relationship between each latent variable to the indicators (Caraka & Sugiarto, 2017). While in the inner model testing, it proves the relationship between latent variables (Caraka et al, 2020). One of the outer model testing is testing loading values to measures validity, as we can see in Table 1 below. ...
Article
The purpose of this paper is to investigate the impact of transformational leadership (TL) and psychological empowerment (PE) on innovative work behavior (IWB) of frontline employees at public sector in North Sumatera. This study examines the effects of PE as a moderator on the relationship between transformational leadership and innovative work behavior (IWB). The data were collected from 786 frontline employees through an online survey. Partial least square structural equation modeling analysis by the bootstrap method were used for the data analysis. Results indicated that TL and PE have positive influence on innovative work behavior, however specifically, the result showed that PE does not moderates on the relationship between TL and IWB of frontline employees in North Sumatera.
... Since the utilization of the inorganic element was unavoidably excessive especially in the agricultural regard 8 , the infiltration into water bodies is one of the deleterious consequence 9 one should take into account. Bioremediation is, therefore, seen to be the solution-with-wisdom for a fundamental reason that it harnesses living creatures 5 , mainly the microorganisms, to activate necessary metabolic networks in such a way to, eventually, degrade or reduce the impact or the presence of hazardous pollutants in certain environment systems 10,11 . This approach is environment-friendly thus sustainable for contamination cleanse 12 . ...
Conference Paper
Full-text available
Copper (Cu) has been excessively used for some valuable commodities and this creates environmental problems. The inorganic element becomes toxic when presents beyond the recommended tolerated concentration. Bacterial-based remediation is seen to be an excellent tool to overcome it as it reduces the copper contamination without yielding any other forms of contamination. There are some pivotal properties in the bacteria render them being considered as bioremediation agents against coppers contamination, namely bioaccumulation and biosorption. In the present study, we question if these bacteria could be clustered into a strong and representative proximity according to their functional properties. Mostly, bacteria are grouped based on their genetic profiles derived from the 16S rRNA sequencing. We propose that our K-Means clustering model can be employed to identify genetically-unlabelled bacteria. But first, a prominent reference should be developed and we are in this phase. We figured out the K-Means clustering model do not pull the same-genus bacteria into the same cluster. Instead, the model gathers into a proximity those isolates with similarity on a functional characteristic termed minimum inhibitory concentration (MIC), regardless their origins and their hierarchy in taxonomy.
Article
Full-text available
As the sharing economy has emerged, the way customer perceives the service is shifting toward a combination of offline and online. The need for the service provider to understand its nature as well as the pertinent aspects regarding its characteristics is crucial. Previous research validated the influence of perceived online and offline service quality toward customer satisfaction and loyalty. However, with the distinctive dimensions of OFA service quality, its effects on customer satisfaction and the role of social innovativeness in satisfaction and loyalty linkage remain unexplored. Hence, this study attempts to investigate these relationships using the data obtained from customers of any OFA in Malaysia. Purposive sampling was employed and 227 collected responses were analyzed using variance-based partial least square path modeling. The results confirm the direct effect of online and offline service quality on customer loyalty and full mediation role of customer satisfaction. Besides, social innovativeness is found negatively moderates customer satisfaction and loyalty relationship. Implications and contributions of the study are also discussed.
Article
Full-text available
Banteng, Bos javanicus, as wild cattle is a vital and importance source of germplasm in Indonesia. Various human activities currently threaten their conservation status. Nonetheless, no long-term monitoring programmes are in place for this species. Using distribution point and statistical analysis based on 46,116 camera trap days from December 2015 to January 2017, we aimed to provide habitat preferences, activity patterns and ecological data for banteng population in Ujung Kulon National Park (UKNP). It is the largest population of banteng in Indonesia and is living in a limited habitat area. According to the best occupancy model, the most suitable areas for this species were the secondary forest located at the center portion of UKNP. The presence of the invasive cluster sugar palm, Arenga obtusifolia, in dry season provides additional alternative food for banteng when its main food is scarcer in the forest. Banteng was cathemeral all year round, with the proportion of cathemeral records and the recording rate did not change with the protection of the level area, moon phase or season. To reduce the probability of encountering predators, banteng avoided the space use of dholes. Selection and avoidance of habitats was stronger than avoidance of the predator activity areas. Habitat competition from domestic cattle which grazed illegally in the national park appears to be a problem to the species since zoonosis appears from domestic cattle to banteng. Therefore, effective law enforcement and an adequate conservation strategy are required to eliminate the impacts of both direct and indirect threats.
Article
Full-text available
Generalized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods. For likelihood based estimation, several closed form approximations for the marginal likelihood of GLLVMs have been proposed, but their efficient implementations have been lacking in the literature. To fill this gap, we show in this paper how to obtain computationally convenient estimation algorithms based on a combination of either the Laplace approximation method or variational approximation method, and automatic optimization techniques implemented in R software. An extensive set of simulation studies is used to assess the performances of different methods, from which it is shown that the variational approximation method used in conjunction with automatic optimization offers a powerful tool for estimation.
Article
Full-text available
Kajian terhadap struktur komuniti tanah di kawasan ladang monokultur masih kurang dan tidak meluas sedangkan komuniti ini memberi sumbangan besar terhadap kestabilan sesuatu ekosistem. Tujuan kajian adalah untuk menentukan perbezaan komposisi makroartropoda yang berasosiasi dengan biojisim sawit mereput di atas tiga jenis tanah yang berbeza berdasarkan kumpulan pemakanan. Kajian ini menggunakan kaedah garis transek dengan lima unit persampelan (kuadrat) pada tiga jenis tanah yang berbeza di ekosistem sawit Ladang Endau Rompin, Pahang iaitu di tanah liat, tanah gambut cetek dan juga tanah gambut dalam. Jumlah keseluruhan unit persampelan adalah 15 unit dan persampelan dilakukan sebanyak tiga kali. Sejumlah 942 individu daripada 14 order makroartropoda dan 38 famili berjaya disampel di Ladang Endau Rompin. Kajian ini memperolehi empat kelas artropoda iaitu Arachnida, Heksapoda, Malacostraca dan Miriapoda. Jenis tanah yang mencatatkan bilangan individu artropoda masing-masing iaitu tanah liat (432 individu; 12 order; 30 famili), tanah gambut cetek (386 individu; 14 order; 30 famili) dan tanah gambut dalam (133 individu; 9 order; 18 famili). Perbezaan yang signifikan (χ2 = 312.285, dk = 74, p <0.05) telah dikenalpasti antara jenis tanah dan famili makroartropoda. Melalui analisis pengelompokan dua hala mengikut pembahagian transek, 11 kelompok komuniti makroartropoda tanah dapat dikenalpasti berdasarkan kumpulan pemakanan. Kumpulan pemakanan omnivor (69%) adalah kumpulan pemakanan yang tertinggi diikuti oleh kumpulan pemakanan pemangsa (17.3%), pebangkai (13.4%) dan akhir sekali ialah fitofagus (0.3%). Kajian ini berjaya mengenalpasti komuniti makroartropoda tanah yang berasosiasi dengan biojisim sawit mereput di atas jenis tanah yang berbeza berdasarkan kumpulan pemakanan dan kelimpahan komuniti tanah berpandukan kedudukan petak tanaman. Kata kunci: ekologi komuniti, invertebrata, pertanian, kelapa sawit
Article
Full-text available
Termites are perceived as decomposers and as pests in an ecosystem. A study on the species composition of termites in different soil types (i.e. clay, sand and peat) in oil palm plantations was conducted between 6 April 2015 and 10 December 2015 in nine selected localities in Johor (Malaysia) and Riau (Indonesia). Sampling of termites was conducted using belt transects of 100 m in length and 4 m in width in the oil palm plantation. A total of three replicates for each soil types were done from the nine transects for each location. A total of 41 species from five subfamilies (i.e. Coptotermitinae, Rhinotermitinae, Termitinae, Macrotermitinae and Nasutitemitinae) and two families (i.e. Rhinotermitidae and Termitidae) of termite species were successfully sampled and recorded. Sand soil (81 colonies: 12 species; four subfamilies; two families) recorded the highest colonies, followed by peat soil (62 colonies; 12 species; five subfamilies; two families), and clay soil (47 colonies consisting; nine species, four subfamilies and two families). There was a significant difference (χ 2 = 618 886 df = 328, p<0.005) between soil types and termite species composition that were found in the oil palm plantation. This study identified that the diversity and abundance of termites differed between soil types in different oil palm plantations.
Conference Paper
Full-text available
Ecology is a branch of biology that studies on the interaction and relationship between organisms and their environment. Abundance, distribution of organisms and patterns of biodiversity are great interests for many ecologists. One of interesting ecosystems to study is a cave. Cave has a typical environment character with a vulnerable ecosystem. Many caves in Indonesia, particularly in Gunungsewu karst area have been developed into tourist objects (show caves) and managed less wisely. Such cave management has the potential to change the environment and leads to ecosystem destruction. Arthropods are the most abundance fauna in cave that play critical roles in maintaining cave ecosystems equilibrium. In the heart of statistical ecology, we need to analyze the differences on Arthropods community and abiotic (climatic-edaphic) parameters among show caves and wild caves. Statistical techniques are needed for the extraction of such information. GLLVM is one method that is able to explain spatial-based information and is particularly suitable for ecology. In this paper, we use negative binomial models to see the differences on spatial patterns of predator and decomposer Arthropods, also characteristic of edaphic and climatic in each cave.
Article
In many real-world applications, data are represented by high-dimensional features. Despite the simplicity, existing K-means subspace clustering algorithms often employ eigenvalue decomposition to generate an approximate solution, which makes the model less efficiency. Besides, their loss functions are either sensitive to outliers or small loss errors. In this paper, we propose a fast adaptive K-means (FAKM) type subspace clustering model, where an adaptive loss function is designed to provide a flexible cluster indicator calculation mechanism, thereby suitable for datasets under different distributions. To find the optimal feature subset, FAKM performs clustering and feature selection simultaneously without the eigenvalue decomposition, therefore efficient for real-world applications. We exploit an efficient alternative optimization algorithm to solve the proposed model, together with theoretical analyses on its convergence and computational complexity. Finally, extensive experiments on several benchmark datasets demonstrate the advantages of FAKM compared to state-of-the-art clustering algorithms.
Article
1.There has been rapid development in tools for multivariate analysis based on fully specified statistical models or “joint models”. One approach attracting a lot of attention is generalized linear latent variable models (GLLVMs). However, software for fitting these models is typically slow and not practical for large datsets. 2.The R package gllvm offers relatively fast methods to fit GLLVMs via maximum likelihood, along with tools for model checking, visualization and inference. 3.The main advantage of the package over other implementations is speed e.g. being two orders of magnitude faster, and capable of handling thousands of response variables. These advances come from using variational approximations to simplify the likelihood expression to be maximised, automatic differentiation software for model‐fitting (via the TMB package), and careful choice of initial values for parameters. 4.Examples are used to illustrate the main features and functionality of the package, such as constrained or unconstrained ordination, including functional traits in “fourth corner” models, and (if the number of environmental coefficients is not large) make inferences about environmental associations.
Article
Factor models, structural equation models (SEMs) and random-effect models share the common feature that they assume latent or unobserved random variables. Factor models and SEMs allow well developed procedures for a rich class of covariance models with many parameters, while random-effect models allow well developed procedures for non-normal models including heavy-tailed distributions for responses and random effects. In this paper, we show how these two developments can be combined to result in an extremely rich class of models, which can be beneficial to both areas. A new fitting procedures for binary factor models and a robust estimation approach for continuous factor models are proposed.
Book
Since their introduction, hierarchical generalized linear models (HGLMs) have proven useful in various fields by allowing random effects in regression models. Interest in the topic has grown, and various practical analytical tools have been developed. This book summarizes developments within the field and, using data examples, illustrates how to analyse various kinds of data using R. It provides a likelihood approach to advanced statistical modelling including generalized linear models with random effects, survival analysis and frailty models, multivariate HGLMs, factor and structural equation models, robust modelling of random effects, models including penalty and variable selection and hypothesis testing. This example-driven book is aimed primarily at researchers and graduate students, who wish to perform data modelling beyond the frequentist framework, and especially for those searching for a bridge between Bayesian and frequentist statistics.
Article
Poisson models are widely used for statistical inference on count data. However, zero-inflation or zero-deflation with either overdispersion or underdispersion could occur. Currently, there is no available model for count data, that allows excessive occurrence of zeros along with underdispersion in non-zero counts, even though there have been reported necessity of such models. Furthermore, given an excessive zero rate, we need a model that allows a larger degree of overdispersion than existing models. In this paper, we use a random-effect model to produce a general statistical model for accommodating such phenomenon occurring in real data analyses.