[SYLWAN., 164(1)]. ISI Indexed 161
Abstract—. In order to account for
correlated count data with excess zeros, we use
a variational approximation multivariate
latent generalized linear model. We performed
two different simulation-based on level species
and genus with Poisson and negative binomial
to subject-specific interpretations. Methods: In
this work, we use variational approximation to
estimate parameter in multivariate latent
generalized linear model. Otherwise,
overdispersed a count outcome exhibiting
many zeros, above the amount expected under-
sampling from a Poisson distribution. Results:
Through simulation studies, species counts
follows negative binomial, and genus counts
follow Poisson distribution and the
performance of this methods evaluate by
Akaike information criterion (AIC), Akaike
information criterion corrected (AICc), and
Bayesian Information Criterion (BIC).
Conclusion: While these two sets of latent class
parameters might be meaningful in certain
species counts and genus counts.
Index Terms—Variational Approximation,
Latent Variables, Niche Modelling, GLLVM,
cology is defined as the study of the
relationship of organisms, or groups of
organisms to their environment. Also, Ecology
Manuscript received October 11, 2019; revised January 15,
2020 This work supported by MOST, Taiwan (ROC).
1 College of Informatics, Chaoyang University of Technology,
Taiwan (ROC), 41349.Corresponding email:
2 Department of Statistics, College of Natural Sciences Seoul
National University, Shin Lim-Dong, Kwan Ak-Ku, South
3 Department of Statistics, College of Natural Sciences,
Pukyong National University, 45, Busan, South Korea.
can be described as the science of the mutual
relations between living organisms and their
environment. Every ecological process that occurs
in nature is represented as a mathematical equation
that will build a model.
Nevertheless, the mathematical equation can be
determined the solution with the help of
computational techniques and numerical methods
and statistical modelling. Ecology has initially been
a general knowledge and only studied
environmental relations individually based on
physiology. At that time, scholars, especially from
the natural sciences, paid less attention to various
sciences that were general, but more people
directed the development of the sciences toward
specialization. Although people's attention to
ecology compared to other sciences, especially
economics and politics, is inadequate, ecology
continues to grow. As proof that ecology can
continue to grow and spread its wings to other
fields such as botany, and zoology.
Ecological research ranges from the
adaptation of organisms to ecosystem dynamics
because there are many levels and types of
interactions between individuals in dealing with
challenges posed by their abiotic environment. In
general, research on species counts will produce
zero data because identified at that location there
are no species, and this is very difficult to analyze
because the assumptions in other methods that data
should not be 0 so the method with the Poisson
distribution will be beneficial in the analysis.
4 Department of Statistics, Padjadjaran University, Indonesia,
5 Bioinformatics and Data Science Research Center, Bina
Nusantara University, Jakarta 11480, Indonesia
6 Computer Science Department, BINUS Graduate Program
Master of Computer Science Bina Nusantara University,
Jakarta, Indonesia, 11480.
7School of Environmental and Natural Resource Science,
Universiti Kebangsaan Malaysia, 43600.
Rezzy Eko Caraka1,2,4,5, Rung Ching Chen1†, Youngjo Lee2, Maengseok Noh3,
Toni Toharudin4, Bens Pardamean5,6, Andi Saputra7
Variational Approximation Multivariate Generalized
Linear Latent Variable Model in Diversity Termites
Riau and Peninsular Malaysia
[SYLWAN., 164(1)]. ISI Indexed 162
Data count describes the number of events
in a certain period and can only be positive because
an event cannot be negative. In modelling with the
data counting will violate the Ordinary Least
Square (OLS) regression assumption, such as the
error follows the normal distribution (normality)
and has constant variance, so the data counting
cannot use OLS regression.
The modelling of count data in its
development led to Generalized Linear Models
(GLMs). GLMs are generalizations of classical
regression models or OLS regression (M Noh et al.
2019); (Kwon et al. 2016), and there are analytical
methods for data that do not meet the assumption
of a normal distribution (De Jong and Heller,
2008). One member of the GLMs family from
the Poisson distribution is Poisson regression. The
assumption that must be fulfilled in Poisson
regression is that the variation value of the Y
response variable must be the same as the average
value (Myers et al. 2012). In Poisson regression
analysis with discrete data, there is usually a
violation of that assumption (Abraham et al. 2007),
where the variance value is smaller than the
average value which is generally called the under
dispersion or the variance value is higher than the
average value called over dispersion. (Consul &
Famoye 1992) Stated that sometimes cases of over
dispersion were found in the data count. Chopped
data usually has a tremendous integer value and
contains a lot of zero values, so the variance is quite
large. If an assumption violation occurs, the
resulting conclusion is invalid because it
underestimates the estimated error standard. How
to overcome over dispersion is to form several
models that are a combination of the Poisson
distribution with several distributions, both discrete
and two continuous (mixed model distribution)
(Kéry 2010). In Poisson distribution combinations,
only a few distributions are often used in research
due to complex calculations.
In ecological and species modelling
(Warton, Foster, et al. 2015), the analysis will be
more complicated when involving latent variables,
i.e. unobserved variables, so it needs to be
developed into Multivariate GLM (Warton,
Blanchet, et al. 2015) which is one type of
statistical analysis used to analyse data with data
used in the form of many predictor variables
(Warton 2015); (Joyner et al. 2019) and many
response variables are very suitable for species
modelling (Caraka et al. 2018); (Rahman et al.
2019); and (Herliansyah & Fitia 2018). The main
challenge in GLLVM modelling is that the
estimation process involves integrals on random
variables that do not have an explicit form unless
the response variable is normally distributed. To
estimate the integral function, a method is needed.
Some methods currently being developed are
Laplace approximations (Bower & Savitsky 2008)
and Variational approximations (Hui et al. 2017).
However, another challenge is due to a high-
dimensional data (X. D. Wang et al. 2019). Number
of respond variables and random variables used by
computational problems which are quite
Latent variable models are powerful
probabilistic tools for extracting useful latent
structure from otherwise unstructured data and
have proved useful in numerous applications.
Especially in ecological modelling (Y. Wang et al.
2012). A particular case of latent variable models,
where observations originate from a linear
transformation of latent variables. Despite their
modelling simplicity, latent linear models are
useful and widely used instruments for data
analysis in practice and include, among others,
such notable examples as probabilistic principal
component analysis and correlation component
analysis, independent component analysis.
Otherwise, it is well known that estimation and
inference are often intractable for many latent
linear models and one has to make use of
approximate methods often with no recovery
The remainder of the paper is organized as
follows. Section II provides an explanation of
Niche-modelling and presents the multivariate
latent glm and variational approximation, Section
III our results and discussion. Finally, conclusions
and future research directions are indicated in
The models distribution of species generally
known envelope-modelling, habitat modelling, and
niche-modelling. The main objective is to estimate
the similarity of conditions in all regions by using
emergence, and predictor data as objects in models.
Generally, The species distribution uses climate
data (Kurniawan, Soesilohadi, et al. 2018) as a
predictor describing the outline of the modelling
process of species distribution (Kurniawan,
Rahmadi, et al. 2018). Also, when modelling
species, the data will tend to follow the Poisson
distribution (Warton 2005). Poisson distribution is
a distribution for events with a small probability of
[SYLWAN., 164(1)]. ISI Indexed 163
occurrence where the occurrence depends on a
specific time interval or in a particular area with
observations in the form of discrete variables. The
characteristics of the experiments that follow the
Poisson distribution are as follows. 1. Events that
occur in large populations with small probabilities
2. Events depend on specific time intervals 3.
Events are included in the counting process. and, 4.
Repetition of events that follow the distribution of
The probability function of the Poisson
distribution can be stated as follows (Ha & Lee
Where is the average of the random variable Y
with the Poisson distribution where the average
value and variance have values greater than zero.
The function used in the Poisson regression model
is ln, so ln i = i . Thus Poisson
regression can be stated as follows.
Poisson regression analysis is a regression analysis
that is part of the Generalized Linear Model
(GLM). Poisson regression is used for data with
response variables that follow the Poisson
distribution (Y ~ Poisson). The important
assumption in this analysis is that the variance must
be equal to the average called equidispersion. But
in some studies, this condition is not met, often
found the count data which has a higher than the
average range is called over dispersion. However,
if the condition found in Poisson regression
analysis with a smaller than average variance, it is
called under dispersion.
According to Hinde and Demetrio (Hinde
& Demétrio 1998), there are several possibilities
for equidispersion not to be fulfilled in a model,
including the diversity of observations (the
difference between individuals as components not
explained by the model), correlations between
individual responses. The consequence of not meet
the equidispersion is that the Poisson regression is
not suitable for modelling the data because the
formed model will produce a biased parameter
estimate. In addition, over dispersion also results in
a smaller standard error value (underestimates)
than it should, resulting in inappropriate
conclusions. Over dispersion can be checked by
use the deviance value. The range of the Poisson
distribution is equal to the average (σ2 = µ). Over
dispersion is detected using the amount of deviance
divided by the degree of freedom, that has a value
greater than 1. At the same time, under dispersion
is detected by the value of deviance divided by the
degree of freedom that has a value of less than 1.
Deviance value can be expressed as an equation:
n=number of observation
= variable response to-i with i=1,2,...,n
= mean of variable response y which is
influenced by the predictor variable value on the i-
In a GLM, the response y follows a distribution
from the exponential family of distributions
(including normal, binomial, Poisson, and
gamma), and its expectation is modelled as
E(y) = µ (4)
There is a link function g(·) connecting µ with Xβ
g(µ) = Xβ (5)
The variance of y is a function of µ. Take a
Poisson distribution, for instance, the variance of
y is equal to the mean µ (Maengseok Noh & Lee
2007) ; (Lee & Noh 2012). This relationship
between the mean and the variance depends
directly on the assumed distribution of y (Lee &
Nelder 2001). Table 1 represents for all GLMs,
the variance of y is the product of a variance
function V (µ) and a dispersion parameter φ (del
Castillo & Lee 2008). With m being the binomial
denominator, we have the following variances and
variance functions, V (µ).
Table 1 Variance
Variance of y
[SYLWAN., 164(1)]. ISI Indexed 164
For the Poisson and binomial distributions
=1, whereas for normal and gamma
is a parameter to be estimated. For the normal
distribution, is simply the residual variance.
Generalized linear latent variable models
(GLLVMs) is extended version of GLM with
latent variables (Rahman et al. 2019) ; (Niku, Hui,
et al. 2019), (Niku, Brooks, et al. 2019); (Niku et
al. 2017). Suppose is the multivariate
responses across species with being
the observational units and being the
number of species. The expectation of is
modeled through the following relationship
with being the linear predictor and is a
link function. Linear components of the predictor
are similar to that of GLM with the inclusion of
random effects as follows:
Where represents the row effect, contains a
matrix of the regression coefficient to
corresponding independent variables,, and is
the loading factors or quantities describing the
interactions across species and connecting the
unobserved variables to responses. In many
papers, the distributional choice of latent
variables, , is a normal distribution with mean
zero and constant variance.
In the heart of parameter estimation, the
data likelihood with a lower bound can be
estimated by variational approximation. VA
generally has known computational viability and
trade the bias. However, if our random vector
followed exponential family then we can write the
distribution joint as follows:
Where is latent variable X and observed
variables Y,<x,y> and have inner product between
x and y, s a vector of parameters
and can be defined
as the realised value of the variables.
However is the log
partition function and ensures distribution is
normalised. The parameter will be estimated
using the method of moments is a generalization
of the Gaussian-Poisson model
It is clear that the under-determination of the
estimating equations (2) is a direct result of
reducing the dimensionality of X via the clustering
function k. By projecting the data
for each cluster onto a single
dimension of an auxiliary data object
the latent process could be fitted to
the auxiliary data. Without encountering this
Where are some functions and
Are analogous to
In terms of both interpretability and computational
convenience, restricting to be a linear function
of its arguments can be easily justiﬁed, and this
approach is taken here. Several choices of function
are available in this regard, including cluster
Then we can rewrite to random effects:
And a representative dimension projection
[SYLWAN., 164(1)]. ISI Indexed 165
One dimension in each random effect is chosen to
represent the dimension whose
average data over replicates
closest in norm to the average data over replicates
Also, if more than one dimension in
Minimizes the norm then one of them can be
chosen arbitrarily. Once a representative dimension
for each cluster has been chosen,
sample moments from the data can be used to
approximate the expected values and estimate the
parameters of the latent process.
III. SPECIES COUNTS
Indonesia is a tropical country that is rich in plant
diversity which strongly supports termites. About
80% of Indonesia's land area is a suitable habitat
for development. Termites belong to the order
Blattodea family Termitidae consisting of 2000
species that spread in the world. Termite species
diversity on the island of Sumatra and Peninsular
Malaysia have not been fully inventoried. In the
1990s, an inventory was carried out by several
researchers. However, (Nandika et al. 2003)
examined the types of termites and their spread in
the DKI Jakarta and Bandung regions. The study
found nine species of termites, namely
Microtermes insperatus, M. incertoides,
Macrotermes gilvus, Odontotermes javanicus, O.
malaccensis, Schedorhinotermes javanicus,
Coptotermes curvignathus, C. haviliandi, C.
kalshoveni, C. heimi, and C. travians.
Termites also attack the nursery phase of the
cocoa, so it is hazardous if attacked in that phase.
(Keng 2006) Has researched about termites in
cocoa plantations compared to primary forests in
Bukit Tawau Park, Sabah. Based on the results of
these studies obtained data that termites also attack
cocoa plantations even though the level of diversity
is lower when compared to primary forests.
Termite abundance is more abundant in cocoa
plantations so that the attacks become severe.
Plentiful food sources cause this. The identification
of termites is more accessible from the family level
to the species level. Termites in tropical regions
such as Indonesia have been widely studied, and
there are several families, namely Kalotermitdae,
Rhinotermitidae, and Termitidae. The
Kalotermitidae family is a group of termites that
attack and nest in living trees or nest in dry wood
that is not related to the soil.
This group of termites is commonly called dry
wood termites. 2 Species of dry wood termites that
usually attack settlements include Cryptotermes
cynocephalus. These termite species will attack
settlements whose buildings are made of wood
(Bong et al. 2012). The second family is
Rhinotermitidae. The distinctive feature of this
family is the presence of sclerite pieces on the flat
thorax. This family often nests in wood or other
materials that contain cellulose found on the
surface of the soil. Types of termites from this
family that often attack settlements and plantations
are of the genus Schedorhinotermes and
Coptotermes. Coptotermes genus has a
characteristic that is when the colony feels
disturbed, and the soldiers will release liquid like
milk whose function is to paralyze the enemy
(Saputra et al. 2017). This fluid comes out of the
fontanel in front of the head. The third family is
Termitidae, which is the group of termites with the
most species. Characteristics of the family
Termitidae are the presence of sclerite pieces on the
pronotum shaped like a saddle; the center of the
nest is in the ground and makes a mushroom-
shaped gallery of sponges, and usually makes
mounds of land. Examples of genera that are
known to attack plantations are Macrotermes,
Odontotermes, and Microtermes. Termites and ants
are a group of social insects. Both have almost the
People often mention that termites are white
ants, even though termites and ants do not have a
kinship. Based on its taxonomy, termites are an
order of Blattodea (Inward et al. 2007) while ants
are a group of the order Hymenoptera. In addition,
in terms of ant morphology, there is a concern
between the piston and the abdomen which is a
characteristic of the wasp group (Hymenoptera:
Apocrita). The hardening is called petiole whereas
in termites there is no hardening, and often there is
no clear line between the piston and the abdomen.
Generally termites feed in closed areas (cryptic)
characterized by wandering tubes that come from
the ground to form channels that connect one
wandering tube to another, whereas ants tend to
forage in open areas. Termite food in the form of
lignocellulose consisting of cellulose, lignin, and
Cellulose is composed of glucose polymers that
are rich in fibre while ants eat organic material that
contains sugar which is composed of
polysaccharides. Social insects have different tasks
[SYLWAN., 164(1)]. ISI Indexed 166
in a colony or commonly referred to as caste.
Termites have a caste distribution system, namely
reproductive caste, soldiers, and workers.
Reproductive caste has the duty to mate and lays
eggs consisting of king and queen as primarily
Reproductive caste is divided into two, namely
primary and neotene (secondary). Primary
reproductive caste originates from winged termites
(alate) or larons, whereas neotent reproduction
arises when the queen or king dies or disappears in
the colony so that this neotent reproductive caste
will appear (Nandika et al. 2003). The soldier caste
has the characteristic of not marrying, eyes are
reduced, and its job is only to defend or protect the
colony in the event of an attack from the enemy.
This caste is characterized by the development of a
forward-looking mandible which is usually used to
attack enemies who attack its colony. The termite
group Nasutitermitinae is a group of termites that
is unique compared to other termite groups. The
warrior caste has 3 mandibles that are not well
developed, but the fontanel (forehead) of a more
developed head. Workers can digest wood that
contains a lot of cellulose. The help of microbes in
the termite's body makes it easier in the digestive
process. Microbes that aid in the digestion process
releases cellulolytic enzymes to facilitate the
degradation of cellulose from wood.
According to (Klepzig et al. 2009) the types of
relationships between insects and symbiotic
microorganisms are comprehensive. This
relationship has many variations such as
mutualism, commensalism, or parasitism. Termites
are one example that has a symbiotic mutualism
between termites and their symbiotic organisms.
Termites have a specific relationship to their
symbiotic microorganisms such as bacteria, fungi,
and protozoa. This relationship has benefits both
directly and indirectly. The symbiotic relationship
in these termites can be helpful in food digestion,
nutrient absorption, and in protecting against
According to (Sarkar 1998), symbiosis is a
shared life in different organisms. Termites consist
of a diverse collection of species, broadly divided
into two, namely low-level termites and high-level
termites. Low-level termites are symbiotic with a
large proportion of prokaryotic and protist
populations (single-celled eukaryotes). High-level
termites only consist of the family Termitidae, but
the species are more than three-quarters of all
species and are symbiotic with most groups of
bacteria. The association of cellulolytic protists in
low-termite digestion is known as an example of
mutual symbiosis. Protists produce acetate from
cellulose particles or wood endocytosis; the result
of the acetate is absorbed by termites as an energy
and carbon source.
This research was conducted in Indonesia and
Malaysia This study was conducted in 11 sampling
locations, which is the entire sampling location in
oil palm plantations located in Riau (Indonesia),
Johor and Pahang (Peninsular Malaysia). Six
different sampling locations in Riau were
conducted while in Johor and Pahang; five different
areas were sampled, each sampling location
representing different types of land and farm
Fig. 1 Sampling locations range from the Riau palm oil ecosystem (Indonesia) to Johor-Pahang,
(Peninsular Malaysia) (Saputra et al. 2016).
[SYLWAN., 164(1)]. ISI Indexed 167
Table 2 Location of Termite Sampling in Indonesia to Peninsular Malaysia
and Land Type
Private & Clay
Indonesia Sako Pangean
Indonesia Redang Seko
Private & Clay
Indonesia Sungai Pagar
Private & Peat
Private & Peat
Company & Peat
Malaysia Bukit Pasir
Private & Clay
Malaysia Felda Kahang
Company & Clay
Malaysia Felda Nitar
Company & Clay
Malaysia Ladang Endau
Company & Clay
Malaysia Ladang Sungai
3.1 Counts Based on Species
In this simulation using not open dataset that has
been performed by previous research on Termites in
Riau and Peninsular Malaysia (Saputra et al. 2016);
(Nur-Atiqah et al. 2017) ; (Halim et al. 2018) ;
(Saputra et al. 2018) ; (Zaki et al. 2019). In the first
stage, the modeling will be carried out by
considering only the groups of species. As seen in
Figure 2 below. Coptotermitinae was not found in
(IdSgPK, MyBPM, MyFKT, MyFNT) but was
found in (IdFRGB, IdCPSK), Rhinotermitinae was
not found in (IdBCB, MyBPM) and was found in
(IdCPSK). Macrotermitinae not found in (IdFRGB,
IdCPSK, and MyLER), Nasutitermitinae not found
in (IdRS, IdSgPK, and MyBPM). Overall different
from Termitinae Found in all research sites (Saputra
et al. 2016). A total of 522 termites been
successfully sampled at oilfields from Belilas (Riau
Province, Indonesia) to Endau (Johor-Pahang,
Peninsular Malaysia) Out of the total number of
individuals, this study recorded five subfamilies of
two The termites are the subfamilies
Coptotermitinae and Rhinotermitinae of the family
Rhinotermitidae and the subfamilies Termitinae,
Macrotermitinae and Nasutitermitinae from the
The highest abundance of termites was
recorded from the family Termitidae (43 species;
349 populations) from three subfamilies namely
Termitinae (17 species; 165 populations),
Macrotermitinae (12 species; 65 populations) and
Nasutitermitinae (14 species; 119 populations).
While for Rhinotermitidae 15 species (173
populations) have been successfully recorded from
two subfamilies, first, Coptotermitinae (5 species;
60 populations) and Rhinotermitinae (10 species;
Based on the number of species, MyLSK recorded
the highest number of species and 26 species were
recorded, followed by IdCPSK (21 species),
MyFKT and MyLER (15 species), MyFNT (12
species), IdSPTK (11 species), MyBPM with (8
species), then IdSgPK and IdFRGB (7 species),
IdRS (5 species) and the smallest region of IdBCB
of only three species were found.
[SYLWAN., 164(1)]. ISI Indexed 168
Fig. 2 Species Counts in Riau and Peninsular Malaysia
In species richness and diversity compared to
temperature and humidity acquired abundance
tends to increase with increasing humidity. It can be
seen that the pattern of abundance decreases
immediately after the temperature factor and the
humidity decreases in figure 3. However, the
temperature does not show a significant increase or
decrease in temperature at the sampling location.
Different results were found in the pattern of
temperature and humidity, whereas the temperature
increased, the humidity decreased, but this did not
affect the abundance of termites. The richness of the
termite species is found to show similarities in
The structure of the termite community may
be influenced by environmental factors found in the
pattern of temperature and humidity as temperatures
increase, humidity decreases but this does not affect
the abundance of termites and species richness.
However, temperature, humidity is the primary
physical factors affecting the termite pattern. These
different results can be achieved because data
collection of environmental factors needs to be done
more thoroughly and carefully so as not to
misinterpret the existing ones.
Fig. 3. Heatmap Diversity, Richness and Climatic
Then, we wil perform GLLVM to see the distribution
at the species level. Based on modelling with regard
to AIC. A good model is the one that has the
minimum AIC among all other models (Joyner et al.
2019). So that the model chosen with LV 1 is chosen.
Since the best distribution is negative binomial, we
can write in equation (6) Let Follows the
negative binomial distribution with mean μ and
variance. By using the log link function.
Table 3. Selected distribution based on species counts
[SYLWAN., 164(1)]. ISI Indexed 169
We have the same relationship between μ and like
the Poisson model. The conditional distribution on
is given by (Caraka et al. 2018).
Function of log-likelihood for negative binomial
response can be written:
We get the values from the intercept in table 3 and
visualise the species ordination in Figure 4.
Table 4. Parameter GLLVM Based On Species Counts
Fig. 4 Ordination Species Based on Negative Binomial GLLVM
Based on Figure 5 explained that in each species co
unts there is a perfect relationship (negative 1) Rhi
notermitinae against Termitinae and Macrotermitin
ae and perfectly (positive 1) like Termitinae agains
[SYLWAN., 164(1)]. ISI Indexed 170
Fig. 5 Species Correlation
3.2 Counts Based on Genus
After getting information on species counts, then
we are interested to run this model until the genus
level. However, we do the same modelling to
compare binomial negatives and Poisson. Tabel 4
represents that Poisson provides high accuracy than
other models with AIC 1080.345. Besides, after
modelling using Poisson LV 2 we can get the
parameter based on genus in Table 5. The value of
intercept visualized in Figure 6 as ordinary genus.
Table 4.Selected Distribution Based on Genus Counts
Figure 6, Figure 7, and Figure 8, respectively. We
can form a variational bound of the likelihood
on the . Then, the variational
approximation can be relies on maximizing the lower
bound over a tractable of :
Then, term by term inequality in equation (25) can
In our variational approximation, we can set and
choose of product distribution of q-dimensional
[SYLWAN., 164(1)]. ISI Indexed 171
multivariate cases with diagonal covariance
In the Poisson-case, the variational expectation of the
non-linear part involving b – the matrix of
conditional expectations A– is equal to and can be
The choice of a good starting value is crucial in
iterative procedures as it helps the algorithm start in
the attractor field of a good local maximum and can
substantially speed-up convergence. Here we
initialize by fitting a GLLVM-Poisson to Y,
then extracting the regression coefficients and
the variance-covariance matrix of the
Pearson residuals. We set and
the best rank q-
approximation of, as given by keeping the
first q-dimensions of a singular value decomposition
of. We set the other starting values as
. In general, the idBCB region has a
significant difference compared to the other areas, as
does IdFRGB, MyLER. However, MyFNT, IDRs,
IdRS have the same kinship for the diversity value of
Fig. 6 Ordination Genus Based on Poisson- GLLVM
Fig. 7 Correlation Based on Genus
Fig. 8 Q Graph Based on Genus
[SYLWAN., 164(1)]. ISI Indexed 172
Table 5. Paramater Based on Genus GLLVM
Parrhinotermes spp. A
Procapritermes spp. G
For Termitinae, species that can be found are
Prohamitermes mirabilis, Microcerotermes
dubius, M. havilandi, Termes rostratus,
Procapritermes sp. G., Pericapritermes
buiteinzorgi, P. mohri. The species are distributed
as follows; P. mirabilis species are found in
IdSgPK. The species M. dubius is found at
IdCPSK. M. havilandi species are found IdFRGB.
T. rostratus species found in IdRS, IdBCB,
IdSgPK, IdCPSK Then the species
Procapritermes sp. G is found on IdSgPK. P.
buiteinzorgi species found IdSgPK. Also, P.
mohri species are found in IdSPTK and IdSgPK.
Each diversity based on the genus can be seen in
Figure 9 that the distribution of densities is many
0 which means that in some species there is no
equal relationship with the number at a particular
Fig. 9 Heatmap Based on Genus
Species from the subfamily Termitinae are found
to have a wide variety in each location which
clearly visualize in figure 10. The species of P.
mirabilis, M. dubius, M. serrula is an arboreal
[SYLWAN., 164(1)]. ISI Indexed 173
species of wood eater. While species of T.
rostratus and P. sp. G is a species of wood-eater
or middle-class eater. The nesting method for
species of T. rostratus is nestled by inquilines
which means that this species builds up hostage
on other species and for species of P. sp. G. builds
a hive hypogeal. Species of P. buitenzorgi and P.
mohri are termites of organic soil and are
Fig. 10 Heatmap Based on Family in Each Location
IV.CONCLUSION AND FUTURE WORK
In this study, we succeeded in simulating the
diversity of species of termites by applying and
perform multivariate latent generalized linear
models or shortly GLLVM. Then to get the best
parameters on our GLLVM, we employ variational
approximation by evaluating based on AIC, AICc,
and BIC. In simulations obtained at the species
level by a negative binomial with level 1, obtained
AIC: 373.9463, AICc: 277.9463, and BIC:
379.9147. Unlike the genus level modelling, the
best distribution is Poisson with AIC: 1080.345,
AICc: 889.3317, and BIC: 1113.371. In general,
GLLVM is well able to present the diversity and
richness of Termites in Riau and Peninsular
Malaysia. For further research, we will compare the
variational approximation technique with Laplace
approximation to see the difference in
computational time. Then the distribution will be
tried using zero-inflated Poisson (Loeys et al.
2012), zero-inflated negative-binomial (Hall
2000), beta-binomial (Kim & Lee 2019), Tweedie
(Shono 2008), extended Tweedie (Bonat et al.
2018), hurdle (Zeileis et al. 2008), and extended
hurdle negative binomial (Lee et al. 2017) ;
(Maengseok Noh & Lee 2019).
Acknowledgement. This paper is supported by
the Ministry of Science and Technology, Taiwan,
under Grant MOST-107-2221-E-324-018-MY2
and MOST-106-2218-E-324-002 and under
collaboration with Lab Hierarchical Generalized
Linear Model (H-GLM), Department of
Statistics, College of Natural Sciences Seoul
National University and Department of Statistics,
Padjadjaran University. This research partially
supported by Bioinformatics Data Science
Research Center Bina Nusantara University.
Conceptualization: Rezzy Eko Caraka, Youngjo
Lee, Rung Ching Chen, Maengseok Noh.
Data curation: Rezzy Eko Caraka, Andi Saputra.
Formal analysis: Rezzy Eko Caraka.
Investigation: Rezzy Eko Caraka, Youngjo Lee,
Rung Ching Chen, Maengseok Noh.
Methodology: Rezzy Eko Caraka, Youngjo Lee,
Rung Ching Chen, Maengseok Noh.
Software: Rezzy Eko Caraka.
Validation: Rezzy Eko Caraka, Youngjo Lee,
Rung Ching Chen, Maengseok Noh.
Visualization: Rezzy Eko Caraka
Writing – original draft: Rezzy Eko Caraka,
Youngjo Lee, Rung Ching Chen, Maengseok
[SYLWAN., 164(1)]. ISI Indexed 174
Writing – review & editing: Rezzy Eko Caraka,
Youngjo Lee, Rung Ching Chen, Maengseok
Noh, Toni Toharudin, Bens Pardamean, Andi
Abraham, V. M., Walpole, R. E. & Myers, R. H.
2007. Probability and Statistics for
Engineers and Scientists. The Mathematical
Bonat, W. H., Jørgensen, B., Kokonendji, C. C.,
Hinde, J. & Demétrio, C. G. B. 2018.
Extended Poisson–Tweedie: Properties and
regression models for count data. Statistical
Bong, M. C. F., King, P. J. H., Ong, K. H. &
Mahadi, N. M. 2012. Termites assemblages
in oil palm plantation in Sarawak, Malaysia.
Journal of Entomology.
Bower, B. & Savitsky, T. 2008. Laplace
Approximation. Graphical Models.
Caraka, R. E., Shohaimi, S., Kurniawan, I. D.,
Herliansyah, R., Budiarto, A., Sari, S. P. &
Pardamean, B. 2018. Ecological Show Cave
and Wild Cave: Negative Binomial Gllvm’s
Arthropod Community Modelling.
Procedia Computer Science 135: 377–384.
Consul, P. C. & Famoye, F. 1992. Generalized
poisson regression model. Communications
in Statistics - Theory and Methods.
del Castillo, J. & Lee, Y. 2008. GLM-methods for
volatility models. Statistical Modelling
Ha, I. Do & Lee, Y. 2003. Estimating Frailty
Models via Poisson Hierarchical
Generalized Linear Models. Journal of
Computational and Graphical Statistics.
Halim, M., Nasir, D. M., Saputra, A., Ayob, Z. A.,
Ahmad, S. Z. S., Din, A. M. M.,
Khairuddin, W. N. W. M., et al. 2018.
Komuniti makroartropoda yang berasosiasi
dengan ekosistem sawit di atas jenis tanah
yang berbeza. Serangga 22(3): 38–55.
Hall, D. B. 2000. Zero-inflated poisson and
binomial regression with random effects: A
case study. Biometrics. doi:10.1111/j.0006-
Herliansyah, R. & Fitia, I. 2018. Latent variable
models for multi-species counts modeling
in ecology. Biodiversitas Journal of
Biological Diversity 19(5): 1871–1876.
Hinde, J. & Demétrio, C. G. B. 1998.
Overdispersion: Models and estimation.
Computational Statistics and Data
Hui, F. K. C., Warton, D. I., Ormerod, J. T.,
Haapaniemi, V. & Taskinen, S. 2017.
Variational Approximations for
Generalized Linear Latent Variable
Models. Journal of Computational and
Inward, D., Beccaloni, G. & Eggleton, P. 2007.
Death of an order: A comprehensive
molecular phylogenetic study confirms that
termites are eusocial cockroaches. Biology
Joyner, C., McMahan, C., Baurley, J. &
Pardamean, B. 2019. A twophase Bayesian
methodology for the analysis of binary
phenotypes in genomewide association
studies. Biometrical Journal 1–11.
Keng, W. . 2006. Spesies comparison of termite
(Isoptera) in primary forest of Tawau Hill
Park, Sabah and adjacent cocoa plantation
area. University Malaysia Sabah.
Kéry, M. 2010. Poisson Mixed-Effects Model
(Poisson GLMM). Introduction to
WinBUGS for Ecologists, hlm. 203–209.
Kim, G. & Lee, Y. 2019. Marginal versus
conditional beta-binomial regression
models. Statistical Methods in Medical
Klepzig, K. D., Adams, A. S., Handelsman, J. &
Raffa, K. F. 2009. Symbioses: A Key Driver
of Insect Physiological Processes,
Ecological Interactions, Evolutionary
Diversification, and Impacts on Humans.
Kurniawan, I. D., Rahmadi, C., Caraka, R. E. &
Ardi, T. A. 2018. Short Communication:
Cave-dwelling Arthropod community of
Semedi Show Cave in Gunungsewu Karst
Area, Pacitan, East Java, Indonesia.
Biodiversitas 19(3): 857–866.
Kurniawan, I. D., Soesilohadi, R. C. H., Rahmadi,
C., Caraka, R. E. & Pardamean, B. 2018.
The difference on Arthropod communities’
structure within show caves and wild caves
in Gunungsewu Karst area, Indonesia.
Ecology, Environment and Conservation
[SYLWAN., 164(1)]. ISI Indexed 175
Kwon, S., Oh, S. & Lee, Y. 2016. The use of
random-effect models for high-dimensional
variable selection problems. Computational
Statistics and Data Analysis 103(1): 401–
Lee, Y. & Nelder, J. 2001. Modelling and
analysing correlated non-normal data.
Statistical Modeling 1(1): 3–16.
Lee, Y. & Noh, M. 2012. Modelling random
effect variance with double hierarchical
generalized linear models. Statistical
Modelling 12(6): 487–502.
Lee, Y., Rönnegård, L. & Noh, M. 2017. Data
analysis using hierarchical generalized
linear models with R. Data Analysis Using
Hierarchical Generalized Linear Models
with R. doi:10.1201/9781315211060
Loeys, T., Moerkerke, B., de Smet, O. & Buysse,
A. 2012. The analysis of zero-inflated count
data: Beyond zero-inflated Poisson
regression. British Journal of Mathematical
and Statistical Psychology.
Myers, R. H., Montgomery, D. C., Vining, G. G.
& Robinson, T. J. 2012. Generalized Linear
Models: With Applications in Engineering
and the Sciences: Second Edition.
Generalized Linear Models: With
Applications in Engineering and the
Sciences: Second Edition.
Nandika, D., Rismayadi, Y., Diba, F. & Harun, J.
. 2003. Rayap: Biologi dan
Muhammadiyah University Press.
Niku, J., Brooks, W., Herliansyah, R., Hui, F. K.
C., Taskinen, S. & Warton, D. I. 2019.
Efficient estimation of generalized linear
latent variable models. PLoS ONE 14(5): 1–
Niku, J., Hui, F. K. C., Taskinen, S. & Warton, D.
I. 2019. gllvm: Fast analysis of multivariate
abundance data with generalized linear
latent variable models in r. Methods in
Ecology and Evolution 1–10.
Niku, J., Warton, D. I., Hui, F. K. C. & Taskinen,
S. 2017. Generalized Linear Latent Variable
Models for Multivariate Count and Biomass
Data in Ecology. Journal of Agricultural,
Biological, and Environmental Statistics.
Noh, M, Lee, Y., Oud, J. H. . & Toharudin. 2019.
Hierarchical likelihood approach to non-
Gaussian factor analysis. Journal of
Statistical Computation and Simulation
Noh, Maengseok & Lee, Y. 2007. Robust
modeling for inference from generalized
linear model classes. Journal of the
American Statistical Association 102(479):
Noh, Maengseok & Lee, Y. 2019. Extended
negative binomial hurdle models. Statistical
Methods in Medical Research.
Nur-Atiqah, J., Saputra, A., Mohammad Esa, M.
F., Shafuraa, O., Billy, A. N. A., Mohd
Yaziz, N. A. A. & Faszly, R. 2017.
Coptotermes sp. (rhinotermitidae:
Coptotermitinae) infestation pattern shifts
through time in oil palm agroecosystem.
Serangga 22(2): 15–31.
Rahman, D. A., Herliansyah, R., Rianti, P.,
Rahmat, U. M., Firdaus, A. Y. &
Syamsudin, M. 2019. Ecology and
Conservation of the Endangered Banteng
(Bos javanicus) in Indonesia Tropical
Lowland Forest. HAYATI Journal of
Biosciences, 26(2), 68. 26(2): 68–80.
Saputra, A., Halim, M., Jalaludin, N.-A., Hazmi,
I. R. & Faszly Rahim. 2017. Effects of Day
Time Sampling on The Activities of
Termites in Oil Palm Plantation at
Malaysia-Indonesia. Serangga 22(1): 23–
Saputra, A., Jalaludin, N. A., Hazmi, I. R. &
Rahim, F. 2016. Termite assemblages from
oil palm agroecosystems across Riau
Province, Sumatra, Indonesia. AIP
Saputra, A., Muhammad Nasir, D., Jalaludin, N.
A., Halim, M., Bakri, A., Mohammad Esa,
M. F., Riza Hazmi, I., et al. 2018.
Composition of termites in three different
soil types across oil palm agroecosystem
regions in Riau (Indonesia) and Johor
(Peninsular Malaysia). Journal of Oil Palm
Sarkar, S. 1998. Evolution by association: A
history of symbiosis. Studies in History and
Philosophy of Science Part C: Studies in
History and Philosophy of Biological and
Biomedical Sciences. doi:10.1016/s1369-
Shono, H. 2008. Application of the Tweedie
distribution to zero-catch data in CPUE
analysis. Fisheries Research.
[SYLWAN., 164(1)]. ISI Indexed 176
Wang, X. D., Chen, R. C., Yan, F., Zeng, Z. Q. &
Hong, C. Q. 2019. Fast Adaptive K-Means
Subspace Clustering for High-Dimensional
Data. IEEE Access 7: 42639–42651.
Wang, Y., Naumann, U., Wright, S. T. & Warton,
D. I. 2012. Mvabund- an R package for
model-based analysis of multivariate
abundance data. Methods in Ecology and
Warton, D. I. 2005. Many zeros does not mean
zero inflation: Comparing the goodness-of-
fit of parametric models to multivariate
abundance data. Environmetrics.
Warton, D. I. 2015. New opportunities at the
interface between ecology and statistics.
Methods in Ecology and Evolution.
Warton, D. I., Blanchet, F. G., O’Hara, R. B.,
Ovaskainen, O., Taskinen, S., Walker, S. C.
& Hui, F. K. C. 2015. So Many Variables:
Joint Modeling in Community Ecology.
Trends in Ecology and Evolution.
Warton, D. I., Foster, S. D., De’ath, G., Stoklosa,
J. & Dunstan, P. K. 2015. Model-based
thinking for community ecology. Plant
Zaki, N. I. A., Nasir, D. M., Aziz, A., Azhari, L.
H., Saputra, A., Halim, M., Muslim, S. A.,
et al. 2019. Diversity of ground beetles
(Coleoptera: Carabidae) in oil palm
plantation in endau-rompin, Pahang,
Malaysia. Serangga 24(1): 91–102.
Zeileis, A., Kleiber, C. & Jackman, S. 2008.
Hurdle regression models in R. Journal of
REZZY EKO CARAKA
received The B.S. degree
(S.Si) from Department of
University and Master of
Science by research (MSc-
Res) School of Mathematical
Sciences the National
University of Malaysia.
Moreover, in 2019 he starts PhD in College of
Informatics, Chaoyang University of Technology,
Taiwan. He acts as a researcher in Bioinformatics
& Data Science Research Center University of
Bina Nusantara (BINUS) and Department of
Statistics, Padjadjaran University. He also fellow
researcher in lab GLM-H Department of
Statistics, Seoul National University, South
Korea. At the same time, He was co-founder
Statistical Calculator (STATCAL). His research
interests include Statistical Climatology, Climate
Modeling, Ecological Modelling, Statistical
Machine Learning, and Large-scale Optimization.
RUNG CHING CHEN
received a B.S. from the
Department of Electrical
Engineering in 1987, and
an M. S. from the Institute
of Computer Engineering
in 1990, both from
University of Science and
Taiwan. In 1998, he received his Ph.D. from the
Department of Applied Mathematics in computer
science, National Chung Hsing University. He is
now a distinguished professor in the Department
of Information Management, Taichung, Taiwan.
His research interests include network
technology, pattern recognition, and knowledge
engineering, IoT and data analysis, and
applications of Artificial Intelligence.
YOUNGJO LEE is full
Profesor in Department of
Statistics, College of Natural
Sciences, Seoul National
University (SNU). He already
published 4 distinct books in
Generalized Linear Models
with Random Effects and
GLM Likelihood. His Research Interest
Generalized Linear Model, Hierarchical
Generalized Linear Model, Random effects, Data
Science, Statistical Software Development.
received the B.S., M.S. and
Ph.D. degrees from the
Department of Statistics,
Seoul National University,
in 1996, 1998 and 2005,
respectively. His thesis was
on analysis of binary data and robust modelling
via hierarchical likelihood. Since 2006, he has
been a Professor with Department of Statistics,
Pukyong National Univeristy, Busan, Korea. His
current research interests are application and
software developments for hierarchical
generalized linear models, development of
methodology for zero-inflated Poisson model
[SYLWAN., 164(1)]. ISI Indexed 177
with spatial correlation and hierarchical approach
non-Gaussian factor analysis
currently works at the
Department of Statistics,
Toni researches in Statistics.
He received the Master of
Science University of
Leuven Belgium (2004-
2005) and Ph.D. Spatial Sciences University of
Groningen (2007–2010). Moreover, Toni act as
head of the research group in time series and
BENS PARDAMEAN has
over thirty years of global
experience in information
and education. After
successfully leading the
Interest Group, He currently
holds a dual appointment as the Director of
Bioinformatics & Data Science Research Center
(BDSRC) and as an Associate Professor of
Computer Science at the University of Bina
Nusantara (BINUS) in Jakarta, Indonesia. He
earned a doctoral degree in informative research
from the University of Southern California
(USC), as well as a master’s degree in computer
education and a bachelor’s degree in computer
science from California State University, Los
Angeles. Andi Saputra received The
B. S. degree (S.Si) from
Department of Biology, Riau
University and Master of
Science by Research (M. Sc)
in Zoology School of
Environmental and Natural
Resource Science, Universiti
Kebangsaan Malaysia. His research interest in
Zoology, Entomology and Ecology.