Available via license: CC BY 3.0
Content may be subject to copyright.
AJS
Austrian Journal of Statistics
January 2021, Volume 50, 1–15.
http://www.ajs.or.at/
doi:10.17713/ajs.v50i2.1069
Impact of Covariates in Compositional Models and
Simplicial Derivatives
Joanna Morais
Avisia, Bordeaux, France
Christine Thomas-Agnan
Toulouse School of Economics
Abstract
In the framework of Compositional Data Analysis, vectors carrying relative informa-
tion, also called compositional vectors, can appear in regression models either as dependent
or as explanatory variables. In some situations, they can be on both sides of the regression
equation. Measuring the marginal impacts of covariates in these types of models is not
straightforward since a change in one component of a closed composition automatically
affects the rest of the composition.
Previous work by the authors has shown how to measure, compute and interpret these
marginal impacts in the case of linear regression models with compositions on both sides
of the equation. The resulting natural interpretation is in terms of an elasticity, a quantity
commonly used in econometrics and marketing applications. They also demonstrate the
link between these elasticities and simplicial derivatives.
The aim of this contribution is to extend these results to other situations, namely
when the compositional vector is on a single side of the regression equation. In these
cases, the marginal impact is related to a semi-elasticity and also linked to some simpli-
cial derivative. Moreover we consider the possibility that a total variable is used as an
explanatory variable, with several possible interpretations of this total and we derive the
elasticity formulas in that case.
Keywords: compositional regression model, marginal effects, simplicial derivative, elasticity,
semi-elasticity.
1. Introduction and literature review
We consider regression models involving compositional vectors, i.e. vectors carrying relative
information. When relative information is the focus, meaningful functions are functions of
ratios of the vector’s components therefore using traditional regression models in such cases
is not correct. Regression models that respect the compositional nature of such data have
been proposed in the literature, for example those introduced by Aitchison (1986) based on
log-ratio transformations. Theory for inference in these models is developed for example in
Pawlowsky-Glahn and Buccianti (2011), Van Den Boogaart and Tolosana-Delgado (2013),
Pawlowsky-Glahn, Egozcue, and Tolosana-Delgado (2015b), and Filzmoser, Hron, and Templ
(2018).
When the compositional vectors only appear as dependent variable, we will say that the
2Impacts of Covariates
model is of the ‘Y-compositional’ type (see e.g. Egozcue, Daunis-I-Estadella, Pawlowsky-
Glahn, Hron, and Filzmoser (2012)). When they only appear as explanatory variables, we
will say that the model is of the ‘X-compositional’ type (see e.g. Hron, Filzmoser, and
Thompson (2012)). Finally, when they appear on both sides, we will say that the model is of
the ‘YX-compositional’ type, see e.g. Kynclova, Filzmoser, and Hron (2015), Chen, Zhang,
and Li (2017), Morais, Thomas-Agnan, and Simioni (2018a) and Morais, Thomas-Agnan,
and Simioni (2018b). A simplified version of the YX-compositional type is presented in
Wang, Shangguan, Wu, and Guan (2013) and Morais, Thomas-Agnan, and Simioni (2018b)
later show that this model is equivalent to the so-called MCI (multiplicative competitive
interaction) model introduced earlier in the marketing literature by Nakanishi and Cooper
(1982). It may also be relevant to include in the model the total of the different parts
involved in the composition and we will consider each of the above models for the case with
or without a total variable, see e.g. Coenders, Mart´ın-Fern´andez, and Ferrer-Rosell (2017)
and Coenders, Ferrer-Rosell, Mateu-Figueras, and Pawlowsky-Glahn (2015). Extensions with
compositional functional predictors are presented in Sun, Xu, Cong, and Chen (2018), Bui,
Loubes, Risser, and Balaresque (2018) and Combettes and Muller (2019). Case studies using
some of these models are presented in Hron, Filzmoser, and Thompson (2012), Trinh, Morais,
Thomas-Agnan, and Simioni (2018) for the X-compositional type, Morais, Thomas-Agnan,
and Simioni (2017) for the YX-compositional type’.
The focus of the present work is on the definition and interpretation of impacts of covariates
in these models, question addressed by much fewer papers. Muller, Hron, Fiserova, Smahaj,
Cakirpaloglu, and Vancakova (2018) propose an interpretation for models of X-compositional
or Y-compositional types based on using a specific type of orthogonal coordinates (called
pivot coordinates, see e.g. Filzmoser, Hron, and Templ (2018)). Moreover they promote the
replacement of the natural logarithm by the base-2 logarithm for enhancing the interpretabil-
ity. The first drawback is that the resulting interpretation requires rerunning the model once
for each component in the Y-compositional case. Moreover changes in log-ratios correspond
to multiplicative increase (of the dependent or independent variables) in terms of relative
dominance, i.e. the ratio of one component to the geometric mean of the others (while keep-
ing all other log-ratios constant) which is not a very intuitive notion. This point of view is
extended in Coenders and Pawlowsky-Glahn (2020) by considering changes in more general
log-ratios leading to changes in any subset of components by a common factor (while reducing
the remaining components accordingly).
Morais, Thomas-Agnan, and Simioni (2018b) show that a natural interpretation tool in the
YX-compositional model is the notion of elasticity. The notion of elasticity is frequently used
in econometrics: it corresponds to the percent increase of the dependent variable induced by
a percent increase of the explanatory variable and is natural in a log-log regression model
because it coincides with the explanatory variable parameter (see Section 3). Indeed elastic-
ities are commonly computed for the MCI model in the marketing literature (see Nakanishi
and Cooper (1982)). Morais, Thomas-Agnan, and Simioni (2018b) relate it to the notion of
simplicial derivatives introduced in Egozcue, Jarauta-Bragulat, and D´ıaz-Barrero (2011) and
Barcel´o-Vidal, Mart´ın-Fern´andez, and Mateu-Figueras (2011).
With a graphical approach, Nguyen, Laurent, Thomas-Agnan, and Ruiz-Gazen (2018) bring
a different light on the evaluation of these impacts by plotting the predicted components
as a function of the explanatory variables but this graphing tool is limited to compositional
dependent or explanatory variables with three components.
Finally, for the X-compositional model, Coenders and Pawlowsky-Glahn (2020) consider the
introduction of the total variable among the explanatory and adapt the resulting interpreta-
tions, still in terms of log-ratio changes.
The objective of this paper is to extend Morais, Thomas-Agnan, and Simioni (2018b) to the
Y-compositional and the X-compositional models and to allow inclusion of the total variable
in the models. In Section 2, we introduce notations and define the different specifications
Austrian Journal of Statistics 3
of the considered models. In Section 3, we demonstrate the equations linking elasticities or
semi-elasticities (depending on the considered model) with simplicial derivatives. Section 4es-
tablishes the formulas for the elasticities and semi-elasticities in terms of model parameters in
the simplex as well as in coordinate space. Finally, Section 5provides examples of applications
to the X-compositional and to the Y-compositional models and Section 6concludes.
2. Compositional model specifications
Using the notation x0for the transpose of a vector x, let us denote by ˇ
X= ( ˇ
X1,· · · ,ˇ
XDX)0∈
RDX
+a vector of DXpositive components corresponding to the components of a compositional
vector expressed in original units: we call these components volumes as opposed to shares. For
example, in the case studied in Morais, Thomas-Agnan, and Simioni (2017), the volumes are
numbers of cars sold during a given month by the different brands of cars whereas the shares
represent the corresponding proportion of cars sold during that month by each brand relative
to the other brands in the study. The closure of the vector ˇ
Xof volumes is the corresponding
vector of shares
X=C(ˇ
X1,· · · ,ˇ
XDX)0= ˇ
X1
PDX
i=1 ˇ
Xi
,· · · ,ˇ
XDX
PDX
i=1 ˇ
Xi!0
= (X1,· · · , XDX)0
and belongs to the simplex space SDXof positive vectors in RDXwith sum equal to 1.
In some cases, it may be relevant to include in the regression model a variable measuring a
total (hence not scale-invariant) which may be T(X) or T(Y). Pawlowsky-Glahn, Egozcue,
and Lovell (2015a) argue that different formulas can be used for this total, for example one
of the following two:
•Arithmetic total: TA(ˇ
X) = PD
i=1 ˇ
Xi
•Geometric total: TG(ˇ
X) = (QD
i=1 ˇ
Xi)1/√D
The general principle of simplicial regression is to use transformations to transport the sim-
plex space SD, equipped with the Aitchison geometry, into the Euclidian space RD−1thus
eliminating the simplex constraints problem. It is generally agreed upon to use log-ratio
orthonormal coordinates (Pawlowsky-Glahn, Egozcue, and Tolosana-Delgado (2015b)). We
recall that to each D×(D−1) contrast matrix V,constructed from an orthonormal ba-
sis of SD,corresponds an isometric transformation traditionally called ilrV.As advocated
recently by Mart´ın-Fern´andez (2019), we will rather use the name olr (orthogonal log ra-
tio) for these transformations. We then have z∗=olrV(z) = V0log(z),where the natural
logarithm (denoted by log) is understood componentwise and the inverse transformation is
olr−1
V(z∗) = C(exp(Vz∗)).Note that olr-coordinates take the same value regardless if you use
shares or volumes. However the inverse transformation always returns shares.
Using the traditional notations for the simplex operations (see Pawlowsky-Glahn, Egozcue,
and Tolosana-Delgado (2015b)) and denoting by <, >Athe Aitchison scalar product, the first
row of Table 1presents the formulation of the regression models explaining a collection of
ni.i.d. random variables (simplex valued or not) by corresponding explanatory variables
which may be simplex valued or not. The observations are indexed by t,t= 1,· · · n. Because
marginal effects only involve one explanatory at a time, if we had a model explaining a simplex
valued variable by both types of explanatory variables, we would use the first and last columns
of this table. The second row of Table 1presents the corresponding model formulations in
coordinate space for a given choice of olr transformation olrV. Parameters a∗,b∗or B∗are
then estimated by maximum likelihood in coordinate space where the regression is classical.
Formulas to compute the corresponding parameter estimates in the simplex a,bor Bare
available and it is known that these estimated parameters in the simplex are independent of
4Impacts of Covariates
the particular choice of olrV, i.e. of the particular choice of contrast matrix. In Table 1the
different formulations may involve a total variable T(X) or T(Y) and it is printed in grey
to indicate that it is an option. Finally, we included in the formulations the particular case
of the MCI model obtained when DX=DYand the matrix B∗is a multiple of the identity
resulting in BX=bX.
Table 1: Specifications of the compositional models and notations
Space Y-compositional model X-compositional model YX-compositional model
SDYt=a⊕ˇ
Xtb⊕t
⊕T(ˇ
Y)tc
ˇ
Yt=a+<b,Xt>A
+t+dT (ˇ
X)t
‘CODA’ model:
Yt=a⊕BXt⊕t
⊕T(ˇ
X)td+⊕T(ˇ
Y)tc
‘MCI’ model:
Yt=a⊕bXt⊕t
⊕T(ˇ
X)td+⊕T(ˇ
Y)tc
RD−1Y∗
t=a∗+b∗ˇ
Xt+∗
t
+c∗T(ˇ
Y)t
ˇ
Yt=
a+PDX−1
k=1 b∗
kX∗
t,k +t
+dT (ˇ
X)t
‘CODA’ model:
Y∗
t=a∗+B∗X∗
t+∗
t
+d∗T(ˇ
X)t+c∗T(ˇ
Y)t
‘MCI’ model:
Y∗
t=a∗+bX∗
t+∗
t
+d∗T(ˇ
X)t+c∗T(ˇ
Y)t
Yt,a,b, t∈ SDY,ˇ
Xt∈R
Y∗
t,a∗,b∗,∗
t∈RDY−1
Xt,b∈ SDX,
ˇ
Yt,ˇ
Xt, a, t∈R
X∗
t,b∗∈RDX−1
B∈RDY,DX,b∈R
B∗∈RDY−1,DX−1
3. Elasticities, semi-elasticities and simplicial partial derivatives
A marginal impact in a linear regression model is usually understood as the change in the
expected value of the dependent variable Yinduced by an additive increase of the explanatory
of interest X. In nonlinear models, it is rather understood as the infinitesimal equivalent,
i.e. the derivative of the expected value of Ywith respect to Xand it may be non constant
throughout the range of X. In some nonlinear models, an elasticity or a semi-elasticity may
be more natural. Indeed in a log-log model, if E(log(Y)) depends linearly on log(X),then the
parameter of log(X) is equal to the derivative of E(log(Y)) with respect to log(X) (also called
the logarithmic derivative) and can be interpreted as the percent increase of E(Y) induced by
a one percent increase of X: this quantity is called elasticity of Ywith respect to X. Finally,
if the model is a semi-log model, the natural quantity is either the partial derivative of E(Y)
with respect to log(X) (if the logarithm is on the right hand side of the regression equation)
or symmetrically the partial derivative of E(log(Y)) with respect to Xin the other case (if
the logarithm is on the left hand side of the regression equation): in both cases it is called
a semi-elasticity. Because log-log models and semi-log models are frequent in econometrics,
elasticities and semi-elasticities are often used to measure the impact of covariates.
This supports the idea that, in a simplicial regression model, one should turn attention to
simplicial derivatives to evaluate the impacts of explanatory variables. Adapting the definition
of derivative to the case where a function is simplex valued or is defined on the simplex stems
Austrian Journal of Statistics 5
from the fact that a change in a vector of shares cannot be just reduced to a change in one
of the components since they are linked by their sum constraint: in other words, it is due to
the fact that one of the variables lies in a subspace of RD.
More precisely, the quantities of interest are
•∂⊕E⊕Y
∂X in the case of the Y-compositional model
•∂EY
∂⊕Xin the case of the X-compositional model,
•∂⊕E⊕Y
∂⊕Xin the case of the YX-compositional model,
where E⊕denotes the expectation of a simplex valued random variable (see Pawlowsky-
Glahn and Buccianti (2011)) and where the symbol ∂⊕indicates that the derivative should
be understood in the simplicial derivative sense with respect to that variable (see Barcel´o-
Vidal, Mart´ın-Fern´andez, and Mateu-Figueras (2011) and Egozcue, Jarauta-Bragulat, and
D´ıaz-Barrero (2011).
For the Y-compositional and X-compositional models, we are first going to express the relevant
simplicial derivatives in terms of semi-elasticities.
Indeed, for the case of the X-compositional model, let us consider an homogeneous function of
degree zero fdefined from RD
+to Rinducing a function fon SDby f(x) = f(C(ˇx)) = f(ˇ
x).
Propositions (13.10) and (13.13) in Barcel´o-Vidal, Mart´ın-Fern´andez, and Mateu-Figueras
(2011) imply that the part-Cderivatives of f, which we denote here by ∂f (x)
∂⊕xare given by:
∂f (x)
∂⊕x=∂f (ˇx)
∂log(ˇx)
Therefore the derivative of a function fof a simplex valued variable x=C(ˇx) corresponds to
the ordinary semi-log derivative of the corresponding homogeneous function fof the volumes
ˇx. Applying this result to the function expressing EYas a function of the share vector X,we
obtain the link between the simplicial derivative of this function and the semi-elasticity (or
semi-log derivative) in the classical sense of the corresponding function of the volume vector
ˇ
X.
Similarly, for the case of the Y-compositional model, for a simplex-valued function fof a
real variable x∈R, Theorem 12.2.6 in Egozcue et al. in Egozcue, Jarauta-Bragulat, and
D´ıaz-Barrero (2011) implies that:
∂⊕f(x)
∂x =Cexp ∂log f(x)
∂x 0,
where ∂⊕fdenotes the simplicial derivative of fat x. This result links the simplicial derivatives
of a simplex-valued function fto the semi-log derivatives (in the ordinary sense) of this
function. Applying this result to the function expressing E⊕Yas a function of X, we obtain
the link between the simplicial derivative of this function and the semi-elasticity (or semi-log
derivative) in the classical sense of E⊕Yas a function of X.
For the YX-compositional model, Morais, Thomas-Agnan, and Simioni (2018a) linked sim-
plicial derivatives to elasticities in the case of a model without a total and in the particular
case where the number of components DYof the Y composition is the same as that of the X
composition (DX). The limitation DY=DXin Morais, Thomas-Agnan, and Simioni (2018a)
was simply due to the particular application framework of this work but there is no additional
mathematical difficulty to extend the result to DY6=DX. The corresponding formulas are
recalled in Table 2for completeness.
Finally, considering models including a total, one would need to define infinitesimal paths
in the T-space. Instead we consider three types of infinitesimal variations as described in
Section 4.3.
6Impacts of Covariates
Table 2: Simplicial derivative and (semi-)elasticities
Y-compositional model X-compositional model YX-compositional model
∂⊕E⊕Y
∂X
=Cexp ∂log E⊕Y
∂X 0
∂EY
∂⊕X=∂EY
∂log ˇ
X
∂⊕E⊕Y
∂⊕X
=Cexp ∂log E⊕Y
∂log ˇ
X0
For upcoming interpretations, it is interesting to consider first order Taylor approximations
of such functions (of a simplex variable or simplex valued). For a function ffrom SDto R,
consider as in Barcel´o-Vidal, Mart´ın-Fern´andez, and Mateu-Figueras (2011) the generating
system u1,· · · ,uDof SDdefined by
uj=D−1
Dµj
µj= (1,· · · ,1,exp(1),1,· · · ,1),
where exp(1) is at the jth position. From Barcel´o-Vidal, Mart´ın-Fern´andez, and Mateu-
Figueras (2011), the first order Taylor’s approximation is given by
f(x⊕δuj)∼f(x) + δ∂f (ˇx)
∂log( ˇxj).(1)
This additive (in the simplex sense) increase of δujcorresponds to a multiplicative increase
of the jth component while holding constant all other ratios of remaining components. It
is also equivalent in coordinate space, for a proper choice of olr transformation, to increase
additively one olr component while keeping all others constant. To summarize, note that the
increment is given by the product of δby the classical semi-elasticity, i.e., a semi-log derivative
in the ordinary sense of the corresponding function of the volumes. As we will see in Section
5,δis proportional to the rate of change of x.
For a function ffrom Rto SD,Egozcue, Jarauta-Bragulat, and D´ıaz-Barrero (2011) obtain
the following first order Taylor approximation for a small additive increase δ > 0 of x∈R
f(x+δ)∼f(x)⊕δ∂⊕f(x)
∂x
As in Morais (2017), let us go one step further in the approximation. Indeed,
f(x)⊕δ∂⊕f(x)
∂x =f(x)⊕exp(δ∂log f(x)
∂x ).
Combining with a first order approximation of the exponential in a neighborhood of zero
exp(δ∂log f(x)
∂x )∼1 + δ∂log f(x)
∂x , we get the following approximation for the mth component of
f(x+δ)
fm(x+δ)∼fm(x)(1 + δ∂log fm(x)
∂x ).(2)
Taking the derivative of PD
m=1 fm(x)=1,we get PD
m=1 fm(x)∂log fm(x)
∂x = 0.Therefore the
RHS vector in equation (2) belongs to SD.To summarize, note that in this case the percent
increase of each component of f(x) is given by the classical semi-elasticity, i.e., a semi-log
derivative in the ordinary sense of the function.
Finally for a function ffrom SD
X, to SD
Y, a similar approximation has been obtained in Morais
(2017) for the particular case DX=DY.Combining the above two results, we obtain easily
that the Taylor approximation of a function ffrom SDXto SDYis given by
Austrian Journal of Statistics 7
fm(x⊕δuj)∼fm(x)1 + δ∂log fm(ˇx)
∂log ˇxj.
showing that a percent increase of the components of x,proportional to δ, induces a percent
increase of each component of f(x) given by the classical elasticity of the corresponding
component ∂log fm(ˇx)
∂log ˇxj.
4. Elasticities and semi-elasticities in terms of model parameters
The aim is now to relate the elasticities/semi-elasticities of the previous section to the model
parameters. The results of this section will be based on the following two lemmas which
establish the formulas for the semi-log derivatives of an olr transformation and its inverse.
Lemma 4.1 If zis a D-composition which is the closure of the vector ˇz of RD
+, and if z∗=
olrV(z) = V0log(z)is the olr-transformed vector associated to the contrast matrix V, then
∂olrV(z)
∂log ˇz =V0
This first lemma just results from the definition of the olr which is linear with respect to logˇz,
and could be used for any other log-ratio linear transformation.
Lemma 4.2 If zis a D-composition which is the closure of the vector ˇz of RD
+, and if z∗=
olrV(z) = V0log(z)is the olr-transformed vector associated to the contrast matrix V, then
∂log(olrV−1(z∗))
∂z∗=WzV,
where z=olr−1
V(z∗)and where Wz=ID−1Dz0with IDthe the D×Didentity matrix and
1Dthe D×1vector of ones.
Let vij, (i= 1,· · · , D and j= 1,· · · , D −1), be the general term of the matrix V. To prove
Lemma 4.2, using the formula for the inverse transformation of an olr, one representative of
log ˇz = log(olrV−1(z∗)) = log C(exp(Vz∗)) is given by Vz∗and therefore its derivative with
respect to z∗is V. Since log(z) = log(ˇz)−log(S)1D,where S=TA(ˇz) = PD
i=1 ˇzi, and since
∂S
∂z∗
j
=
D
X
k=1
∂log( ˇzk)
∂z∗
j
ˇzk=
D
X
k=1
vkj ˇzk,
we have
∂log(S)
∂z∗
j
=1
S
∂S
∂z∗
j
=
D
X
k=1
vkj zi
Combining first and second terms yields, for i= 1,· · · , D and j= 1,· · · , D −1
∂log(zi)
∂z∗
j
=vij −
D
X
k=1
vkj zk,
and this is the general term of the matrix WzV.
If we define W∗
z=WzV, note that W∗
zV0=Wz(will be used later on).
4.1. Semi-elasticities for Y-compositional models and X-compositional models
In the case of Y-compositional and X-compositional models, the natural tool is semi-elasticities.
However the formulas differ in the two cases:
8Impacts of Covariates
•X-compositional case: se(Y, ˇ
X) = ∂EY
∂log ˇ
X
•Y-compositional case: se(Y,ˇ
X) = ∂log E⊕Y
∂X
Let us denote by VX, respectively VY, the contrast matrices used for X, respectively Y.
The computation in the X-compositional case uses Lemma 4.1. Indeed, for j= 1,· · · , DX
∂EY
∂log ˇ
Xj
=
DX−1
X
k=1
∂EY
∂X ∗
k
∂X ∗
k
∂log ˇ
Xj
=
DX−1
X
k=1
b∗
kvX
jk
The result is reported in Table 3with a matrix formulation
∂E(ˇ
Y)
∂log ˇ
X=VXb∗=VXVX0log b=clr(b).(3)
The computation in the Y-compositional case uses Lemma 4.2 since E⊕Y=olr−1
V(EY∗).We
have
∂log E⊕Y
∂ˇ
X=∂log E⊕Y
∂EY∗
∂EY∗
∂ˇ
X=W∗
zb∗=W∗
zVY0log b=WzVYVY0log b=
=Wzclr(b) (4)
where z=olrV−1(E(olrVY)) = E⊕Y.
Expressions (3) and (4) underline the fact that the semi-elasticities are independent of the
particular contrast matrix. They are observation dependent in the Y-compositional case
through z=E⊕Y.
4.2. Elasticities for the YX-compositional model
For the YX-compositional model, Morais, Thomas-Agnan, and Simioni (2018a) have obtained
the expressions of the elasticities when the dimension of the Y composition is the same as
that of the X composition. Let us extend this result to the case DX6=DYusing the above
two lemmas.
We can see the relationship between log ˇ
Xand log E⊕Yas the composition of three functions
(listed from inside to outside)
•the function which maps log ˇ
X∈R+DXto X∗∈ SDX
•the function which maps X∗∈ SDXto EY∗∈R+DY
•the function which maps EY∗∈R+DYto log E⊕Y∈ SDY
Using the generalized chain rule for functions of several variables which states that the Jaco-
bian matrix of the composite function is the product of the Jacobian matrices of the composed
functions evaluated at appropriate points, we get
∂log E⊕Y
∂log ˇ
X=∂log E⊕Y
∂EY∗
∂EY∗
∂X∗
∂X∗
∂log ˇ
X(5)
The rightmost term on the right hand side of (5) is obtained using Lemma 4.1:
∂X∗
∂log ˇ
X=VX0.
Austrian Journal of Statistics 9
The central term yields the matrix B∗of parameters in coordinate space since the relationship
between EY∗and X∗is linear. The leftmost term on the right hand side is obtained using
Lemma 4.2:∂log E⊕Y
∂EY∗=WzVY,
where z=E⊕Y.We finally get
∂log E⊕Y
∂log ˇ
X=WzVYB∗VX0=WzVYVY0B=WzB,(6)
using the relationships between Band B∗(see e.g. Nguyen, Laurent, Thomas-Agnan, and
Ruiz-Gazen (2018)), and using the fact that the matrix Bsatisfies the zero-sum property
(sum of rows equal sum of columns equals 0.)
Note that the elasticity is observation dependent through z=E⊕Y.
For the MCI model, we have that B=bVYVY0,and therefore
WE⊕YB=bWE⊕YVYVY0=bWE⊕Y
Table 3summarizes the different formulas for semi-elasticities and elasticities for the three
types of models as a function of parameters estimates, in the simplex or in coordinate space.
Both expressions are important to keep in mind: the expression as a function of the simplex
parameters makes it clear that these are intrinsically simplex quantities independent of any
transformation. The expression as a function of coordinate space parameters is handy for
computations.
Table 3: (Semi-)elasticities without total
Y-compositional model X-compositional model YX-compositional model
∂log E⊕Y
∂X =W∗
E⊕Yb∗
=WE⊕Yclr(b)
∂E(ˇ
Y)
∂log ˇ
X=VXb∗
=clr(b)
‘CODA’ Model
∂log E⊕Y
∂log ˇ
X=W∗
E⊕YB∗VX0=
WE⊕YB
‘MCI’ Model
∂log E⊕Y
∂log ˇ
X=bWE⊕Y
Notations Wz=ID−1Dz0,W∗
z== WzVY
4.3. Models including a total
The presence of the total variable has to be taken into account in the partial impact measure
computations. We consider including among the explanatory variables
•a total of Yin the Y-compositional model (model A)
•a total of Xin the X-compositional model (model B)
•a total of Xand/or a total of Yin the YX-compositional model (model C)
The right hand side of model equations from Table 1are modified as follows
•model A: add ⊕T(Yt)c, where c∈ SDYis the parameter corresponding to the total
effect of Y
10 Impacts of Covariates
•model B: add +dT (Xt), where d∈Ris the parameter corresponding to the total effect
of X
•model C: add ⊕T(Yt)c⊕T(Xt)d,where c∈ SDYand d∈ SDYare the parameters
corresponding to the two total effects.
In the presence of a total, as mentioned in Section 3, we need to distinguish three types of
infinitesimal variations for a compositional variable X. The three types are as follows
•Type 1: the total T(X) remains constant and we look at infinitesimal variations of the
composition X. Such variations correspond to considering derivatives in the direction of
one of the unitary vectors of an orthonormal basis of SDX. With a proper choice of basis
and of contrast matrix as in Hron, Filzmoser, and Thompson (2012), this corresponds
to an infinitesimal change in one component, along a linear path in the simplex, keeping
all but the first olr coordinate constant.
•Type 2: the composition Xremains constant while the total is subject to an infinitesimal
variation. Such variations correspond to considering ordinary derivatives with respect
to the total T(X).
•Type 3: one of the components of Xvaries together with the total T(X).
In model A, the impact of additively increasing the total of Yis the same question as the
impact of a non-compositional variable and therefore the formula of Table 3can be applied
with cinstead of b.
Type 1 variations of XIn model B, the impact of a type 1 variation of Xwith fixed total
can be computed as in the X-compositional model in Table 3. The impact of a type 1 variation
of Xin model C can be computed as in the YX-compositional model in Table 3.
Type 2 variations of XType 2 variations of Xcorrespond to ordinary derivatives with
respect to the total of X.
For model B, a type 2 variation of Xresults in an ordinary derivative
∂EY
∂T =d
In model C, a type 2 variation of Xcan be computed as in a Y-compositional model treating
the total T(X) as an ordinary variable and formula from Table 3can be applied with dinstead
of b.
Type 3 variations of XIn this case, evaluating the effect of the variation of Xor of T(X) is
equivalent since they are linked together, therefore one of the two formulas is enough. For type
3 variations of X,since both total and composition vary, the easiest way out is to express the
dependent as a function of the volumes and use ordinary derivatives of the ensuing function
of the volumes.
In model B, for computing the effect of a type 3 variation of X, we need to adapt equation (3)
adding an extra term taking into account the fact that the total depends upon the volumes
and we get ∂EY
∂log ˇ
X=VXb∗+∂log EY
∂T
∂T
∂log ˇ
X=VXb∗+d∂T
∂log ˇ
X
This result shows that the derivatives of the total with respect to the volumes play a role in
the final expression of this semi-elasticity (hence we get a different formula for an arithmetic
or a geometric total).
Austrian Journal of Statistics 11
In model C, for a type 3 variation of X, the derivative with respect to Xof the first term
BXis obtained as in the YX-compositional model without total and and the derivative
of the second term T(X)dis obtained as in the X-compositional model with a T(X) total
(equation (6)) yielding overall
∂log E⊕Y
∂log ˇ
X=WE⊕Y(B+ log(d)∂T
∂log ˇ
X)
Once again, the result involves the derivatives of the total with respect to the volumes.
No additional complexity is introduced if we consider models involving a total of Yas an ad-
ditional dependent variable, which makes sense in the Y-compositional or YX-compositional
cases. In that case consequently, there would be no total of Yamong the explanatory vari-
ables. The impact of variations of a classical (resp: compositional) explanatory variable on
the compositional part of Ywould be studied as in the Y-compositional model (resp: YX-
compositional) and on the total of Yas in an ordinary linear model (resp: X-compositional
model).
5. Illustration
Let us give two toy examples of interpretation to illustrate our approach. We focus on
the X-compositional and the Y-compositional models since the case of the YX-compositional
model was already illustrated in Morais, Thomas-Agnan, and Simioni (2018a). Both examples
involve time series data and would justify a compositional time series model. However since
the focus is just on illustrating the impacts evaluation, we will ignore the time series aspect
and pretend the observations are i.i.d.. The subsequent computations were done using the R
package compositions.
5.1. Economic context and automobile market: Y-compositional model
In Morais and Thomas-Agnan (2020), the relationship between the socioeconomic context on
the demand of new cars by segments is investigated with a data set coming from the French
Renault company for market shares and from publicly available data bases. The data coming
from Renault has been blurred with a small noise for confidentiality reasons. The automobile
market is divided into five segments, from the smallest vehicles (A segment) to the largest
vehicles (E segment). The available explanatory variables are consumption expenditure, an
economic sentiment indicator, Gross Fixed Capital Formation of household, Gross Domestic
Product, diesel price and short term interest rate. The data is recorded monthly from 2003
to 2015 (167 observations). The model explaining the market shares of each segment by
the above explanatory is therefore a Y-compositional model in our terminology. We use the
following sequential binary partition: B versus A, C versus A and B, D versus A, B and C, and
E versus A, B, C and D to construct an orthonormal basis of the simplex and an associated
olr transformation. Figure 1displays the observed and predicted segments shares over time
and we can see that the compositional model catches the general tendency and smoothes the
jiggly patterns without suffering from overfitting, but not all the variance of this data which
differs across shares. The quality of fit of this model can be assessed by the multivariate
adjusted coefficient of determination based on the proportion of metric variance explained
by the model and which is equal to 0.86. Table 4contains the average semi-elasticities of
segments shares with respect to GDP.
Assuming the fitted model is correct, let us interpret for example the effect of a small increase
of GDP on the small cars (A segment) market shares. From formula (2), a small additive
increase δ= 1 billion euros (this amount representing 0.6% of the average monthly GDP)
results on average in a multiplicative increase of 0.0028 % of the A segment market share.
Instead of focusing on average elasticities, we could concentrate on a given point in time and
12 Impacts of Covariates
0.0
0.1
0.2
0.3
0.4
2005 2010 2015
Market shares
Segment ABCDESerie Fit Obs
Figure 1: Observed (in dotted line) and predicted (in solid line) segments shares over time
Table 4: Average semi-elasticities of segments shares with respect to GDP
se(St, GDPt)
A 2.88e-05
B -0.17e-05
C -0.96e-05
D 0.99e-05
E 1.18e-05
compute the impact on the whole share vector of such a small increase in GDP. We could
then check easily that the new shares vector is indeed in the simplex.
5.2. French GDP and job market: X-compositional model
In this second illustration, we are interested in the impact on French GDP of the structure
(composition) and the volume (total) of the French job market in the three main sectors
of activity: Agriculture (primary), Industry (secondary), and Services (tertiary). GDP is
expressed in million euros (current price) and total employment in thousands of people. The
data is collected quarterly from 2004 to 2018 1. We use the olr transformation corresponding
to the sequential binary partition: Agriculture versus Industry and Services, and Industry
versus Services. We consider the model explaining the GDP as a function of total employment
and the two olr coordinates associated to the above olr transformation. It is therefore an X-
compositional model including a total, in this case the simple arithmetic total employment.The
adjusted R square for this model is of 0.92. Table 5reports the semi-elasticities of GDP with
respect to the three sectors at the mean value of the sector composition corresponding to 788,
9196 and 19385 thousand employees for respectively Agriculture, Industry and Services. To
1https://data.oecd.org/emp/employment-by-activity.htm
Austrian Journal of Statistics 13
apply formula (1) in the neighborhood of the mean sector composition in volumes, we consider
a small δ > 0 and a variation of ⊕δujof x,where ujis the unit vector in the direction
of the component Services. This variation of xis equivalent, when δis small, to a relative
variation of p3/2δ(i.e. multiplying xby 1 + p3/2δ.) The factor p3/2 is pDX/DX−1 in
the general case, corresponding to log(uj) in the Taylor expansion in Barcel´o-Vidal, Mart´ın-
Fern´andez, and Mateu-Figueras (2011) . Taking δ= 0.01% results in an increase of around
p3/2∗19385 ∗0.0001 = 2450 people of the Services employment while the ratio between
Agriculture and Industry employments remain constant, and, assuming the fit is correct, we
can see in Table 5that the model predicts that GDP should increase by 84 million euros. The
marginal effect of the size of the job market, assuming that its composition stays the same
is obtained by the parameter estimate of total employment in the model, which is equal to
26.52. When total employment increases by 1000 people, the predicted GDP increase is 26.5
millions.
Table 5: Semi-elasticities of GDP with respect to employment sectors composition
se(GDP, ˇ
EmplSect)
AGR -10157.26
INDU -51706.00
SERV 841030.75
Note that using a base 2 logarithm as in Muller, Hron, Fiserova, Smahaj, Cakirpaloglu, and
Vancakova (2018) is not useful in our approach and would rather introduce an unnecessary
constant.
6. Conclusion
This contribution highlights the fact that elasticities or semi-elasticities are well-adapted to
interpret the impacts of explanatory variables in all types of compositional regression models.
It also links these elasticities or semi-elasticities to the simplicial derivatives of the expected
response with respect to the considered explanatory variable. The models may contain compo-
sitional variables on the right hand side and/or on the left hand side of the regression equation,
and may contain or not total variables (relative to the dependent or the explanatory vari-
ables). Further work should be done about confidence intervals for (semi-)elasticities which
can be computed by the Delta method, or simply using a bootstrap approach. An extension
to time series compositional model as well as to spatial compositional model involves more
complex elasticites computations which take into account the time-lag and spatial-lag opera-
tors as can be seen in Thomas-Agnan, Laurent, Ruiz-Gazen, Nguyen, Chakir, and Lungarska
(2020) for the spatial case.
An alternative but more complex tool used in Wang, Shangguan, Wu, and Guan (2013)
and in Morais, Thomas-Agnan, and Simioni (2018a) is the elasticity of a ratio of shares. In
the framework of an MCI model, it would directly correspond to a parameter of the model,
which is attractive, but relates to a change rate of a ratio of components and not of a single
component and therefore is more difficult to vulgarize.
Acknowledgements
We acknowledge funding from the French National Research Agency (ANR) under the In-
vestments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010.
14 Impacts of Covariates
References
Aitchison J (1986). The Statistical Analysis of Compositional Data. Monographs on statistics
and applied probability. Chapman and Hall, Reprinted in 2003 with additional material by
Blackburn Press.
Barcel´o-Vidal C, Mart´ın-Fern´andez JA, Mateu-Figueras G (2011).“Compositional Differential
Calculus on the Simplex.” Compositional Data Analysis: Theory and Applications. John
Wiley & Sons.
Bui TTT, Loubes JM, Risser L, Balaresque P (2018). “Distribution Regression Model with a
Reproducing Kernel Hilbert Space Approach.” arXiv preprint arXiv:1806.10493.
Chen J, Zhang X, Li S (2017). “Multiple Linear Regression with Compositional Response and
Covariates.” Journal of Applied Statistics,44(12), 2270–2285.
Coenders G, Ferrer-Rosell B, Mateu-Figueras G, Pawlowsky-Glahn V (2015). “MANOVA of
Compositional Data with a Total.” CODAWORK2015.
Coenders G, Mart´ın-Fern´andez JA, Ferrer-Rosell B (2017). “When Relative and Absolute
Information Matter: Compositional Predictor with a Total in Generalized Linear Models.”
Statistical Modelling,17(6), 494–512.
Coenders G, Pawlowsky-Glahn V (2020). “On Interpretations of Tests and Effect Sizes in
Regression Models with a Compositional Predictor.” SORT,44(1).
Combettes PL, Muller CL (2019). “Regression Models for Compositional Data: General Log-
contrast Formulations, Proximal Optimization, and Microbiome Data Applications.” URL
https://arxiv.org/abs/1903.01050.
Egozcue JJ, Daunis-I-Estadella J, Pawlowsky-Glahn V, Hron K, Filzmoser P (2012). “Sim-
plicial Regression. The Normal Model.” Journal of Applied Probability and Statistics.
Egozcue JJ, Jarauta-Bragulat E, D´ıaz-Barrero J (2011). “Calculus of Simplex-valued Func-
tions.” Compositional Data Analysis: Theory and Applications.
Filzmoser P, Hron K, Templ M (2018). Applied Compositional Data Analysis, With Worked
Examples in R. Springer series in Statistics. Springer.
Hron K, Filzmoser P, Thompson K (2012). “Linear Regression with Compositional Explana-
tory Variables.” Journal of Applied Statistics,39(5), 1115–1128.
Kynclova P, Filzmoser P, Hron K (2015). “Modeling Compositional Time Series with Vector
Autoregressive Models.” Journal of Forecasting,34(4), 303–314.
Mart´ın-Fern´andez JA (2019). “Comments on: Compositional Data: The Sample Space and Its
Structure.” TEST,28, 653–657. URL https://doi.org/10.1007/s11749-019-00672-4.
Morais J (2017). Impact of Media Investments on Brands’ Market Shares: A Compositional
Data Analysis Approach. Ph.D. thesis, Toulouse School of Economics (TSE).
Morais J, Thomas-Agnan C (2020). “Impact of the Economic Context on the Automobile
Market Segment Shares: A Compositional Approach.” Preprint.
Morais J, Thomas-Agnan C, Simioni M (2017). “Impact of Advertising on Brand’s Market-
shares in the Automobile Market: A Multi-channel Attraction Model with Competition
and Carryover Effects.” URL https://hal.archives-ouvertes.fr/hal-01666853/.
Morais J, Thomas-Agnan C, Simioni M (2018a). “Interpretation of Explanatory Variables
Impacts in Compositional Regression Models.” Austrian Journal of Statistics,47(5), 1–25.
Austrian Journal of Statistics 15
Morais J, Thomas-Agnan C, Simioni M (2018b). “Using Compositional and Dirichlet Models
for Market Share Regression.” Journal of Applied Statistics,45(9), 1670–1689.
Muller I, Hron K, Fiserova E, Smahaj J, Cakirpaloglu P, Vancakova J (2018). “Interpretation
of Compositional Regression with Application to Time Budget Analysis.” Austrian Journal
of Statistics,47(2), 3–19. doi:10.17713/ajs.v47i2.652. URL https://www.ajs.or.at/
index.php/ajs/article/view/vol47-2-1.
Nakanishi M, Cooper LG (1982). “Simplified Estimation Procedures for MCI Models.” Mar-
keting Science,1(3), pp. 314–322. ISSN 07322399. URL http://www.jstor.org/stable/
183931.
Nguyen THA, Laurent T, Thomas-Agnan C, Ruiz-Gazen A (2018). “Analyzing the Impacts
of Socio-economic Factors on French Departmental Elections with CODA Methods.” TSE
Working paper 18-961.
Pawlowsky-Glahn V, Buccianti A (2011). Compositional Data Analysis: Theory and Appli-
cations. John Wiley & Sons.
Pawlowsky-Glahn V, Egozcue JJ, Lovell D (2015a). “Tools for Compositional Data with a
Total.” Statistical Modelling,15(2), 175–190.
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015b). Modeling and Analysis of
Compositional Data. John Wiley & Sons.
Sun Z, Xu W, Cong X, Chen K (2018). “Log-Contrast Regression with Functional Composi-
tional Predictors: Linking Preterm Infant’s Gut Microbiome Trajectories in Early Postnatal
Period to Neurobehavioral Outcome.” arXiv preprint arXiv:1808.02403.
Thomas-Agnan C, Laurent T, Ruiz-Gazen A, Nguyen T, Chakir R, Lungarska A (2020).
“Spatial Simultaneous Autoregressive Models for Compositional Data: Application to Land
Use.” TSE Working Paper,20(1098).
Trinh HT, Morais J, Thomas-Agnan C, Simioni M (2018). “Relations between Socio-economic
Factors and Nutritional Diet in Vietnam from 2004 to 2014: New Insights Using Composi-
tional Data Analysis.” Statistical Methods in Medical Research, p. 0962280218770223.
Van Den Boogaart KG, Tolosana-Delgado R (2013). Analysing Compositional Data with R.
Springer.
Wang H, Shangguan L, Wu J, Guan R (2013). “Multiple Linear Regression Modeling for
Compositional Data.” Neurocomputing,122, 490–500.
Affiliation:
Christine Thomas-Agnan
Toulouse School of Economics
Esplanade de l’universit´e
31080 Toulouse Cedex 06 France
E-mail: christine.thomas@tse-fr.eu
URL: https://www.tse-fr.eu/fr/people/christine-thomas-agnan
Austrian Journal of Statistics http://www.ajs.or.at/
published by the Austrian Society of Statistics http://www.osg.or.at/
Volume 50 Submitted: 2019-12-02
January 2021 Accepted: 2020-07-18