Available via license: CC BY 4.0

Content may be subject to copyright.

Advances in Data Analysis and Classiﬁcation

https://doi.org/10.1007/s11634-021-00469-0

REGULAR ARTICLE

Quantile composite-based path modeling: algorithms,

properties and applications

Pasquale Dolce1·Cristina Davino2·Domenico Vistocco3

Received: 6 September 2020 / Revised: 5 September 2021 / Accepted: 22 September 2021

© The Author(s) 2021

Abstract

Composite-based path modeling aims to study the relationships among a set of

constructs, that is a representation of theoretical concepts. Such constructs are opera-

tionalized as composites (i.e. linear combinations of observed or manifest variables).

The traditional partial least squares approach to composite-based path modeling

focuses on the conditional means of the response distributions, being based on ordinary

least squares regressions. Several are the cases where limiting to the mean could not

reveal interesting effects at other locations of the outcome variables. Among these:

when response variables are highly skewed, distributions have heavy tails and the

analysis is concerned also about the tail part, heteroscedastic variances of the errors

is present, distributions are characterized by outliers and other extreme data. In such

cases, the quantile approach to path modeling is a valuable tool to complement the

traditional approach, analyzing the entire distribution of outcome variables. Previous

research has already shown the beneﬁts of Quantile Composite-based Path Model-

ing but the methodological properties of the method have never been investigated.

This paper offers a complete description of Quantile Composite-based Path Model-

ing, illustrating in details the method, the algorithms, the partial optimization criteria

along with the machinery for validating and assessing the models. The asymptotic

properties of the method are investigated through a simulation study. Moreover, an

application on chronic kidney disease in diabetic patients is used to provide guidelines

BCristina Davino

cristina.davino@unina.it

Pasquale Dolce

pasquale.dolce@unina.it

Domenico Vistocco

domenico.vistocco@unina.it

1Department of Public Health, University of Naples Federico II, Naples, Italy

2Department of Economics and Statistics, University of Naples Federico II, Naples, Italy

3Department of Political Science, University of Naples Federico II, Naples, Italy

123

P. Dolce et al.

for the interpretation of results and to show the potentialities of the method to detect

heterogeneity in the variable relationships.

Keywords Composite-based path modeling ·OLS regression ·Quantile regression

Mathematics Subject Classiﬁcation 62-07: (Statistics) Data analysis ·62H99:

(Statistics) Multivariate analysis, forse ·62G08: (Statistics) Nonparametric

regression ·62P10: (Statistics) Applications to biology and medical sciences

1 Introduction

Several are the approaches to study the relationships among different constructs and

between each construct and its corresponding observed or manifest variables (MVs). In

most common models, each block of MVs measures a construct, and prior knowledge is

used to deﬁne the theoretical model. Two are the main parameters in this type of model:

the path coefﬁcients and the loadings. Path coefﬁcients represent the relationships

between constructs while loadings measure the relationship between constructs and

the corresponding MVs.

Covariance structure analysis (Jöreskog 1978) and Partial Least Square Path

Modeling (PLS-PM) (Esposito Vinzi et al. 2010;Hairetal.2017) are the two main-

stream approaches. Even if they are commonly considered as alternative, they belong to

two different families of statistical methods. Covariance structure analysis, essentially

used in factor-based Structural Equation Modeling (SEM), exploits the covariance

matrix of MVs to estimate the model parameters. PLS-PM instead summarizes each

block of MVs in a component, or composite, namely an exact linear combination of the

MVs, focusing on the explained variance of MVs (Wold 1982,1985). Each composite

is a proxy of the construct associated to the correspondent block. For the aforesaid

reasons, PLS-PM is commonly referred to as a component-based, composite-based

or variance-based approach. Herman Wold, who proposed the PLS-PM, referred to

this approach as “Soft Modeling”. The name indicates that the method requires “soft”

distributional assumptions, in contrast to the estimation method for factor-based mod-

els, which requires strong assumptions on the error distributions (thus the name “hard

modeling”) (Wold 1975,1982; Tenenhaus et al. 2005;Chin1998).

PLS-PM exploits least square regression to estimate the model coefﬁcients and

therefore focuses on the conditional mean of the response variables. Many are the

cases where the analysis of the average alone could produce an incomplete view of the

complex structure of the relationships among variables. When heteroscedastic vari-

ances of least square regression residuals occurs, and/or when response variables are

highly skewed, the study of the conditional distribution at locations different from the

mean can complement the classical approach and provide a richer picture of the inves-

tigated phenomenon. Following this idea, Quantile Composited-based Path Modeling

(QC-PM), proposed by Davino and Esposito Vinzi (2016), exploits quantile regres-

sion (Koenker and Basset 1978) to look beyond the average. It is a valuable tool to

study the relationships among variables, and to model location, scale and shape of the

responses. QC-PM can be used to complement PLS-PM to investigate whether the

123

Quantile composite-based path modeling

effects of explanatory constructs change over the entire distribution of the response

constructs. It is worth emphasizing that PLS-PM and QC-PM are not competing meth-

ods, and therefore their comparison in terms of performance is of little interest. The

two methods have different objectives: PLS-PM focuses on the conditional means

of the dependent variables providing an instant summary, while QC-PM explores

relationships among variables outside of conditional mean. Finding that estimated

coefﬁcients vary across the conditional quantiles does not imply that the PLS-PM

results are invalid.

Previous research (Davino et al. 2016,2017,2018,2020; Davino and Esposito

Vinzi 2016) was oriented to show the relative advantages of QC-PM when the interest

is in the effects of explanatory constructs on the entire distribution of the response

constructs, and to assist in the interpretation of results and in their use combined with

PLS-PM results.

The aim of this paper is to provide a complete and organic description of QC-PM,

since its methodological properties have never been investigated. This goal is pursued

through several innovative contributions introduced in the paper: a clear and detailed

explanation of the method introducing also the case of one block and two blocks, an

improvement of the method allowing to handle the measurement invariance issue, an

application with artiﬁcial data that allows to highlight the potential of the method.

More speciﬁcally, in order to clarify the characteristics of the method, a step-by-step

description of the algorithms, the partial optimization criteria, and the formalization

of the models are provided. A relevant part of the present work is devoted to studying

the properties of QC-PM, which have never been investigated before.

An analytical discussion of the properties of the involved estimators is a daunting

task, due to the complexity of composite-based path modeling. This is a fertile ground

for the use of simulation studies, which provide information on the performance of

the method in terms of bias, efﬁciency and robustness of the estimates (Paxton et al.

2001). In particular, we exploit a Monte Carlo simulation design generating data

from a composite-based population and considering a set of different scenarios. This

allows us to assess the effects of several drivers: correlation within the blocks of MVs,

correlation among constructs, effect of heavy-tails and skewness of MV distributions,

effect of sample size.

A discussion on the asymptotic properties of the method is offered, along with

empirical evidence on the behaviour of the estimators in terms of bias and efﬁciency.

An innovative contribution to the estimation of the outer model is also provided to

fulﬁl the measurement invariance required to compare path coefﬁcients estimates over

quantiles.

Moreover, an application on Chronic Kidney Disease (CKD) in diabetic patients is

provided to show beneﬁt of using QC-PM as a supplement to PLS-PM. In particular,

we apply QC-PM on data already used in a research that proposed a quantile approach

to factor based SEM (Wang et al. 2016). Because the original data were not available,

they have been artiﬁcially derived mimicking the model, the relations among variables

and the estimates obtained in the original study (Wang et al. 2016). As the artiﬁcial

data are generated from a scenario where relationships among variables change with

quantiles, the application highlights the potentialities of the method in detecting the

123

P. Dolce et al.

heterogeneity in the variable relationships and stressing its complementary role with

the traditional methods for composite-based path modeling.

The paper is organized as follows: Sect. 2describes QC-PM in detail, formalizing

the estimation process starting from the simplest case of one block of MVs and moving

until the general path model for multi-block data. Section 3illustrates the assessment

measures of QC-PM in terms of goodness of ﬁt and statistical signiﬁcance of the esti-

mated coefﬁcients. Section 4shows the simulation design and the main results, while

the applicative potentialities of QC-PM, along with guidelines for the interpretation

of results, are provided through the study on the artiﬁcial data set concerning CKD

in diabetic patients in Sect. 5. Finally, a summary of the proposal and an outline for

future developments are given in Sect. 6.

2 QC-PM: quantile composite-based path modeling

QC-PM is strongly related to PLS-PM. Therefore the modeling and estimation pro-

cedures have much in common, the same holds for their properties. The theoretical

foundations are framed in the iterative algorithm proposed by Wold (1966a,b), the

Nonlinear estimation by Iterative PArtial Least Squares (NIPALS) algorithm, an alter-

native algorithm for implementing principal component analysis. More broadly, Partial

Least Squares (PLS) refers to a set of iterative alternating Ordinary Least Squares

(OLS) algorithms, extending the NIPALS algorithm to implement a large number of

multivariate statistical techniques (Esposito Vinzi and Russolillo 2013), depending on

the involved MVs. For example, in case of one block of MVs, PLS provides principal

component analysis. In case of two blocks of MVs, multivariate regularized regression

can be obtained (PLS regression). In case of multi-block data, PLS algorithm produces

PLS-PM (Lohmöller 1989).

The numerical solutions of all these methods are obtained through an iterative

algorithm, which is the ﬁrst stage of the procedure. The basic idea of the PLS iterative

algorithm, which computes the weights used to deﬁne the composites, is to partition

the set of model parameters to be estimated in subsets. At each step of the algorithm,

one subset of parameters is considered known and held ﬁxed, while the other subset

is estimated (Lohmöller 1989). A least squares criterion is adopted to estimate the

parameters in each step. The name PLS comes from the use of OLS to face with the

least square criterion at each step. For example, in case of multi-block data, namely

PLS-PM, the procedure comes down to a set of simple and multiple OLS regressions,

and Pearson correlation computation.

The QC-PM algorithm follow exactly the same steps of the PLS-PM algorithm, but

replacing simple and multiple OLS regression with their Quantile Regression (QR)

counterparts (Koenker 2005). The same for classical Pearson correlation, which is

replaced with quantile correlation (Li et al. 2014). As well as PLS-PM, QC-PM is

based on a two-stage procedure. The ﬁrst stage aims at computing the outer weights

by an iterative procedure (these weights are then used for computing the composites).

In the second stage, model parameters (loadings and path coefﬁcients) are estimated

through regression analysis using the composites. As for PLS-PM, partial criteria are

optimized at each step.

123

Quantile composite-based path modeling

The use of quantile based tools in all phases of the algorithm shifts the focus

to the entire conditional distributions of the involved response variables, allowing

to estimate partial conditional quantiles. Through the use of different conditional

quantiles, the whole response distribution can be inspected. Therefore QC-PM is a

valuable complement to PLS-PM, as much as quantiles are a complement to the

average. For the sake of illustration, next subsections present QC-PM for the simplest

model (one block of MVs), for two blocks of MVs, and for multi-block data (the

general model), respectively.

The presentation of the QC-PM algorithm will follow the same steps and the same

approach generally used to present the PLS-PM algorithm (Lohmöller 1989; Tenen-

haus et al. 2005; Esposito Vinzi and Russolillo 2013). A basic knowledge of QR is

assumed. Appendix provides a basic introduction of QR idea and goals. For a more

detailed description of QR, please refer to Koenker and Basset (1978), Davino et al.

(2013) and Furno and Vistocco (2018).

2.1 Quantile path modeling for one block of manifest variables

The simplest model involves one block of MVs and a construct. The relationships

between them are depicted in Fig. 1. MVs are denoted with X={xip}and represented

through rectangles. It is worth to recall that i=1,...,nrefers to the observations, n

denoting their number, and p=1,...,Prefers to the MVs, Pdenoting the number

of MVs in each block. The corresponding construct is labeled with ξ={ξi}and is

placed in oval or circle. This diagram is called path diagram.

The relationships among MVs and construct in the path diagram can be translated

into a system of simultaneous equations. In particular, for each considered quantile

θ∈(0,1), the link between each MV xpand ξis deﬁned through the following

equation:

xip =αp(θ)+λp(θ)ξi+ip (θ),(1)

Fig. 1 A path diagram for an hypothetical one-block model

123

P. Dolce et al.

where αpis a location parameter, λpis the loading coefﬁcient, capturing the effect

of ξon xp, and p={ip}is the error term vector. The only assumption is that the

generic θth conditional quantile of xpcan be expressed as:

Qθxp|ξ=αp(θ)+λp(θ)ξ(2)

which implies that the θth quantile of the error term p(θ)is equal to zero and p(θ)

is independent of ξ. No assumptions on the error distribution are required.

The model captures the common variation among the MVs in Xthat depends

on the construct ξ. The use of QR allows to encompass the effect of ξon the whole

conditional distribution of each xp, effect that might not be equal at different locations,

i.e. quantiles. In fact, the construct could be a weak factor at some location of the MV

distributions, exerting a stronger effect at other conditional quantiles. In other cases

the effect could be almost uniform along the entire distribution. The quantile model

offers the opportunity to investigate the possible different situations.

Parameters are estimated through the classical PLS algorithm for one block of

variables, replacing OLS regression with QR at each step of the procedure. This

corresponds to iteratively optimize a quantile partial criterion. The algorithm consists

of three main steps. In the initialization step, arbitrary values are set for the outer

weights ˆ

wp, namely the coefﬁcients used to compute the composite ˆ

ξ, which expresses

the construct as a linear combination of the MVs. Starting from such initialization

values, a ﬁrst approximation of the composite is computed as a linear combination

of the MVs in X,ˆ

ξ(0)=P

p=1ˆw(0)

pxp. Then, a loop starts and at each sth iteration

(s=0,1,2,...), each MV xpis regressed on the composite, minimizing the following

quantile loss function:

ˆw(s)

p(θ)=argmin

wp(θ)

n

i=1

ρθxip −αp(θ)−ξ(s−1)

iwp(θ),(3)

where ρθ(.)is the check function, which asymmetrically weights positive and negative

residuals, namely:

ρθ(r)=θrif r>0

(θ−1)rif r≤0.(4)

The outer weights are then iteratively computed until convergence through simple QR

models, where each MV is the response variable and the composite is the regressor.

In each step the estimated weights are normalized and used to update the ˆ

ξ, obtained

as a linear combination of the response MVs xp.

By weighting the MVs considering their quantile covariation with the construct, for

each quantile θthe proposed algorithm chooses a linear combination of MVs that is a

consistent quantile composite. Once convergence is reached or the maximum number

of iterations is achieved, loadings are estimated by means of simple QRs over the

corresponding scores. The pseudo–code of the QC-PM iterative algorithm for the case

of one block of MVs is provided in Algorithm 1.

123

Quantile composite-based path modeling

Algorithm 1 The QC-PM algorithm for the case of one block of MVs

1: for each quantile θdo

STEP 1: Initialization

2: s←0iteration counter

3: Choose arbitrary outer weights ˆw(s)

p(θ)(p=1, ..., P)

4: Compute ˆ

ξ(s)(θ) =P

p=1ˆw(s)

p(θ)xp

STEP 2: Iteration

5: repeat

6: s←s+1increment the iteration counter

7: for all manifest variables xpdo

8: Compute ˆw(s)

p(θ)solving the quantile regressions:

xip =α(s)

p(θ)+w(s)

p(θ)ˆ

ξ(s−1)

i+ip (θ)(i=1,...,n;p=1,...,P)

9: end for

10: Normalize the weights:

ˆw(s)

p(θ)=ˆw(s)

p(θ)

Xˆ

w(s)(θ)where ˆ

w(s)(θ)={ˆw(s)

p(θ)}

11: Compute ˆ

ξ(s)(θ):

ˆ

ξ(s)(θ)=

P

p=1

ˆw(s)

p(θ)xp

12: until ˆ

w(s)(θ)≈ˆ

w(s−1)(θ)

STEP 3: Estimation

13: for all manifest variables xpdo

14: Estimate ˆ

λp(θ)solving the quantile regressions:

xip =αp(θ)+λp(θ)ˆ

ξ(s)

i(θ)+ip (θ)(i=1,...,n;p=1,...,P)

15: end for

16: end for

2.2 Quantile path modeling for two blocks of manifest variables

Figure 2depicts the path diagram for an hypothetical two-block model. In such a

case, let X={xip}and Y={yij}denote the two blocks of MVs. In particular,

the former consists of the explanatory MVs and the latter of the response MVs. We

denote with Pthe number of explanatory MVs, as in the previous subsection, and

with Jthe number of response MVs, yjbeing the generic response MV. Moreover,

ξ={ξi}is the construct representing the explanatory block, and η={ηi}the construct

representing the dependent block. The general model consists of two sub-models: the

inner model and the outer model. The inner model refers to the relationships between

the constructs, the outer model between each construct and its block of MVs.

By referring to the outer model, for each quantile θ∈(0,1),theMVsxpin

the explanatory block, and the MVs yjin the dependent block, are related to their

correspondent constructs through the following system of bilinear equations:

123

P. Dolce et al.

Fig. 2 A path diagram for an hypothetical two-block model

xip =αxp (θ)+λxp (θ)ξi+ip (θ),(5)

yij =αyj (θ)+λyj (θ)ηi+ωij (θ).(6)

where αxp and αyj are location parameters, λxp is the loading coefﬁcient capturing

the effect of ξon xp,λyj is the loading coefﬁcient capturing the effect of ηon yj,

while ={ip}and ω={ωij}are the error terms. The usual assumptions on the error

terms already mentioned above are required.

The inner model speciﬁes the dependence relationships between the two constructs.

The dependent construct ηis linked to the explanatory construct ξby the following

model:

ηi(θ)=β0(θ)+β1(θ)ξi+ζi(θ),(7)

where β1is the so-called path coefﬁcient capturing the effects of ξon the dependent

construct η, and ζ={ζi}is the inner error variable.

The procedure for the estimation of the model parameters requires a multi-step

algorithm and follows the same structure of the PLS algorithm for two blocks of MVs,

deﬁned for example in Esposito Vinzi and Russolillo (2013), where partial criteria are

optimized iteratively. The pseudo code of the QC-PM algorithm for the case of two

blocks of MVs is detailed in Algorithm 2.

123

Quantile composite-based path modeling

Algorithm 2 The QC–PM algorithm for the case of two blocks of MVs

1: for each quantile θdo

STEP 1: Initialization

2: s←0iteration counter

3: Choose arbitrary outer weights ˆw(s)

xp (θ)(p=1, ..., P)

4: Compute ˆ

ξ(s)(θ)=P

p=1ˆw(s)

xp (θ)xp

STEP 2: Iteration

5: repeat

6: s←s+1increment the iteration counter

7: dependent block

8: for all manifest variables yjof the dependent block do

9: Compute ˆw(s)

j(θ)solving the quantile regressions:

yij =α(s)

j(θ)+w(s)

j(θ)ˆ

ξ(s−1)

i+ij (θ)(i=1,...,n;j=1,...,J)

10: end for

11: Normalize the weights:

ˆw(s)

yj (θ)=

ˆw(s)

yj (θ)

Yˆ

w(s)

y(θ)

where ˆ

w(s)

y(θ)={ˆw(s)

yj (θ)}

12: Compute ˆ

η(s)(θ):

ˆ

η(s)(θ)=

J

j=1

ˆw(s)

yj (θ)yj

13: explanatory block

14: for all manifest variables xpof the explanatory block do

15: Compute ˆw(s)

p(θ)solving the quantile regressions:

xip =α(s)

p(θ)+w(s)

p(θ)ˆη(s)

i(θ)+ip (θ)(i=1,...,n;p=1,...,P)

16: end for

17: Normalize the weights:

ˆw(s)

xp (θ)=ˆw(s)

xp (θ)

Xˆ

w(s)

x(θ)

where ˆ

w(s)

x(θ)={ˆw(s)

xp (θ)}

18: Compute ˆ

ξ(s)

(θ):

ˆ

ξ(s)(θ)=

P

p=1

ˆw(s)

xp (θ)xp

19: until ˆ

w(s)

x(θ),ˆ

w(s)

y(θ)≈ˆ

w(s−1)

x(θ),ˆ

w(s−1)

y(θ)

STEP 3: Estimation

20: for all manifest variables xpdo explanatory block

21: Estimate ˆ

λxp (θ)solving the quantile regressions:

xip =αp(θ)+λp(θ)ˆ

ξ(s)

i(θ)+ip (θ)(i=1,...,n;p=1,...,P)

22: end for

23: for all manifest variables yjdo dependent block

24: Estimate ˆ

λyj (θ)solving the quantile regressions:

yij =αj(θ)+λj(θ)ˆη(s)

i(θ)+ij (θ)(i=1,...,n;j=1,...,J)

25: end for

26: Estimate the path coefﬁcient β1(θ)solving the quantile regressions: path coefﬁcient

ηi=β0(θ)+β1(θ)ˆ

ξi(θ)+ζi(θ)

27: end for

123

P. Dolce et al.

In the initialization step of the algorithm, arbitrary values are set for the outer

weights ˆ

wxp to compute a ﬁrst approximation of the composite as a linear combination

of the MVs in X,ˆ

ξ(0)=P

p=1ˆw(0)

xp xp. Then, the iterative algorithm step proceeds

over two phases. At each sth iteration (s=0,1,2,...), the response MVs yjare

regressed on the approximation of the composite ˆ

ξ(s−1), minimizing the following

quantile loss function:

ˆw(s)

yj (θ)=argmin

wyj(θ)

n

i=1

ρθyij −αp(θ)−ˆ

ξ(s−1)

iwyj (θ),(8)

where ρθ(.)is the check function deﬁned as above.

In the second phase, the estimated ˆw(s)

yj (θ),for j=1,..., J, are used to compute

the composite ˆ

η(s)through a linear combination of the response MVs yj, and then the

explanatory MVs xxp, are regressed on the obtained linear combination, minimizing

the following quantile loss function:

ˆw(s)

xp (θ)=argmin

wxp(θ)

n

i=1

ρθxip −αp(θ)−ˆη(s)

iwxp (θ).(9)

Finally, an updated approximation of the composite ˆ

ξ(s)is obtained as a linear

combination of the explanatory MVs xp, using the weigths ˆw(s)

xp (θ).

These two phases are iteratively repeated until convergence of the outer vectors,

wx(θ)={wxp (θ)}and wy(θ)={wyj (θ)}, is achieved. Then loadings and path

coefﬁcients are estimated through quantile regression.

QC-PM algorithm returns, for each quantile θ, a linear combination of the explana-

tory MVs xxp by weighting the corresponding MVs on the basis of their quantile

covariation with the linear combination of the response MVs yj.

2.3 Quantile path modeling for multi-block data

Figure 3depicts a path model for multi-block data using the case of three blocks. The

general model for Kblocks follows the same logic. Let us assume that Pvariables

are collected in a table Xof data partitioned in Kblocks: X=[X1,X2...,XK].Let

Xk={xip

k}be a generic block of MVs, where i=1,...,n, with ndenoting the

number of observations, pk=1,..., Pk, with Pkbeing the number of MVs in the kth

block. We denote by ξk={ξik}and xpk={xip

k}the LV and a generic MV of the kth

block, respectively.

A construct that never appears as a dependent variable in the model is called exoge-

nous, while the endogenous constructs play only the role of dependent variables or of

both dependent and explanatory variables. In Fig. 3, for example, ξ1is an exogenous

construct and ξ2and ξ3are endogenous constructs.

As for the case of two blocks of MVs, the general model consists of the inner model

and the outer model. For each quantile θ∈(0,1), in the outer model it is assumed

123

Quantile composite-based path modeling

Fig. 3 A path model with three blocks of MVs

that each MV xpkis related to its own construct through the following equations:

xip

k=αpk(θ)+λpk(θ)ξik +ip

k(θ),(10)

where αpkis the location parameter, ξk={ξik}is the construct representing the kth

block of MVs, λpkis the loading coefﬁcient, capturing the effect of ξkon xpkand

k={ip

k}is the error term vector, using the usual above mentioned assumption on

the errors.

The inner model captures and speciﬁes the dependence relationships among con-

structs. A generic endogenous construct, ξk, is linked to the related explanatory

constructs, ξk,k∈Jk, where Jk={k:ξkis predicted by ξk}, by:

ξik(θ)=βk0(θ)+

k∈Jk

βkk(θ)ξik +ζik(θ),(11)

where βkkis the path coefﬁcient capturing the effects of ξkon the dependent construct

ξ

k, and ζ

k={ζik}is the inner error variable vector, with the usual assumption on the

errors.

A description of the general QC-PM algorithm is provided in Algorithm 3.

123

P. Dolce et al.

Algorithm 3 The general QC-PM algorithm for the case of multi-block data

1: for each quantile θdo

STEP 1: Initialization

2: s←0iteration counter

3: Choose arbitrary outer weights ˆw(s)

pk(θ)(pk=1,...,Pk;k=1,...,K)

4: Compute ˆ

ξ(s)

k(θ)=Pk

pk=1ˆw(s)

pk(θ)xpk

STEP 2: Iteration

5: repeat

6: s←s+1increment the iteration counter

7: inner approximation phase (k∈Jk)

8: Compute quantile correlation:

τ(θ)(s)

kk=qcor (θ)ˆ

ξ(s−1)

k(θ),ˆ

ξ(s−1)

k(θ)

9: Compute inner scores as,

ˆ

ξ(s)

k(θ)=

k∈Jk

τ(θ)(s)

kkˆ

ξ(s−1)

k(θ),where Jk={k:ξkis predicted by ξk}

ˆ

ξ(s)

k(θ)=

k∈Jk

τ(θ)(s)

kkˆ

ξ(s−1)

k(θ),where Jk={k:ξkpredicts ξk}

10: outer approximation phase (k=1,...,K)

11: for all manifest variables xpkdo

12: Compute ˆw(s)

pk(θ)solving the quantile regressions:

xip

k=α(s)

pk(θ)+w(s)

pk(θ)ˆ

ξ(s)

i(θ)+ip

k(θ)(i=1,...,n;pk=1, ..., Pk)

13: end for

14: Normalize the weights:

ˆw(s)

pk(θ)=ˆw(s)

pk(θ)

Xˆ

w(s)

k(θ)

where ˆ

w(s)

k(θ)={ˆw(s)

pk(θ)}

15: Compute ˆ

ξ(s)

k(θ)=Pk

pk=1ˆw(s)

pk(θ)xpk

16: until ˆ

w(s)≈ˆ

w(s−1)

STEP 3: Estimation

17: for all manifest variables xpkdo estimation of loadings

18: Estimate ˆ

λxp (θ)solving the quantile regressions:

xip =αp(θ)+λp(θ)ˆ

ξ(s)

i(θ)+ip (θ)(i=1,...,n;p=1,...,P)

19: end for

20: Estimate βkk(θ), solving the quantile regressions: estimation of path coefﬁcients

ξik(θ)=βk0(θ)+

k∈Jk

βkk(θ)ξik (θ)+ζik(θ)

21: end for

123

Quantile composite-based path modeling

The weight vector w(θ)={wpk(θ)},(pk=1,...,Pk;k=1,...,K),used

to deﬁne the composites, is computed by an iterative algorithm that proceeds over

two phases, so-called inner and outer approximation phases, iteratively repeated until

convergence of the outer vectors w(θ) is achieved (i.e., the change of the outer weights

from one iteration to the next is smaller than a predeﬁned tolerance).

In the inner phase, composites are approximated as weighted aggregates of the

adjacent composites: two composites are adjacent if there exists a link in the inner

model connecting the corresponding constructs, that is, an arrow going from one

construct to the other in the path diagram, independently of the direction. The inner

weights are deﬁned as the values of the quantile correlation between the composites

obtained at the previous step. According to the PLS-PM terminology, this mode to

compute inner weights is called factorial inner scheme. Another scheme can be also

applied, called centroid scheme, where the inner weights are computed as the signs of

the quantile correlation between the composites (Tenenhaus et al. 2005). These two

schemes generally provides very close results, but factorial scheme is more advisable

when correlation between composites is close to zero. In this case, correlation may

oscillate from small negative to small positive values during the iteration cycles, and

factorial scheme is more advisable because it takes into account the strength of the

correlation, instead of just the sign. It is worth to note that unlike Pearson correlation,

quantile correlation is not a symmetric measure (Li et al. 2014), hence it is necessary

to specify the role played by the involved constructs in each equation (i.e., explanatory

or dependent one) for the calculation of the inner weights.

In the outer estimation phase, composites are approximated through a normal-

ized weighted aggregate of the corresponding MVs. Outer weights are computed

through simple quantile regressions, where each MV is regressed on the corre-

sponding inner approximation composite. Then, the weights are normalized so that

var[Xkwk(θ)]=1. According to the PLS-PM terminology (Tenenhaus et al. 2005),

this mode to compute outer weights is called Mode A. The so-called Mode B is also fea-

sible in QC-PM, computing the outer weights as regression coefﬁcients in the quantile

multiple regression of the inner approximation composite on its own MVs. Basically,

Mode B takes account of collinearity among MVs of the some blocks, while Mode A

ignores this collinearity.

In the QC-PM iterative procedure, at each sth iteration (s=0,1,2,...),thefol-

lowing partial criterion are then optimized:

ˆ

wpk(θ)=argmin

wpk(θ)

n

i=1

ρθxip

k−αp(θ)−ξ(s−1)

ik wpk(θ)(12)

where ρθ(.)is the check function deﬁned as above.

When Mode B is applied, the following criterion is instead minimized:

ˆ

w(s)

k(θ)=argmin

wk(θ)

n

i=1

ρθξ(s−1)

ik (θ) −αp(θ)−Xikwk(θ)(13)

123

P. Dolce et al.

When convergence is achieved, loadings and path coefﬁcients are estimated through

QR.

As a matter of fact, QC-PM provides, for each quantile of interest, a set of outer

weights, loadings and path coefﬁcients, offering a more complete picture of the rela-

tionships among variables both in the outer model and in the inner model.

The algorithm provides quantile-based composites, and it is useful to deal with

heterogeneity both in the structural model and in the measurement model. In such a

case the interest is in evaluating how weights and composites vary across quantile.

However, if the interest is in comparing estimated models over quantiles, the measure-

ment invariance (Henseler et al. 2016) has to be fulﬁlled. If weights, and consequently

composites, change over quantiles, a proper comparison among path coefﬁcients esti-

mated at different quantiles is indeed not reliable, because the same concept may not

be measured across quantiles. To this end, a test on the weights deﬁned as a vari-

ant of the Wald test described in Koenker and Basset (1982) can be exploited. The

null hypothesis of the test states that the weights are identical. In case of signiﬁcant

differences among weights, or in case there is the requirement to keep the weights

ﬁxed, a new variant of QC-PM can be implemented simply setting the quantile to the

median in the iterative procedure. The use of the median in the iterative procedure

(step 2) of the algorithm is in line with the approach proposed in Wang et al. (2016)

for factor-based SEM. In such an approach the quantile varies only in step 3, to obtain

quantile-dependent path coefﬁcients. The median approach can be generalized to the

whole iterative process to provide measurement invariance.

3 Model assessment and validation

Once the algorithm converges and estimates for loadings and path coefﬁcients are

obtained, there are many tools for assessing both the inner and outer model. Results,

namely loadings and path coefﬁcients, can also be validated from an inferential point

of view (Davino et al. 2016).

Goodness of ﬁt measures most commonly used in PLS-PM cannot be directly

adapted to QC-PM. Moving from OLS to QR requires indeed amendment. The intro-

duction of an effective goodness of ﬁt approach in QR is still an open issue in the

scientiﬁc literature (Koenker and Machado 1999; He and Zhu 2003). This does make

it odd to directly compare OLS and QR, even considering that the two methods opti-

mize different criteria. Therefore, a direct comparison between PLS-PM and QC-PM

is not possible.

Starting with the inner model, the coefﬁcient of determination R2of the endoge-

nous constructs (Esposito Vinzi et al. 2010) is the criterion mostly used in PLS-PM.

Considering that QR loss function is not based on a least squares criterion but rather

on a least absolute deviation criterion in terms of weighted residuals, the use of R2in

QC-PM goes against the underlying rationale of the method. This issue is particularly

relevant since most of the assessment indexes in PLS-PM are based on the multiple

linear determination coefﬁcient or squared Pearson correlation coefﬁcient. It is against

this background we employ the pseudo-R2proposed by Koenker and Machado (1999),

so to have a measure that simulates the role and interpretation of the R2for QC-PM

123

Quantile composite-based path modeling

assessment. It is important, however, to bear in mind that the pseudo–R2is designed

differently.

QC-PM estimates a set of parameters for each conditional quantile θof interest and,

consequently, it requires a set of assessment measures for each estimated model. In

particular, for each θ,thepseudo–R2compares the residual absolute sum of weighted

differences using the selected model (RASW) with the total absolute sum of weighted

differences using a model with the only intercept (TAS W ). RASW corresponds to the

residual sum of squares in classical regression, TASW to the total sum of squares of

the dependent variable. Pseudo–R2aims to evaluate if the full model (i.e. the model

with the regressors) is better in terms of residuals the “restricted” (the model with

the only intercept). More precisely, the pseudo–R2is calculated as one minus the

ratio between RASW and TASW. In essence, pseudo-R2can be considered as a local

measure of goodness of ﬁt for a particular quantile as it measures the contribute of

the selected regressors to the explanation of the dependent variable with respect to the

trivial model without regressors. With an R2,pseudo–R2values range between 0 and

1: the more it is close to 1, the more the model with regressors can be considered a

good model (i.e., the θth conditional quantile function is signiﬁcantly altered by the

effect of the covariates). If on one hand the pseudo–R2will always be smaller than

the R2and a direct comparison with R2in PLS-PM is not feasible, on the other hand

pseudo–R2is useful to the end of identifying locations in the distribution of outcome

variable where model may show a better/worse ﬁt (for example, if the model ﬁts in the

tail, there’s not guarantee that it ﬁts well anywhere else) (Kováˇc and Želinský 2013).

For the sake of generality, we consider below the case of multi-block QC-PM. Once

convergence is reached and composites are obtained, several QRs are carried out in the

inner part of the model, according to the number of considered quantiles. Such QRs

estimate the path coefﬁcients linking endogenous and exogenous constructs. As stated

in Sect. 2.3, a generic endogenous construct, ξk, is linked to the related explanatory

constructs, ξk,k∈Jk, where Jk={k:ξkis predicted by ξk}. For the convenience

of the reader, we report again Eq. (11) that describes this relationship:

ξik(θ)=βk0(θ)+

k∈Jk

βkk(θ)ξik (θ)+ζik(θ).(14)

Since ˆ

ζk(θ) represents the residuals of the model explaining the kth endogenous

construct, for each considered quantile θ,RASW is the corresponding minimizer:

RASW

k(θ)=

ˆ

ζk(θ)≥0

θ

ˆ

ζk(θ)

+

ˆ

ζk(θ)<0

(1−θ)

ˆ

ζk(θ)

,(15)

where positive and negative residuals are asymmetrically weighted, respectively with

weights equal to θand (1−θ).TheTASW is instead:

TASW

k(θ)=

ξk≥θ

θ|ξk−θ|+

ξk<θ

(1−θ)|ξk−θ|.(16)

123

P. Dolce et al.

Therefore, the obtained pseudo–R2can be computed as follows:

pseudo−R2

k(θ)=1−RASW

k(θ)

TASW

k(θ).(17)

The pseudo—R2ranges between 0 and 1, since RASW (θ)is always less than

or equal TASW(θ). It indicates, for each considered quantile, whether the presence

of the covariates inﬂuences the correspondent conditional quantile of the response

variable. It is worth noticing that the pseudo-R2is not a symmetric measure, assuming a

different value when the role of the variables is reversed. The index, computed for each

inner equation, measures the amount of variability of a given endogenous construct

explained by its explanatory constructs. The average of all the pseudo—R2indexes

provides a synthesis of the evaluations regarding the inner model.

As regards to the outer model, the assessment is carried out considering the relations

between each construct and its own MVs and the estimate of the error term vector ˆk.

The pseudo—R2can be used to assess convergent validity for each outer model,

applying for each block the average of the pseudo—R2indexes of the related MVs,

and can be used for assessing the quality of the whole outer models computing a

weighted average of all measures over all the blocks, using the number of MVs for

each block as weights. In particular, a pseudo−R2

pk(θ ) is computed on the basis of

Eq. (10) considering the kth block, for each MV and for each considered quantile

θ. This measure, called Communalitypk (θ), with p=1,...,P,k=1,...,K,

indicates how much of the MV’ variance can be explained by the corresponding

component. The communality of the block kresults:

Communalityk(θ) =1

Pk

Pk

pk=1

pseudo−R2

pk (θ).(18)

The quality of the whole outer model is ﬁnally obtained through the average of the

Communality indexes of all the blocks.

It should be noted that, as described in Sect. 2.3, if the quantile in the iterative

procedure is set equal to the median to solve the measurement invariance issue, the

assessment of the outer model is limited to the quantile θ=0.5.

Another measure of assessment is provided by the Redundancy index, which is

deﬁned only for the endogenous block. Please note that low levels of redundancy does

not necessarily mean that the structural model is poorly speciﬁed. This index only

combines the evaluations of both the inner model and the outer model (Lohmöller

1989;Hairetal.2011), thus can be used as a measure of assessment of the global

model, but speciﬁc measures for the two sub-models are also needed.

Redundancy can be computed for each endogenous MV or for the whole block, as

an average of the redundancies of its MVs. For each MV of the endogenous block xpk,

Redundancy is computed multiplying its Communality measure by the pseudo—R2

obtained in the corresponding inner model:

Redundancypk (θ) =Communalitypk(θ ) ×pseudo−R2

k(θ) (19)

123

Quantile composite-based path modeling

The overall Redundancy of the block kis obtained averaging the measures associated

to the MVs of the endogenous block:

Redundancyk(θ) =

Pk

pk=1

Redundancypk (θ)

Pk

.(20)

Possible variants could exploit different goodness of ﬁt measures available in the

quantile framework as well as the amendment of some assessment indexes proposed

in PLS-PM literature (Benitez et al. 2020;Hairetal.2020,2017,2019; Amato et al.

2004).

Two main approaches can be used to evaluate the statistical signiﬁcance of the

coefﬁcients related to the different quantiles. The ﬁrst approach exploits the asymptotic

normal distribution of QR estimators (Koenker and Basset 1978). Such estimators

are indeed asymptotically normal, with variance–covariance matrix depending on the

model assumptions. Independent and identically distributed errors, independent and

not identically distributed errors, and dependent errors determine obvious differences

in the variance–covariance matrix. The alternative resorts to bootstrap theory (Efron

and Tibshirani 1993), commonly used both in PLS-PM and QR. Bootstrap permits

to estimate the standard errors of the coefﬁcients using a distribution free approach.

QR literature counts several bootstrap procedures, the xy-pair method (Parzen et al.

1994) being the simplest and widespread solution. It is also known as design matrix

bootstrap. Bootstrap standard errors are exploited to compute conﬁdence intervals

and to perform hypothesis tests. Resampling methods are also useful in case of small

samples. For example, a jackknife approach could be used to estimate the standard

errors of the coefﬁcients. Statistical tests could be also easily introduced in QC-PM to

test if coefﬁcients at different quantiles can be considered statistically different (Gould

1997). We will not expand on the details here. Readers who are interested can consults

Davino et al. (2013) and Furno and Vistocco (2018) for a thorough explanation and

all bibliographic indications on inference in QR.

4 A simulation study

A Monte Carlo simulation study has been designed to investigate the performance of

QC-PM considering different scenarios. As already stated above, QC-PM and PLS-

PM are complementary rather than alternative approaches. Therefore, it is advisable

to use both methods in real data application: in many cases, indeed, the focus on con-

ditional mean is not sufﬁcient and a more comprehensive look at the entire conditional

distribution is necessary.

Nevertheless, since there is not a real competing method to QC-PM, namely a

composite-based approach focusing on conditional quantile, we chose to investigate

properties of the QC-PM estimators comparing them with PLS-PM results. It is well

known that PLS-PM produces consistent estimates for composite-based model param-

eters and performs well in the considered scenarios.

123

P. Dolce et al.

4.1 Simulation design and data generation

We operate in the context proposed by Schlittgen et al. (2020), generating data from

composite-based populations using the cbsem R package (Schlittgen 2019). Deter-

mination of the covariance matrix in the procedure proposed by Schlittgen et al.

(2020) can be derived considering three scenarios, named formative–formative (ff),

formative–reﬂective (fr) and reﬂective–reﬂective (rr). We used the scenario rr, where

outer weights are not required. The procedure requires path coefﬁcients, loadings and

variances and covariances of exogenous constructs. The parameters must be chosen

such that sets of weights can be found to fulﬁll the equation deﬁning the covariance

matrix (see the Vignette from the cbsem R package for further details) (Schlittgen

2019).

We set the relationships in the model assuming the theoretical path model repre-

sented in Fig. 3and then we simulated data for the given values of the parameters.

The postulated inner model is:

ξ2=β20 +β21ξ1+ζ2

ξ3=β30 +β31ξ1+β32ξ2+ζ3.

The outer model can be instead written as:

x11 =α11 +λ11ξ1+11

x21 =α21 +λ21ξ1+21

x31 =α31 +λ31ξ1+31

x12 =α12 +λ12ξ2+12

x22 =α22 +λ22ξ2+22

x32 =α32 +λ32ξ2+32

x13 =α13 +λ13ξ3+13

x23 =α23 +λ23ξ3+23

x33 =α33 +λ33ξ3+33.

The simulation study considered different scenarios both in the outer and in the

inner part of the model. Moreover, the effect of sample size and non-normality dis-

tributions were also considered. The cases of homogeneous blocks (no differences

among loadings) and heterogeneous blocks (large differences among loadings) were

used to assess the outer model. For the inner model, the effect of different correlation

levels between constructs was investigated.

That, in short, are the design-factors we considered for the simulation study: sample

size, homogeneity of blocks, size effect and variable distributions. In particular, we

used the following levels for each design-factor.

Sample sizes. We set n∈{50,100,200,300,400,500,1000}.Thevaluen=50

allows us to investigate the performance of the method in case of application with

small sample size. The other values used for nare instead typically encountered

123

Quantile composite-based path modeling

Table 1 Levels of the design-factors considered in the simulation design

in research applications, the largest values being useful to study the asymptotic

properties of QC-PM estimators.

Loadings.Wesetλpk=0.9(p=1,2,3;k=1,2,3)for homogeneous blocks

and loadings λ1k=0.9,λ

2k=0.6,λ

3k=0.3(k=1,2,3)for heterogeneous

blocks, these last values to reﬂect very large differences among loadings.

Path coefﬁcients. We set, for all inner relationships, βkk∈{0.2,0.3,0.4,0.5}to

take into account different levels of correlations among constructs.

Skewness and Kurtosis. We set both equal to 0 for normal distribution, while

skewness =2 and kurtosis =6 were used for mimicking exponential distributions.

The total number of scenarios obtained from the combination of the above described

levels of the design-factors is equal to 112 (7 sample sizes ×2 loadings ×4 path

coefﬁcients ×2 skewness and kurtosis). For each considered scenario, we generated

500 replications.

A synthesis schema of the simulation design is offered in Table 1.

The R software environment (R Core Team 2020) for statistical computing were

used to generate and analyze data.

Data with non-normal distribution were generated using the technique described in

Vale and Maurelli (1983), who extended the method proposed by Fleishman (1978).

The performance of QC-PM was assessed considering the Relative Bias (RBias)

and the Root Mean Square Error (RMSE) of the estimates on the basis of the 500

replications. RBias was computed as:

RBias =1

S

S

s=1

(ˆ

θs−θ)

θs=1,2,...,500

123

P. Dolce et al.

where Srepresents the number of replications in the simulation, ˆ

θsis the estimate

for the generic replication, and θis the corresponding population parameter. Instead,

RMSE was computed as:

1

S

S

s=1

(ˆ

θs−θ)2s=1,2,...,500

Clearly, because MSE =Var(ˆ

θ) +bias(ˆ

θ)2,RMSE entails information on both

bias and variability of the estimates.

4.2 Simulation results

In presenting simulation results, we choose to focus on the effect of sample size on

the bias and efﬁciency, and, consequently, on the consistency of the estimates. Since

the number of considered scenarios (112) is too large, in the following we focus only

on the more interesting and enlightening scenarios. In particular, we present results

for all the considered levels of loadings in the three blocks of variables for PLS-PM

and for QC-PM at quantile θ=0.5 (measurement model was indeed restricted to the

median regression model).

Additionally, since there are interesting differences between homogeneous blocks

and heterogeneous blocks, we reported results for both the cases.

For the inner model, we present results for all the three path coefﬁcient estimates

obtained through PLS-PM and QC-PM at quantiles θ∈{0.25,0.5,0.75}, but only for

βkk=0.3, a small/moderate effect. Indeed, as known in literature (Tenenhaus 2008),

results are similar as correlations between composites increases.

The following subsections details results according to the three considered factors,

that is sample size, level of heterogeneity within blocks and degree of skew-

ness/kurtosis of the distribution. The resulting scenarios are organized in three groups:

–Group 1, focusing at the effect of sample size,

–Group 2, focusing at the effect of the level of heterogeneity within blocks,

–Group 3, focusing at the effect of the degree of skewness/kurtosis of the distribu-

tion.

The effect of sample size

The ﬁrst set of considered scenarios (from hereinafter Group 1) allows us to focus

only on the effect of sample size, neutralising, as far as possible the effect of the other

factors: outer blocks are considered homogeneous with high correlations among the

MVs (all λvalues set equal to 0.9), data generated from normal distributions and path

coefﬁcients set equal to 0.3. These settings result in seven scenarios for Group 1.

Figure 4shows the distribution of loadings for such scenarios for the outcome block

(Block 3). The results for the other two blocks are not shown since they do not differ

much from those of Block 3. The two columns of Fig. 4depict the results for PLS-PM

and QC-PM (recall that we set θ=0.5 for the outer model). The seven scenarios of the

group, corresponding to the different values of the sample size n, are represented on

123

Quantile composite-based path modeling

Fig. 4 Diminutive distribution charts of the loadings of scenarios belonging to Group 1 with

n∈{50,100,200,300,400,500,1000}, homogeneous blocks, path coefﬁcients equal to 0.3 and Normal

distributed data

the horizontal axis. Finally, the rows refer to the coefﬁcients λ13,λ23,λ33 associated

with the MVs of the block.

Data are represented through diminutive distribution charts (Rudis 2019), a variant

of boxplots aimed to visualize distribution characteristics: each box ranges from the

10th percentile to the 90th percentile, the triangle indicates the mean value of the

distribution and the circle the median. Both PLS-PM and QC-PM show distributions

that converge to the true parameter value (horizontal line at the value 0.9) as nincreases.

Moreover, the variability of the estimates is rather small, although for QC-PM slightly

higher. Regarding bias, it is interesting to note, that for the quantile model, the 10th

percentile has a smaller distance from the true parameter than the PLS-PM, whatever

the value of n.

Table 2reports the values of RBias and RMSE for each loading distribution. In

particular, the table shows the average values of RBias and RMSE for each block

(columns) and each scenario (rows). The table is row-partitioned according to the

model. The values of RBias and RMSE for the loading estimates do not change sub-

stantially between QC-PM and PLS-PM when blocks are homogeneous and variables

are normally distributed, with large sample sizes (at least 300 observations). At the

lowest considered sample size (n=50), QC-PM always shows higher bias compared

to PLS-PM and such behavior is also conﬁrmed combining bias with variability of the

estimates (see RMSE columns).

The distribution of the path coefﬁcients (Fig. 5) is also affected by sample size and

shows less marked differences between the two methods (the comparison between

Figs. 4and 5must be done with caution because the two ﬁgures have different vertical

scales). As in Fig. 4, each diminutive distribution chart refers to a scenario (horizontal

axis) but here the three rows refer to the path coefﬁcient (β21,β31,β32). Each vertical

123

P. Dolce et al.

Table 2 Values of RBias and RMSE for loadings of the scenarios belonging to Group 1 with n∈

{50,100,200,300,400,500,1000}, homogeneous blocks, path coefﬁcients equal to 0.3 and Normal dis-

tributed data (levels of the varying factor, sample size, are in the ﬁrst column)

RBias RMSE

Block X1 Block X2 Block X3 Block X1 Block X2 Block X3

PLS-PM

n=50 0.035 0.036 0.037 0.040 0.042 0.039

n=100 0.039 0.039 0.038 0.038 0.038 0.037

n=200 0.038 0.038 0.038 0.036 0.036 0.035

n=300 0.038 0.038 0.038 0.035 0.035 0.035

n=400 0.038 0.038 0.038 0.035 0.035 0.035

n=500 0.039 0.038 0.038 0.035 0.035 0.035

n=100 0.038 0.038 0.038 0.035 0.035 0.035

QC-PM

n=50 0.043 0.039 0.044 0.066 0.082 0.066

n=100 0.040 0.043 0.042 0.056 0.052 0.050

n=200 0.041 0.039 0.041 0.044 0.042 0.043

n=300 0.039 0.039 0.040 0.040 0.040 0.040

n=400 0.039 0.039 0.039 0.039 0.039 0.039

n=500 0.040 0.039 0.039 0.039 0.038 0.038

n=1000 0.039 0.039 0.039 0.036 0.036 0.036

Results for PLS-PM and QC-PM estimated at θ=0.5 are reported each scenario

Fig. 5 Diminutive distribution charts of the path coefﬁcients of scenarios belonging to Group 1 with

n∈{50,100,200,300,400,500,1000}, homogeneous blocks, path coefﬁcients equal to 0.3 and Normal

distributed data

123

Quantile composite-based path modeling

Table 3 Values of RBias and RMSE for the path coefﬁcients of scenarios belonging to Group 1 with

n∈{50,100,200,300,400,500,1000}], homogeneous blocks, path coefﬁcients equal to 0.3 and Normal

distributed data (the levels of the varying factor, the sample size, are in the ﬁrst column)

RBias RMSE

β21 β31 β32 β21 β31 β32

PLS-PM

n=50 −0.031 −0.007 −0.016 0.120 0.121 0.122

n=100 −0.039 −0.037 −0.053 0.092 0.091 0.088

n=200 −0.062 −0.047 −0.068 0.067 0.066 0.069

n=300 −0.074 −0.046 −0.062 0.058 0.051 0.055

n=400 −0.068 −0.055 −0.049 0.050 0.047 0.048

n=500 −0.075 −0.062 −0.052 0.047 0.044 0.044

n=1000 −0.066 −0.053 −0.065 0.035 0.033 0.035

QC-PM θ=0.25

n=50 0.008 −0.015 0.026 0.163 0.160 0.163

n=100 −0.033 −0.033 −0.053 0.123 0.122 0.123

n=200 −0.057 −0.061 −0.057 0.090 0.089 0.094

n=300 −0.074 −0.033 −0.058 0.079 0.072 0.074

n=400 −0.083 −0.064 −0.054 0.071 0.065 0.062

n=500 −0.066 −0.058 −0.059 0.063 0.058 0.060

n=1000 −0.063 −0.050 −0.072 0.043 0.043 0.048

QC-PM θ=0.5

n=50 0.014 0.020 0.021 0.160 0.150 0.154

n=100 −0.024 −0.028 −0.025 0.118 0.111 0.115

n=200 −0.059 −0.043 −0.069 0.083 0.079 0.084

n=300 −0.066 −0.030 −0.066 0.071 0.064 0.070

n=400 −0.062 −0.050 −0.045 0.060 0.059 0.060

n=500 −0.082 −0.059 −0.051 0.060 0.053 0.055

n=1000 −0.066 −0.052 −0.063 0.043 0.039 0.042

QC-PM θ=0.75

n=50 −0.023 0.018 0.013 0.152 0.161 0.161

n=100 −0.025 −0.044 −0.053 0.128 0.123 0.119

n=200 −0.087 −0.027 −0.084 0.096 0.088 0.093

n=300 −0.072 −0.054 −0.063 0.077 0.074 0.071

n=400 −0.069 −0.048 −0.060 0.068 0.063 0.065

n=500 −0.074 −0.066 −0.055 0.063 0.061 0.058

n=1000 −0.073 −0.057 −0.063 0.047 0.043 0.045

block refers to a model: PLS-PM and QC-PM for θ∈{0.25,0.5,0.75}. It is interesting,

in the case of the path coefﬁcients, the correspondence of the mean and median values

of the estimates with the true value of the parameter (horizontal line at 0.3).

Table 3shows, for each horizontal block, the results obtained from the correspond-

ing model in terms of RBias and RMSE values for the path coefﬁcient estimates, in the

123

P. Dolce et al.

Fig. 6 Diminutive distribution charts of the loadings of scenarios belonging to Group 2 with n∈

{50,100,200,300,400,500,1000}, heterogeneous blocks, path coefﬁcients equal to 0.3 and Normal dis-

tributed data

seven scenarios. PLS-PM and QC-PM perform very similar both in terms of bias and

efﬁciency. In general, as sample size increases, RMSE decreases for all path coefﬁcient

estimates. The RMSE is slightly higher in QC-PM at quantile θ=0.25 and θ=0.75.

The effect of the level of heterogeneity

The convergence of estimates as nincreases is also conﬁrmed in the case of hetero-

geneous blocks. Figure 6shows the diminutive distribution charts of the loadings of

the scenarios belonging to the second group of scenarios, Group 2, still encompassing

normal distributions and with path coefﬁcients equal to 0.3 but with MVs differently

correlated to each construct (0.9, 0.6 and 0.3).

In this group of scenarios, unlike Group 1, the distribution of loadings gets closer

to the true population parameter especially for λ13. However, the heterogeneity of the

blocks has a distortive effect on the estimates of the path coefﬁcients (Fig. 7) both in

case of PLS-PM and QC-PM. Tables 4and 5conﬁrm decreasing bias and variability

as nincreases. In this case but both RBias and RMSE are always higher than the

homogeneous case.

The effect of the degree of skewness/kurtosis

The third group of scenarios, Group 3, worth to be mentioned aims to show the effect

of an asymmetric distribution in the data generation process. Also this group includes

seven scenarios (varying the sample size), with homogeneous loadings (equal to 0.9)

and path coefﬁcients equal to 0.3. Data are here generated by an Exponential distribu-

tion. Convergence is conﬁrmed for this group as well, but there are differences between

123

Quantile composite-based path modeling

Fig. 7 Diminutive distribution charts of the path coefﬁcients of scenarios belonging to Group 2 with

n∈{50,100,200,300,400,500,1000}, heterogeneous blocks, path coefﬁcients equal to 0.3 and Normal

distributed data

Table 4 Values for the RBias and RMSE for the loadings (averages for each block) of the scenarios belonging

to Group 2 with n∈{50,100,200,300,400,500,1000}, heterogeneous blocks, path coefﬁcients equal to

0.3 and Normal distributed data (the levels of the varying factor, the sample size, are in the ﬁrst column)

RBias RMSE

Block X1 Block X2 Block X3 Block X1 Block X2 Block X3

PLS-PM

n=50 0.307 0.295 0.282 0.208 0.205 0.186

n=100 0.265 0.282 0.262 0.160 0.164 0.153

n=200 0.274 0.252 0.272 0.141 0.138 0.138

n=300 0.274 0.274 0.279 0.134 0.133 0.133

n=400 0.272 0.273 0.279 0.130 0.129 0.129

n=500 0.282 0.275 0.281 0.129 0.129 0.128

n=1000 0.280 0.283 0.284 0.124 0.125 0.124

QC-PM

n=50 0.304 0.297 0.296 0.248 0.247 0.230

n=100 0.258 0.277 0.256 0.188 0.193 0.178

n=200 0.270 0.244 0.261 0.155 0.153 0.151

n=300 0.266 0.271 0.272 0.146 0.145 0.144

n=400 0.267 0.268 0.275 0.138 0.138 0.137

n=500 0.276 0.277 0.276 0.136 0.136 0.134

n=1000 0.277 0.282 0.283 0.128 0.129 0.128

For each scenario, results for PLS-PM and QC-PM estimated at θ=0.5areshown

123

P. Dolce et al.

Table 5 Path coefﬁcients RBias and RMSE values of scenarios belonging to Group 2 with

n=[50,100,200,300,400,500,1000], heterogeneous blocks, path coefﬁcients equal to 0.3 and Nor-

mal distributed data (the levels of the varying factor, the sample size, are in the ﬁrst column)

RBias RMSE

β21 β31 β32 β21 β31 β32

PLS-PM

n=50 −0.015 −0.064 −0.061 0.117 0.117 0.120

n=100 −0.124 −0.130 −0.131 0.095 0.095 0.100

n=200 −0.186 −0.145 −0.167 0.084 0.075 0.080

n=300 −0.186 −0.191 −0.170 0.075 0.076 0.073

n=400 −0.213 −0.176 −0.180 0.078 0.070 0.070

n=500 −0.207 −0.188 −0.170 0.074 0.069 0.065

n=1000 −0.217 −0.172 −0.183 0.070 0.059 0.061

QC-PM θ=0.25

n=50 −0.009 0.010 −0.047 0.167 0.167 0.165

n=100 −0.137 −0.120 −0.119 0.129 0.128 0.133

n=200 −0.197 −0.145 −0.166 0.111 0.098 0.104

n=300 −0.183 −0.194 −0.182 0.092 0.092 0.091

n=400 −0.226 −0.183 −0.174 0.092 0.084 0.081

n=500 −0.220 −0.198 −0.169 0.087 0.080 0.076

n=1000 −0.218 −0.175 −0.183 0.076 0.065 0.067

QC-PM θ=0.5

n=50 −0.012 0.019 −0.063 0.148 0.158 0.156

n=100 −0.108 −0.097 −0.131 0.122 0.126 0.128

n=200 −0.190 −0.142 −0.164 0.100 0.091 0.096

n=300 −0.179 −0.194 −0.171 0.085 0.088 0.085

n=400 −0.216 −0.178 −0.183 0.087 0.081 0.080

n=500 −0.210 −0.196 −0.166 0.082 0.080 0.073

n=1000 −0.213 −0.173 −0.182 0.074 0.063 0.065

QC-PM θ=0.75

n=50 −0.049 −0.004 −0.082 0.161 0.162 0.167

n=100 −0.133 −0.133 −0.149 0.128 0.131 0.133

n=200 −0.196 −0.152 −0.186 0.106 0.100 0.103

n=3000 −0.194 −0.198 −0.175 0.095 0.095 0.092

n=4000 −0.219 −0.174 −0.185 0.092 0.083 0.083

n=500 −0.211 −0.182 −0.168 0.085 0.078 0.076

n=1000 −0.219 −0.175 −0.187 0.077 0.067 0.068

123

Quantile composite-based path modeling

Fig. 8 Diminutive distribution charts of the loadings of scenarios belonging to Group 3 with n∈

{50,100,200,300,400,500,1000}, homogeneous blocks, path coefﬁcients equal to 0.3 and exponential

distributed data

the PLS-PM and QC-PM. As regards loadings (Fig. 8), the distributions are always

more variable in QC-PM than in PLS-PM. Nevertheless QC-PM always manages to

capture the true parameter value within the 90% of central values. Considering the

average values of RBias and RMSE in all blocks (Table 6), better performance of QC-

PM in terms of efﬁciency and unbiasedness is conﬁrmed especially for larger sample

sizes. Looking at the distribution of path coefﬁcient estimates (Fig. 9), we note the

ability of QC-PM to capture the positive skewness of the distribution used to generate

the data: the variability of the estimates is smaller for θ= 0.25 and larger in the right

tail (θ= 0.75) and the parameter is overestimated at the lowest quantile and underes-

timated at the highest quantile. For θ= 0.5 both methods provide unbiased estimates.

The RBias values in Table 7conﬁrm the reduction in the bias as nincreases. More

complex is the interpretation of the RMSE values, which combine bias and variability:

the estimates at quantile 0.25, for example, are more biased but less variable than those

at quantile 0.5, so the RMSE is affected by a kind of trade-off between variability and

bias.

5 An application on Chronic Kidney Disease in diabetic patients

QC-PM potentialities are described through an artiﬁcial dataset which simulates a

study on CKD in diabetic patients. The original study was proposed by Wang et al.

(2016) who used real data to examine the potential risk factors of CKD through a

quantile approach to factor-based SEM. In particual, data were generated mimicking

the model and estimates obtained by Wang et al. (2016), since the original data were

123

P. Dolce et al.

Fig. 9 Diminutive distribution charts of the path coefﬁcients of Group 3 of scenarios belonging to

Group 3 with n∈{50,100,200,300,400,500,1000}, homogeneous blocks, path coefﬁcients equal to

0.3 and exponential distributed data

Table 6 Values of RBias and RMSE for the loading (averages for each block) of the scenarios belonging to

Group 3 with n∈{50,100,200,300,400,500,1000}, homogeneous blocks, path coefﬁcients equal to 0.3

and Exponential distributed data (the levels of the varying factor, the sample size, are in the ﬁrst column)

BIAS RMSE

Block X1 Block X2 Block X3 Block X1 Block X2 Block X3

PLS-PM

n=50 0.033 0.034 0.034 0.051 0.047 0.045

n=100 0.037 0.039 0.037 0.039 0.040 0.039

n=200 0.038 0.038 0.037 0.037 0.037 0.036

n=300 0.038 0.038 0.038 0.036 0.036 0.036

n=400 0.038 0.038 0.038 0.036 0.036 0.036

n=500 0.039 0.038 0.038 0.036 0.035 0.035

n=1000 0.039 0.038 0.038 0.035 0.035 0.035

QC-PM

n=50 0.032 0.031 0.032 0.081 0.086 0.085

n=100 0.030 0.030 0.032 0.058 0.058 0.059

n=200 0.029 0.029 0.029 0.047 0.046 0.046

n=300 0.029 0.030 0.030 0.040 0.040 0.041

n=400 0.029 0.029 0.030 0.037 0.037 0.037

n=500 0.031 0.029 0.030 0.036 0.035 0.036

n=1000 0.030 0.029 0.029 0.032 0.031 0.031

For each scenario, results for PLS-PM and QC-PM estimated at θ=0.5areshown

123

Quantile composite-based path modeling

Table 7 Values of RBias and RMSE for the Path coefﬁcients of the scenarios belonging to Group 3 with n∈

{50,100,200,300,400,500,1000}, homogeneous blocks, path coefﬁcients equal to 0.3 and Exponential

distributed data (the levels of the varying factor, the sample size, are in the ﬁrst column)

BIAS RMSE

β21 β31 β32 β21 β31 β32

PLS-PM

n=50 −0.007 −0.032 0.012 0.139 0.148 0.144

n=100 −0.035 −0.028 −0.061 0.108 0.111 0.114

n=200 −0.073 −0.040 −0.057 0.080 0.079 0.082

n=300 −0.055 −0.054 −0.067 0.066 0.067 0.069

n=400 −0.070 −0.054 −0.045 0.058 0.057 0.058

n=500 −0.055 −0.054 −0.055 0.053 0.053 0.053

n=1000 −0.070 −0.055 −0.051 0.041 0.039 0.038

QC-PM θ=0.25

n=50 −0.349 −0.376 −0.348 0.163 0.163 0.160

n=100 −0.406 −0.403 −0.395 0.150 0.149 0.150

n=200 −0.430 −0.432 −0.410 0.144 0.144 0.139

n=300 −0.436 −0.430 −0.426 0.141 0.140 0.138

n=400 −0.443 −0.430 −0.435 0.139 0.136 0.137

n=50 −0.428 −0.435 −0.437 0.134 0.137 0.136

n=1000 −0.449 −0.436 −0.429 0.137 0.134 0.132

QC-PM θ=0.50

n=50 0.039 −0.001 0.049 0.175 0.169 0.164

n=100 −0.010 −0.013 −0.052 0.133 0.124 0.129

n=200 −0.045 −0.031 −0.048 0.092 0.085 0.088

n=300 −0.035 −0.065 −0.064 0.080 0.075 0.076

n=400 −0.067 −0.055 −0.059 0.066 0.065 0.065

n=500 −0.048 −0.052 −0.062 0.062 0.059 0.061

n=1000 −0.071 −0.055 −0.059 0.047 0.042 0.042

QC-PM θ=0.75

n=50 0.446 0.303 0.401 0.277 0.247 0.273

n=100 0.358 0.349 0.273 0.217 0.210 0.193

n=200 0.351 0.347 0.300 0.171 0.169 0.159

n=300 0.363 0.322 0.295 0.154 0.146 0.143

n=400 0.335 0.307 0.329 0.140 0.130 0.133

n=500 0.370 0.327 0.304 0.139 0.128 0.124

n=1000 0.357 0.330 0.315 0.124 0.115 0.111

123

P. Dolce et al.

Fig. 10 Theoretical model for Chronic Kidney Disease data, following Wang et al. (2016)

not available. Even if we use artiﬁcial data, the involved variables and their relations

are in line with the study by Wang et al., allowing a clear practical interpretation of

results. Both studies are quantile based, even if Wang et al. exploited factor-based

SEM while we focus on composite-based path modeling.

The main objective of this section is to show QC-PM in action, stressing its comple-

mentarity with the traditional methods for composite-based path modeling (PLS-PM),

which focus only on conditional means. The advantage of using artiﬁcial data allows

us to obtain a scenario where relationships among variables change with quantiles

(i.e., there are different relations considering the different parts of the dependent vari-

able distributions). Our main objective was not recovering parameters, but evaluate if

QC-PM is able to detect this heterogeneity in the variable relationships.

5.1 Data description

This application aims to study the effect of some risk factors on CKD. We started

from the original path model in Wang et al. (2016) and removed the non signiﬁcant

predictors. In particular, the study investigates Type 2 diabetic patients who might have

experienced CKD. Data consist of 300 patients. Diagnosis and staging of CKD were

based on urinary albumin-creatinine ratio (ACR) and estimated glomerular ﬁltration

rate (eGFR). These two variables are the MVs of the outcome block named Kidney

disease. The considered risk factors were Blood pressure and Lipid. The former was

measured by systolic blood pressure (SBP) and diastolic blood pressure (DBP), while

the latter by total cholesterol (TC), high-density lipoprotein (HDL), and triglycerides

(TG). Therefore, the inner model underpinning our design and subsequent analyses

consists of two exogenous constructs, Blood pressure (ξ1) and Lipid (ξ2), and one

endogenous construct, Kidney disease (ξ3). Figure 10 depicts the corresponding path

diagram.

Data generation process exploits the classical covariance-based approach for SEM.

As above mentioned, the results of the original study by Wang et al. (2016) represent the

123

Quantile composite-based path modeling

Fig. 11 Theoretical models for the simulated data: patients with low (a) and high (b) severity of kidney

disease

starting point for the generation of artiﬁcial data. Therefore, the parameters of the SEM

are set to the values of the model estimated in that study. In particular, Blood pressure

was positively correlated with the severity of Kidney disease, and the correlation was

stronger for higher quantiles. Lipid was found to be positively correlated with Kidney

disease and, also for this variable, the correlation was stronger for higher quantiles.

The resulting variance–covariance matrix characterizes the multivariate distribution

used to generate data. The generation process was carried out using the software EQS

6.1 (Bentler 2006), computation and analysis using R (R Core Team 2020).

Heterogeneity in the inner model was introduced in the artiﬁcial data assuming that

the exogenous constructs exert a differenteffect on the different parts of the endogenous

construct distribution. It results that the path coefﬁcients differ across quantiles. In

order to generate data with these features, we supposed that two different populations

exist, and for each population the model parameters are different. In particular we

divided the patients in two groups. The ﬁrst group was represented by patients with low

severity of kidney disease, and thus the relationship between kidney disease and each of

the two exogenous constructs is weaker (Fig. 11a). The second group was represented

by patients with high severity of kidney disease: in such a case, the relationship between

kidney disease and each exogenous constructs is stronger (Fig. 11b). In order to focus

only on heterogeneity in the inner model, as in Wang et al. (2016), the loadings between

constructs and the corresponding MVs were set all equal to 1 for both the populations.

The simulation procedure was articulated in the following three steps:

1. data were generated from a multivariate normal population, X∼N(0,), where

is the population covariance matrix using the values in Fig. 11a for the parameters

of the model. The sample size was set equal to 300. For each of the two MVs of

Kidney disease block (ACR and eGFR), we removed the observations higher than

the quantile 0.6 of the same MV. In other words, once the observations were sorted

in non decreasing order with respect to the values on each MV, we kept the ﬁrst 60%

of observations, i.e. the ﬁrst 180 observations. Then, the MVs of the endogenous

block were transformed in order to have realistic values ranging from 1 to 6 for

ACR, and from 50 to 90 for eGFR;

2. data were generated from a multivariate normal population, X∼N(0,), where

is the population covariance matrix using the values in Fig. 11b for the param-

eters of the model. The sample size was set again equal to 300. For each of the

123

P. Dolce et al.

Table 8 Check for block unidimensionality and internal consistency

MVs Eig. 1st Eig. 2nd C.alpha DH.rho

Blood pressure 2 1.73 0.266 0.847 0.99

Lipid 3 2.34 0.359 0.860 0.99

CKD 2 1.79 0.205 0.886 0.99

two MVs of the Kidney disease block, we kept the 40% of central observations

around the MV mean (100 units), namely 20% on the left-neighborhood of the

mean and the other 20% on its right-neighborhood. The two resultings MVs were

transformed so to have values ranging from 6 to 10 for ACR, and from 90 to 120

for eGFR;

3. the two data sets generated at the previous steps were stacked obtaining an unique

data set with sample size 300. Note that the MVs of the exogenous blocks in the

two models come from the same population, while obviously the same does not

hold for the MVs of the endogenous block.

According to such data generation process, we expect that QC-PM provides esti-

mates for the parameters of model (a) for quantiles smaller than 0.6, and estimates for

the parameters of model (b) for quantiles larger than 0.6.

5.2 Results

This section describes a complete application of QC-PM, from the preliminary analysis

to the evaluation of the goodness of ﬁt. The aim is to illustrate the potential of the

method along with the guidelines for the interpretation of the results.

An initial inspection of unidimensionality and internal consistency of blocks was

performed. To check unidimensionality, we carried out a principal component analysis

for each block of MVs. If a block is unidimensional, the ﬁrst eigenvalue is expected

to be the only one greater than 1 and much higher than the second one.

The internal consistency of each block of MVs was instead evaluated through the

Cronbach’s α. Such index assumes equal population covariances among the indicators

of one block, and such assumption is likely not met in empirical research. However, this

index can be used as a lower bound for reliability (Benitez et al. 2020). We also consider

Dijkstra–Henseler’s ρ(Dijkstra and Henseler 2015) to evaluate composite reliability.

Table 8shows that all the blocks are unidimensional and internally consistent. The

method used to obtain the artiﬁcial data in Sect. 5provides equal loadings and therefore

the values of Dijkstra–Henseler’s ρ(DH.rho) are all equal to 0.99.

Model parameters were estimated through PLS-PM and QC-PM by setting the

quantile in the iterative procedure to the median and considering a dense grid of

quantiles in the inner model. The two panels in Fig. 12 show the different QC-PM

path coefﬁcient estimates across quantiles and the PLS-PM path coefﬁcient estimates:

Fig. 12a depicts the path coefﬁcient connecting Blood pressure to Kidney disease,

while Fig. 12b refers to Lipid. In particular, quantiles are represented on the horizontal

axis and coefﬁcients on the vertical axis. The horizontal solid lines represent the PLS-

123

Quantile composite-based path modeling

●

●

●●

●

●

●

●

●●

●

●●

●●

●

●

●

0.20.40.60.8

0.1 0.2 0.3 0.4 0.5 0.6

Blood pressure

quantile

path coefficients

(a)

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

0.2 0.4 0.6 0.8

0.1 0.2 0.3 0.4 0.5 0.6

Lipid

quantile

path coefficients

(b)

Fig. 12 Path coefﬁcient estimates (y-axis) across quantiles (x-axis) linking Blood pressure (a)andLipid

(b)toKidney disease. The horizontal solid lines represent the PLS-PM estimates and the vertical dotted

line drawn at quantile 0.6 refer to the threshold used in the data generation process

Table 9 Path coefﬁcients and

corresponding standard error

from a classical PLS-PM (ﬁrst

row) and from a QC-PM applied

on the inner model for a selected

set of quantiles

(θ∈{0.25,0.50,0.75})

Quantile Blood pressure Lipid

(±standard error) (±standard error)

PLSPM 0.320* (±0.051) 0.426* (±0.048)

0.25 0.189* (±0.062) 0.263* (±0.096)

0.50 0.291* (±0.090) 0.547* (±0.101)

0.75 0.416* (±0.051) 0.588* (±0.053)

*p<0.05

PM estimates while the broken lines represent the QC-PM estimates over quantiles.

The vertical dotted line drawn at quantile 0.6 in each ﬁgure refers to the threshold used

in the data generation process (we expect that for quantiles smaller than 0.6, QC-PM

produces estimates for the parameters speciﬁed in the model shown in the Fig. 11a,

while for quantiles larger than 0.6, QC-PM produces estimates for the parameters

speciﬁed in the model shown in the Fig. 11b). Figure 12 shows the ability of the QC-

PM to detect the structure underlying the simulated data (Fig. 11). QC-PM was indeed

able to distinguish the different effects in the different parts of the Kidney disease

distribution: both path coefﬁcients increase with quantiles and results are consistent

with the true values speciﬁed in the population models shown in Fig. 11. Table 9reports

the path coefﬁcient estimates (±standard errors) obtained using PLS-PM (ﬁrst row)

and QC-PM for the quantiles θ∈{0.25,0.50,0.75}. Standard errors of estimates were

obtained using bootstrap. All path coefﬁcients were statistically signiﬁcant. On the

whole, except for coefﬁcients at θ=0.75, PLS-PM estimates are slightly more efﬁcient

than QC-PM ones. This is in line with theory: just like mean is more efﬁcient than

median, OLS regression estimates are usually more efﬁcient than QR estimates. Both

Blood pressure and Lipid have a positive impact on Kidney disease, which increases for

123

P. Dolce et al.

patients with higher levels of CKD. This positive and increasing effect is well-known

in literature: hypertension, high presence of cholesterol, lipoprotein and triglycerides

are all considered leading causes of CKD (Bakris and Ritz 2009).

The assessment of the models is carried out using the measures introduced in Sect.

3. It is worth to recall again that a direct comparison of the measures of ﬁt of the two

methods is not appropriate, since the two methods optimize different criteria. Hence,

the objective is neither to compare PLS-PM and QC-PM results nor to identify the

best model. Instead, we aim to illustrate how to use the measures deﬁne above for the

assessment of QC-PM results.

With respect to the inner model, PLS-PM produces an R2equal to 0.214, while

QC-PM provides pseudo−R2values increasing from lower to higher quantiles (0.051,

0.123, 0.185), for the quantiles θ∈{0.25,0.50,0.75}. This result was expected and

coherent with the data structure, as relationships among constructs increase with quan-

tiles. The assessment of the outer model is carried out in two steps. Table 10 shows,

for each block, the communality values related to each MV and to the whole block

(in bold) both for PLS-PM and QC-PM. For the latter, obviously, estimates refer only

to the median because, as speciﬁed in Sect. 2.3, quantiles are allowed to vary only in

the inner model. Overall, the communality of blocks is satisfactory. From the average

communality of each block (last row in each block—values in bold), each construct

explains much of the variability of its own MVs. Considering the individual commu-

nality of each MV, we did not ﬁnd much differences, coherently with the way data

were generated (i.e., all loadings are equal). The global communalities are satisfactory

showing a good ﬁt of the outer model.

Finally, Redundancy values are reported in Table 11, PLS-PM on the ﬁrst column

and QC-PM on the subsequent columns. Results reveal a low ability of predictor

constructs to explain the variability of the outcome MVs for low quantiles, while

redundancies achieved almost moderate levels for high quantiles and for PLS-PM

(Latan and Ramli 2013; Latan and Ghozali 2015).

Table 10 Communalities for

PLS-PM and QC-PM (θ= 0.50) Construct MV PLSPM QCPM at θ=0.5

Blood pressure SBP 0.907 0.575

DBP 0.821 0.683

0.864 0.629

Lipid TC 0.780 0.490

HDL 0.809 0.584

TG 0.745 0.534

0.778 0.536

CKD ACR 0.903 0.713

eGFR 0.892 0.692

0.897 0.703

Global 0.846 0.624

123

Quantile composite-based path modeling

Table 11 Redundancy measures

for PLS-PM and QC-PM applied

on a selected set of quantiles

(θ∈{0.25,0.50,0.75})

QC-PM

PLS-PM 0.25 0.5 0.75

ACR 0.193 0.036 0.088 0.132

eGFR 0.191 0.035 0.85 0.128

CKD 0.192 0.036 0.086 0.130

6 Conclusions and insights for future works

The original proposal of QC-PM was presented for the ﬁrst time at the 8th Interna-

tional Symposium on PLS and Related Methods (PLS’14) which took place in 2014

in Paris (www.pls14.org). The method was proposed with the aim to extend classical

least squares methods for conditional mean to the estimation of conditional quantile

functions in the context of composite-based path modeling. QC-PM complements the

well-known and consolidated PLS-PM by exploring heterogeneous effects of explana-

tory constructs over the entire conditional distributions of the response constructs.

The present paper has formalized QC-PM and the iterative procedure for parameter

estimation, starting from the simplest case of one block of MVs and moving until the

general path model for multi-block data. In addition, a methodological variation in the

estimation phase of the outer model is also proposed. The applicative potentialities of

QC-PM, along with guidelines for the interpretation of results, were provided through

the analysis of an artiﬁcial data set on CKD in diabetic patients. The example highlights

how QC-PM can complement traditional methods for composite-based path modeling

in presence of heterogeneity in the relationships among variables. The properties of

the method across different scenarios were investigated through a simulation study.

The simulation design took into account the factors that typically affect the results

of composite-based path modeling methods: sample size, strength of the relationship

within the blocks (homogeneous vs heterogeneous blocks), different levels of correla-

tions between constructs and shape of distributions in the outcome blocks. Data were

generated from composite-based populations. The comparison among the different

scenarios was carried out in terms of RBias and RMSE of estimates obtained from

500 replications for each scenario. Several similarities between QC-PM and PLS-PM

emerged comparing the performance of the two methods in all generated scenarios.

Nevertheless, some differences were identiﬁed. However, it is worth to recall that the

spirit of the simulation study is to show the properties of QC-PM rather than to pro-

vide a comparison with PLS-PM. In fact, QC-PM and PLS-PM are not alternative but

complementary methods.

Simulations point out similar results for QC-PM and PLS-PM, both in terms of

bias and RMSE. This conﬁrms our insight to consider QC-PM as a supplementary

method to PLS-PM, with similar features but able to assess relationships between

variables in different parts of the distribution. However, it is noted that variability of the

QC-PM estimates is always greater even though the bias is smaller (the true population

parameter is always within the 90% range of the central values for large samples). Even

if the convergence of estimates is conﬁrmed as the sample size increases in the case of

123

P. Dolce et al.

heterogeneous blocks, both RBias and RMSE are always higher than the homogeneous

case. The new element that emerges in the case of an asymmetric distribution is the

ability of QC-PM to capture the positive skewness of the distribution used to generate

the data. The variability of the path coefﬁcient estimates is smaller for θ= 0.25 and

larger in the right tail (θ= 0.75) and the parameter is overestimated at the lowest

quantile and underestimated at the highest quantile.

From a methodological point of view, a promising extension of QC-PM will accom-

modate the case of observed or unobserved heterogeneity among observations. In the

PLS-PM literature several contributions allows to treat both kind of heterogeneity

(Sarstedt et al. 2016,2011b; Lamberti et al. 2016; Sarstedt et al. 2011a; Esposito

Vinzi et al. 2008). In the QR literature, Davino and Vistocco (2018) proposed an

innovative approach to identify group effects through a quantile regression model.

Future studies will be devoted to combine these approaches into QC-PM. Moreover,

since a recent work by Davino et al. (2020) exploited the ability of QC-PM for in-

sample prediction, future research will further evaluate the proposed approach from

an out-of-sample prediction perspective.

A further development, albeit a minor one, will consider the implementation of

another way of calculating outer weights based on a measure of quantile correlation.

Several contributions in the literature extends the ﬁrst proposal of quantile correlation

(Li et al. 2014) introducing different alternatives to measure the linear correlation

between any two random variables for a given quantile (Tang et al. 2021;Xuetal.

2020). The introduction of a descriptive measure such as quantile correlation into

the process of calculating outer weights would have an interesting computational

advantage over traditional modes requiring the estimation of regression models.

Funding Open access funding provided by Universitá degli Studi di Napoli Federico II within the CRUI-

CARE Agreement.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which

permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give

appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,

and indicate if changes were made. The images or other third party material in this article are included

in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If

material is not included in the article’s Creative Commons licence and your intended use is not permitted

by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the

copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix

Quantile Regression was proposed by Koenker and Basset (1978) as a complementary

and robust approach to classical regression analysis. In their seminal paper, the authors

remind that “in statistical parlance the term robustness has come to connote a certain

resilience of statistical procedures to deviations from the assumptions of hypothetical

models” and that the need for robust statistics alternative to least squares estimation

dates back to the nineteenth century.

Just one year earlier, Mosteller and Tukey (1977) sensed the need to identify more

robust regression methods by stating that “What the regression curve does is give a

123

Quantile composite-based path modeling

7 8 9 10 11 12 13 14

1000 2000 3000 4000 5000 6000

x

y

Slope

quantiles

coefficients

250 350 450 550

0.1 0.25 0.5 0.75 0.9

Fig. 13 Left-hand side: Scatterplot with simulated data, ordinary least square regression line (solid line)

and quantile lines (dashed lines). Right-hand side: representation of QR slopes, dashed horizontal line

corresponding to OLS slope

grand summary for the averages of the distributions corresponding to the set of X’s.

We could go further and compute several different regression curves corresponding to

the various percentage points of the distributions and thus get a more complete picture

of the set. Ordinarily this is not done, and so regression often gives a rather incomplete

picture. Just as the mean gives an incomplete picture of a single distribution, so the

regression curve gives a correspondingly incomplete picture for a set of distributions”.

Quantile Regression (QR) was introduced precisely for the purpose of going beyond

the study of average effects in a regression model and to provide a description of the

whole conditional distribution of a response variable in terms of a set of regressors. QR

can be exploited in case of location, scale and shape shifts on the dependent variable but

also when a monotone transformation of the response and/or the explanatory variables

is advisable.

In order to show QR in action, we introduce an example dealing with a sample of

n=10,000 observations generated from the following model:

y=1+2x+(1+x)e

x∼N(10;1)

e∼N(−1+20x);ex/3.

The scatterplot in Fig. 13 (left-hand side) shows the typical fan-like shape, sign that

the amount of variability over the expected value of the dependent variable for a given

value of the explanatory variable xis not the same at every level of x, but varies

systematically with the level of x.

Estimating a least-squares regression model (solid line in Fig. 13) would not fully

capture the relationship between the two variables. In such a case, it is more interest-

ing to explore the effect of the regressor at different parts of the distribution of the

dependent variable.

123

P. Dolce et al.

A QR model for a given conditional quantile and for the ith observation can be

formulated as follows:

yi=xiβ(θ) +i(θ )

Qθ(ˆ

y|X)=Xˆ

β(θ)

where Xis the regressor matrix and xia row of this matrix, ythe vector containing the

dependent variable, 0 <θ <1 is a generic quantile, Qθ(.|.) denotes the conditional

quantile function for the θth quantile and is the error term such that Qθ((θ )|X)=0.

QR offers a complete view of a response variable providing a method for mod-

elling the rates of changes at multiple points (conditional quantiles) of its distribution

(Koenker 2005; Davino et al. 2013) without requiring assumptions on the errors.

Although different functional forms can be used, we will deal here only with simple

linear regression models.

For each quantile, a regression line is estimated and, as a consequence, the estimated

values of the response variable conditioned to given values of the regressors, provides

the conditional quantiles of the dependent variable.

Going back to the example with simulated data, ﬁve QR lines have been estimated

considering the following set of θvalues: 0.1, 0.25, 0.5, 0.75, 0.9. The dashed lines in

Fig. 13 (left-hand side) are obtained by estimating the effect of xon the selectd con-

ditional quantiles of y. They conﬁrm the positive impact of the regressor but showing

that this contribution increases as we move on the distribution of the yvariable.

In Fig. 13 (right-hand side), the slopes of the estimated quantile lines are graphically

represented. The horizontal axis displays the different quantiles while the effect of the

regressor is represented on the vertical axis. The dashed line parallel to the horizontal

axis corresponds to the ordinary least squares coefﬁcient. This graphical representation

allows to visually catch the different effect of the regressor on the yvariable.

The parameter estimates in QR linear models have the same interpretation as those

of any other linear model. The intercept measures the value of the dependent variable

setting to zero all the regressors. Each slope coefﬁcient ˆ

βi(θ) =∂Qθ(y|X)

∂xican be

interpreted as the rate of change of the θth conditional quantile of the dependent

variable per unit change in the value of the ith regressor, holding constant the other

regressors.

The conditional quantile estimator is obtained as a generalisation of the uncondi-

tional quantile estimator:

ˆ

β(θ) =argmin

β

Eρθ(y−Xβ)(21)

where ρθ(.) is an asymmetric absolute loss function which uses an unbalanced weight-

ing system (weight equal to (1-θ) for the sum of negative deviations and weight equal

to θfor the sum of positive deviations).

The most widespread algorithm for ﬁnding QR estimates is the one proposed in

Koenker and d’Orey (1987) as a variant of the Barrodale and Roberts (1974)simplex

algorithm. Although it is theoretically possible to extract inﬁnite quantiles, a ﬁnite

number is numerically distinct in practice. This is known as quantile process. A fairly

123

Quantile composite-based path modeling

accurate approximation of the whole quantile process can be obtained using a dense

grid of equally spaced quantiles in the unit interval (0, 1) (Davino et al. 2013).

References

Amato S, Esposito Vinzi V, Tenenhaus M (2004) A global goodness-of-ﬁt index for PLS structural equation

modeling. Oral Communication to PLS Club, HEC School of Management, France, March, p 24

Bakris GL, Ritz E (2009) The message for World Kidney Day 2009: hypertension and kidney disease: a

marriage that should be prevented. J Clin Hypertens 11(3):144–147

Barrodale I, Roberts FDK (1974) Solution of an overdetermined system of equations in the l1 norm. Commun

Assoc Comput Mach 17:319–320

Benitez J, Henseler J, Castillo A, Schuberth F (2020) How to perform and report an impactful analysis using

partial least squares: guidelines for conﬁrmatory and explanatory IS research. Inf Manag 2(57):103168

Bentler PM (2006) EQS 6 structural equations program manual. Multivariate Software, Encino, CA

Chin WW (1998) The partial least squares approach to structural equation modeling. In: Marcoulides GA

(ed) Modern methods for business research. Erlbaum, Mahwah, pp 295–358

Davino C, Esposito Vinzi V (2016) Quantile composite-based path modelling. Adv Data Anal Classif

10(4):491–520

Davino C, Vistocco D (2018) Handling heterogeneity among units in quantile regression. Investigating the

impact of students’ features on University outcome. Stat Interface 11:541–556

Davino C, Furno M, Vistocco D (2013) Quantile regression: theory and applications. Wiley, Hoboken

Davino C, Esposito Vinzi V, Dolce P (2016) Assessment and validation in quantile composite-based path

modeling. In: Abdi H, Esposito Vinzi V, Russolillo G, Saporta G, Trinchera L (eds) The Multiple

facets of partial least squares methods, chapter 13. Springer proceedings in mathematics and statistics.

Springer, Berlin

Davino C, Dolce P, Taralli S (2017) Quantile composite-based model: a recent advance in PLS-PM. A

preliminary approach to handle heterogeneity in the measurement of equitable and sustainable well-

being. In: Latan H, Noonan R (eds) Partial least squares path modeling: basic concepts. Methodological

issues and applications. Springer, Cham, pp 81–108

Davino C, Dolce P, Taralli S, Esposito Vinzi V (2018) A quantile composite-indicator approach for the

measurement of equitable and sustainable well-being: a case study of the Italian provinces. Social

Indicators Research, 136, pp 999–1029, Dordrecht, Kluwer Academic Publishers

Davino C, Dolce P, Taralli S, Vistocco D (2020) Composite-based path modeling for conditional quantiles

prediction. An application to assess health differences at local level in a well-being perspective. Soc

Indic Res. https://doi.org/10.1007/s11205-020-02425-5

Dijkstra T, Henseler J (2015) Consistent partial least squares path modeling. MIS Q 39(2):297–316

Dolce P, Lauro CN (2015) Comparing maximum likelihood and PLS estimates for structural equation

modeling with formative blocks. Qual Quant 49(3):891–902

Efron B (1982) The jackknife, the bootstrap, and other resampling plans. SIAM, Philadelphia, p 38

Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman Hall, New York

Esposito Vinzi V, Russolillo G (2013) Partial least squares algorithms and methods. Wiley Interdiscip Rev

Comput Stat 5(1):1–19

Esposito Vinzi V, Trinchera L, Squillacciotti S, Tenenhaus M (2008) REBUS-PLS: a response-based pro-

cedure for detecting unit segments in PLS path modelling. Appl Stoch Models Bus Ind 24:439–458

Esposito Vinzi V, Chin WW, Henseler J, Wang H (eds) (2010) Handbook of partial least squares. Springer,

Berlin

Fleishman AI (1978) A method for simulating non-normal distributions. Psychometrika 73:521–532

Fornell C, Larcker DF (1981) Structural equation models with unobservable variables and measurement

error: algebra and statistics. J Mark Res 18(3):328–388

Furno M, Vistocco D (2018) Quantile regression: estimation and simulation. Wiley series in probability

and statistics. Wiley, Hoboken

Gould W (1997) sg70: interquantile and simultaneous-quantile regression. Stata Tech Bull 38:142

Hair JF,Ringle CM, Sarstedt M (2011) PLS-SEM: indeed a silver bullet. J Mark Theory Pract 19(2):139–150

Hair JF, Hult GTM, Ringle CM, Sarstedt M (2017) A primer on partial least squares structural equation

modeling (PLS-SEM), 2nd edn. Sage, Thousand Oaks

123

P. Dolce et al.

Hair JF, Risher JJ, Sarstedt M, Ringle CM (2019) When to use and how to report the results of PLS-SEM.

Eur Bus Rev 31(1):2–24

Hair JF,Howard MC, Nitzl C (2020) Assessing measurement model quality in PLS-SEM using conﬁrmatory

composite analysis. J Bus Res 109:101–110

He XM, Zhu LX (2003) A lack-of-ﬁt test for quantile regression. J Am Stat Assoc 98:1013–1022

Henseler J, Ringle CM, Sarstedt M (2016) Testing measurement invariance of composites using partial least

squares. Int Mark Rev 33(3):405–431

Jöreskog KG (1978) Structural analysis of covariance and correlation matrices. Psychometrika 43(4):443–

477

Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge

Koenker R, Basset G (1978) Regression quantiles. Econometrica 46:33–50

Koenker R, Basset G (1982) Robust tests for heteroscedasticity based on regression quantiles. Econometrica

50(1):43–61

Koenker R, d’Orey V (1987) Computing regression quantiles. Appl Stat 36:383–393

Koenker R, Machado JAF (1999) Goodness of ﬁt and related inference processes for quantile regression. J

Am Stat Assoc 94(448):1296–1310

Ková ˇc Š, Želinský T (2013) Determinants of the Slovak enterprises proﬁtability: quantile regression

approach. Statistika 93(3):41–55

Lamberti G, Aluja TB, Sanchez G (2016) The pathmox approach for PLS path modeling segmentation.

Appl Stoch Models Bus Ind 32(4):453–468

Latan H, Ghozali I (2015) Partial least squares: concepts, techniques and application using program Smart-

PLS 3.0, 2nd edn. Diponegoro University Press, Semarang

Latan H, Ramli NA (2013) The results of partial least squares-structural equation modeling analyses (PLS-

SEM). Available at SSRN. https://doi.org/10.2139/ssrn.2364191

Li G, Li Y, Tsai C (2014) Quantile correlations and quantile autoregressive modeling. J Am Stat Assoc

110(509):233–245

Lohmöller JB (1989) Latent variable path modeling with partial least squares. Physica-Verlag, Heildelberg

Mosteller F, Tukey J (1977) Data analysis and regression. Addison-Wesley, Reading, MA

OECD (2008) Handbook on constructing composite indicators: methodology and user guide. OECD, Paris

Parzen MI, Wei L, Ying Z (1994) A resampling method based on pivotal estimating functions. Biometrika

18:341–350

Paxton P, Curran PJ, Bollen KA, Kirby J, Chen F (2001) Monte Carlo experiments: design and implemen-

tation. Struct Equ Model Multidiscip J 8(2):287–312

R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical

Computing. Austria (https://www.R-project.org/), Vienna

Rigdon EE (2013) Partial least squares path modeling. In: Hancock G, Mueller R (eds) Structural equation

modeling: a second course, 2nd edn. Information Age, Charlotte, pp 81–116

Rudis B (2019) ggeconodist: create diminutive distribution charts. R package version

Sarstedt M, Henseler J, Ringle CM (2011a) Multi-group analysis in partial least squares (PLS) path model-

ing: alternative methods and empirical results. In: Sarstedt M, Schwaiger M, Taylor CR (eds) Advances

in international marketing, vol 22. Bingley, Emerald, pp 195–218

Sarstedt M, Becker J-M, Ringle CM, Schwaiger M (2011b) Uncovering and treating unobservedheterogene-

ity with FIMIX-PLS: which model selection criterion provides an appropriate number of segments?

Schmalenbach Bus Rev 63:34–62

Sarstedt M, Ringle CM, Gudergan SP (2016) Guidelines for treating unobserved heterogeneity in tourism

research: a comment on Marques and Reis (2015). Ann Tourism Res 57:279–284

Schlittgen R (2019) R package sempls: simulation, estimation and segmentation of composite based struc-

tural equation models (version 1.0.0). https://cran.r-project.org/web/packages/cbsem/index.html

Schlittgen R, Sarstedt M, Ringle CM (2020) Data generation for composite-based structural equation

modeling methods. Adv Data Anal Classif. https://doi.org/10.1007/s11634-020-00396-6

Shmueli G, Sarstedt M, Hair JF, Cheah J-H, Ting H, Vaithilingam S, Ringle CM (2019) Predictive model

assessment in PLS-SEM: guidelines for using PLSpredict. Eur J Mark (forthcoming)

Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc 36:111–147

Tang W, Xie J, Lin Y, Tang N (2021) Quantile correlation-based variable selection. J Bus Econ Stat. https://

doi.org/10.1080/07350015.2021.1899932

Tenenhaus M (2008) Component-based structural equation modelling. Total Qual Manag Bus Excell

19:871–886

123

Quantile composite-based path modeling

Tenenhaus M, Vinzi VE, Chatelin YM, Lauro C (2005) PLS path modeling. In: Computational statistics

and data analysis, pp 159–205

Vale C, Maurelli V (1983) Simulating multivariate nonnormal distributions. Psychometrika 48(3):465–471

Wang Y, Feng XN, Song XY (2016) Bayesian quantile structural equation models. Struct Equ Model

Multidiscip J 23(2):246–258

Wold H (1966a) Nonlinear estimation by iterative least squares procedures. Research Papers. Statistics 630

Wold H (1966b) Estimation of principal component and related models by iterative least squares. In:

Krishnaiah PR (ed) Multivariate analysis. Academic Press, New York, pp 391–420

Wold H (1975) From hard to soft modelling. In Wold H (ed) Modelling in complex situations with soft

information. (Paper, Third World Congress on Econometrics; Toronto, Canada; 1975 August 21–26).

(Research Report 1975:5). University, Institute of Statistics, Goteborg, Sweden

Wold H (1982) Soft modeling: the basic design and some extensions. In: Jöreskog K, Wold H (eds) Systems

under indirect observation, vol 2. North-Holland, Amsterdam, pp 1–54

Wold H (1985) Partial least squares. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences, vol

6. Wiley, New York, pp 581–591

Xu C, Ke J, Zhao X, Zhao X (2020) Multiscale quantile correlation coefﬁcient: measuring tail dependence

of ﬁnancial time series. Sustainability 12(12):4908

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps

and institutional afﬁliations.

123