Content uploaded by Florian Schuberth
Author content
All content in this area was uploaded by Florian Schuberth on Jun 27, 2019
Content may be subject to copyright.
Contents lists available at ScienceDirect
Information & Management
journal homepage: www.elsevier.com/locate/im
How to perform and report an impactful analysis using partial least squares:
Guidelines for conﬁrmatory and explanatory IS research
Jose Benitez
a,b,⁎
, Jörg Henseler
c,d
, Ana Castillo
b
, Florian Schuberth
c
a
Rennes School of Business, Rennes, France
b
Department of Management, School of Business, University of Granada, Granada, Spain
c
Department of Design, Production and Management, Faculty of Engineering Technology, University of Twente, Enschede, the Netherlands
d
Nova Information Management School, Universidade Nova de Lisboa, Lisbon, Portugal
ARTICLE INFO
Keywords:
Partial least squares path modeling
Guidelines
Model validation
Composite model
Conﬁrmatory and explanatory information
systems research
ABSTRACT
Partial least squares path modeling (PLSPM) is an estimator that has found widespread application for causal
information systems (IS) research. Recently, the method has been subject to many improvements, such as
consistent PLS (PLSc) for latent variable models, a bootstrapbased test for overall model ﬁt, and the heterotrait
tomonotrait ratio of correlations for assessing discriminant validity. Scholars who would like to rigorously apply
PLSPM need updated guidelines for its use. This paper explains how to perform and report empirical analyses
using PLSPM including the latest enhancements, and illustrates its application with a ﬁctive example on
business value of social media.
1. Introduction
Structural equation modeling (SEM) has become an important sta
tistical tool in social and behavioral sciences. It is capable of modeling
nomological networks by expressing theoretical concepts through
constructs and connecting these constructs via a structural model to
study their relationships [1]. In doing so, random measurement errors
can be taken into account and empirical evidence for postulated the
ories can be obtained by means of statistical testing.
Two kinds of estimators for SEM can be distinguished: covariance
based and variancebased estimators. While covariancebased estima
tors minimize the discrepancy between the empirical and modelim
plied variance–covariance matrix of the observable indicators to obtain
the model parameter estimates, variancebased estimators create linear
combinations of the indicators as standins for the theoretical concepts
and subsequently estimate the model parameters. A widely used var
iancebased estimator is partial least squares path modeling (PLSPM).
Originally developed by Herman O.A. Wold [2] to analyze highdi
mensional data in a lowstructure environment, PLSPM has become a
fullﬂedged estimator for SEM over the past decade [3]. Consequently,
PLSPM has been applied in various ﬁelds of business administration
research such as strategy [4], marketing [5], operations management
[6], human resource management [7], ﬁnance [8], tourism [9], and
family business [10].
For decades, PLSPM has been the predominant estimator for
structural equation models in the ﬁeld of information systems (IS) (e.g.,
[11–15]). IS research usually incorporates complex research problems
and questions that require conceptualization and operationalization of
theoretical concepts, and investigation of their relationships. Current
literature suggests two types of theoretical concepts: concepts from
behavioral sciences and concepts from design science [16]. Theoretical
concepts from behavioral research are assumed to cause observable
indicators and their relationships, i.e., the theoretical concept is the
common cause of observable indicators [17]. Typically, these concepts
are operationalized by a measurement model. Extant literature suggests
two types of measurement models: the reﬂective [1] and the cau
sal–formative measurement model [18]. Both types of measurement
models assume a causal relationship between the indicators and their
construct, i.e., the latent variable. In contrast, theoretical concepts from
design science, socalled artifacts, are humanmade creations that are
shaped and built by their ingredients to serve a certain goal [19]. Due to
the constructivist nature of this type of theoretical concept, recent lit
erature suggests to operationalize artifacts by the composite model
[16]. In contrast to the measurement models, in the composite model,
the indicators do not cause the construct, but combine to compose the
construct. To highlight this aspect and to pronounce the diﬀerence to
the latent variable, we refer to constructs that are composed of in
dicators emergent variables [20,21]. In summary, IS scholars can
https://doi.org/10.1016/j.im.2019.05.003
Received 24 October 2017; Received in revised form 29 April 2019; Accepted 18 May 2019
⁎
Corresponding author at: Rennes School of Business, Rennes, France.
Email addresses: jose.benitez@rennessb.com,joseba@ugr.es (J. Benitez), j.henseler@utwente.nl (J. Henseler), anacastillo@ugr.es (A. Castillo),
f.schuberth@utwente.nl (F. Schuberth).
Information & Management xxx (xxxx) xxx–xxx
03787206/ © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/BY/4.0/).
Please cite this article as: Jose Benitez, et al., Information & Management, https://doi.org/10.1016/j.im.2019.05.003
operationalize a theoretical concept in three diﬀerent ways in their
model: reﬂective and causal–formative measurement model (usually
employed for behavioral concepts), and composite model (usually
employed for artifacts). See Table 1 in Henseler (2017) for a detailed
explanation of the diﬀerent types of operationalization.
In recent years, PLSPM has become the subject of scholarly debate.
Proponents called PLSPM a “silver bullet”[38], while opponents cri
ticized PLSPM’s inconsistency for latent variable models and the ab
sence of a test for overall model ﬁt (e.g. [39],). This debate has sti
mulated the development of several enhancements to PLSPM. These
include consistent PLS (PLSc) to consistently estimate linear and non
linear latent variable models ([40][41],); a bootstrapbased test to
statistically assess overall model ﬁt[42]; measures of overall model ﬁt,
such as the standardized root mean squared residual (SRMR), based on
heuristic rules to evaluate overall model ﬁt[22]; and the heterotraitto
monotrait (HTMT) ratio of correlations as a criterion to assess dis
criminant validity [35]. As a result, PLSPM has become a fullﬂedged
estimator to SEM that can deal with reﬂective and causal–formative
measurement models as well as composite models. Moreover, it can be
applied to conﬁrmatory, explanatory, exploratory, descriptive, and
predictive research [24].
For the ﬁeld of IS to beneﬁt from these methodological and con
ceptual achievements in PLSPM, IS scholars need guidelines for their
empirical studies that incorporate all these new developments and re
cently obtained insights. Some of the guidelines papers on PLSPM in
the IS literature were published before 2013 –i.e., before the debate
and resulting enhancements (e.g., [12,43–45]). Although several re
cently published textbooks and articles (e.g. [46–48],) have provided
guidelines for causal research that cover some of the latest
enhancements in PLSPM, neither of these prior PLSPM guidelines for
causal research covered the full range of recent developments.
To ﬁll this gap on guidelines in the current IS literature, this study
provides updated guidelines for using PLSPM in causal research
(conﬁrmatory and explanatory research), employing all the most re
cently proposed standards. In so doing, the paper addresses why and
how to perform and report PLSPM estimation in conﬁrmatory and ex
planatory IS research. In conﬁrmatory IS research, the scholar aims to
understand the causal relationships between theoretical concepts of
interest for the IS community. In doing so, the scholar aims to conﬁrm a
postulated theory, i.e., obtain empirical evidence for his/her descrip
tion of the working mechanism of the world. This is tried to be achieved
by imposing testable restrictions on the indicator variance–covariance
matrix, e.g., by ﬁxing path coeﬃcients to a certain value, assuming that
the correlation between two indicators is the result of an underlying
latent variable like in the classical reﬂective measurement model, or in
the composite model that the correlations of the indicators forming an
emergent variable with a variable not forming the emergent variable
are proportional. The dominant statistical tool in the context of con
ﬁrmatory research is the test for overall model ﬁt. Testing model ﬁt
only makes sense if the number of correlations among observable
variables exceeds the number of model parameters to be estimated, i.e.,
it is indispensable to have a certain amount of parsimony (a positive
number of degrees of freedom in the sense of SEM).
As in conﬁrmatory research, in explanatory IS research, the analyst
aims to understand the causal relationship among the theoretical con
cepts. However, this type of research primarily focuses on the ex
planation of a speciﬁc phenomenon which is treated as a dependent
variable in the model. In doing so, the primary focus is on the
Table 1
Substantial changes in understanding of PLSPM.
Traditional view of PLSPM Uptodate view of PLSPM
1. PLSPM should be used primarily for exploratory and earlystage research PLSPM can be used for various types of research, e.g., conﬁrmatory and explanatory or
predictive ([22], [23][24],)
2. PLSPM has advantages over covariancebased estimators when the sample size is
small
PLSPM can produce estimates even for very small sample sizes. However, as for other
estimators, these estimates are generally less accurate than those obtained by a larger
sample ([25][26],). Hence, the justiﬁcation of using PLSPM with small sample sizes
should be considered cautiously
3. PLSPM can only estimate recursive structural models PLSPM can also consistently estimate nonrecursive structural models by using, e.g.,
2SLS or 3SLS instead of OLS (Dijkstra and Henseler 2015b, [27])
4. Model identiﬁcation plays no role when employing PLSPM First, PLSPM always estimates an underlying composite model, regardless of whether
the model consists of latent variables; identiﬁcation rules of composite models must thus
be taken into account ([28][134],). Consequently, model identiﬁcation is also important
in the case of PLSPM
5. PLSPM has greater statistical power than the maximumlikelihood (ML) estimator This statement is based on inconsistent parameter estimates and has been shown to be
invalid [30]. Furthermore, an estimator has no statistical power (one refers to its
eﬃciency, or accuracy, in estimating parameters, usually expressed by the standard
error); only a statistical test can be assessed in terms of its statistical power
6. Mode A can be used to consistently estimate reﬂective measurement models Regardless of the mode used, PLSPM creates linear combinations of observed indicators
(composites) as proxies for the theoretical concepts ([31][32],). Therefore, to
consistently estimate models containing latent variables, one must correct for
attenuation of the construct scores correlations. In the context of PLSPM, this procedure
is known as PLSc (Dijkstra and Henseler 2015a, 2015b)
7. Mode B can be used to estimate causal–formative measurement models consistently Mode B is just another way to obtain weights to build composites; hence, it does not
consistently estimate causal–formative measurement models. However, causal–formative
measurement models can be estimated by means of a MIMIC model ([18][3],)
8. The overall ﬁt of models estimated by PLSPM cannot be assessed The overall model can be assessed in two nonexclusive ways ([22], Dijkstra and
Henseler 2015b): (1) bootstrapbased tests for overall model ﬁt, and (2) measures of
overall model ﬁt. Both assess the discrepancy between the empirical and the model
implied indicator variance–covariance matrix. While the latter is based on heuristic
rules, the former is based on statistical inferences
9. Reliability of construct scores obtained by PLSPM should be assessed using
Cronbach’sαand Dillon–Goldstein’sρ(also called Jöreskog’sρor composite
reliability)
Currently, Dijkstra–Henseler’sρ
A
is the only consistent reliability coeﬃcient for PLSPM
construct scores [16]. Dillon–Goldstein’sρand Cronbach’sαindicate the reliability of
sum scores. While Cronbach’sαis based on the indicator variance–covariance matrix,
Dillon–Goldstein’sρis based on the factor loadings. Therefore, for the calculation of
Dillon–Goldstein’sρ, consistent factor loading estimates should be used. Moreover,
Cronbach’sαassumes equal population covariances among the indicators of one block;
an assumption that is likely not met in empirical research. However, it can be used as a
lower bound to reliability ([33][34],)
10. Discriminant validity should be examined using the Fornell–Larcker criterion The HTMT [35] should be considered to assess discriminant validity ([36][37],)
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
2
coeﬃcient of determination (R²) and the signiﬁcance of path coeﬃcient
estimates. Such models can be saturated (i.e., have zero degrees of
freedom in the sense of SEM). Although the two types of research can be
theoretically distinguished, in empirical IS research, scholars very often
combine conﬁrmatory and explanatory IS research, e.g., testing the
measurement model (conﬁrmatory research) and focusing on the ex
planation of a speciﬁc construct in structural model (explanatory re
search). This paper explains why and how to perform and report em
pirical analyses using PLSPM in causal IS research following the latest
enhancements, and illustrates this analysis with a ﬁctive example on
business value of social media.
2. Foundations of PLSPM
In its current form, PLSPM is a fullﬂedged variancebased esti
mator for SEM that can estimate linear, nonlinear, recursive, and non
recursive structural models ([40][42],). Moreover, it is capable of
dealing with models that contain emergent and latent variables [41],
secondorder emergent variables built by latent variables [49], and
ordinal categorical indicators [29]. It can incorporate sampling weights
known as weighted partial least squares (WPLS, see [50]), deal with
correlated measurement errors within a block of indicators [51], and
address multicollinearity among the constructs in the structural model
[52]. It can also be used for multiple group comparison ([53], [54],),
and potential sources of endogeneity can be addressed [55]. Finally,
importantperformance map analysis can be used to illustrate the re
sults of the structural model [56]. For a recent overview of the meth
odological research on PLSPM, we refer to [57].
2.1. Model speciﬁcation
To employ PLSPM, scholars must transfer their proposed theory
into a statistical model [58]. In the context of SEM, this means that the
theoretical concepts and their hypothesized relationships must be
transferred into a structural model. “Theoretical concepts refer to ideas
that have some unity or something in common. The meaning of a
theoretical concept is spelled out in a theoretical deﬁnition”[59]. We
distinguish between two types of theoretical concepts: behavioral
concepts, and design concepts, socalled artifacts. Typically, theoretical
concepts are represented by constructs in the structural model [32].
Although constructs and latent variables are often equated [60], we
deliberately distinguish between a latent variable, i.e., a construct that
represents a behavioral concept, and an emergent variable, i.e., a
construct that represents an artifact. The operationalization of theore
tical concepts, i.e., the speciﬁcation of the theoretical concepts in the
structural model, requires special attention because estimates are likely
to be inconsistent if a concept’s operationalization is not in accordance
with the concept’s nature [61].
PLSPM can deal with two kinds of constructs: emergent variables
and latent variables. Latent variables refer to variables that are not
directly observed but instead inferred through a measurement model
from other observed variables (directly measured; [46,62]). They
usually represent theoretical concepts of behavioral research such as
personality traits, individual behavior, and individual attitude [63].
This theoretical reasoning rests on the assumption that behavioral
concepts of interest exist in nature, irrespective of scholarly investiga
tion [64]. The existing literature proposes two ways to measure beha
vioral concepts [59]: reﬂective and causal–formative measurement
model.
The reﬂective measurement model –also known as the common
factor model –is grounded in the true score theory [65]. It assumes that
a set of indicators is a measurement error–prone manifestation of an
underlying latent variable [66]. Some indicators can thus be inter
changed without altering the meaning of the latent variable. As the
measurement errors of a block of indicators are usually assumed to be
uncorrelated and independent of the latent variable, the reﬂective
measurement model imposes restrictions on the variance–covariance
matrix of indicators belonging to one latent variable. In its classical
form, the correlations among the indicators of one block are zero when
controlled for the latent variable, also known as the axiom of local
independence [67]. This fact is typically exploited to draw conclusions
about the existence of the latent variable.
Besides the reﬂective measurement model, the literature proposes
the causal–formative measurement of behavioral concepts [68,69]. In
contrast to the reﬂective measurement model, the causal–formative
measurement model reverses the direction of causality between the
indicators and the construct and assumes that the observed indicators
cause the latent variable. This model thus does not restrict the covar
iances of the indicators belonging to one block. The remaining causes
not represented by the indicators are captured in an error term, which is
by assumption uncorrelated with the causal indicators. Although a
violation of this assumption, i.e., omission of causal indicators, leads to
biased parameter estimates of the causal indicators, recent literature
shows that the meaning of the latent variable is not aﬀected by omitting
causal indicators and the remaining model parameters can be con
sistently estimated [70]. However, the causal–formative measurement
model is not identiﬁed on its own, i.e., the model parameters cannot be
uniquely retrieved from the population indicator variance–covariance
matrix [71,72]. To obtain an identiﬁed causal–formative measurement
model, the latent variable must be connected to at least two other
variables not aﬀecting the latent variable [18], for example, using a
multipleindicators, multiplecauses (MIMIC) model.
Typical examples of behavioral concepts in IS that have been op
erationalized by a measurement model are behavioral intention to use
information technology (IT), and IT interaction behavior. Behavioral
intention to use IT indicates the degree to which a person has for
mulated conscious plans to perform or not to perform a speciﬁed future
behavior involving IT use. This concept has been operationalized by a
reﬂective measurement model in past IS research using the following
items: user’s intention, prediction, and plan to use IT in future months
(e.g., [73,74]). IT interaction behavior refers to the user’s interaction
with IT to accomplish an individual or organizational task. This concept
has been operationalized by a causal–formative measurement model in
past IS research. For example, Barki et al. [75] employed a MIMIC
model to operationalize IT interaction behavior using six tasks (causes)
that motivated users to interact with IT (problem solving, justifying
decisions, exchanging information with people, planning or following
up, coordinating activities, and serving customers); and two measure
ments of this behavior using two reﬂective indicators (importance of IT
and time invested using IT).
Emergent variables are an alternative representation of theoretical
concepts [20,21]. They have been recently referred in empirical IS re
search as “composite constructs”(e.g. [76],). Although these labels
could be used interchangeably, we recommend using the term “emer
gent variable”to highlight that the construct emerges from the in
dicators. Emergent variables can help model artifacts [3,16]. An artifact
is a human or ﬁrmmade object composed of its ingredients. Thus, in
contrast to behavioral concepts, they are not assumed to exist in nature,
but are products of theoretical thinking and/or theoretically justiﬁed
constructions usually made to fulﬁll a certain purpose. To oper
ationalize these human or ﬁrmmade concepts, the composite model
can be employed [16]. Examples from the IS research are IT capability
and IT ambidexterity [77,78].
The composite model can be understood as a recipe for how in
gredients (the components) should be mixed and matched to build the
artifact. The composite model assumes a deﬁnitorial rather than a
causal relationship between indicators and the emergent variable ([63],
2017). In the classical composite model, the indicators forming an
emergent variable are assumed to be free of measurement errors. In
contrast to the reﬂective measurement model, the composite model
imposes no restrictions on the covariance structure of indicators be
longing to the same construct. The reﬂective measurement model is
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
3
thus nested within the composite model, as the composite model relaxes
the assumption that all covariation among a block of indicators is ex
plained by one latent variable [22]. Yet, the composite model con
straints the correlations between the indicators forming an emergent
variable and variables not forming the emergent variable, i.e., they are
proportional [29]. Similar to the causal–formative measurement model,
the composite model is not identiﬁed when isolated in the structural
model. To ensure identiﬁcation, a necessary condition is that each
emergent variable must be linked to at least one variable not forming
the emergent variable [28,134].
Because the artifact as a type of theoretical concept was introduced
only recently, it is helpful to illustrate this type of theoretical concept
with an example. Based on theory, bread is made from wheat, water,
salt, and yeast. Although the correlations between the amounts of
wheat, water, salt, and yeast in a sample of loaves of bread are likely to
be high, one would not conclude that bread is something that should be
measured, i.e., that bread causes (or is caused by) wheat, water, salt,
and yeast. Rather, wheat, water, salt, and yeast are the simple entities
(ingredients) combined to form the emergent variable representing the
artifact we call bread. Clearly, the temporal precedence of the in
gredients also suggests that bread cannot be the common cause of its
ingredients.
Because IS Science analyzes and aims at explaining how IT aﬀects
organizations, individuals, and society, artifacts play a pivotal role in IS
research. For example, the theoretical concept IT infrastructure cap
ability refers to a ﬁrm’s ability to use and leverage its IT resource in
frastructure for business activities [79–82]. IT infrastructure capability
isa“humanmade/ﬁrmmade”concept that can be operationalized by
the composite model [76,81,83]. Of course, no single “true”recipe
exists for creating this artifact. Just as diﬀerent bakeries can produce
diﬀerent types of bread or diﬀerent breweries produce diﬀerent types of
beer, diﬀerent scholars can produce diﬀerent recipes for the same
concept. The beer analogy can be extraordinarily instructive. Diﬀerent
recipes exist worldwide to design and manufacture beer. For example,
Spanish breweries use one recipe, German breweries another. Recipes
can even vary by region within a country. Such diversity makes each
recipe an idiosyncratic way to understand and design beer, but all of
these recipes ultimately produce beer.
For example, based on Melville et al. [84] study, Ajamieh et al. [81]
deﬁne the artifact IT infrastructure capability as composed of IT tech
nological infrastructure capability, IT managerial infrastructure cap
ability, and IT technical infrastructure capability. Further, some prior IS
research [85,86] considers IT capability –a concept similar to IT in
frastructure capability –as composed of IT technical infrastructure,
human IT resources, and ITenabled intangibles. IT infrastructure ﬂex
ibility and postmerger and acquisition (M&A) IT integration capability
are two examples of artifacts recently considered in IS research [83]. IT
infrastructure ﬂexibility refers to the capability of the infrastructure to
adapt to environmental changes. A ﬂexible ﬁrm IT infrastructure has
the following characteristics: IT compatibility, IT connectivity, mod
ularity, and IT personnel skills ﬂexibility [83]. Similarly, postM&A IT
integration capability is the ﬁrm’s ability to integrate the IT technical
infrastructure, IT personnel, and IT and business processes of the
target/acquired ﬁrm with the IT technical infrastructure, IT personnel,
and IT and business processes of the acquirer after an M&A [83]. Thus,
postM&A IT integration capability can be understood as an artifact
built by integrating IT technical infrastructure, IT personnel, and IT and
business processes. These are two examples of artifacts that have re
cently been examined in the ﬁeld of IS.
While this study argues for the use of the composite model to op
erationalize artifacts, recently it has been suggested to employ the
composite model to operationalize behavioral concepts [58]. This no
tion assumes that both latent and emergent variables serve as a proxy
for behavioral concepts [61]. Following this reasoning, the validity gap
occurs between the concept and its construct and not between the
construct and the observable variables [32].
Once the theoretical concepts are operationalized, the constructs
representing the theoretical concepts can be related via the structural
model. The structural model typically represents the core of the theory
proposed. The structural model generally consists of a set of regression
equations, illustrating the relationship hypothesized between the the
oretical concepts. In each equation, a dependent construct is explained
by one or more independent constructs. Because a dependent construct
is typically not fully explained by its independent constructs, an error
term accounts for the remaining variance in the dependent construct.
By assumption, the error term is independent of the explanatory con
structs of its equation. To avoid violating this assumption, in causal IS
research, the scholar should make every eﬀort to include all relevant
constructs (those that aﬀect the dependent construct and correlate with
at least one explanatory construct in the corresponding equation).
Otherwise, the path coeﬃcient estimates obtained by ordinary least
squares (OLS) suﬀer from omitted variable bias [87]. One potential way
to address this problem of endogeneity is to use the twostage least
squares (2SLS) estimator for the structural model ([42][27,55],). In the
following, we consider only recursive structural models, structural
models without feedback loops, and/or correlated error terms.
2.2. Parameter estimation
In its current form, PLSPM estimates model parameters in three
steps. In the ﬁrst, the iterative PLSPM algorithm determines the
weights to create scores for each construct (latent variables and emer
gent variables; [88]). As construct scores of latent variables contain
measurement errors, the second step corrects for attenuation in corre
lations between latent variables. In doing so, PLSc divides the construct
scores correlations by the geometric mean of the constructs’reliabilities
[41], making the main outcome of the second step a consistent con
struct correlation matrix. Finally, the third step estimates the model
parameters (weights, loadings, and path coeﬃcients). Based on the
consistent construct correlation matrix, OLS can be used to estimate the
path coeﬃcients of recursive structural models. In case of nonrecursive
structural models, the 2SLS or threestage least squares (3SLS) esti
mator can be used, instead of the OLS estimator, to obtain consistent
path coeﬃcient estimates ([42][27,83],).
2.3. Substantial changes in the understanding of PLSPM
In recent years, PLSPM practices have been examined, debated, and
improved. The recent literature on PLSPM has been thus substantially
changed and improved, requiring that we identify the changes in the
understanding and practice of PLSPM. Table 1 summarizes these
changes in the understanding of PLSPM in the context of conﬁrmatory
and explanatory research.
Traditional view 1: PLSPM should be used primarily for ex
ploratory and earlystage research. Although PLSPM was originally
developed for exploratory research [2], enhancements such as PLSc and
the bootstrapbased test for overall model ﬁt make PLSPM suitable for
causal research, i.e., conﬁrmatory and explanatory research. However,
as originally developed, PLSPM can also be applied in descriptive and
predictive research [23,24].
Traditional view 2: PLSPM has advantages over covariancebased
estimators in the case of small sample sizes. The application of PLSPM
has often been justiﬁed by the size of the investigated sample [26]. It is
true that PLSPM is capable of estimating models with more parameters
than observations because it only estimates partial model structures,
but as with every other statistical method, the standard errors of the
estimates increase as the sample size decreases. Therefore, justifying
the use of PLSPM due to small sample sizes should be considered
cautiously. In this sense, claiming that PLSPM is particularly suitable
for small sample sizes can be regarded as problematic [26]. However, in
case of pure emergent variable models and small sample size con
stellations, PLSPM performs superior focusing on accuracy in the
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
4
estimation of path coeﬃcients compared to other variancebased esti
mators [89], i.e., generalized structured component analysis (GSCA
[90],) and regression with sum scores.
Traditional view 3: PLSPM cannot be used for nonrecursive
models. Although current userfriendly software packages do not yet
implement approaches to analyzing nonrecursive models, the as
sumption of recursivity can be relaxed by estimating the structural
model parameters using 2SLS or 3SLS instead of OLS ([42,27]). Another
approach to estimate nonrecursive structural models involves using the
construct scores (in the case of emergent variables) or the disattenuated
construct correlation matrix (in the case of latent variables) obtained by
PLSPM as input for the fullinformation maximumlikelihood (FIML)
estimator (e.g. [83],).
Traditional view 4: Model identiﬁcation plays no role when em
ploying PLSPM. PLSPM always employs composites to estimate the
model, whether or not the theoretical concepts are operationalized by a
measurement model or a composite model. Therefore, the identiﬁcation
rules for composite models must be applied [28,134]. In addition to
normalization of the weight vector such as ﬁxing the variance of each
composite to one, it must be ensured that each construct is connected
(by means of a nonzero path) with at least one other construct in the
model to ensure that the weights can be uniquely retrieved from the
population indicator variance–covariance matrix. Although all weight
vectors are scaled and no construct is isolated in the structural model,
the signs of the weights of a block of indicators are still ambiguous. The
dominant indicator approach is thus recommended to ﬁx construct
scores’orientation and thereby uniquely determine the weights [3].
Additionally, the structural model must be identiﬁed. As long as one
only considers recursive structural models with uncorrelated error
terms, identiﬁcation is straightforward as they are always identiﬁed
[1].
Traditional view 5: PLSPM has greater statistical power than the
ML estimator. In estimating latent variable models, Reinartz et al. [91]
claim that the power of statistical testing is higher when PLSPM esti
mates are employed than when ML estimates are used. However, these
ﬁndings are highly questionable, as they are based on traditional PLS
PM, which is known to produce inconsistent parameter estimates for
latent variable models. In line with Goodhue et al. [30], who show that
this alleged higher power goes along with an inﬂated type I error, we
conclude that preferring PLSPM over the ML estimator due to eﬃ
ciency is not a valid argument for latent variable models. Similar
ﬁndings observed for GSCA are applicable to PLSPM [92]. For emer
gent variable models, however, traditional PLSPM has shown favorable
properties among variancebased estimators, i.e., GSCA, sum scores,
and PLSPM [89].
Traditional view 6: Mode A can be used to consistently estimate
reﬂective measurement models. In its most modern appearance, PLS
PM can deal with models containing both emergent and latent vari
ables. Because PLSPM inherently estimates composite models [31,32],
it is the estimator of choice for models containing emergent variables
only [61]. In PLSPM, composite models can be consistently estimated
by Mode B [28]. To obtain consistent parameter estimates for reﬂective
measurement models, PLSc should be used. Estimates obtained by tra
ditional Mode A, or Mode B and C as well, suﬀer from the attenuation
bias [42]. In contrast, PLSc produces consistent and asymptotically
normal estimates for reﬂective measurement models by combining
Mode A estimates with a correction for attenuation [41]. Consequently,
the development of PLSc [41] enables PLSPM to analyze models con
taining both emergent and latent variables. However, scholars should
use Mode B in PLSPM (instead of PLSc) when they estimate pure
emergent variable models as PLSc has shown to produce biased esti
mates in this situation [61].
That being said, in the case of pure latent variable models, covar
iancebased estimators are preferred, as they are consistent and
asymptotically eﬃcient. However, the availability of asymptotically
eﬃcient estimators does not mean that scholars cannot use PLSPM to
estimate models of this kind. First simulation studies have investigated
the performance of PLSc to other estimators in this situation and in fact,
its usage for pure latent models is considered as acceptable and its bias
for ﬁnite samples has been evidenced of little practical relevance as
suming that the model is correctly speciﬁed ([42,61,93]).
Traditional view 7: Mode B can be used to estimate cau
sal–formative measurement models consistently. Mode B, or socalled
regression weights, cannot be used to consistently estimate cau
sal–formative measurement models, as this kind measurement model is
not identiﬁed by its own [94]. However, by the development of PLSc,
PLSPM can consistently estimate causal–formative measurement
models by means of the MIMIC model [16].
Traditional view 8: Overall ﬁt of models estimated by PLSPM
cannot be assessed. Due to recent developments in the context of PLS
PM, the overall ﬁt of models estimated by PLSPM can be assessed in
two nonexclusive ways: (1) by a bootstrapbased test for overall model
ﬁt[42], and (2) by measures of overall model ﬁt such as the SRMR
[22]. Both ways assess the diﬀerence between the empirical indicator
variance–covariance matrix and the estimated modelimplied counter
part. While the empirical indicator variance–covariance matrix con
tains the variances and covariances of the indicators based on the
sample, the estimated modelimplied counterpart contains the var
iances and covariances of the indicators implied by the model structure
based on the estimated model parameters. Typically, the discrepancy
between the two matrices is measured by the squared Euclidean dis
tance (d
ULS
), the geodesic distance (d
G
), and the SRMR. The bootstrap
based test for overall model ﬁt relies on a bootstrap procedure to obtain
the reference distribution of the distance measures under the null hy
pothesis that the population indicator variance–covariance matrix
equals the modelimplied counterpart [95]. Assuming a 5% level of
signiﬁcance, a discrepancy value larger than the 95% quantile of the
corresponding reference distribution leads to rejection of the null hy
pothesis. In addition to the bootstrapbased test for overall model ﬁt,
the values of the distance measures can be compared to threshold va
lues recommended by the literature to assess overall model ﬁt. Mea
sures of ﬁt are thus based on heuristic rules rather than on statistical
inference. Moreover, the suggested thresholds for the measures of
overall model ﬁt in the context of PLSPM, e.g., 0.080 for the SRMR, are
preliminary and need to be examined in more detail in future research.
Traditional view 9: Reliability of the construct scores obtained by
PLSPM should be assessed using Cronbach’sαand Dillon–Goldstein’sρ
(also called Jöreskog’sρor composite reliability). Traditionally, the
literature recommended determining the reliability of PLSPM construct
scores through Cronbach’sαand Dillon–Goldstein’sρ[12]. However,
considering this recommendation, several aspects of these two mea
sures have been widely neglected. First, Cronbach’sαand Dillon–
Goldstein’sρboth assess the reliability of sum scores (construct scores
obtained by equally weighted indicators) created for the latent variable.
However, PLSPM allows the indicator weights used for the calculation
of the construct scores to vary such that indicators with a smaller
amount of random measurement error take on greater weight than in
dicators containing a larger amount of random measurement error.
Consequently, the PLSPM construct scores contain less measurement
error and are generally more reliable than sum scores [22]. Second,
Cronbach’sαassumes tauequivalence, i.e., equal population covar
iances among the indicators belonging to one latent variable, an as
sumption that is rarely met in empirical research [34]. While Cron
bach’sαcan be calculated based on the sample variance–covariance
matrix, Dillon–Goldstein’sρis based on factor loadings. Because tra
ditional PLSPM is known to produce inconsistent factor loading esti
mates, Dillon–Goldstein’sρshould be based on consistent factor loading
estimates obtained by PLSc. Furthermore, as the assumptions of Cron
bach’sαand Dillon–Goldstein’sρare likely to be violated in empirical
research, their use cannot be recommended. However, the reliability
obtained by Cronbach’sαcan be regarded as a lower bound [33]. To
consistently estimate the reliability of latent variable scores obtained by
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
5
PLSPM’s Mode A, Dijkstra–Henseler’sρ
A
should be used [41].
Traditional view 10: Discriminant validity should be examined by
the Fornell–Larcker criterion. Although the Fornell–Larcker criterion
[96] had been long recommended to assess discriminant validity of
latent variables [12], it is ineﬀective in combination with traditional
PLSPM because it relies on consistent factor loading estimates [22]. To
overcome this drawback, the HTMT was developed to assess dis
criminant validity in the case of variancebased estimators [35]. The
HTMT can be assessed in two ways: (1) by comparing it to a threshold
value, and (2) by constructing a conﬁdence interval to examine whether
HTMT is signiﬁcantly smaller than a certain threshold value ([35]
[37],). For the ﬁrst approach, simulation studies suggest a threshold
value of 0.90 if constructs are conceptually very similar or 0.85 if the
constructs are conceptually more distinct ([35–37]). For the second
approach, prior methodological research has suggested to examine
whether HTMT is signiﬁcantly smaller than 1 [35] or below other
smaller values, e.g., 0.85 or 0.90 [37][37]. conclude that HTMT is a
reliable tool for assessing discriminant validity, whereas the For
nell–Larcker criterion has limitations that do not justify its reputation
for rigor and its widespread use in empirical research.
3. An illustrative example
3.1. Description of the example
We provide an illustrative IS example to present the latest en
hancements of PLSPM. Fig. 1 displays the proposed research model to
be estimated and tested. For this purpose, we use a simulated dataset of
300 observations, where each observation represents a ﬁrm –the unit of
analysis in the example. Because we use a simulated dataset, the ob
tained results are not scientiﬁcally relevant and any comparison of our
results to results of other empirical studies is only made for purely il
lustrative purposes.
Social executive behavior is the positive/negative behavior of the
ﬁrm’s top managers towards the ﬁrm’s use of social media for business
activities. Social employee behavior is the positive/negative behavior of
the ﬁrm’s employees towards the ﬁrm’s use of social media for business
activities. Social media capability refers to the ﬁrm’s ability to use and
leverage external social media platforms purposefully to execute busi
ness activities [77,97]. Business process performance is the ﬁrm’s
relative performance in key business processes as compared with its key
competitors [98]. Fig. 1 presents the research model of the example.
Based on prior IS research on social media in organizations [77,99], it is
assumed that social executive behavior and social employee behavior
positively aﬀect development of a ﬁrm’s social media capability, which,
in turn, may positively inﬂuence ﬁrm’s business processes performance.
The research model represents the theory proposed by an author/
team to be tested empirically. It illustrates how the theoretical concepts
are operationalized, i.e., how the indicators are related to the constructs
representing the theoretical concepts, and how these constructs are
connected. It usually includes several hypotheses to be tested. Based on
prior literature and anecdotal evidence from the real world, authors
should explain one by one why the hypothesized relationships are in
cluded and state expectations about their signs. These explanations are
omitted from this article because theoretical explanation of the re
lationships included in the example is beyond the paper’s scope. In our
example, the following three hypotheses are tested:
Hypothesis 1 (H1). Social executive behavior has a positive impact on the
development of social media capability.
Hypothesis 2 (H2). Social employee behavior has a positive impact on the
development of social media capability.
Hypothesis 3 (H3). Social media capability has a positive impact on
business process performance.
Although prior IS studies using PLSPM have investigated more
complex models (e.g., including a greater number of constructs, second
order constructs, moderation eﬀects), the presented research model
seems reasonable for our purposes due to the following three reasons:
(1) the goal of our study is to provide guidelines for using PLSPM in
causal IS research (conﬁrmatory and explanatory), employing the most
recently proposed standards. In sake of brevity, parsimony, and peda
gogical illustration for IS scholars, we think, in line with Occam’s razor,
the simpler the research model is, the better. "Parsimonious yet well
ﬁtting models are more likely to be scientiﬁcally replicable, explain
able" [100]. “Parsimony is also regarded by many social scientists as an
important ingredient in theory development (e.g. [101,102],), precisely
because it ‘explains much by little’([103]; p.153)”[100]; (2) the
considered model contains both latent variables (ovals) and emergent
variables (hexagons), and therefore, presents a situation in which PLS
Fig. 1. Research model (CV = Control variables).
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
6
PM can leverage its full capacities; and (3) the research model is the
oretically positioned in IS literature on business value of IT, where the
research models are usually parsimonious (e.g. [104,105],).
Theoretical concepts of behavioral research such as personality
traits, individual behavior, and individual attitude are usually re
presented as latent variables [106]. Because social executive behavior
and social employee behavior indicate types of individual behavior and
attitude, the two theoretical concepts were operationalized by reﬂective
measurement models. The ovals represent the latent variables and the
connected rectangles their indicators. Social executive behavior and
social employee behavior were each measured by four indicators,
(SEXB1SEXB4) and (SEMB1SEMB4), respectively. To obtain con
sistent estimates, the reﬂective measurement models were estimated by
PLSc [41].
In contrast, the theoretical concepts social media capability and
business process performance were considered as artifacts designed by
ﬁrms, executives, and/or employees. To operationalize these theore
tical concepts, the composite model was employed. In doing so, social
media capability is assumed to be composed of the following in
gredients: Facebook, Twitter, corporate blog(s), and LinkedIn cap
abilities [77], which are the ingredients that shape social media cap
abilities and are lowerorder capabilities. IS scholars and analysts from
other contexts (e.g., China) might consider the social media WeChat
(capability) as a key ingredient of social media capability and might
remove other, less relevant social media capabilities for Chinese ﬁrms.
This illustrates the potential of including/studying diﬀerent artifacts to
investigate the same phenomenon of interest for ﬁrms and society. The
artifact business process performance was also operationalized by a
composite model. It comprises supplier relations, product and service
enhancement, production and operations, marketing and sales, and
customer relations (Tallon and Pinsoneault 2011). Fig. 2 illustrates how
the artifact social media capability was operationalized. The hexagon
represents the construct, i.e., the emergent variable, while the rec
tangles represent ingredients forming the construct.
Besides the variables of main interest, ﬁrm size and industry were
included as control variables in the structural model to control for ef
fects of extraneous variables [80,81]. Firm size was modeled as a single
indicator composite to account for the role of diﬀerent ﬁrm sizes in
explaining business process performance through the natural logarithm
of the number of employees [76]. Due to the skewed distribution, it is
advisable to also apply the logarithm when the ﬁrm size is measured
through sales or total assets. Industry was incorporated as a composite
to control for an overall industry eﬀect on business process performance
and was shaped by three indicators, i.e., industry groups 1–3. The three
industry group dummies indicate whether an observation belongs to
industry 1, 2, or 3. Each industry assigns 0 if the observation does not
belong to the industry and 1 if the observation does. For example, the
variable industry group 1 will have a value of 0 for ﬁrms that do not
belong to industry group 1 and a value of 1 for ﬁrms that belong to
industry group 1. Although the dataset consists of four diﬀerent in
dustries, industry group 4 was not included to avoid perfect multi
collinearity. Therefore, group 4 became the reference category. The
weights of the industry composite can be interpreted as a simple con
trast, i.e., the diﬀerence in contribution to the total industry eﬀect
between the industry considered and the reference industry. Fig. 3
presents how industry, a nominal control variable, was included in the
structural model. IS scholars can use the dominant or the most im
portant industry as the reference group.
3.2. Statistical power analysis
A power analysis should typically be conducted before data col
lection. It gives insight into the minimum sample size required to obtain
suﬃcient statistical accuracy to detect eﬀects of interest existing in the
population. The power of a statistical test is the probability of rejecting
the false null hypothesis correctly, that is, of ﬁnding an eﬀect in the
sample if it indeed exists in the population [107]. Power analysis can be
conducted in two ways: (1) using heuristic rules such as Cohen’s power
tables and the inverse square root method [108,109], and (2) con
ducting a Monte Carlo simulation study [110]. The 10times rule [111]
or the minimum R
2
rule is no longer recommended to estimate the
minimum sample size [26,46,109].
To apply Cohen’s power tables for multiple regression analysis, four
parameters must be considered: eﬀect size (the extent to which the path
coeﬃcient/weight exists in the population), power (probability of re
jecting the false null hypothesis correctly), signiﬁcance level (prob
ability of rejecting the true null hypothesis incorrectly), and the number
of independent variables of the equation containing the considered path
coeﬃcient/weight. Once these values are determined, Cohen’s power
tables can be used to approximate the minimum required sample size in
order to achieve a certain power level. To determine the number of
required observations, analysts can assume a small eﬀect size (0.020 ≤
f
2
< 0.150) for a more conservative approximation or a medium to
large eﬀect size (0.150 ≤f
2
< 0.350 or f
2
≤0.350) for a more opti
mistic approximation of the required sample size. The statistical power
is usually set to 0.8, and a signiﬁcance level of 0.05 is assumed [107].
Often the equation with the highest number of independent variables is
considered to determine the minimum number of observations to re
liably detect an eﬀect. In our example, the composite model for busi
ness process performance has the highest number of independent
variables (supplier relations, product and service enhancement, pro
duction and operations, marketing and sales, and customer relations) in
an equation. Cohen’s power tables suggest a minimum sample size of 91
observations assuming a medium eﬀect size (f
2
= 0.150), statistical
power of 0.8, and signiﬁcance level of 0.05 [108]. Considering the
outcomes of the power analyses for our example, a sample size of 300
seems adequate to detect the eﬀects of interests. The inverse square root
method assumes that the estimates are standard normally distributed,
and approximates the standard error using
N
. Assuming a 5% sig
niﬁcance level, the required sample size to obtain a statistically sig
niﬁcant eﬀect (
N
ˆ
), if it exists in the population, can be approximated by
>
()
N
ˆβ
2.486

2
min , where β

min represents the minimum magnitude of the
coeﬃcient considered.
In addition to considering heuristic rules, IS scholars can conduct a
Fig. 2. Operationalization of the artifact social media capability.
Fig. 3. Modeling a nominal control variable.
Note: Industry group 4 is the reference group.
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
7
Monte Carlo simulation to examine the sample size required to reliably
detect eﬀects that exist in the population. A population model with the
same structure as the estimated model must be speciﬁed and all po
pulation parameter values need to be determined. In the second step,
the model is estimated several times and the rejection rates of the null
hypothesis signiﬁcance test for the coeﬃcients under examination are
considered, i.e., the statistical power. The appealing property of this
approach is that it can take into account various aspects of the model
and indicators incorporated, such as sample size, number of indicators,
their distribution, and magnitude of the eﬀect. Moreover, sensitivity
analyses can be conducted by changing the assumed population model
to see how these changes aﬀect the statistical power. While guidelines
have been proposed for pure latent variable models in the context of
PLSPM [110], development of guidelines for models containing
emergent variables is still an open issue.
3.3. Estimation
Various software packages –such as PLSGraph [112], SmartPLS
[113], WarpPLS [114], XLSTATPLS [115], and ADANCO [116]–can
be used to estimate the model with PLSPM. We used ADANCO 2.0.1
Professional for Windows (http://www.compositemodeling.com/)
[116] to estimate the empirical example. In the following, we used
Mode B to estimate composite models and PLSc to estimate reﬂective
measurement models. Moreover, we used the factor weighting scheme
for inner weighting and statistical inferences were based on the boot
strap procedure, relying on 4999 bootstrap runs.
Prior to model estimation, analysts should set a dominant indicator
in each composite and reﬂective measurement model. As the signs of
the weight and factor loading estimates of a block of indicators are
ambiguous, the dominant indicator is used to dictate the orientation of
a construct. A dominant indicator that is expected to positively corre
late with the construct is preferable. Face validity can be used to select
the dominant indicator –the indicator that is theoretically most re
levant and thus expected to positively correlate with the construct. In
our example, we chose SEXB2, SEMB2, SMC1, and BPP4 as dominant
indicators.
Before the model assessment, the researcher has to ensure that the
estimation is technically valid, i.e., that the estimation is admissible and
no Heywood case has occurred [117]. In doing so, he/she needs to
investigate whether the PLSPM algorithm has properly converged.
Additionally, in particular in the context of PLSc, he/she needs to en
sure that the construct correlation and the modelimplied indicator
correlation matrix are valid, i.e., positive semideﬁnite. To assess the
deﬁniteness of a matrix, userwritten Excel plugins for the calculation of
Eigenvalues can be used. A symmetric matrix is positive semideﬁnite if
all Eigenvalues are larger or equal to 0. Finally, all absolute factor
loading estimates and reliability estimates must be smaller or equal to
1. For our example, the solution was technically valid.
3.4. Assessment of reﬂective measurement and composite models
3.4.1. Evaluation of overall ﬁt of the saturated model
Table 4 summarizes the steps to assess reﬂective measurement and
composite models. Joint assessment should begin with the evaluation of
the overall ﬁt of a model with a saturated structural model [16,118],
that is, with conﬁrmatory factor/composite analysis. The estimated
model is as speciﬁed by analysts [118]. The saturated model corre
sponds to a model in which all constructs are allowed to be freely
correlated, whereas the concept’s operationalization is exactly as spe
ciﬁed by the analyst. The evaluation of the overall model ﬁt of the
saturated model is useful to assess the validity of the measurement and
the composite models, because potential model misﬁt can be entirely
attributed to misspeciﬁcations in the composite and/or measurement
models. Therefore, empirical support can be obtained for the con
structs, i.e., “Does a latent variable exist?”,or“Do the indicators form
an emergent variable?”Table 2 contains the values of the discrepancy
measures and 95% quantiles of their corresponding reference dis
tribution for our example. The value of the SRMR was below the re
commended threshold value of 0.080 [22,119]. However, the thresh
olds for the overall model ﬁt in the context of PLSPM should be
considered cautiously as they are preliminary and need to be examined
in more detail in future methodological research. Moreover, all dis
crepancy measures were below the 95% quantile of their reference
distribution (HI
95
). Empirical evidence was thus obtained for the latent
variables (social executive behavior and social employee behavior) as
well as the emergent variables (social media capability and business
process performance) incorporated in the model. In case of contra
dictory results for the measure of ﬁt (SRMR) and the test of overall
model ﬁt(d
ULS
, and d
G
), the test for overall model ﬁt is preferred, as it is
based on statistical inference rather than heuristic rules. Moreover, if
none of the discrepancies was below the 95% quantile of the corre
sponding reference distribution (HI
95
), analysts can evaluate whether
the discrepancies are at least below the 99% quantile (HI
99
) before ﬁ
nally rejecting the model. In the next step, each measurement and
composite model must be examined separately. Authors of future stu
dies in IS research are encouraged to report a table like Table 2.
Scholars should assess content validity for both kinds of constructs,
i.e., latent variables and emergent variables, by carefully considering
each type of construct and how the according concept has been oper
ationalized in prior research. In the case of emergent variables, how
ever, it might be desirable to modify the weighting scheme, number of
indicators, and content of the indicators as illustrated in the bread and
beer example. Finally, construct validity should be assessed. Depending
on the concept’s operationalization, this can be done in several non
exclusive ways.
3.4.2. Assessment of the reﬂective measurement model
For reﬂective measurement models in which latent variables re
present behavioral concepts such as social executive behavior and so
cial employee behavior, composite reliability, convergent validity, in
dicator reliability, and discriminant validity should be evaluated.
Dijkstra–Henseler’sρ
A
should be considered in assessing composite re
liability (the correlation between latent variable and construct scores).
A value of Dijkstra–Henseler’sρ
A
larger than 0.707 can be regarded as
reasonable, as more than 50% of the variance in the construct scores
can be explained by the latent variable [120]. Table 3 shows that the
values of Dijkstra–Henseler’sρ
A
for social executive behavior and social
employee behavior. Both are 0.938 and 0.913, and thus above the
suggested threshold of 0.707, indicating reliable construct scores.
Convergent validity is the extent to which the indicators belonging
to one latent variable actually measure the same construct. The average
variance extracted (AVE), typically used to assess convergent validity
[121], indicates how much of the indicators’variance can be explained
by the latent variable. An AVE larger than 0.5 has been suggested to
provide empirical evidence for convergent validity, as the corre
sponding latent variable explains more than half of the variance in the
belonging indicators, and consequently, all other latent variables ex
plain less than a half [96]. In our example, all AVE values are above 0.5
(0.788 and 0.716), indicating convergent validity (see Table 3).
Indicator reliability can be assessed through the factor loading es
timates. As factor loading estimates are standardized in PLSPM, the
Table 2
Results of the conﬁrmatory factor/composite analysis.
Discrepancy Overall saturated model ﬁt evaluation
Value HI
95
Conclusion
SRMR 0.030 0.049 Supported
d
ULS
0.210 0.546 Supported
d
G
0.049 0.221 Supported
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
8
squared factor loading estimate equals the estimated indicator relia
bility. It is generally advisable for factor loadings to be greater than
0.707, indicating that more than 50% of the variance in a single in
dicator can be explained by the corresponding latent variable. In this
context, also the signiﬁcance of the factor loading estimates should be
investigated. Somewhat lower values are not really problematic as long
as the construct validity and reliability criteria are met. Table 3 presents
the factor loading estimates from our example. They range from 0.769
to 0.912 and are all signiﬁcant on a 1‰level, suggesting that the
measures are reliable.
Discriminant validity entails that two latent variables that are meant
to represent two diﬀerent theoretically concepts are statistically suﬃ
ciently diﬀerent. To obtain empirical evidence for discriminant validity,
IS scholars should consider the HTMT [35]. The HTMT should be lower
than 0.85 (more strict threshold) or 0.90 (more lenient threshold) or
signiﬁcantly smaller than 1 ([36,37]). In our example, the HTMT of
social executive behavior to social employee behavior is 0.322, and
thus below the recommended threshold of 0.85 (and of 0.90). More
over, the onesided 95% percentile conﬁdence interval of HTMT does
not cover 1, that is, it is signiﬁcantly diﬀerent from 1. Scholars can also
follow [37] suggestion to test whether the HTMT is signiﬁcantly smaller
than 0.85 or 0.90.
3.4.3. Assessment of the composite model
The composite model requires an evaluation sui generis –an ex
amination of the composite model with respect to multicollinearity,
weights, composite loadings, and their signiﬁcances [43,76,77,83,122].
As composite models are typically estimated by Mode B (regression
weights) in PLSPM, collinearity among indicators forming an emergent
variable should be investigated by means of the variance inﬂation
factor (VIF), as high multicollinearity can lead to insigniﬁcant estimates
and unexpected signs of the weights. Traditionally, VIF values above 5
are regarded as indications of problematic multicollinearity [38,46].
Yet, typical phenomena of multicollinearity can also occur in case of
VIF values far below 5. For weights estimated by Mode A, an assessment
of multicollinearity is not necessary as these equal scaled covariances,
and therefore, ignores multicollinearity [32].
While weights show the relative contribution of an indicator to its
construct, composite loadings represent the correlation between the
indicator and the corresponding emergent variable; a loading shows the
absolute contribution of an indicator to its construct [122]. As weights
show the degree of importance of each indicator (ingredient) to the
construct, analysts should examine whether all indicator weight esti
mates are signiﬁcant. For indicators with nonsigniﬁcant weight esti
mates, one must investigate whether composite loading estimates are
statistically signiﬁcant and consider dropping any indicators with non
signiﬁcant weight and loading estimates. However, content validity
must be considered as well, because dropping an indicator may alter the
meaning of the emergent variable. IS scholars can thus decide to keep
an indicator with nonsigniﬁcant weight and loading to preserve the
construct’s content validity [46].
Table 3 shows that the VIF values for the indicators of the composite
models range from 1.020 to 1.134, suggesting that multicollinearity is
not a problem in our data. Moreover, all weight and composite loading
estimates show the expected sign and are signiﬁcant at a 5% sig
niﬁcance level except one (estimated weight of the indicator production
and operations of the construct business process performance). The
weight estimate of this indicator is 0.108, and its composite loading
estimate is 0.203
†
(close to be signiﬁcant). Considering content validity,
the indicator production and operations may include some of the ﬁrm’s
key business processes. Therefore, we decided to keep the indicator in
the empirical analysis to preserve content validity and avoid altering
the meaning of the emergent variable business process performance. In
this type of situation, analysts can also repeat the analysis, dropping the
questionable indicators to explore whether the decision to keep or drop
these indicators aﬀects the results. We dropped BPP3 and repeated the
empirical analysis. The results obtained were qualitatively identical,
suggesting that this decision does not aﬀect the research ﬁndings. One
might ask why BPP3 should be included when the results do not
change. This is theoretically justiﬁed, as it is diﬃcult to imagine a
company’s business process performance without production and op
erations processes, as these often are the heart of a company. Because
all reﬂective measurement and composite models from our example
show desirable properties, we proceed to evaluate the structural model.
Table 3
Measurement model evaluation.
Code Construct/indicator ρ
A
AVE VIF Weight Loading
Social executive behavior (1: Strongly disagree, 5: Strongly agree) (reﬂective measurement model, Mode A consistent (PLSc), SEXB2
as dominant indicator)
0.938 0.788
SEXB1 Behavior of top business executives towards adoption of social media is positive 0.278
***
0.905
***
SEXB2 Top business executives are positive in adopting social media for business activities 0.269
***
0.877
***
SEXB3 Top business executives support adoption of social media for business activities 0.263
***
0.856
***
SEXB4 Top business executives are willing to support adoption of social media in the ﬁrm 0.280
***
0.912
***
Social employee behavior (1: Strongly disagree, 5: Strongly agree) (reﬂective measurement model, Mode A consistent, SEMB2 as
dominant indicator)
0.913 0.716
SEMB1 Employee behavior towards adoption of social media is positive 0.301
***
0.901
***
SEMB2 Employees are positive to adopt social media in the ﬁrm 0.274
***
0.820
***
SEMB3 Employees support adoption of social media in the ﬁrm 0.257
***
0.769
***
SEMB4 Employees are willing to support adoption of social media in the ﬁrm 0.296
***
0.888
***
Social media capability: My ﬁrm has deliberately used and leveraged…for business activities (1: Strongly disagree, 5: Strongly agree)
(composite model, Mode B, SMC1 as dominant indicator)
SMC1 Facebook 1.037 0.229
***
0.397
***
SMC2 Twitter 1.032 0.489
***
0.627
***
SMC3 Corporate blog(s) 1.059 0.601
***
0.751
***
SMC4 LinkedIn 1.020 0.333
***
0.455
***
Business process performance: Relative to your key competitors, what is your performance in last three years in the following
business processes (1: Signiﬁcantly worse, 5: Signiﬁcantly better than my key competitors) (composite model, Mode B, BPP4 as
dominant indicator)
BPP1 Supplier relations 1.022 0.285
**
0.397
**
BPP2 Product and service enhancement 1.134 0.553
***
0.307
**
BPP3 Production and operations 1.105 0.108 0.203
†
BPP4 Marketing and sales 1.064 0.609
***
0.531
***
BPP5 Customer relations 1.063 0.629
***
0.591
***
Note:
†
p < 0.10,
*
p < 0.05,
**
p < 0.01,
***
p < 0.001, onetailed test.
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
9
Table 4
Steps to assess common factor and composite models.
Steps Type of construct Description Assessment
criterion
Decision criterion Interpretation
Testing the adequacy of reﬂective
measurement and composite
models
Latent and emergent
variable
Evaluate the overall ﬁt of the model with a saturated
structural model by investigating discrepancy between
empirical and modelimplied indicator variance–covariance
matrix
SRMR SRMR < 0.080 SRMR <
HI
95
A SRMR value smaller than 0.080 indicates an acceptable
model ﬁt[22]; however, these thresholds are preliminary and
need to be investigated in more detail
d
ULS
d
ULS
<HI
95
The null hypothesis that the population indicator
variance–covariance matrix equals the modelimplied
counterpart is not rejected. Hence, empirical evidence for the
model is given when the value of the discrepancy measure is
below the 95% quantile of its corresponding reference
distribution
d
G
d
G
<HI
95
Evaluating content validity Latent and emergent
variable
How the corresponding theoretical concepts have been
operationalized (measured or built) in prior research
Flexibility in the case of artifacts represented by an emergent variable (bread and beer analogy)
Evaluating reliability of construct
scores
Latent variable Evaluating whether the construct scores reliably represent
the underlying construct
ρ
A
ρ
A
> 0.707 More than 50% of the variance in the construct scores can be
explained by the underlying latent variable
Evaluating indicator reliability Evaluating whether indicators are reliable Factor loading
estimates
Factor loading estimates >
0.707
More than 50% of the indicator’s variance is explained by the
latent variable
Factor loading
signiﬁcance
Signiﬁcant at 5%
signiﬁcance level
Evaluating convergent validity Latent variable Evaluating the share of variance in the indicators that is
explained by the underlying latent variable
AVE AVE > 0.5 More than 50% of indicators’variance is explained by the
underlying latent variable
Evaluating discriminant validity Latent variable Evaluating whether two latent variables are statistically
diﬀerent
HTMT HTMT < 0.85 (or whether
the HTMT is signiﬁcantly
smaller than 1)
Factors are statistically diﬀerent and thus have discriminant
validity
Multicollinearity Emergent variable
(estimated by Mode
B)
Evaluating how the standard errors of the weight estimates
are aﬀected by the correlations of the indicators
VIF VIF < 5 If the estimates suﬀer from multicollinearity, weights obtained
by Mode A or predetermined weights can be used
Weights Emergent variable Evaluating relative contribution of an indicator to its
construct
Weights’value and
signiﬁcance
Signiﬁcant at 5%
signiﬁcance level
Each indicator contributes signiﬁcantly to the emergent
variable
Loadings Emergent variable Evaluating absolute contribution of an indicator to its
construct
Loading
signiﬁcance
Signiﬁcant at 5%
signiﬁcance level
Each indicator contributes to the emergent variable in a
statistically signiﬁcant way
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
10
3.5. Assessment of the structural model
In evaluating the structural model, the analyst should examine the
overall ﬁt of the estimated model, the path coeﬃcient estimates, their
signiﬁcance, the eﬀect sizes (f
2
), and the coeﬃcient of determination
(R
2
,[3,123]). Analysts should focus speciﬁcally and primarily on
overall model ﬁt in conﬁrmatory research and primarily on R
2
, the path
coeﬃcient estimates, and the eﬀect sizes in explanatory research [24].
Table 7 summarizes the steps to follow in evaluating the structural
model.
3.5.1. Evaluation of the overall ﬁt of the estimated model
First, analysts should evaluate the overall ﬁt of the estimated model
through the bootstrapbased test of overall model ﬁt and the SRMR as a
measure of approximate ﬁt to obtain empirical evidence for the pro
posed theory. Analysis in conﬁrmatory research without assessing the
overall model would be incomplete as this means ignoring empirical
evidence for and also against the proposed model and the postulated
theory [124]. Without assessing the model ﬁt, a researcher would not
obtain any signal if he or she had incorrectly omitted an important
eﬀect in the model. Because the test for overall model ﬁt was in
troduced only recently in the context of PLSPM, the vast majority of
models estimated by PLSPM in past IS research has not been evaluated
in this respect. However, because the overall model ﬁt can now be
tested in the context of PLSPM, we encourage IS scholars to take this
evaluation very seriously in causal research. In our example, all values
of discrepancy measures were below the 95% quantile of their corre
sponding reference distribution (HI
95
), indicating that the estimated
model was not rejected at a 5% signiﬁcance level (see Table 5).
Moreover, the SRMR was below the preliminary suggested threshold of
0.080, indicating an acceptable model ﬁt. This result suggests that the
proposed model is well suited for conﬁrming and explaining the de
velopment of social media capability and business process performance
among ﬁrms. While the model ﬁt suggests that there is a possibility that
the world functions according to the speciﬁed model, the model can
still be misspeciﬁed in the sense of overparameterization, i.e., the
model contains superﬂuous zeropaths [22]. Neither the bootstrap
based test of model ﬁt nor the SRMR punishes for unnecessary paths,
i.e., neither of them rewards parsimony. Regardless of whether one
conducts conﬁrmatory or explanatory research, it remains indis
pensable to assess all path coeﬃcients and their signiﬁcance. Table 6
presents the construct correlation matrix.
3.5.2. Evaluation of path coeﬃcients and their signiﬁcance levels
The path coeﬃcient estimates are essentially standardized regres
sion coeﬃcients, whose sign and absolute size can be assessed. These
coeﬃcients are interpreted as the change in the dependent construct
measured by standard deviations, if an independent construct is in
creased by one standard deviation while keeping all other explanatory
constructs constant (ceteris paribus consideration). For example, in
creasing social media capability by one standard deviation will increase
business process performance by 0.515 standard deviations if all other
variables are kept constant. Statistical tests and conﬁdence intervals can
be used to draw conclusions about the population parameters. For
conﬁdence intervals, the percentile bootstrap conﬁdence interval is
recommended [125]. As shown in Fig. 4, the path coeﬃcient estimates
for the hypothesized relationships included in the example range from
0.396 to 0.515, and are all signiﬁcant at a 5% signiﬁcance level except
the eﬀect of the two control variables, ﬁrm size and industry. A path
coeﬃcient estimate is considered as statistically signiﬁcant diﬀerent
from zero at a 5% signiﬁcance level when its pvalue is below 0.05 or
when the 95% bootstrap percentile conﬁdence interval constructed
around the estimate does not cover the zero.
3.5.3. Evaluation of eﬀect sizes
The practical relevance of signiﬁcant eﬀects should be investigated
by considering the eﬀect sizes of the relationships between the con
structs. The eﬀect size is a measure of the magnitude of an eﬀect that is
independent of sample size. The f
2
values ranging from 0.020 to 0.150,
0.150 to 0.350, or larger or equal to 0.350, indicating weak, medium, or
large eﬀect size respectively [108]. Just as all actors in a movie cannot
play a leading role, it is unusual and unlikely that most constructs will
have a large eﬀect size in the model. We provide this clariﬁcation be
cause scholars often expect/selfdemand that all/most of their eﬀect
Table 5
Structural model evaluation.
Relationship Path coeﬃcient
Social executive behavior →Social media capability (H1) 0.422
***
(8.830) [0.327, 0.512]
Social employee behavior →Social media capability (H2) 0.396
***
(8.052) [0.300, 0.490]
Social media capability →Business process performance (H3) 0.515
***
(10.232) [0.426, 0.609]
Firm size →Business process performance (control variable) 0.022 (0.305) [0.128, 0.160]
Industry →Business process performance (control variable) 0.030 (0.312) [0.161, 0.174]
Endogenous variable R
2
Social media capability 0.443
Business process performance 0.267
Overall ﬁt of the estimated model Value HI
95
SRMR 0.032 0.049
d
ULS
0.232 0.558
d
G
0.052 0.222
Eﬀect size f
2
Social executive behavior →Social media capability (H1) 0.286
Social employee behavior →Social media capability (H2) 0.252
Social media capability →Business process performance (H3) 0.362
Firm size →Business process performance (control variable) 0.001
Industry →Business process performance (control variable) 0.001
Note: tvalues (onetailed test) are presented in parentheses. Percentile bootstrap conﬁdence intervals are presented in brackets.
Table 6
Construct correlation matrix.
123456
1. Social executive
behavior
1.000
2. Social employee
behavior
0.322 1.000
3. Social media capability 0.550 0.532 1.000
4. Business process
performance
0.216 0.309 0.515 1.000
5. Firm size −0.025 −0.048 −0.014 0.016 1.000
6. Industry 0.069 0.073 0.010 0.036 0.038 1.000
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
11
magnitude be large –an unrealistic expectation. This cautionary note
extends to supervisors’expectations for their Ph.D. students (as illu
strated by [126]). In our sample, the f
2
values for the hypothesized
relationships range from 0.252 to 0.362 (medium to large).
3.5.4. Evaluation of R
2
R
2
is used to assess goodness of ﬁt in regression analysis [87]. In the
case of models estimated by OLS, the R
2
value gives the share of var
iance explained in a dependent construct. Thus, it provides insights into
a model’s insample predictive power [127]. Moreover, R
2
forms the
basis for several innovative model selection criteria ([37,100]). Re
porting R² makes PLSPM research futureproof in this regard, because
the new model selection criteria can still be calculated ex post as long as
the R² values are given.
The expected magnitude of R
2
depends on the phenomenon in
vestigated. As some phenomena are already quite well understood, one
would expect a relatively high R². For phenomena that are less well
understood, a lower R² is acceptable. The R² values should be judged
relative to studies that investigate the same dependent variable. In our
example, the R
2
values for social media capability and business process
performance are 0.443 and 0.267, respectively. The study of social
media in organizations is in its initial stages [77]. Braojos et al. [128]
report an R
2
value of 0.541 for social media capability. In our example,
social executive behavior and social employee behavior explain 44.3%
of variance in development of social media capability, using two un
explored exogenous variables for social media capability (social ex
ecutive behavior and social employee behavior). Considering explained
variance in prior IS research and the originality of our two exogenous
variables in inﬂuencing social media capability, an R
2
of 0.443 seems to
be an excellent value.
The models of [129,130]) explain 49% and 43.9% of the variance in
business process outcomes. In our example, social media capability,
ﬁrm size, and industry explain 26.7% of the variance in business pro
cess performance. Although this R
2
value is somewhat smaller than
those obtained by [129,130]), it can be considered as satisfactory be
cause our model is the ﬁrst using social media capability to explain
business process performance individually. The independent variables
explaining business process outcomes in [129,130]) work refer to other
IT resources (e.g., IT assets, enterprise resource planning capabilities)
diﬀerent from social media capability. This subsection illustrates by our
ﬁctive example how analysts can report and compare their R
2
values.
4. Discussion and conclusions
IS research often tackles complex research problems and questions
that require conceptualization and operationalization of diﬀerent types
of theoretical concepts, i.e., behavioral concepts and artifacts, as well as
the estimation of their relationships. PLSPM is a suitable estimator for
this purpose. How can one perform and report an impactful analysis
using PLSPM in IS research following the recent improvements in PLS
PM? This study provides thorough guidelines on PLSPM in the fra
mework of causal (conﬁrmatory and explanatory) research, employing
the latest standards recommended. In doing so, it addresses the why
and how to perform and report a PLSPM estimation in conﬁrmatory
and explanatory IS research, illustrated by a ﬁctive example on business
value of social media. This is the key contribution of this paper to the
methodological literature in IS empirical research.
In the last ﬁve years, methodologists have overcome major weak
nesses of traditional PLSPM, such as its inconsistency for latent vari
able models and lack of a test for overall model ﬁt. To beneﬁt from all
these enhancements, IS scholars need new guidelines for empirical
studies that incorporate all these recent new developments and insights,
as most of the guidelines papers on PLSPM in the IS research were
published before 2013 (e.g., [12,43–45]). Although several recent
scholarly textbooks and articles (e.g. [46–48],) have provided guide
lines for causal research that cover some of the latest enhancements to
Table 7
Steps to follow in performing structural model evaluation.
Steps Description Criterion Suggested threshold Interpretation
Overall ﬁt of estimated model Evaluating overall ﬁt of the estimated model by evaluating
discrepancy between the empirical indicator
variance–covariance matrix and its modelimplied counterpart
SRMR SRMR < 0.080 SRMR < HI
95
Value of discrepancy measure below the 95% quantile of
the corresponding reference distribution provides
empirical evidence for the postulated model. In other
words, it is possible that the empirical data stem from a
world that functions as theorized by the model
d
ULS
d
ULS
<HI
95
d
G
d
G
<HI
95
Consider path coeﬃcient
estimates and their
signiﬁcance levels
Standardized regression coeﬃcients are interpreted as change
in standard deviations of the dependent variable if an
independent variable is increased by one standard deviation
while all other independent variables in the equation remain
constant
Path coeﬃcient
estimates and their
signiﬁcance level
Signiﬁcant at 5% signiﬁcance level, i.e., pvalue
<5%
Eﬀect of independent variables on dependent variables is
statistically signiﬁcant
Consider eﬀect sizes (f
2
) Measure of the magnitude of an eﬀect that is independent of
sample size. Give an indication about the practical relevance of
an eﬀect
f
2
value f
2
< 0.020: no substantial eﬀect 0.020 ≤f
2
<
0.150: weak eﬀect size 0.150 ≤f
2
< 0.350:
medium eﬀect size f
2
≥0.350: large eﬀect size
Degree of strength of an eﬀect
Evaluate R
2
Explained variance of an dependent construct R
2
When the phenomena are already quite well
understood, one would expect a high R². When the
phenomena are not yet well understood, a lower
R² is acceptable
Degree of variance explained for phenomenon under
investigation
J. Benitez, et al. Information & Management xxx (xxxx) xxx–xxx
12
PLSPM, neither of these PLSPM guidelines for causal research covered
the full range of recent developments, nor did they introduce any new
framework for applying PLSPM and reporting its outcomes. To address
this shortcoming in the existing IS literature, this paper provides up
dated guidelines on the use of PLSPM in assessment of reﬂective
measurement models, composite models, and structural models. To the
best of our knowledge, the proposed guidelines take into account all
recent enhancements. An application of the guidelines is illustrated
using a parsimonious IS research example on business value of social
media.
In contrast to prior guidelines [11,12], our article introduces the
artifact –a humanmade/ﬁrmmade object –as a new kind of theore
tical concept and shows how this type of theoretical concept can be
operationalized by means of the composite model. Because a signiﬁcant
proportion of theoretical concepts in IS research are humanmade/ﬁrm
made, one can expect the composite model to become the dominant
conceptualization in IS research in the coming years. Against this
background, we highlight the usefulness of model testing in con
ﬁrmatory and explanatory research using PLSPM. Without considering
its results, it is hardly possible to obtain empirical evidence for or
against a scholar’s proposed theory. Finally, we strongly recommend
that scholars employ consistent estimators, using PLSc when the theo
retical concept is operationalized by a measurement model.
As our article about the use of PLSPM for causal research is limited
to linear, recursive models containing only ﬁrstorder constructs, future
IS research should develop additional updated guidelines incorporating
recent developments for more complex models, such as models con
taining moderation eﬀects, secondorder emergent variables of emer
gent variables, and for composite models that account for more com
plex relationships between the indicators and the emergent variable.
Although some steps have been made using PLSPM to deal