Conference PaperPDF Available

Preliminary Analysis of Data mining Adoption in Italian SMEs using PLS-SEM Method

Authors:
  • University of Zagreb Faculty of Economics and Business
  • Faculty of Economics and Business University of Zagreb

Abstract

Data mining has become the omnipresent requirement in the business environment of abundant data. At the beginning of data mining development, the one-solution approach was mostly used. In the last 20 years, mostly due to the development of various educational programs and the availability of open-source software, data mining has evolved into a mature-leveled software solution that does not necessarily require high-level expertise since numerous prototype solutions have emerged. Small and medium enterprises (SMEs) are still lagging behind large companies in data mining implementation, which raises the question of the determinants of data mining adoption. This paper aims to test the preliminary model of data mining adoption using the Technology-Organization-Environment framework. The survey on the sample of Italian SMEs was conducted to test the model, and the collected company data was analyzed using the PLS-SEM approach.
Preliminary Analysis of Data mining Adoption in
Italian SMEs using PLS-SEM Method
M. Pejić Bach*, A. Topalović ** and I. Jajić*
* University of Zagreb, Faculty of Economics and Business, Zagreb, Croatia
** AISMA s.l.r., Milan, Italy
mpejic@net.efzg.hr, tplmra@gmail.com, ijajic@net.efzg.hr
Abstract - Data mining has become the omnipresent
requirement in the business environment of abundant data.
At the beginning of data mining development, the one-
solution approach was mostly used. In the last 20 years,
mostly due to the development of various educational
programs and the availability of open-source software, data
mining has evolved into a mature-leveled software solution
that does not necessarily require high-level expertise since
numerous prototype solutions have emerged. Small and
medium enterprises (SMEs) are still lagging behind large
companies in data mining implementation, which raises the
question of the determinants of data mining adoption. This
paper aims to test the preliminary model of data mining
adoption using the Technology-Organization-Environment
framework. The survey on the sample of Italian SMEs was
conducted to test the model, and the collected company data
was analyzed using the PLS-SEM approach.
Keywords - data mining, adoption, SMEs, TOE
framework, PLS-SEM
I. INTRODUCTION
Previous research on the acceptance of data mining
using theoretical frameworks is rare, stemming from the
novelty of this technology and its complexity. [1]
researched the determinants of data mining acceptance
using the TAM framework (Technology-Acceptance-
Model), but it also investigates data mining acceptance by
the individual and not at the enterprise level. On the other
hand, the topic of this paper is the intensity of the use of
data mining at the company level and not at the level of
the individual within the organization.
Previous research on data mining applications in small
and medium enterprises (SMEs) mainly focuses on
presenting specific case studies of data mining but not on
researching the determinants of its acceptance. [2]
presented an analysis of a case study of official statistics
in SMEs in the UK. [3] investigates possible data mining
applications in manufacturing SMEs. [4] shows the
concrete application of data mining for financial risk
management in SMEs. [5] explores the possibilities of
applying the disclosure of knowledge from databases to
protect privacy in SMEs. [6] presents an overview of
possible data mining applications in SMEs in the transport
industry, while [7] presents case studies of data mining
applications in banking when SMEs are banks' clients.
Even though SMEs make up the largest part of
enterprises in most countries, as is the case in the Republic
of Italy, there is a lack of scientific research on knowledge
discovery in SME databases using the Technology-
Organization-Environment (TOE) framework [8].
However, the usage of the TOE framework has shown the
greatest potential for assessing the acceptance of new
technologies. So far, it has not been used to research the
acceptance of data mining.
This paper aims to present a preliminary study on the
intensity of the use of data mining in Italian SMEs.
Company data was collected on a sample defined by the
snowball method [9] and was analyzed using a research
questionnaire defined by the TOE framework. The paper's
contribution is the development of the preliminary model
of determinants of intensity using data mining methods in
Italian SMEs.
The paper consists of the following parts. A brief
overview of previous research and research gap is outlined
in the introduction. The second chapter presents the data
collected, the research instrument, and the statistical
methods used. In the third chapter, the measurement
model was validated, and the PLS-SEM model was
developed, while the bootstrap analysis was used to test
the research hypothesis. Finally, the concluding remarks
are presented in the last chapter.
II. METHODOLOGY
 
The research model is based on the TOE framework,
which includes the four elements, defined as latent
variables: Technological aspects of data mining
implementation (TECH), Organizational aspects of data
mining implementation (ORG), Environmental aspects of
data mining implementation (ENVIN), and Intensity of
usage of data mining (INTENS). Additionally, the
Business performance (PERFORM) is included in the
model to investigate the impact of the Intensity of data
mining usage on Business performance.
Figure 1 presents the research model that contains the
four hypotheses:
H1: Technological aspects of data mining
implementation affect the intensity of usage of
data mining
H2: Organizational aspects of data mining
implementation affect the intensity of usage of
data mining
H3: Environmental aspects of data mining
implementation affect the intensity of usage of
data mining
H1: Intensity of usage of data mining affect the
business performance
Figure 1. Research model
 
The focus of this research is SMEs in the Republic of
Italy, which represent the target population of the
research. This paper used the number of employees as the
main size criterion, and small enterprises were defined as
those with less than 50 employees and medium-sized
enterprises with 51 to 250 employees. A list of companies
generated using the Register of Business Entities of the
Italian Chamber of Commerce for 2018 was used as a
sample frame for a systematic random sample. Since the
paper investigates the continuous use of data mining,
companies were examined for using this system. The
study included only those companies that have used the
discovery of knowledge from databases in their business
for a longer time (at least one year).
Companies were sampled using the snowball method
through a questionnaire. Snowball method is a non-
probabilistic sampling method where sample members
recommend contacts of potential sample members based
on their contacts. This method is often used for hidden
populations. Since discovering knowledge from databases
is rarely used in SMEs, it can be estimated that it is a
hidden population.
Previous research in business informatics often uses the
snowball method as a sampling method for data analyzed
by structural equations [10]. The community of data
mining experts is interconnected, and it is specialized
knowledge that such experts share and can be considered a
knowledge community [11]. The research was completed
after the respondents could not recommend a larger
number of contacts, i.e., when they started recommending
included contacts [9]. In this way, the respondents'
recommendations identified 70 SMEs that accepted
participating in the research. Such a small sample results
from the fact that SMEs still rarely use data mining
methods in their day-to-day business [2].
 
The company data was collected using the
questionnaire that contains the research instrument with
five latent variables: Technology, Organization,
Environmental, Usage intensity, and Performance. Each
of the latent factors consists of several manifest variables.
The manifest variables were defined based on the initial
interviews with experts. Latent and manifest variables in
the questionnaire are described in the following part of
this subchapter. The respondents evaluated from 1 to 5 to
what extent they agree with the statements in the manifest
variables (1-not agree at all, 5-fully agree).
Latent factor Technology (TECH) is composed of two
manifest factors: TECH1: The information system of our
company enables easy integration of data from different
subsystems, and TECH2: Our company provides
continuous education of IT staff from various fields (i.e.,
certificates).
Latent factor Organization (ORG) is composed of three
manifest factors: ORG1: The company's management
accepts the risk of accepting new information systems and
software solutions and their continuous application;
ORG2: Business rules (i.e., in the procurement of new
information solutions) do not create obstacles to the
acceptance of new information systems and software
solutions, and their continuous application and ORG3:
The company is using the automatic method for consistent
data collection.
Latent factor Environment (ENVIR) is composed of
two manifest factors: ENVIR1: Barriers to the entry of
new competitors into our company's market are low, and
ENVIR2: The bargaining power of our company's
suppliers is high.
Latent factor Organizational performance (PERF)
consists of three manifest factors: ORG1: The profitability
of our company is significantly higher than the
competition; ORG2: Customer satisfaction of our
company is significant ORG3: Innovation of
products/services of our company is significantly higher
than the competition.
Latent variable Intensity of data mining use (INTENS)
consists of two manifest variables: INTENS1: Discovery
of knowledge from databases is used continuously in our
company to improve output logistics, and INTENS2:
Discovery of knowledge from databases is used
continuously in our company to improve marketing and
sales.
 
The PLS-SEM method evaluated the proposed model,
considering various measures. ADANCO 2.0.1 software
was used for the model development and evaluation.
It is important to check the composite reliability when
evaluating the measurement model. The indicator load
should be above 0.700, which explains more than 50
percent of the variance [12]. Values close to -1 or 1
strongly influence factors on the variable.
In addition to composite reliability, it is necessary to
check the internal reliability using Cronbach's Alpha
values and [13] composite reliability.
Acceptable values range from 0.60 to 0.70.
Additionally, the Dijkstra-Henseler rho indicator was used
for the following reasons: (i) Cronbach's alpha indicators
may be too conservative, (ii) composite reliability may be
too liberal [12].
The convergent validity, as well as the value of the
average derived variance, are then checked. The
constructs' load should be greater than 0.50 of the value
recommended for the extracted variance (AVE). It is
important to point out that higher factor loads indicate a
sufficient deviation from this variable for developing the
PLS-SEM model.
Discriminant validity is measured by cross-loadings
(Fornell - Larcker criterion), which verifies that the load
of the indicator with the associated variable must be
greater than its load on other variables, i.e., that the second
root of the AVE of each variable is greater than all
correlations variables with other variables in the model
(Heterotrait-Monotrait (HTMT) correlation ratio) [14].
The collinearity test is performed to check the bias of
construction methods and elements. The multicollinearity
problem does not exist if the variance inflation factor
(VIF) values are lower than 0.5 [15].
Cohen's f2 indicators of the relative influence of
exogenous on the endogenous variable were also used.
Indicator f2 represents an estimate of the strength of the
relative influence of exogenous on the endogenous
variable [14]. The strength of the impact can be as
follows: (i) Weak influence of the independent on the
dependent variable is valid for values from 0.02 to 0.15,
(ii) Medium impact is from 0.15 to 0.35, (iii) Strong
influence of the independent on the dependent the variable
holds for values above 0.35 [16].
The unweighted least squares mismatch (dULS)
quantifies how much the empirical correlation matrix
differs from the correlation matrix implied by the model.
Theoretically, the model fits better if the dULS measure is
determined for the estimated model and the saturated
model is lower. ADANCO 2.0.1 uses bootstrapping to
provide a 95% percentage ("HI95") and a 99% percentile
("HI99") for dULS if the theoretical model was true or if
the dULS measure is less than the specified values.
The standardized root square residual (SRMR) is a
measure that quantifies how strongly the empirical
correlation matrix differs from the correlation matrix
implied by the model.
The bootstrap analysis was performed to assess the
relationship between independent and dependent
variables. Bootstrapping analysis was performed in
SmartPLS 3.0 with 1%, 5%, and 10% probabilities to
examine whether the relationships between the
independent and dependent variables were significant. The
most used critical values for the two-way tests are 1.65 for
a 10% probability level, 1.96 for a 5% probability level,
and 2.57 for a 1% probability level [14].
III. RESULTS
 
Table 1. shows the composite reliability indicators of
latent variables.
The values of composite reliability indicators for all
latent variables should be above the critical value of 0.7,
which confirms the existence of a high degree of internal
consistency [17], which can be seen from the table below.
Composite reliability for all variables (TEHN, ENVIR,
PERF, ORG, INTENS) is above the critical value of 0.7
and ranges from 0.9098 to 0.9515, indicating a high
degree of internal consistency [14].
TABLE 1. COMPOSITE RELIABILITY INDICATORS OF LATENT VARIABLES
  !
"#$ 0.9098
$%& 0.9515
'( 0.9143
)* 0.9098
&$"$ 0.9377
Source: Author's research; June-December, 2019
The results of Cronbach's alpha indicators of latent
variables are shown in Table 2.
The Cronbach's alpha index value should be above the
critical value of 0.7 [14], as seen from the table below.
Values for all variables (TEHN, ENVIR, PERF, ORG,
INTENS) range from 0.8128 to 0.8983, indicating a high-
reliability level in measuring latent variables.
TABLE 2. CRONBACH'S ALPHA INDICATORS OF LATENT VARIABLES
 +,!
"#$ 0.8128
$%& 0.8983
'( 0.8589
)* 0.8526
&$"$ 0.8670
Source: Author's research; June-December, 2019
Table 3. shows the value of Dijkstra-Henseler's rho
indicators of latent variables, which represent an
additional measure of the reliability of constructs [18], and
are located between Cronbach's alpha and composite
reliability.
Composite reliability indicators based on Dijkstra-
Henseler's rho for all variables (TEHN, ENVIR, PERF,
ORG, INTENS) are above the critical value of 0.7 and
range from 0.8589 to 0.9568.
TABLE 3. DIJKSTRA-HENSELER'S RHO INDICATORS OF LATENT
VARIABLES

-.#
 !
"#$ 0.9568
$%& 0.9043
'( 0.8589
)* 0.8748
&$"$ 0.8671
Source: Author's research; June-December, 2019
Table 4 shows the external load indicators measured
through the outer load indicators and the Average
Variance Extracted (AVE) value, which are above the
critical value of 0.50 [19].
TABLE 4. EXTERNAL LOAD INDICATORS
& "#$ )* $%& &$"$ '(
"#$/ 0.9541
"#$0 0.8714
)*/ 0.8717
)*0 0.9093
)*1 0.8522
$%&/ 0.9480
$%&0 0.9573
&$"$/ 0.9386
&$"$0 0.9403
'(/ 0.8499
'(0 0.9192
'(1 0.8804
Source: Author's research; June-December, 2019
Table 5. shows the values of the average extracted
variance of latent variables (AVE) that should be above
the critical value of 0.5 [20].
The values of the average extracted variance of latent
variables for all variables (TEHN, ENVIR, PERF, ORG,
INTENS) are above the critical value of 0.5 and range
from 0.7710 to 0.9075.
Table 5. the average derived variance of latent variables
 "223%!
"#$ 0.8349
$%& 0.9075
'( 0.7807
)* 0.7710
&$"$ 0.8826
Source: Author's research; June-December, 2019
Table 6. shows the discriminant validity of the Fornel-
Larcker criterion [17].
The extracted variances of each construct's latent
variables (AVE) need to be compared with the square of
the correlation. It is possible to conclude that the results
from Table 7. confirm the examination of discriminant
validity because all excluded values of variance of latent
variables (AVE) are higher than the values of inter-
correlation.
TABLE 6. Discriminant validity by Fornel-Larcker
criteria
Construct TEHN ENVIR PERF ORG INTENS
TEHN 0,8349
ENVIR 0,2002 0,9075
PERF 0,2257 0,1414 0,7807
ORG 0,3419 0,2470 0,2128 0,7710
INTENS 0,1662 0,2348 0,4395 0,2882 0,8826
Source: Author's research; June-December, 2019
Note: The values of the AVE indicator are on the diagonal
Table 7. shows the discriminant validity using the
HTMT criteria, as the values for all variables (TEHN,
ENVIR, PERF, ORG, INTENS) are below 0.90.
HTMT values lower than 0.90 indicate discriminant
validity between the two reflective constructs [21].
TABLE 7. Discriminatory validity using HTMT criteria
Construct TEHN ENVIR PERF ORG INTENS
TEHN
ENVIR 0.7330
PERF 0.6945 0.5971
ORG 0.8329 0.7411 0.7640
INTENS 0.6612 0.6939 0.8632 0.7654
Source: Author's research; June-December, 2019
Table 8. shows the variance inflation factors (VIF),
which should be above 5.
The values for all variables of the structural model
range from 1.8570 to 3.5419, which suggests no
multicollinearity problem.
TABLE 8. Factors of inflation variance (VIF)
Indicator TEHN ORG ENVIR INTENS PERF
TEHN1 1.8821
TEHN2 1.8821
ORG1 2.0336
ORG2 2.2639
ORG3 2.0413
ENVIR1 2.9835
ENVIR2 2.9835
INTENS1 2.4132
INTENS2 2.4132
PERF1 1.8570
PERF2 2.9739
PERF3 2.4303
Source: Author's research; June-December, 2019
 '4.5
Table 9. shows the coefficients of determination of R2
(R Square) constructs [14].
The coefficient of determination values for both
variables indicates good model performance: PERF
(0.4395) and INTENS (0.3549).
TABLE 9. CONSTRUCTS COEFFICIENTS OF DETERMINATION
Construct R2Adjusted R2
PERF 0.4395 0.4313
INTENS 0.3549 0.3256
Source: Author's research; June-December, 2019
Table 10. presents Cohen's f2 indicators of the relative
influence of exogenous on the endogenous variable.
By determining the values of individual path
coefficients, it is possible to evaluate the model where
values closer to 1.00 represent a statistically strong
positive relationship between variables. In contrast, values
closer to 0 represent a statistically weaker relationship
[14].
The independent variables TEHN (0.0058), ENVIR
(0.08270), ORG (0.1157) have a weak influence on the
dependent variable INTENS, while the independent
variable INTENS has a strong influence on the dependent
variable PERF (0.7841).
TABLE 10. Cohen's f2 indicators of the relative influence
of exogenous on the endogenous variable
Effect Cohen's f2 Explanation
TEHN ->
INTENS
0.0058 Weak influence
ENVIR ->
INTENS
0.0827 Weak influence
ORG -> INTENS 0.1157 Weak influence
INTENS -> PERF 0.7841 Strong influence
Source: Author's research; June-December, 2019
Table 11. shows the values of individual path
coefficients within the limit values (0.0775 to 0.6629).
The values of the stated individual path coefficients are
positive. A weaker positive relationship (0.2730) was
achieved between the independent variable ENVIR and
the dependent variable INTENS. The weakest relationship
(0.0766) was achieved between the independent variable
TEHN and the dependent variables INTENS. The
strongest relationship was made between the independent
variable INTENS and the dependent variable PERF
(0.6629).
TABLE 11. Values of individual path coefficients
&
2+

2+
'( &$"$
"#$ 0.0775
$%& 0.2730
)* 0.3559
&$"$ 0.6629
Source: Author's research; June-December, 2019
The results of the Bootstrapping analysis are shown in
Table 12. and show that the relationships between the
variables were statistically significant, except for the
relationship between the TEHN and INTENS.
TABLE 12. RESULTS OF BOOTSTRAPPING ANALYSIS OF DIRECT
EFFECTS

)

+
5
2


.
2
.2
0.!
"#$.6
&$"$ 0,078
0,08
5 0,152 0,510 0,610
$%&.6
&$"$ 0,273
0,27
4 0,096 2,862 0,004***
)*.6
&$"$ 0,356
0,35
9 0,138 2,578 0,010**
&$"$.6
'( 0,663
0,67
1 0,053 12,574 0,000***
Source: Author's research; June-December, 2019
Note: *** statistically significant with 1%, ** 5%, * 10%
The results indicate the following conclusions:
The first hypothesis, H1, was not supported by the
collected data and applied statistical analysis;
Independent variable ORG affect the dependent
variable INTENS with path coefficient 0.3559, t =
2.5779, p> 0.0000, 1% probability, which confirms
hypothesis H2;
Independent variable ENVIR affect the dependent
variable INTENS with path coefficient 0.2730, t =
2.8622, p> 0.0000, 1% probability, which confirms
hypothesis H2;
Independent variable INTENS. The intensity of
use affects the dependent variable PERF. Business
performance with path coefficient 0.6629, t =
12.5748, p> 0.0000, 1% probability, which
confirms hypothesis H4.
Table 13. presents a conclusion on hypotheses based
on direct effects. All tested hypotheses are accepted
except the first (H1), which is rejected.
TABLE 13. Conclusion on hypotheses based on direct
effects
#7


H1 TEHN -> INTENS Rejected
H2 ORG -> INTENS Accepted
H3 ENVIR ->
INTENS
Accepted
H4 INTENS -> PERF Accepted
Source: Author's research; June-December, 2019
Figure 2 presents the acceptance and rejection of the
hypothesis. Although companies usually presume that
technological factors are the most important for
implementing the new technology, our results indicate that
the environmental organizational (H2) and environmental
(H3) factors were statistically significant determinants of
the intensity of data mining usage. Research results
confirmed numerous previous research results that
indicate the implementation of the advanced technologies
(H4).
Figure 2. Research model with the status of hypothesis acceptance
Table 14 shows the representativeness of the direct
effect model using SMRM and dULS measures.
TABLE 14. Representativeness of direct effect models
% #&89 #&88
 
5
0,0924 0,0848 0,0930
SMRM < 0.08
– Very good
SMRM < 0.10
- Good
Good
:4 0,6659 0,5611 0,6751
HI95 < dULS <
HI99 – Very
Good
Very Good
Source: Author's research; June-December, 2019
The results in Table 15 show that the dULS measure is
within the default values of HI95 and HI99, which means
that the model is representative.
The results of the bootstrapping analysis of indirect
effects are presented in Table 15. Based on the above
results, it is possible to conclude that the relationships
between the variables are statistically significant, except
for the relationship between the TEHN constructs.
Technical factors and PERF. Business performance.
The results indicate the following conclusions:
Independent variable ENVIR affect the dependent
variable PERF with path coefficient 0.1810, t =
2.7337, p> 0.0000, 5% probability;
Independent variable ORG affect the dependent
variable PERF with path coefficient 0.2360, t =
2.5224, p> 0.0000, 1% probability.
Table 15. Results of bootstrapping analysis of indirect
effects

)

+
5
2


.
2
.20.
!
"#$.6
'( 0,051 0,057 0,102 0,506 0,613
$%&.
6'( 0,181 0,184 0,066 2,734 0,006***
)*.6
'( 0,236 0,240 0,094 2,522 0,012**
Source: Author's research; June-December, 2019
Note: *** statistically significant with 1%, ** 5%
IV. CONCLUSION
This paper's contribution involves researching the
application and acceptance of data mining in SMEs in
Italy. The paper analyzes the determinants of data mining
intensity of usage on a sample of SMEs. Initial random
sampling has shown that the number of SMEs using data
mining is probably very small and, as such, assumes a
hidden population. For that purpose, snowball sampling
was used. However, SMEs that have shown that they use
data mining intensively use several methods with various
software and for several different business purposes. The
professional contribution of the paper consists of
emphasizing the importance of key success factors in the
adoption of data mining in SMEs, which can help experts
and consultants in the field of software solutions and
information and communication technologies in general.
Usage of data mining is a new direction for the
development of SMEs that would support them embracing
digital transformation as the backbone of their strategy.
In the evaluation of this paper, it is necessary to
consider its shortcomings, which are primarily derived
from the sample size. SMEs still rarely use data mining,
and therefore sampling was carried out using the snowball
method, and a small number of enterprises were collected
in the sample. A recommendation for future research
arises from the above, indicating that the model should be
tested on a larger sample. The recommendation for further
research is also to explore the reasons and motivations for
implementing data mining using the upper echelon theory,
which has proven significant in explaining technology
acceptance in SMEs [21].
REFERENCES
[1] Huang, T. C. K., Liu, C. C., & Chang, D. C. (2012). An empirical
investigation of factors influencing the adoption of data mining
tools. International Journal of Information Management, 32(3),
257-270.
[2] Coleman, S. Y. (2016). Data-mining opportunities for small and
medium enterprises with official statistics in the UK. Journal of
Official Statistics, 32(4), 849-865.
[3] Packianather, M. S., Davies, A., Harraden, S., Soman, S., &
White, J. (2017). Data mining techniques applied to a
manufacturing SME. Procedia CIRP, 62, 123-128.
[4] Koyuncugil, A. S., & Ozgulbas, N. (2009, April). An intelligent
financial early warning system model based on data mining for
SMEs. In Future Computer and Communication, 2009. ICFCC
2009. International Conference on (pp. 662-666). IEEE.
[5] Grljevic, O., Bosnjak, Z., & Mekovec, R. (2011, September).
Privacy preserving in data mining-Experimental research on SMEs
data. In Intelligent Systems and Informatics (SISY), 2011 IEEE
9th International Symposium on (pp. 477-481). IEEE.
[6] Selamat, M., Aishah, S., Prakoonwit, S., Sahandi, R., Khan, W., &
Ramachandran, M. (2018), Big data analytics—A review of data‐
mining models for small and medium enterprises in the
transportation sector. Wiley Interdisciplinary Reviews: Data
Mining and Knowledge Discovery.
https://onlinelibrary.wiley.com/doi/pdf/10.1002/widm.1238
[7] Miller, M.M, & Nyauncho, E. (2015). Effective data mining &
analysis for SME banking. Nairobi, Kenya: FSD Kenya.
[8] Awa, H. O., Ojiabo, O. U., & Emecheta, B. C. (2015). Integrating
TAM, TPB and TOE frameworks and expanding their
characteristic constructs for e-commerce adoption by
SMEs. Journal of Science & Technology Policy
Management, 6(1), 76-94.
[9] Etikan, I., Alkassim, R., & Abubakar, S. (2016). Comparision of
snowball sampling and sequential sampling technique. Biometrics
and Biostatistics International Journal, 3(1), 55.
[10] Kumar, S., & Kaur, K. (2020). S-commerce: perception analysis
using PLS-SEM. International Journal of Business and
Globalisation, 26(4), 345-359.
[11] Van den Hooff, B., Elving, W., Meeuwsen, J. M., & Dumoulin, C.
(2003). Knowledge sharing in knowledge communities. In
Communities and technologies (pp. 119-141). Springer,
Dordrecht.
[12] Hair, J. F., Risher, J. J., Sarstedt, M., & Ringle, C. M. (2019).
When to use and how to report the results of PLS-SEM. European
Business Review, 31(1), 2-24.
[13] Jöreskog, K. G. (1971). Statistical analysis of sets of congeneric
tests. Psychometrika, 36(2), 109-133.
[14] Hair Jr, J. F., Matthews, L. M., Matthews, R. L., & Sarstedt, M.
(2017). PLS-SEM or CB-SEM: updated guidelines on which
method to use. International Journal of Multivariate Data
Analysis, 1(2), 107-123.
[15] Kock, N. (2015). Common method bias in PLS-SEM: A full
collinearity assessment approach. International Journal of e-
Collaboration (ijec), 11(4), 1-10.
[16] Cohen, J. (1992). Statistical power analysis. Current directions in
psychological science, 1(3), 98-101.
[17] Mikulić, J., & Prebežac, D. (2011). What drives passenger loyalty
to traditional and low-cost airlines? A formative partial least
squares approach. Journal of Air Transport Management, 17(4),
237-240.
[18] Dijkstra, T. K., & Henseler, J. (2015). Consistent partial least
squares path modeling. MIS quarterly, 39(2), 39(2), 297-316.
[19] Hulland, J. (1999). Use of partial least squares (PLS) in strategic
management research: A review of four recent studies. Strategic
management journal, 20(2), 195-204.
[20] Bagozzi, R. P., & Yi, Y. (1988). On the evaluation of structural
equation models. Journal of the academy of marketing science,
16(1), 74-94.
[21] Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion
for assessing discriminant validity in variance-based structural
equation modeling. Journal of the academy of marketing science,
43(1), 115-135.
[22] Chuang, T., Nakatani, K. and Zhou, D. (2009), "An exploratory
study of the extent of information technology adoption in SMEs:
an application of upper echelon theory", Journal of Enterprise
Information Management, Vol. 22 No. 1/2, pp. 183-196.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Purpose This paper provides a comprehensive, yet concise, overview of the considerations and metrics required for PLS-SEM analysis and result reporting. Preliminary considerations are summarized first, including reasons for choosing PLS-SEM, recommended sample size in selected contexts, distributional assumptions, use of secondary data, statistical power, and the need for goodness-of-fit testing. Next, the metrics, as well as the rules of thumb, that should be applied to assess the PLS-SEM results are covered. Besides covering established PLS-SEM evaluation criteria, the overview includes new guidelines for applying (1) PLSpredict, a novel approach for assessing a model’s out-of-sample prediction, (2) metrics for model comparisons, and (3) several complementary methods for checking the results’ robustness. Design/methodology/approach This paper provides an overview of previously and recently proposed metrics, as well as rules of thumb, for evaluating the results of research, based on the application of PLS-SEM. Findings Most of the previously applied metrics for evaluating PLS-SEM results are still relevant, but scholars need to be knowledgeable about recently proposed metrics (e.g., model comparison criteria) and methods (e.g., endogeneity assessment, latent class analyses, PLSpredict) and when and how to apply them. Research limitations/implications Methodological developments associated with PLS-SEM are rapidly emerging. The metrics reported in this paper are useful for current applications, but scholars need to continuously seek the latest developments in the PLS-SEM method. Originality/value In light of more recent research and methodological developments in the PLS-SEM domain, guidelines for the method’s use need to be continuously extended and updated. This paper is the most current and comprehensive summary of the PLS-SEM method and the metrics applied to assess its solutions.
Article
Full-text available
This paper examines how data mining, an aspect of analytical science, can be applied to assist a Small to Medium Enterprise (SME) industry using unsupervised learning techniques, association rules and time-series analysis. Whilst recent developments have meant it is now possible for SME to compile large amounts of commercial data, this information is rarely utilised effectively. The study builds on a number of standard data mining techniques to produce a tailored set of analyses that provide maximum benefit to the company. Self-Organising Maps were utilised to visualise the core characteristics of the firm's customers. The study outlines a new technique to determine associations between customer variables using the arules package available within RStudios. Finally, time-series forecasting was conducted highlighting the seasonal variations and trends for potential growth in the coming year.
Article
Full-text available
There is a growing interest in data amongst small and medium enterprises (SMEs). This article looks at ways in which SMEs can combine their internal company data with open data, such as official statistics, and thereby enhance their business opportunities. Case studies are given as illustrations of the statistical and data-mining methods involved in such integrated data analytics. The article considers the barriers that prevent more SMEs from benefitting in this field and appraises some of the initiatives that are aimed at helping to overcome them. The discussion emphasizes the importance of bringing people together from the business, IT, and statistical worlds and suggests ways for statisticians to make a greater impact.
Article
Full-text available
We discuss common method bias in the context of structural equation modeling employing the partial least squares method (PLS-SEM). Two datasets were created through a Monte Carlo simulation to illustrate the discussion: one contaminated by common method bias, and the other not contaminated. A practical approach is presented for the identification of common method bias based on variance inflation factors generated via a full collinearity test. Our discussion builds on an illustrative model in the field of e-collaboration, with outputs generated by the software WarpPLS. We demonstrate that the full collinearity test is successful in the identification of common method bias with a model that nevertheless passes standard convergent and discriminant validity assessment criteria based on a confirmation factor analysis.
Article
The need for small and medium enterprises (SMEs) to adopt data analytics has reached a critical point, given the surge of data implied by the advancement of technology. Despite data mining (DM) being widely used in the transportation sector, it is staggering to note that there are minimal research case studies being done on the application of DM by SMEs, specifically in the transportation sector. From the extensive review conducted, the three most common DM models used by large enterprises in the transportation sector are identified, namely “Knowledge Discovery in Database,” “Sample, Explore, Modify, Model and Assess” (SEMMA), and “CRoss Industry Standard Process for Data Mining” (CRISP‐DM). The same finding was revealed in the SMEs' context across the various industries. It was also uncovered that among the three models, CRISP‐DM had been widely applied commercially. However, despite CRISP‐DM being the de facto DM model in practice, a study carried out to assess the strengths and weakness of the models reveals that they have several limitations with respect to SMEs. This paper concludes that there is a critical need for a novel model to be developed in order to cater to the SMEs' prerequisite, especially so in the transportation sector context. This article is categorized under: Application Areas > Business and Industry Application Areas > Industry Specific Applications
Article
Advances in causal modeling techniques have made it possible for researchers to simultaneously examine theory and measures. However, researchers must use these new techniques appropriately. In addition to dealing with the methodological concerns associated with more traditional methods of analysis, researchers using causal modeling approaches must understand their underlying assumptions and limitations.