Content uploaded by Louis Steinmeister
Author content
All content in this area was uploaded by Louis Steinmeister on Nov 18, 2024
Content may be subject to copyright.
Contents lists available at ScienceDirect
Expert Systems With Applications
journal homepage: www.elsevier.com/locate/eswa
Human vs. Machines: Who wins in semiconductor market forecasting?
Louis Steinmeister a,b,∗,Markus Pauly a,c
aDepartment of Statistics, TU Dortmund University, Vogelpothsweg 87, 44227 Dortmund, Germany
bGraduate School of Logistics, Leonhard-Euler-Straße 5, 44227 Dortmund, Germany
cResearch Center Trustworthy Data Science and Security, Joseph-von-Fraunhofer-Straße 25, 44227 Dortmund, Germany
ARTICLE INFO
Dataset link:wsts.org
Keywords:
Prediction
Sales forecast
Semiconductor cycle
Demand forecast
Machine learning
Statistical learning
ABSTRACT
‘‘If you ask ten experts, you will get ten different opinions.’’ This common proverb illustrates the common
association of expert forecasts with personal bias and lack of consistency. On the other hand, digitization
promises consistency and explainability through data-driven forecasts employing machine learning (ML) and
statistical models. Despite the importance of the semiconductor industry being widely recognized, little
research has gone into forecasting the whole semiconductor market including all major product categories.
Instead, analysts have generally relied on expert forecasts such as those provided by the World Semiconductor
Trade Statistics (WSTS). In the following, we generate data-driven forecasts and evaluate whether existing
industry expert forecasts can be further enhanced through statistical and ML models. This study contributes by
systematically evaluating the accuracy of expert forecasts, examining comprehensive multi-granularity forecasts
for the entire semiconductor market, and offering performance insights through out-of-sample error measures
to guide future forecasting practitioners.
1. Executive summary
Objectives: The objective of this paper is to evaluate and con-
trast expert predictions of the World Semiconductor Trade Statistics
(WSTS), a leading semiconductor market data provider, with data-
driven forecasts. In this context, we compare expert forecasts with
different data-driven forecast approaches with respect to three research
hypotheses detailed in the introduction.
Motivation: WSTS plays a crucial role in the semiconductor in-
dustry. According to their website, WSTS is the ‘‘most respected source
of market data and forecasts for the semiconductor industry’’ and their
forecasts ‘‘are the only ones that leverage the collective experience of the
industry’s major players with the market intelligence of a large portion of the
semiconductor industry’’ (WSTS.org,2024). As one of the top providers
of comprehensive semiconductor industry data and indicators, WSTS
plays a pivotal role in business decision making and industry analyst
research. Additionally, the well-being of the semiconductor industry,
which lies upstream in the supply chain, has been identified to be
a leading indicator for the broader economy (Chow & Choy,2006).
This highlights the importance of accurate and reliable semiconductor
industry forecasts even outside of this specific industry.
Methods: Several popular statistical and ML methods for time series
forecasting are evaluated against official forecasts provided by WSTS by
means of a time series cross validation.
∗Correspondence to: Graduate School of Logistics, TU Dortmund University, Leonhard-Euler-Straße 5, 44227 Dortmund, Germany.
E-mail addresses: louis.steinmeister@tu-dortmund.de (L. Steinmeister), pauly@statistik.tu-dortmund.de (M. Pauly).
Results: This paper finds that the expert forecasts provided by
WSTS compare favorably to ML forecasts on a quarterly horizon but
can nevertheless be enhanced by data driven forecasts. However, the
performance of WSTS forecasts is put in perspective when the WSTS
algorithmic updates, which are published bi-quarterly, are included.
Furthermore, it can be argued that additional information should be in-
corporated into the forecasts, which results in a clear outperformance of
the data-driven methods in comparison to the official WSTS forecasts.
This discovery remains consistent regardless of the length of available
data points. In other words, the outcome remains unaffected whether
we analyze product categories with short histories or those with long
histories.
Contribution: This study contributes in the following ways: (1) It
provides a novel evaluation of the accuracy of expert forecasts within
the semiconductor market, (2) it shows that comprehensive forecasting
across various levels of granularity for the entire semiconductor indus-
try is feasible, even with simple models, and (3) it provides valuable
guidance to forecasting practitioners by supplying out-of-sample error
measures for all analyzed models and product categories.
Conclusion: While WSTS forecasts provide a strong starting point,
it is possible to improve the forecast accuracy through data-driven
approaches.
https://doi.org/10.1016/j.eswa.2024.125719
Received 2 August 2024; Received in revised form 4 October 2024; Accepted 4 November 2024
Expert Systems With Applications 263 (2025) 125719
Available online 13 November 2024
0957-4174/© 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (
http://creativecommons.org/licenses/by/4.0/ ).
L. Steinmeister and M. Pauly
2. Introduction
2.1. Motivation
In the today’s dynamic business landscape, accurate forecasts play
an increasingly important role in shaping operative and strategic deci-
sions. The article ‘‘Bringing a real-world edge to forecasting’’ released
by McKinsey & Company in 2020 makes the case that ‘‘[a ‘good’
forecasting process] should be accurate enough to inform a range of critical
business decisions – capital reallocation, hiring, strategy, production, and
more’’ (Agrawal et al.,2020).
For example, Wang et al. (2024) propose a freight rate index fore-
casting model to help industry players mitigate risks in the shipping
market and inform investment opportunities and Wu et al. (2024)
developed a model to forecast container throughput to inform port
logistics industry decision makers. Additionally, Yu et al. (2012) pro-
pose a forecasting model to predict the fashion color trend leading to
a higher success rate of new fashion products and an example of how
resource allocation can be improved by increased operational efficiency
through more accurate short to mid-term demand forecast is provided
by Pauly and Kuhlmann (2023). Furthermore, forecasts play an im-
portant role in anticipating technological change (Foster,1986;Modis,
1999) and estimating technology and product life cycles, which inform
important portfolio and product development decisions (Modis,1994;
Petropoulos et al.,2022;Steinmeister et al.,2023). This is particularly
true for the semiconductor industry with long lead times, a dynamic
technological environment, and shortening product life cycles (Lv et al.,
2018;Macher,2006;Wu & Chien,2008).
Accurate forecasts of the semiconductor market are relevant to the
broader economy. The global semiconductor market reached sales of
$618 billion in 2022 according to Alsop (2024b). This amounted to
about 0.61% of global GDP in 2022 compared to 0.22% in 1990,
highlighting the increasing importance of the industry to the global
economy (Alsop,2024a;World Bank,2024). Semiconductors are ubiq-
uitous in modern life. They enable AI applications, modern defense
equipment, data centers, automobiles, wireless communications, all the
way to home appliances like your washing machine, gaming console,
and electronic toothbrush. As Chow and Choy (2006) observe, the
semiconductor market is a leading indicator for the broader economy.
The strategic importance of the semiconductor industry has further
been recognized by governments. The US authorized about $280 billion
for research and manufacturing of semiconductors in the US with the
CHIPS and Science Act passed in August 2022 (Taylor,2023). Likewise,
the EU subsidizes the industry with roughly 43 billion Euros (roughly
$46.3 billion) (European Comission,2023). Even more impressively,
Sam Altman, the current CEO of OpenAI, is seeking $5 trillion to
$7 trillion of investments to ‘‘boost the world’s chip-building capacity
and expand its ability to power AI, among other things’’ according to
Reuters (Rajan,2024). To add perspective: this amounts to the com-
bined market capitalization of Microsoft and Apple, the two largest
American companies by market capitalization, at the time of writing.1
This motivates a detailed study of semiconductor market fore-
casts, which often inform industry and financial analysts. Furthermore,
semiconductor market forecasts often inform internal goal setting and
benchmarking. Semiconductor market data also factors into the calcula-
tion of market share by product groups. Projections of these quantities
can have strategic implications.
The World Semiconductor Trade Statistics (WSTS)2is a premier
provider of such data and forecasts. Several academic studies and
industry reports cite WSTS as an authority for semiconductor market
forecasts, see Corder et al. (2024), Nagao (2019) and Simons (2024) to
1Based on: https://www.tradingview.com/markets/world-stocks/worlds-
largest-companies/. Accessed 03 April 2024.
2wsts.org.
name a few. Furthermore, the standing of WSTS as a data provider is
highlighted by the common use of their data in academic fields such
as the semiconductor cycle prediction, as outlined in Table 2 which
provides a selection of related works in the field.
Industry forecasts, such as those from WSTS, serve as the basis
for important strategic decisions and are often based on expert judge-
ment. A popular method for consolidating these forecasts is the Delphi
method (Armstrong,2008;Delphi,1975;Hyndman & Athanasopou-
los,2018). However, several researchers have discovered that these
collective predictions, also referred to as ‘‘wisdom of the crowds’’
or ‘‘collective intelligence’’, are susceptible to inaccuracies and low
precision, particularly when the individuals surveyed are pundits or
uninformed laypersons (Atanasov et al.,2015;Modis,1999).
However, the industry members polled by WSTS are industry ex-
perts and have access to detailed insider information, such as customer
orders and sentiment, customer-related project status, the status of
customer contracts, and new product development. Expert forecasts
are generally thought to perform well when systems are complex,
dynamic, and available history is sparse (Hyndman & Athanasopoulos,
2018). The semiconductor industry is heavily intertwined in the global
economy, with complex upstream, and downstream supply chains. This
means that the semiconductor industry is exposed to the bullwhip
effect (Lee et al.,1997). It is also increasingly subjected to geopolitical
considerations. While these factors highlight the importance of accurate
semiconductor industry forecasts, the dynamic technological environ-
ment, geopolitical factors, and the complex supply chains concurrently
complicate the data-driven forecast process due to the amount of poten-
tially relevant extrinsic information. Therefore, expert forecasts from
industry insiders queried by WSTS, who possess extensive access to
quantitative and qualitative insider information, are anticipated to be
highly reliable. Moreover, the lack of research into the area of detailed
semiconductor market forecasts, illustrated in the related works section
(Section 2.2), raises the question whether industry experts may simply
be more accurate than data-driven models in this field. This prompts
inquiry into the potential competitiveness of data-driven forecasts, and
if and how they might enhance existing expert forecasts. To our knowl-
edge the accuracy of semiconductor industry forecasts has not been
systematically evaluated despite their regular use. The present paper
closes this gap. To this end, the following three research hypotheses
are examined in the following:
(H1) Expert forecasts exhibit higher accuracy compared to autoregres-
sive data-driven forecasts.
WSTS publishes quarterly forecasts for each product category in an
alternating pattern: expert forecasts are issued in May and November,
while algorithmically computed updates are provided in February and
August. These algorithmic updates are derived from the preceding
quarter’s results. Industry experts also only have access to official
WSTS figures dating back to the previous quarter. Nevertheless, the
numbers for the initial month of the forecasted quarter are disclosed
simultaneously with the forecasts. Furthermore, it can be anticipated
that industry experts possess internal information regarding the first
month’s data (for a detailed account of the data, see Section 4.1). This
gives rise to the second hypothesis:
(H2) The incorporation of additional autoregressive information in
data-driven forecasts enhances their competitiveness against ex-
pert forecasts.
Fig. 1 within Section 4.1 illustrates that certain product categories
exhibit significantly shorter historical data compared to others. This ob-
servation holds significance as it aligns with the understanding that ex-
perts possess the capability to generate accurate forecasts when histor-
ical data is limited (Hyndman & Athanasopoulos,2018). Consequently,
this observation motivates the formulation of the third hypothesis:
Expert Systems With Applications 263 (2025) 125719
2
L. Steinmeister and M. Pauly
Table 1
Selection of related works in semiconductor sales and demand forecasting including response variables and utilized forecasting methods.
Authors Methods Response variable
Wang and Chen (2019) ARIMA, VARaQuarterly sales time series from Taiwanese semicondoctor
companies from Q1 2009 to Q4 2018
Kapur et al. (2019) Technology diffusion Sales and price data for DRAM, LCD monitors, and room
air-conditioners
Chen and Chien (2018) Technology diffusion 27 quarters of shipments for two technology generations of
non-volatile memory products from a semiconductor company
Xu and Sharma (2017) XGBoost, Linear model, RFb,
ARIMA, Ensemble
Weekly Intel CPU sales in 2012
Chien and Lin (2012) Rolling grey forecasting method Annual sales of companies in Hsinchu Science Park from 1983 to
2010
Aytac and Wu (2013) Extended, Bayesian logistic
growth model
Monthly sales data of about 5300 short life-cyle products from
three semiconductor companies
Chien et al. (2013) Technology diffusion 36 quarters of demand data for four technologies of a leading
foundry from Hsinchu Science Park
aVector autoregression.
bRandom Forest.
(H3) Expert forecasts outperform data-driven forecasts particularly in
the context of short time series.
Structure: The following two subsections summarize the related
work (Section 2.2) and highlight the research gap and our contributions
(Section 2.3). To investigate the hypotheses, the ML and statistical
methods used are summarized in Section 3. The results are discussed
in Section 5, with Section 5.1 addressing (H1), Section 5.2 examining
(H2), Section 5.3 (H3), and finally Section 5.4, which offers insights of
the results at a product category level. Section 6completes this work
with a brief discussion of the findings.
2.2. Related works
Despite the recognition of the semiconductor industry’s importance
in politics and business, little research was dedicated to data-driven
forecasting the semiconductor market. A Scopus search (‘‘market fore-
cast’’ AND ‘‘semiconductor’’ AND (statistic* OR ‘‘machine learning’’))
yields only one search result which discusses front-end drivers of
changes in the semiconductor market (Nagao,2019). However, this
paper does not apply statistical or machine learning methods to gen-
erate forecasts directly. The small number of academic studies in the
sector of the semiconductor industry was also noted by Aubry and
Renou-Maissant (2014).
Likewise, a Scopus search for the ‘‘world semiconductor trade statis-
tics’’ yields 11 results which largely cite WSTS as an authority for
market data or forecasts. However, none of these use WSTS data as
a basis for industry wide forecasts, nor do they assess the accuracy of
WSTS’ forecasts.
More related results can be found in the field of company-specific
sales and demand forecasting and operational planning. Several contri-
butions in this domain are summarized in Table 1. However, it should
be noted that, while these studies often analyze different product or
technology groups, the analyses are specific to one or several companies
and usually do not encompass the whole semiconductor market with its
different levels of granularity.
Another related field is the prediction of the semiconductor cycle.
For a short summary, see Table 2. The semiconductor cycle, similarly
to the economic cycle, describes cyclical fluctuations in semiconductor
industry. These cycles are characterized by growth and contraction
phases. Contributions in this domain usually focus on the overall semi-
conductor market, particularly WSTS global semiconductor sales, as
the target variable. However, total semiconductor sales are usually
considered in this scenario while a break down into finer product
categories is often of interest for analysts and internal benchmarking.
2.3. Contributions
(1) Evaluation of the Accuracy of Expert Semiconductor Market
Forecasts: Despite the frequent reliance on expert forecasts for semi-
conductor market forecasts, the accuracy of these forecasts has not, to
our knowledge, been openly or systematically evaluated. This study
addresses this gap by providing an evaluation of short-term forecasts
through a comparative analysis with forecasts derived from several
statistical and machine learning models.
(2) Comprehensive, Multi-Granularity Forecasting of the Semi-
conductor Market: While recent studies have focused on granular
demand and sales forecasts for specific products or companies, they
do not comprehensively cover the semiconductor market across all
segments and as a whole, see Table 1 in Section 2.2 for an overview.
Additionally, research into the semiconductor cycle often concentrates
on high-level market trends and the identification of leading indi-
cators, see Table 2 in Section 2.2. This study, however, provides a
comprehensive analysis of the semiconductor market across various
levels of granularity, utilizing the WSTS product categorization hierar-
chy. Forecasts are systematically generated for 110 product categories,
covering the entire semiconductor market. This approach captures
higher-granularity product groups as well as broader market trends,
offering a comprehensive perspective that takes a step towards unifying
these two areas of research.
(3) Guidance for Forecasting Practitioners through Model Perfor-
mance Insights: In addition to forecasting across various levels of the
semiconductor market, this study offers valuable guidance for forecast-
ing practitioners. Out-of-sample error measures for each statistical and
machine learning model are presented for all 110 product categories.
These performance insights allow practitioners to assess the accuracy
and applicability of different models in forecasting specific segments
of the semiconductor market. To the best of our knowledge, providing
such detailed model performance data for a wide range of product
categories is a novel contribution.
3. Used data-driven methods
This section gives a brief introduction to the data-driven models
used in the subsequent analysis. It starts with the description of tra-
ditional models based on statistical time series analysis (Section 2.1),
continues with ML methods (Section 2.2), and concludes with a brief
note on ensemble methods (Section 2.3).
The selection of the models presented here is partially influenced by
their performance in the Makridakis Competitions (M-Competitions),
which are renowned forecasting competitions conducted on diverse
and realistic datasets (Hyndman,2020). The results based on the M3
Expert Systems With Applications 263 (2025) 125719
3
L. Steinmeister and M. Pauly
Table 2
Selection of related works in semiconductor cycle prediction including goals, response variables as well as utilized forecasting methods and indicators.
Authors Goal Methods Response variable Indicatorsg
Aubry and Renou-Maissant (2014) Identification of best model for
the semiconductor cycle
prediction
ARMA, VARa, BVARb,
VECMc, MRSMd, SFe, ESf
WSTS global semiconductor sales
(Jan 1991–Jun 2010)
SOX, NI, TI, BOOK
Aubry and Renou-Maissant (2013) Prediction and description of the
global semiconductor industry
cycle
VECMcWSTS global semiconductor sales
(Jan 1991–Jun 2010)
SOX, NO, TI, IP, BOOK
Chow and Choy (2006) Identification of leading indicators
of semiconductor sales to predict
the global semiconductor cycle
VARa, BVARb, BECMcWorld semiconductor sales (Feb
1992–Jan 2005)
NASDAQ, NO, SI, PPI
Liu and Chyi (2006) Prediction of the semiconductor
cycle turning points
MRSMdWSTS global semiconductor sales
growth (Jan 1990–Aug 2003)
–
Liu (2005) Identification of explanatory
factors for the semiconductor
cycle
VARaWSTS global semiconductor sales
growth (Jan 1990–Dec 2001)
IP, FF, CS, SOX, NO, TI,
UTL, EQO, CAP, PPI, SIP,
VS
aVector autoregression.
bBayesian vector autoregression.
cVector error correction models.
dMarkov regime switching model.
eSpectral forecasting.
fExponential smoothing.
gSOX: Philadelphia Semiconductor Index, IP: U.S. Industrial Production, FF: Federal Funds Rate, CS: U.S. Consumer Sentiment, NO: New Orders, TI: Total Inventories, UTL:
Capacity Utilization, EQO: North American Equipment Orders, CAP: Capacity, PPI: Producer Price Index, SIP: Industry Production Index, VS: Value of Shipments, BOOK: Global
Bookings of N.A. Semiconductor Equipment Producers, NASDAQ: NASDAQ Stock Index, SI: U.S. Shipments to Inventories Ratio.
competition are of particular interest: the dataset comprised 3003
individual time series with 14 to 126 observations featuring various
levels of seasonality (Hyndman,2020;Makridakis & Hibon,2000).
Furthermore, this dataset has served as a benchmark for evaluating
popular data-driven forecasting methods. Both Ahmed et al. (2010)
and Makridakis et al. (2018) utilized a subset of the M3 dataset (con-
sisting of 1045 time series) with a minimum length of 81 observations
for their analyses.
3.1. Statistical models
Comparisons based on the M3-Competition data have generally
favored statistical models over ML approaches (Hyndman,2020;Makri-
dakis et al.,2018). One suggested reason for this trend is the rel-
atively short length of the time series involved. Unfortunately, this
limitation is common in forecasting applications and reflects a re-
alistic constraint. The time series examined in this paper, ranging
from 92 to 392 monthly observations (further details in Section 4.1),
are longer compared to those studied in the papers based on the
M3-Competition, where lengths typically spanned from 81 to 126 ob-
servations. Nevertheless, these lengths are still comparable, especially
when contrasted with datasets such as the M5 competition, which
feature significantly longer time series, reaching up to approximately
2000 observations (Makridakis et al.,2022b).
3.1.1. SARIMA
The seasonal autoregressive integrated moving average (ARIMA)
model is a traditional statistical time series model. Its advantages are
its interpretability, its wide spread use and that many of its mathemat-
ical properties are well known (Brockwell & Davis,2002). According
to Brockwell and Davis (2002), a time series 𝑋= {𝑋𝑡}is said to be a
SARIMA(𝑝, 𝑑, 𝑞 )×(𝑃, 𝐷, 𝑄)process with period 𝑠if
𝑌𝑡∶= (1 − 𝐵)𝑑(1 − 𝐵𝑠)𝐷𝑋𝑡
is an causal ARMA process defined as
𝜙(𝐵)𝛷(𝐵𝑠)𝑌𝑡=𝜃(𝐵)𝛩(𝐵𝑠)𝑍𝑡,
where 𝐵is the back-shift operator defined as 𝐵𝑌𝑡=𝑌−1 ,𝜙(𝑧) = 1 −
𝜙1𝑧−⋯−𝜙𝑝𝑧𝑝,𝛷(𝑧) = 1 − 𝛷1𝑧−⋯−𝛷𝑃𝑧𝑃,𝜃(𝑧) = 1 + 𝜃1𝑧+⋯+𝜃𝑞𝑧𝑞,
𝛩(𝑧) = 1 + 𝛩1𝑧+⋯+𝛩𝑄𝑧𝑄, and 𝑍=𝑍𝑡being a white noise process.
Generally, an ARMA(𝑝, 𝑞)process 𝑌= {𝑌𝑡}is characterized as
𝑌𝑡−
𝑝
𝑖=1
𝜙𝑖𝑌𝑡−𝑖=𝑍𝑡+
𝑞
𝑖=1
𝜃𝑖𝑍𝑡−𝑖,
with 𝑍=𝑍𝑡being a white noise process. The left-hand side of this
equation is the autoregressive part, while the right-hand side is the
moving average part (moving average of the error process 𝑍). More
details on the SARIMA model can be found in Brockwell and Davis
(2002).
These models are often used to describe and to generate data of
a wide range of processes. But they can also be used as a predictive
model when the parameters are estimated. To this end, we use the
auto.arima function of the forecast library in R (Hyndman &
Khandakar,2008). A similar implementation for Python is available
through the StatsForecast library (Garza et al.,2022).
The inclusion of the SARIMA model in this work is motivated by
its ubiquity in time series analysis and its strong performance on the
M3-Competition dataset (Makridakis et al.,2018).
3.1.2. Simple exponential smoothing
Exponential smoothing models range back to the 1950’s (Gardner,
1985). Despite their simplicity, they often achieve high predictive
performance (Hyndman,2001;Satchell & Timmermann,1995). Simple
exponential smoothing (SES) only requires two quantities: the initial
forecast
𝑋0and the smoothing constant 𝛼. Consecutive forecasts can
then be calculated via
𝑋𝑡=(1 − 𝛼)
𝑋𝑡−1 +𝛼𝑋𝑡−1 ,
where
𝑋𝑡denotes the one-step forecast for 𝑋𝑡based upon the history
up to 𝑋𝑡−1. An R implementation is available with the SES function
of the forecast library (Hyndman & Khandakar,2008). A similar
implementation for Python is available through the StatsForecast
library (Garza et al.,2022).
SES’ simplicity, ease of implementation, and computational ef-
ficiency make it a popular forecasting tool for practitioners. Addi-
tionally, the model performed well on the M3 and M5 Competition
datasets (Makridakis et al.,2018,2022a).
Expert Systems With Applications 263 (2025) 125719
4
L. Steinmeister and M. Pauly
3.1.3. Error, trend, and seasonality
Error, Trend, and Seasonality (ETS) approaches are a flexible class
of exponential smoothing models that go beyond SES (see above). As
their name suggests, they are capable of modeling time series with
trends and seasonality (Hyndman & Athanasopoulos,2018). ETS was
the best performing model in the (Makridakis et al.,2018) comparison
based on the M3-Competition data. It is also implemented as part of
the forecast library in R (Hyndman & Khandakar,2008). As for the
previous two models, a similar implementation for Python is available
through the StatsForecast library (Garza et al.,2022).
3.2. ML models
This subsection introduces the used ML models. Makridakis et al.
(2018) found that ML methods performed worse than classical sta-
tistical models for relatively short time series - a finding that was
confirmed by Cerqueira et al. (2022). This is particularly the case
for artificial neural networks and deep learning models, which are
well known to require large sample sizes to produce the desired re-
sults (Goodfellow et al.,2016). This was also verified by the NN3-
Competition, which extended the M3-Competition to include neural
network approaches (Crone et al.,2011;Hyndman,2020). Therefore,
following (Cerqueira et al.,2022), this paper does not discuss neural
network models despite their considerable popularity in recent years.
Likewise, boosting models are not included.
3.2.1. Random forest
Random forests (RF) are a bagging algorithm, a specific kind of
ensemble learning, which combines the outputs of multiple decision
trees (Breiman,2001). Ahmed et al. and Makridakis et al. included
Categorization and Regresssion Trees (CART), which generate single
decision trees for regression or classification purposes (Breiman,1984).
However, since its introduction, RF has proven to be an incredibly ver-
satile and successful model for both regression and classification (Biau
& Scornet,2016;Grinsztajn et al.,2022;Huang et al.,2020). We
use the ranger implementation of this model as provided by Wright
and Ziegler (2017). An alternative Python implementation is available
through the skranger library. During each CV-step (see below), a grid
search was conducted to tune the three hyper-parameters (Probst et al.,
2019):
𝑚𝑡𝑟𝑦 ∈ {2,7,12,16,23}
𝑚𝑖𝑛.𝑛𝑜𝑑𝑒.𝑠𝑖𝑧𝑒 ∈ {5,7,10}
𝑠𝑝𝑙𝑖𝑡𝑟𝑢𝑙𝑒 ∈ {𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝑒𝑥𝑡𝑟𝑎𝑡𝑟𝑒𝑒𝑠}.
3.2.2. Extremely randomized trees
Extremely Randomized Trees (ExtraTrees, also referred to as ET
for brevity) is a model similar to Random Forest (see above). The
difference lies in randomizing the splitting point and the feature to split
on during training. It is computationally more efficient and promises
greater accuracy on a range of problems (Geurts et al.,2006). The
ranger library is also used for this model (Wright & Ziegler,2017).
However, in contrast to the RF model, the parameter splitrule
remained fixed as extratrees. A Python implementation is available
through sklearn (Pedregosa et al.,2011). During each CV-step, a
grid search was conducted to tune the hyper-parameters (Probst et al.,
2019):
𝑚𝑡𝑟𝑦 ∈ {2,7,12,16,23}
𝑚𝑖𝑛.𝑛𝑜𝑑𝑒.𝑠𝑖𝑧𝑒 ∈ {5,7,10}.
3.2.3. Gaussian processes regression
Gaussian Processes Regression (GPR) is a probabilistic regression
model incorporating Bayesian ideas: A prior distribution of possible
regression functions is narrowed down as evidence (observed data
points) are incorporated to yield a posterior distribution (Wang,2023).
It has been shown that GPR can be viewed as a limit of many artificial
neural network designs and ARMA processes can be viewed as a Gaus-
sian process under the right conditions (Williams & Rasmussen,1995).
Furthermore, due to their probabilistic nature, GPR easily provides
uncertainty quantification for the forecasts in terms of prediction in-
tervals. GPR showed a promising performance on the M3-Competition
dataset (Ahmed et al.,2010;Makridakis et al.,2018). This analysis uses
the GPR implementation of the kernlab R library (Karatzoglou et al.,
2004). The model is also implemented for Python in the sklearn
library (Pedregosa et al.,2011).
3.2.4. K-Nearest Neighbors
K-Nearest Neighbors (KNN) is a popular non-parametric model clas-
sification and regression model. It bases estimates on the K nearest
neighbors in the covariate space (Cover & Hart,1967;Fix & Hodges,
1989). K represents a hyper-parameter to be tuned. For regression, a
mean of these K nearest neighbors is usually used as the predictor in
regression. This model can be employed with a kernel. In the following,
the kernel is automatically chosen. While KNN has not been among
the best performing models on the M3-Competition data (Ahmed et al.,
2010;Makridakis et al.,2018), it is nevertheless popular as simple
a non-parametric model. Here, the implementation of the kknn li-
brary is used (Schliep & Hechenbichler,2016). KNN is implemented
for Python as the NearestNeighbors model in the sklearn li-
brary (Pedregosa et al.,2011). During each CV-step, a grid search was
conducted to tune hyper-parameters among:
𝐾∈ {1,2,3,4,5,7,9}
𝑑𝑖𝑠𝑡𝑎𝑛𝑐 𝑒 ∈ {𝐿1, 𝐿2, 𝐿3},
where the 𝐿1, 𝐿2, 𝐿3distances are given by
𝑑𝐿𝑝(𝑥, 𝑦) ∶= 𝑥−𝑦𝑝
given that 𝑥𝑝∶= 𝑖=1 𝑥𝑝
𝑖1∕𝑝is the 𝑝norm.
3.2.5. Support vector regression
Support Vector Regression (SVR) was proposed as an extension to
the classical Support Vector Machine (SVM) for classification (Cortes
& Vapnik,1995) to tackle regression problems (Drucker et al.,1996).
Instead minimizing all residuals, such as in ordinary least square re-
gression, the distance of observations outside a margin of error to
this margin of error (the 𝜖-insensitive tube) are minimized. Analogous
to SVMs, these observations are called support vectors, because the
regressor only depends on these observations. To model non-linear
dependencies, kernels can be used (Awad & Khanna,2015). We use
the radial kernel to add another non-linear method. The used model is
implemented in the kernlab library for R or the sklearn library for
Python (Karatzoglou et al.,2004;Pedregosa et al.,2011). A grid search
was conducted to tune hyper-parameters among:
𝜎∈1
16 ,1
8,1
4,1
2,1
𝐶∈1
4,1
2,1,2,4.
3.3. Ensemble
Ensemble models consist of several individual models which are
combined to produce a single output (Opitz & Maclin,1999). In addi-
tion to the tree-based ensembles Random Forests and ExtraTrees, this
paper also analyzes a simple ensemble of all the employed data-driven
models by taking the median prediction.
Expert Systems With Applications 263 (2025) 125719
5
L. Steinmeister and M. Pauly
Fig. 1. Histogram of time series lengths.
4. Experimental setup
This section provides an overview over the data used in the analysis
(Section 4.1) and the methodology behind the time series training and
model evaluation (Section 4.2).
4.1. Data
This paper analyzes time series aggregated sales of 110 WSTS
product categorizations, which were reported monthly and measured
in USD. The data is accessible through a WSTS membership3or a
subscription.4A major challenge was the consistency of the data:
Given the dynamic nature of the semiconductor market, product
categorizations changed over time. The historical consistency of the
current product categorizations was investigated and resolved dating
back to Jan 2010 by merging the C7a and C7b product categorizations
to C7 (Field Effect General Purpose Power Transistors), the P51 and
P52 categorizations into P5 (Automotive and General Purpose MCU),
and the L7a, L7b, L7c, L7d and L7f categorizations into L7a/b/c/d/f
(Wireless Communication Total). The newest category (F10) dates back
to Jan 2016. Fig. 1 provides an overview of the months of history of the
categorizations as used in the analysis. These are the categories which
are consistent until August 2023 and they provide a comprehensive
overview of the semiconductor market. We refer to WSTS.org (2024)
for an exact description of the market and each category.
Generally, the categories positioned higher in the hierarchy exhibit
greater consistency. Fig. 2 illustrates the hierarchical structure of the
product categorizations (starting with T99 as the highest aggregation).
To effectively incorporate seasonal components, model fitting necessi-
tated a minimum of 24 months (two seasonal cycles worth) of training
data. The shortest time series comprised 92 monthly data points, thus
leaving 68 months (or about 17.4% of the complete time series from
1991) as a test set for the first step of the rolling time series cross-
validation (CV, see below). Hence, the first training set spanned all
data from January 1991 (or whenever available) to December 2017.
Consequently, the test set spanned the time frame from January 2018
to August 2023.
Official forecasts from WSTS were released quarterly from Q1 2016
to Q3 2023 (midway through the first forecasted quarter). Expert
forecasts are consolidated during a global WSTS meeting twice a year
– each May and November. WSTS additionally issues forecast updates
semiannually, in February and August, derived from the preceding
meeting’s expert forecasts and updated algorithmically with new data
3https://www.wsts.org/61/membership.
4https://www.wsts.org/61/subscription.
reported for the prior quarter. For example, the forecast for Q2 2024
relies on the upcoming global WSTS meeting scheduled for May 20th–
23rd, while the February 2024 forecast update drew upon forecasts
from the November 2023 meeting and the reported data from Q4 2023.
Thus, 11 expert forecasts and 12 updated forecasts were considered in
the analyzed time span from January 2018 to August 2023.
4.2. Methodology
Training and evaluating the ML models: As described in the last
subsection, each time series is split into first training and test sections.
Time series with a longer available history consequently have more
data points for training than shorter time series, i.e. newer product
categories.
To obtain reliable forecast performance estimates for all of them,
rolling time series cross-validation (CV) as in Hyndman and Athana-
sopoulos (2018) is performed on each time series and for each model.
This is illustrated in Fig. 3, which also shows the most extreme training
periods for the different time series (only 24 months for the first
forecast of category F10 up to 390 months for the last forecast of T99).
Within each iteration, the training data is automatically transformed
with the Box–Cox transformation (Box & COX,1964) given by
𝑋(𝜆)
𝑡=(𝑋𝜆
𝑡− 1)∕𝜆 𝜆 ≠0,
log(𝑋𝑡), 𝜆 = 0.
This transformation is incorporated and automatically estimated by
the used libraries ‘‘forecast’’ and ‘‘caretForecast’’ (Akay,2022;Hyn-
dman & Khandakar,2008). Applying the Box–Cox transformation is
standard practise, especially when residual distributions are skewed,
and when non-negative forecasts are desired (Hyndman & Athana-
sopoulos,2018). The considered time series report aggregated sales
in USD. Thus, there is unlimited upside potential whereas the lower
bound is always zero since all time series are positive. Note that 𝜆can
always be chosen close to one if the transformation is not particularly
helpful. Automatically applying it to all cases therefore does not hurt.
This is also the default setting in the forecast library (Hyndman &
Khandakar,2008).
Additionally, hyper-parameters of all data-driven models introduced
in Section 3are optimized using the default grid search setting of
the ‘‘caret’’ and ‘‘caretForecast’’ libraries (Akay,2022;Kuhn,2008) if
no hyper-parameter optimization is conducted through the learning
algorithm (one example where hyper-parameters are tuned automat-
ically is the SARIMA model using the ‘‘auto.arima’’ function). After
each training iteration, a forecast is generated up to three months
in advance. These forecasts can be compared against the reported
numbers, providing an estimate of the performance of the model, and
the WSTS forecasts.
Expert Systems With Applications 263 (2025) 125719
6
L. Steinmeister and M. Pauly
Fig. 2. WSTS product categorization hierarchy. The highest aggregation level is T99, the node colored in black with white print, slightly to the top of the center of the illustration.
The arrows point to the subsumed product categories.
Fig. 3. Illustration of time series cross validation for two product categories with differing lengths: T99 has a much longer history of reported aggregate sales than F10.
Evaluation and comparison with WSTS’ forecasts: The forecasts
provided by WSTS are evaluated by type (‘‘meeting’’ corresponding to
the two WSTS expert forecast per year, ‘‘alg. update’’ corresponding
to the two WSTS algorithmic forecasts per year and ‘‘overall’’ for all
four WSTS forecasts per year) and compared to the corresponding data-
driven forecasts. The first comparison evaluates all forecasts with a
forecasting horizon of ℎ= 3 months. This corresponds to the informa-
tion timestamp available to WSTS’ updating algorithm and the industry
experts.
At the same time, it is safe to assume that the industry experts
have access to the sales data (among many more) of the first month
of each quarter when the meeting convenes (the meeting is held in
about the middle of the first quarter to be forecasted). Additionally, the
forecasts are disclosed at the same time as the first month’s results are
published. Hence, the information time stamp of the forecast and the
first month’s results is the same. Utilizing this information reduces the
required forecast horizon to ℎ= 2 months. This is analyzed in a second
step to investigate whether using more of the available information
might boost the forecasting accuracy.
Similar to Hyndman and Koehler (2006) and Pauly and Kuhlmann
(2023), the forecasting accuracy is evaluated using the mean squared
error (MSE), mean absolute percentage error (MAPE), and the mean
absolute error (MAE). These are given by the following equations
𝑀𝑆 𝐸 =1
𝑇
𝑇
𝑡=1
(
𝑋𝑡−𝑋𝑡)2
𝑀𝐴𝑃 𝐸 =1
𝑇
𝑇
𝑡=1
𝑋𝑡−𝑋𝑡
𝑋𝑡
𝑀𝐴𝐸 =1
𝑇
𝑇
𝑡=1
𝑋𝑡−𝑋𝑡,
where again
𝑋𝑡is the one-step forecast for the 𝑡th observation 𝑋𝑡of the
test set and 𝑇= 68 is the number of evaluated forecasts. We note that
the MAPE is applicable as the values of all time series are positive.
Table 3 provides a brief overview over the key methods which are
employed in this study and provides references to motivating studies
and deeper methodological discussions.
5. Results
As discussed in Section 4.2, first, the quarterly forecasts (ℎ= 3
months) are discussed in Section 5.1 to examine the first research
hypothesis (H1) that expert forecasts exhibit higher accuracy compared
to data-driven forecasts. This is followed by a comparison with the
model performance when an additional month of available information
(ℎ= 2 months) is incorporated in Section 5.2, addressing the second
research hypothesis (H2). Lastly, the results are contrasted by time
series length
5.1. Quarterly forecast performance
WSTS’ forecasts are provided on a quarterly basis. Each quarter, the
previous quarter’s numbers are known when the forecasts are compiled.
Therefore, as a first step, the data-driven models are compared against
WSTS’ forecasts on a forecasting horizon of ℎ= 3 months, i.e. one
quarter.
Table 4 presents the average performance of data-driven forecasts
across 110 product categories, relative to the average performance
of forecasts provided by the World Semiconductor Trade Statistics
(WSTS). Hence, the first data column (for WSTS) always reads 1.
Each row represents a different error measure, organized according to
Expert Systems With Applications 263 (2025) 125719
7
L. Steinmeister and M. Pauly
Table 3
Overview of used methods: Purposes and motivations for chosen techniques, with references.
Methods Purpose and motivation Authors
SARIMA, SES, ETS, RF, ET,
GPR, KNN, SVR
Purpose: generating point forecasts. Motivation: the selection of these models was largely
motivated by studies based on the M3-Competition dataset. These models achieved
outstanding performance on time series similar to the ones in this study.
Ahmed et al. (2010) and Makridakis et al. (2018)
Cross-Validation Purpose: estimating out-of-sample performance estimates. Motivation: The application of
time series cross validation is a standard procedure.
Cerqueira et al. (2020)
MSE, MAPE, MAE Purpose: measures for prediction accuracy. Motivation: these are standard measures
commonly employed to assess the accuracy of (time series) models with continuous targets
Hyndman and Koehler (2006)
Table 4
Average performance of the data-driven forecasts across all 110 product categories and relative to the World Semiconductor Trade Statistics’
(WSTS). Each row refers to a different error measure, sorted by WSTS’ forecast type: algorithmic update, meeting (expert forecast), and overall.
Lower values are preferable. The best value per row is bold and italic.
WSTS SARIMA ETS ET GPR KNN RF SES SVM Ensemble
Alg. Update
MSE 1.00 0.34 0.55 1.55 0.37 4.47 1.49 0.94 3.20 0.64
MAPE 1.00 1.01 1.01 1.20 1.01 1.71 1.18 1.06 1.69 1.04
MAE 1.00 0.75 0.81 1.34 0.77 2.14 1.29 1.08 1.94 0.95
Meeting
MSE 1.00 1.61 1.57 2.78 1.33 7.23 2.49 1.79 6.91 1.77
MAPE 1.00 1.14 1.09 1.32 1.13 1.76 1.28 1.10 1.82 1.14
MAE 1.00 1.22 1.22 1.53 1.17 2.32 1.47 1.24 2.32 1.28
Overall
MSE 1.00 0.73 0.86 1.93 0.66 5.31 1.79 1.20 4.33 0.98
MAPE 1.00 1.08 1.05 1.26 1.07 1.73 1.23 1.08 1.75 1.09
MAE 1.00 0.97 1.00 1.42 0.95 2.22 1.37 1.15 2.11 1.10
WSTS’ forecast types: algorithmic update, meeting (expert forecast),
and overall. Lower values are preferable in all cases.
For WSTS’ ‘‘Algorithmic Update’’ forecasts, three error measures are
reported: Mean Squared Error (MSE), Mean Absolute Percentage Error
(MAPE), and Mean Absolute Error (MAE) – see Section 4.2. Among
these measures, the data-driven forecasts outperform WSTS in terms of
MSE and MAE, with the best-performing model indicated by bold and
italic formatting. The SARIMA model exhibits the lowest MSE (0.34
relative to WSTS) and MAE (0.75), suggesting superior performance
in this category. Almost as good was the GPR model (relative MSE of
0.37 and MAE of 0.77). In term of MAPE, WSTS performed slightly
better than several data driven forecast methods: SARIMA, ETS, and
GPR each scored 1% worse. Overall, these results indicate potential for
improvement of WSTS’ algorithmic update protocol.
Similarly, for the ‘‘Meeting’’ forecasts, the same three error mea-
sures are provided. Consistent with the first research hypothesis (H1),
industry experts demonstrated superior performance across all three
error measures. However, among the data-driven models, the best
performers were GPR, exhibiting a 33% increase in MSE compared
to expert forecasts, while ETS and SES displayed 9% and 10% higher
MAPE values respectively. Additionally, GPR, again, was the best per-
forming data-driven model in terms of MAE, showing a 17% higher
MAE relative to the expert forecasts.
Table 4 concludes with the ‘‘Overall’’ forecasts, providing the av-
erage performance of all models relative to the average performance
of the combined algorithmic and expert forecasts by WSTS (from 2
meeting and 2 algorithmic forecast per year). Once more, the GPR
model emerges as the top-performing data-driven model, showing a
34% improvement in MSE and a 5% enhancement in MAE compared
to WSTS. However, WSTS outperforms all models in terms of MAPE,
with ETS, the top-performing data-driven model, recording a 5% higher
error than WSTS.
Table 5 provides insights into the mean ranks of the various fore-
casting models across all 110 product categories, structured similarly
to Table 4. Contrary to Table 4, the columns in Table 7 are not
standardized by the WSTS column. A rank of 1 indicates the model
performed the best for that product category based on the respective
error measure, thus lower mean ranks are preferred.
In general, the observations from Table 7 are similar to those from
Table 6. The forecasts provided by WSTS demonstrate excellence across
most error metrics and scenarios. However, there is an exception with
WSTS’ Algorithmic Update forecasts, where SARIMA and GPR achieved
slightly lower average ranks (3.8 vs. 3.9 for WSTS). Considering that
the average MSE for the forecasts based on the SARIMA model was 66%
lower than the MSE of WSTS’ algorithmic updates, it suggests SARIMA’s
strong performance is primarily driven by specific product categories
where it outperforms WSTS (Section 5.4 contains a detailed discussion
on individual product categories). Although several data-driven models
exhibited a superior average error in terms of MSE and MAE for the
overall forecasts (for example, SARIMA and GPR) this superiority does
not necessarily translate to lower average ranks, highlighting WSTS’
robust performance across product categories.
Comparing only the data-driven forecasts, GPR emerges as the most
accurate data-driven model in the Algorithmic Update and Overall
scenarios, while Simple Exponential Smoothing (SES) demonstrates the
best performance among data-driven models during quarters where
WSTS provided expert forecasts (Meeting), particularly in terms of MSE
and MAE. ETS achieved a slightly lower average rank in terms of Mean
Absolute Percentage Error (MAPE).
One plausible explanation for the strong performance of the in-
dustry experts is their access insider information. In contrast, the
data-driven models rely solely on aggregated historical sales data from
the specific product category being forecasted. Another factor to con-
sider is the timing of the WSTS meetings where expert forecasts are
consolidated. These meetings typically occur in the middle of the
forecasted quarter, implying that experts are likely aware of their
first-month figures. Moreover, these figures are available simultane-
ously with the official forecast release. In contrast, the bi-quarterly
algorithmic updates do not integrate this information, though they
could potentially benefit from it. Consequently, data-driven models
that incorporate this timely information, necessitating forecasts with
a horizon of only ℎ= 2 months, are evaluated in the subsequent
subsection.
5.2. Forecast performance with additional information
Considering the timing of the WSTS meetings in the middle of
the quarter, it is reasonable to assume that industry experts take into
account the numbers and internal information pertaining to the first
month when formulating their forecasts for the first quarter. Moreover,
the consolidated results of the first month are released simultaneously
with the forecasts by WSTS. Therefore, utilizing all available infor-
mation for forecasts seems appropriate. This approach allows for an
Expert Systems With Applications 263 (2025) 125719
8
L. Steinmeister and M. Pauly
Table 5
Mean ranks of the forecasts with horizon h = 3 months across all 110 product categories. Each row refers to a different error measure, sorted
by WSTS’ forecast type: algorithmic update, meeting (expert forecast), and overall. Lower values are preferable. The best value per row is bold
and italic.
WSTS ARIMA ETS ET GPR KNN RF SES SVM Ensemble
Alg. Update
MSE 3.9 3.8 4.2 6.6 3.8 8.7 6.2 4.4 8.8 4.5
MAPE 3.6 4.1 4.1 6.6 3.9 8.8 6.1 4.5 8.7 4.5
MAE 3.6 4.1 4.2 6.6 3.9 8.8 6.1 4.4 8.9 4.5
Meeting
MSE 2.9 4.5 3.9 7.0 4.1 8.9 6.4 3.8 8.9 4.6
MAPE 2.8 4.3 4.0 7.0 4.2 8.9 6.4 4.1 8.8 4.5
MAE 2.9 4.3 4.0 7.2 4.2 8.9 6.4 3.8 8.8 4.5
Overall
MSE 3.2 4.4 4.2 6.9 3.7 9.0 6.3 4.1 8.9 4.4
MAPE 2.8 4.1 3.9 7.0 3.9 9.1 6.5 4.2 8.9 4.5
MAE 2.9 4.2 4.0 7.0 4.0 9.1 6.4 4.1 8.8 4.5
Table 6
Average performance of the data-driven forecasts with horizon ℎ= 2 months across all 110 product categories and relative to WSTS. Each row
refers to a different error measure, sorted by WSTS’ forecast type: algorithmic update, meeting (expert forecast), and overall. Lower values are
preferable. The best value per row is bold and italic.
WSTS SARIMA ETS ET GPR KNN RF SES SVM Ensemble
Alg. Update
MSE 1.00 0.10 0.18 0.52 0.10 1.31 0.49 0.46 1.29 0.32
MAPE 1.00 0.62 0.62 0.76 0.61 1.09 0.75 0.68 1.08 0.65
MAE 1.00 0.42 0.51 0.81 0.43 1.31 0.79 0.77 1.24 0.63
Meeting
MSE 1.00 0.68 0.66 1.41 0.64 3.65 1.25 0.96 3.18 0.91
MAPE 1.00 0.61 0.62 0.82 0.65 1.15 0.79 0.65 1.15 0.68
MAE 1.00 0.72 0.73 1.05 0.75 1.54 1.00 0.88 1.46 0.85
Overall
MSE 1.00 0.28 0.32 0.80 0.27 2.03 0.72 0.61 1.86 0.50
MAPE 1.00 0.61 0.62 0.79 0.63 1.12 0.77 0.66 1.12 0.66
MAE 1.00 0.56 0.61 0.92 0.58 1.41 0.89 0.82 1.34 0.73
additional month of data to forecast the quarterly result, rendering
a forecasting horizon of ℎ= 2 months sufficient. Considering that
one out of three months’ numbers do not require estimation, a plausi-
ble anticipation would be to observe approximately a 33% reduction
in MSE (assuming an unbiased estimator and uncorrelated errors).
Such a decrease could already position several data-driven methods as
competitive alternatives to WSTS, which is examined in this section.
The average performance of these forecasts relative to WSTS’ is pre-
sented in Table 6, structured equivalently to Table 4.Table 6 presents
the average performance of data-driven forecasts with a horizon of
ℎ= 2 months across 110 product categories, relative to the forecasts
provided by WSTS. Each row represents a different error measure,
categorized by WSTS’ forecast types: algorithmic update, meeting (ex-
pert forecast), and overall. As before, lower values indicate better
performance, with the best value per row highlighted in bold and italic.
Contrasting these results with those presented in previous table
(Table 4), several notable differences emerge. Firstly, upon a cursory
glance of the results, it becomes evident that the data-driven ap-
proaches have exhibited markedly superior performance in this context.
Whereas WSTS’ expert forecasts (Meeting) previously outperformed the
data-driven forecasts in terms of MSE, MAPE, and MAE, the tables have
now turned, with the data-driven forecasts consistently showcasing su-
perior forecast accuracy in the new scenario (with the exception of SVM
and KNN for all as well as ET and RF for the Meeting MSE and MAE
comparisons). Specifically, the GPR model achieved a 36% lower MSE
than WSTS’ expert forecasts, followed by ETS (34% lower) and SARIMA
(32% lower). In terms of MAPE, SARIMA outperformed WSTS’ experts
by 39%, followed by ETS (38% lower), and GPR and SES (both 35%
lower). Additionally, SARIMA demonstrated the best performance in
terms of MAE (28% lower than WSTS), trailed by ETS (27% lower) and
GPR (25% lower). Even the simple ensemble, which even incorporates
the forecasts of the worse performing models, surpassed WSTS’ experts
by 9% in MSE, 32% in MAPE, and 15% in MAE. This suggests that data-
driven models incorporating the latest available information are highly
effective in forecasting outcomes within a shorter horizon. Further-
more, WSTS’ expert forecasts attained superior average performance in
terms of MAPE across all three forecast types with a forecasting horizon
of ℎ= 3 months. However, with a reduced horizon of ℎ= 2 months, the
top-performing data-driven forecasts now outperform WSTS by up to
39%. ARIMA and ETS emerged as the top performers, closely followed
by GPR.
Secondly, concerning the algorithmic updates, according to Table 6,
SARIMA and GPR once again emerge as one of the top-performing
models. In MSE, both SARIMA and GPR achieved errors 90% lower than
WSTS. Additionally, in terms of MAE, SARIMA attained a 58% lower
error, closely followed by GPR with a 57% reduction. While in terms of
MAPE, where WSTS previously outperformed data-driven models, GPR
achieved a 39% lower error, with SARIMA and ETS achieving a 38%
lower MAPE.
Table 7 offers insights into the mean ranks of the various forecast-
ing models across all 110 product categories, organized similarly to
Tables 4 and 6. It is important to note that, in contrast to Tables 4 and
6, the columns in Table 7 are not standardized by the WSTS column.
A rank of 1 indicates the model performed the best for that product
category based on the respective error measure, thus lower mean ranks
are preferred. Overall, the observations from Table 7 parallel those
from Table 6. Models such as SARIMA, ETS, and GPR consistently
garnered high ranks. Notably, most data-driven methods outperformed
WSTS’ expert forecasts, except for KNN and SVM, which exhibited
poorer performance.
Finally, Fig. C.4 in the Appendix illustrates the frequency of the
best-performing forecasts across 110 WSTS product categories, with
colors indicating performance metrics: red for Mean Squared Error
(MSE), blue for Mean Absolute Percentage Error (MAPE), and green
for Mean Absolute Error (MAE). The left panel reflect the ℎ= 3 months
forecast while ℎ= 2 is presented on the right side. It is evident that
WSTS forecasts rarely emerge as the top performers within any given
product category for ℎ= 2. While WSTS’ expert forecasts generally
outperform its algorithmic updates, data-driven models consistently
outshine both. When comparing against WSTS’ expert forecasts (second
row), SARIMA emerges as the best performer between 20% (MSE) and
30% (MAE) of the time, followed by ETS between 19% (MSE) and 27%
(MAPE). Additionally, GPR and SES each excel between 13% (MAE
and MAPE) and 17% and 18% (MSE), respectively. In contrast, WSTS’
expert forecasts demonstrate the best performance between 7% (MAE
and MAPE) and 13% (MSE) of the time.
Expert Systems With Applications 263 (2025) 125719
9
L. Steinmeister and M. Pauly
Table 7
Mean ranks of the forecasts with horizon h = 2 months across all 110 product categories. Each row refers to a different error measure, sorted
by WSTS’ forecast type: algorithmic update, meeting (expert forecast), and overall. Lower values are preferable. The best value per row is bold
and italic.
WSTS SARIMA ETS ET GPR KNN RF SES SVM Ensemble
Alg. Update
MSE 8.0 3.6 3.7 5.8 3.3 8.4 5.7 3.9 8.7 3.7
MAPE 8.2 3.4 3.7 5.8 3.6 8.6 5.6 4.1 8.4 3.5
MAE 8.0 3.4 3.8 5.9 3.5 8.6 5.7 4.1 8.5 3.6
Meeting
MSE 6.9 3.5 3.3 6.5 3.7 8.9 5.9 3.5 8.5 4.2
MAPE 7.2 2.9 3.4 6.7 3.4 8.8 6.2 3.7 8.6 4.2
MAE 7.0 2.9 3.5 6.8 3.5 8.8 6.2 3.7 8.5 4.1
Overall
MSE 7.8 3.3 3.4 6.3 3.3 8.8 5.9 3.6 8.7 3.8
MAPE 8.0 2.9 3.2 6.4 3.3 9.0 5.9 3.8 8.7 3.7
MAE 7.8 2.9 3.3 6.4 3.4 9.0 6.0 3.8 8.6 3.8
Table 8
Overview of the time series lengths by category.
Count Av. Length Min. Length Max. Length
Long 50 392 392 392
Medium 31 324 296 359
Short 29 222 92 284
All 110 328 92 392
These findings corroborate the trends observed in the preceding
analyses of this section, suggesting the reliability and effectiveness of
certain data-driven models over expert forecasts in near-term forecast-
ing scenarios when additional available information is incorporated,
supporting the second research hypothesis (H2).
5.3. Impact of time series length on forecast performance
This section aims to investigate the possibility that expert forecasts
perform better on shorter time series due to the limited data available
for training data-driven models (Hypothesis (H3)). To explore this third
hypothesis, the 110 product categories are divided into long, medium,
and short categories based on the length of available observations. An
overview of this categorization is presented in Table 8.
In total, the time series had an average length of 328 monthly
observations. Among these, the 50 product categories with available
monthly data points for the entire examined time period were classified
as ‘‘long’’ time series. The ‘‘medium’’ category comprised 31 time series
with observations ranging from 296 to 359, averaging 324 available
data points. The remaining 29 product categories were designated
as ‘‘short’’ time series, with observations ranging from 92 to 284,
averaging 222 available data points.
Similar to Table 6,Table 9 provides an overview of the perfor-
mance metrics Mean Squared Error (MSE), Mean Absolute Percentage
Error (MAPE), and Mean Absolute Error (MAE) for the 2-month fore-
casts. These metrics are segmented based on time series lengths: Short,
Medium, and Long, as delineated in Table 8, and further categorized
by WSTS’ forecast types: algorithmic update, meeting (expert forecast),
and overall.
For short time series, GPR demonstrated superior performance in
terms of both MSE and MAPE, showcasing reductions of 76% and 43%
respectively compared to WSTS’ combined forecasts. Regarding MAPE,
ETS exhibited the lowest error rate (37% lower than WSTS), trailed
by SES (36% lower), SARIMA (34% lower), and GPR (33% lower).
When contrasting the data-driven forecasts with WSTS’ expert forecasts,
ETS emerged as the top-performing model, tying with GPR in terms
of MSE (both 43% lower than expert forecasts) and outperforming
WSTS’ experts by 32% in terms of MAE. However, SES marginally
outperformed ETS in terms of MAPE, showing reductions of 35%
and 34% respectively. When comparing data-driven forecasts solely to
algorithmic forecasts, GPR emerged as the most accurate model across
all metrics, boasting reductions of 93% in MSE, 43% in MAPE, and 58%
in MAE. This explains the robust performance of GPR relative to WSTS’
combined forecasts.
It is also noteworthy to highlight the disparity in the performance
of data-driven methods for short time series depending on whether
quarters with expert forecasts or algorithmic updates were used for
the benchmarks. Given that these are derived from the same time
series, a consistent ranking of data-driven methods might have been
expected. The best-performing models for medium and long time series
remain largely consistent, regardless of whether they are evaluated
using algorithmic forecasts or expert forecasts. Even in cases where
there are differences, the margin is minimal.
In the Appendix,Table A.10 outlines the mean ranks of the analyzed
models, once again categorized by Mean Squared Error (MSE), Mean
Average Percentage Error (MAPE), and Mean Absolute Error (MAE),
and segmented by time series lengths. These rankings are further
delineated by WSTS forecast type, mirroring the structure of Table 9.
Consistent with the findings in Table 9, GPR demonstrated the
most favorable performance in terms of mean ranks across MSE (2.8),
MAPE (2.8), and MAE (2.9) for short time series when compared to
WSTS’ algorithmic forecasts. However, while ETS excelled in average
MSE and MAE, and SES in MAPE, this outcome is reversed when
considering mean ranks: SES exhibited the best performance in terms
of mean ranks for MSE and MAE, while ETS performed best for MAPE.
It is worth noting that the differences between them are minimal in
each case. Furthermore, ETS attained the lowest mean rank when all
quarters were taken into account (overall), suggesting that exponential
smoothing models perform well when data is limited. Furthermore, the
high accuracy of the data-driven forecasts compared to WSTS’ expert
forecasts is evidence contrary to the third research hypothesis (H3) that
expert forecasts would outperform data-driven forecast in the context
of short time series.
In fact, a slight increase in error relative to the expert forecasts is
observed for the best forecasts when medium-length time series are
considered in terms of MSE (0.58 vs. 0.57). Likewise, the data-driven
forecasts fared slightly worse in terms of MSE (0.64 vs. 0.57) and MAE
(0.72 vs. 0.68) for long time series. Given the additional information
which was utilized in training the models, the opposite might have
been expected. This was the case when the forecasts were evaluated
by MAPE (0.57 medium and long time series vs. 0.65 for short ones).
For medium time series, SARIMA generated the most reliable fore-
casts in terms of relative average error measures – 42% more accurate
than the experts polled by WSTS in terms of MSE, 43% more accurate
in terms of MAPE, and 36% more accurate in terms of MAE, followed
by ETS. This is also reflected in the mean ranks of the SARIMA forecasts
(2.9, 2.2, and 2.8 vs. 7.1, 7.8, and 7.5 respectively). Similarly, the
SARIMA forecasts demonstrated superior performance in comparison
to the algorithmic updates and when evaluated overall.
In the case of the long time series, the most accurate forecasts
resulted from the GPR and SARIMA models (Table 9). GPR resulted
in 36% lower MSE and a 43% lower MAPE compared to the experts.
SARIMA excelled in terms of MAE – 38% lower than WSTS. A similar
Expert Systems With Applications 263 (2025) 125719
10
L. Steinmeister and M. Pauly
Table 9
Average performance of the data-driven forecasts with horizon ℎ= 2 months across the different time series lengths and relative to WSTS. Each
row refers to a different error measure, sorted by WSTS’ forecast type: algorithmic update, meeting (expert forecast), and overall. Lower values
are preferable. The best value per row is bold and italic.
WSTS SARIMA ETS ET GPR KNN RF SES SVM Ensemble
Short
Alg. Update
MSE 1.00 0.12 0.11 0.25 0.07 0.58 0.22 0.29 0.60 0.14
MAPE 1.00 0.63 0.61 0.73 0.57 1.08 0.72 0.63 1.07 0.63
MAE 1.00 0.49 0.49 0.70 0.42 1.09 0.68 0.67 1.11 0.56
Meeting
MSE 1.00 0.74 0.57 0.86 0.57 1.35 0.77 0.65 1.26 0.66
MAPE 1.00 0.69 0.66 0.88 0.77 1.23 0.85 0.65 1.24 0.74
MAE 1.00 0.77 0.68 0.89 0.74 1.23 0.85 0.72 1.20 0.76
Overall
MSE 1.00 0.33 0.27 0.46 0.24 0.85 0.41 0.42 0.83 0.32
MAPE 1.00 0.66 0.63 0.80 0.67 1.15 0.78 0.64 1.16 0.68
MAE 1.00 0.63 0.58 0.79 0.57 1.16 0.76 0.70 1.16 0.65
Medium
Alg. Update
MSE 1.00 0.14 0.17 0.42 0.14 1.33 0.39 0.27 1.23 0.27
MAPE 1.00 0.53 0.57 0.76 0.54 1.08 0.75 0.61 1.06 0.63
MAE 1.00 0.45 0.48 0.81 0.45 1.33 0.79 0.63 1.32 0.64
Meeting
MSE 1.00 0.58 0.61 1.05 0.66 2.17 0.93 0.73 2.21 0.78
MAPE 1.00 0.57 0.65 0.83 0.64 1.20 0.79 0.66 1.15 0.67
MAE 1.00 0.64 0.67 0.97 0.72 1.45 0.92 0.76 1.48 0.81
Overall
MSE 1.00 0.31 0.34 0.66 0.34 1.66 0.60 0.45 1.61 0.47
MAPE 1.00 0.55 0.61 0.79 0.59 1.14 0.77 0.63 1.10 0.65
MAE 1.00 0.54 0.57 0.89 0.58 1.39 0.85 0.69 1.39 0.72
Long
Alg. Update
MSE 1.00 0.10 0.18 0.54 0.10 1.34 0.50 0.47 1.31 0.32
MAPE 1.00 0.67 0.68 0.78 0.68 1.09 0.77 0.76 1.11 0.68
MAE 1.00 0.41 0.52 0.83 0.43 1.34 0.81 0.80 1.25 0.64
Meeting
MSE 1.00 0.68 0.66 1.45 0.64 3.80 1.28 0.98 3.29 0.93
MAPE 1.00 0.58 0.58 0.78 0.57 1.07 0.75 0.64 1.09 0.64
MAE 1.00 0.72 0.75 1.08 0.76 1.60 1.03 0.93 1.50 0.88
Overall
MSE 1.00 0.27 0.33 0.81 0.27 2.08 0.74 0.63 1.91 0.51
MAPE 1.00 0.62 0.63 0.78 0.62 1.08 0.76 0.70 1.10 0.66
MAE 1.00 0.55 0.63 0.94 0.58 1.46 0.91 0.86 1.36 0.75
picture arises when mean ranks (Table A.10) are considered. An excep-
tion is the algorithmic update category, where forecasts based on the
ensemble method achieved the lowest mean ranks.
Fig. C.5 in the Appendix illustrates the frequency of the best-
performing forecasts with a forecast horizon of ℎ= 2 months. The
chart is divided into 3 columns corresponding to the time series lengths:
short, medium, and long. Rows are arranged by WSTS forecast type
(algorithmic update, meeting (expert), and overall). The colors differ-
entiate between Mean Squared Error (MSE) in green, Mean Absolute
Percentage Error (MAPE) in blue, and Mean Absolute Error (MAE)
in red. In concordance with the analysis of Tables 9 and A.10, GPR
shows the highest frequency of top model performance for short sime
series when compared to the algorithmic updates. Overall and for the
expert forecasts ETS and SES had the highest frequencies of highest
accuracy. For the medium and long time series, SARIMA, GPR, and the
exponential smoothing models excelled most often.
Additional details pertaining to the overall performance of the
various data-driven models for each product category categorized by
time series lengths is available in Table B.12.
5.4. Additional results
In addition the aggregated results presented in Sections 5.1–5.3, this
Section elaborates the results on a product category level.
Tables B.11,B.12, and B.13 in the Appendix, provide the Root Mean
Squared Errors (RMSE), which is the square root of the Mean Squared
Error (MSE) described in Section 4.2 and was chosen for readability
here, mean absolute percentage errors (MAPE), and Mean Absolute
Errors (MAE) respectively for various forecasting models across 110
different product categories. Each time, these are organized by time
series lengths, consistent with the categorization in Section 5.3. The
error measures for the forecasts with horizon ℎ= 3 months are
presented in the first half of each table and those of forecasts with
horizon ℎ= 2 months are presented in the second halves. Lower RMSE,
MAPE, and MAE values indicate higher forecasting accuracy, with the
best performing model per product category printed in bold and italic
in each row of each table.
In the 3-month forecast category, the results are consistent with
those discussed in Section 5.1. As can be seen in Tables B.11 and
B.13, data-driven methods, particularly GPR and SARIMA, are overall
able to outperform WSTS in terms of RMSE and MAE. Table B.11
reveals that this is in large part due to the strong performance of
the GPR ans SARIMA forecasts in terms of RMSE for a few product
categories such as M99 (37.72 for GPR vs 46.77 for WSTS), T99 (56.83
for GPR vs. 71.85 for WSTS), and S2 (49.17 for SARIMA vs 70.79
for WSTS). Similarly, scrutinizing Table B.13 indicates that among all
forecasts, WSTS’ was the most reliable for most product categories in
terms of MAE. But GPR, SARIMA, and ETS produced forecasts which
excelled for specific product categories, such as T99 (53.76 for WSTS
vs. 46.24 for SARIMA, 47.35 for GPR, and 50.85 for ETS), resulting in
a higher average performance for long time series. In terms of MAPE,
Table B.12 illustrates WSTS’ strong performance on the ℎ= 3 month
horizon across all time series lengths. Nevertheless, upon scrutinizing
Table B.12, it becomes apparent that data-driven models outperform
WSTS’ combined expert and algorithmic update forecasts in terms
of MAPE for specific product categories. For instance, in categories
such as J99 and L8a, the SARIMA model produces the lowest MAPE,
demonstrating its effectiveness in forecasting these particular products.
In some categories like L1, where the time series might exhibit unique
patterns or complexities, traditional statistical models such as SARIMA
and ETS perform inadequately compared to specific machine learning
models, in this case: RF.
Consistent with the observations in Section 5.2, these results are
reversed when the forecasts with a horizon of ℎ= 2 months are con-
sidered. In this scenario, data-driven forecasts excelled across the vast
majority of product categories in terms of RMSE (Table B.11), MAPE
(Table B.12), and MAE (Table B.13). Nevertheless, WSTS’ forecasts
were superior for some select product categories with long histories
in terms of RMSE, such as C7 (1.18 vs. 1.32 for the best performing
data-driven model: GPR), CC (1.29 vs. 1.47 for the best performing
Expert Systems With Applications 263 (2025) 125719
11
L. Steinmeister and M. Pauly
data-driven model: GPR), and S3 (2.77 vs. 2.94 for the best performing
data-driven model: SES).
Moreover, despite the dominance of the SARIMA, GPR, and ETS
among the data-driven models, it is noteworthy that other models with
poorer average performance still yielded in strong forecasts for select
product categories: ET was amongst the top performing models for the
A5 product category in terms of RMSE and MAPE, and SVM performed
best for the JCd product category in terms of RMSE and MAE. This
finding is underlined by Fig. C.4 located in Appendix. The bar chart
visualizes the frequency of the best-performing forecasts across 110
WSTS product categories. Differentiated by colors, green signifies the
best performance based on Mean Squared Error (MSE), blue represents
Mean Absolute Percentage Error (MAPE), and red indicates Mean Ab-
solute Error (MAE). The chart is divided into two sections: the left side
displays outcomes for forecasts with a horizon of ℎ= 3 months, while
the right side portrays forecasts with a horizon of ℎ= 2 months (to be
discussed in the subsequent section). Rows are arranged by algorithmic
update, expert forecasts (meeting), and overall performance.
Finally, Fig. C.4 in the Appendix illustrates the frequency of the best-
performing forecasts across 110 WSTS product categories, with colors
indicating performance metrics: red for Mean Squared Error (MSE),
blue for Mean Absolute Percentage Error (MAPE), and green for Mean
Absolute Error (MAE). The left panel reflect the ℎ= 3 months forecast
while ℎ= 2 is presented on the right side.
For ℎ= 3 months the superior performance of the WSTS models is
evident, particularly their Meeting and Overall forecasts. Depending on
the error measure, WSTS’ experts outperformed all data driven models
between 46% and 48% of all product categories. The most frequent best
performing data driven models were SARIMA, GPR and the exponential
smoothing models, ETS and SES. In aggregate, the forecasts provided
by WSTS (Overall row) achieved a top performance in 38% to 39% of
the cases. In contrast, the playing field was more even when only the
algorithmic updates are considered. WSTS achieved a top performance
for 21% to 23% of all product categories, followed by SARIMA with
17% to 23% and GPR with 13% to 19% of all product categories.
It is evident that WSTS forecasts rarely emerge as the top performers
within any given product category for ℎ= 2 months. While WSTS’
expert forecasts generally outperform its algorithmic updates, data-
driven models consistently outshine both. When comparing against
WSTS’ expert forecasts (second row), SARIMA emerges as the best
performer between 20% (MSE) and 30% (MAE) of the time, followed by
ETS between 19% (MSE) and 27% (MAPE). Additionally, GPR and SES
each excel between 13% (MAE and MAPE) and 17% and 18% (MSE),
respectively. In contrast, WSTS’ expert forecasts demonstrate the best
performance between 7% (MAE and MAPE) and 13% (MSE) of the time.
This highlights that there is no one model for all situations but that
the model choice should depend on the individual product category and
the error measure with the greatest business relevance.
6. Discussion
6.1. Summary
Section 2made the case that the semiconductor industry plays a cru-
cial role in the broader economy and stressed the importance of reliable
forecasts for operational and strategic decision making. Furthermore,
the rapidly evolving technologies, complicated geopolitical considera-
tions, and complex supply chains exposing the industry to the bullwhip
effect make data-driven forecasting more challenging. Concurrently,
industry insiders, such as those queried by the World Semiconductor
Trade Statistics (WSTS) – a leading provider of semiconductor market
data and forecasts, promise reliable forecasts based on a wealth of
quantitative and qualitative insider information.
This motivated the first research hypothesis (H1) that expert fore-
casts exhibit higher accuracy compared to data-driven forecasts. This
hypothesis was extensively examined in Section 5.1 for a forecast
horizon of ℎ= 3. The analysis of the benchmark concluded that the
bi-quarterly expert forecasts indeed demonstrated superior accuracy
on a quarterly horizon in terms of Mean Squared Error (MSE), Mean
Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE).
In contrast to the superior performance of the expert forecasts, it was
also found that the bi-quarterly algorithmic forecasts provided by WSTS
showed potential for further improvement.
Furthermore, it was noted that industry insiders may have access to
additional information owing to the timing of WSTS meetings, during
which the forecasts are formulated. This observation prompted the
formulation of the second research hypothesis (H2) that the additional
information yields competitive data-driven forecasts, which was in-
vestigated in Section 5.2. It was found that the additional data and
the shorter horizon of ℎ= 2 months significantly improved forecasts
based on quantitative models. Forecasts based on the SARIMA, GPR,
ETS, and SES models consistently demonstrated superior accuracy. As
a consequence, it is recommended that practitioners complement expert
forecasts with data-driven methods to enhance the forecasts reliability.
General wisdom holds that experts excel in situations with limited
historical data (Hyndman & Athanasopoulos,2018). Consequently, the
third research hypothesis (H3), which postulates that industry insiders
outperform in short time series due to restricted quantitative data
available for model training, was examined in Section 5.3. The anal-
ysis revealed that data-driven forecasts exhibited superiority across all
examined time series lengths. Nonetheless, different models showcased
higher accuracy under varying circumstances. Specifically, exponen-
tial smoothing models attained the highest accuracy for short time
series, whereas SARIMA dominated in the medium-length scenario.
Conversely, GPR outperformed for longer time series. This implies that
several diverse models should be evaluated and the one that aligns most
effectively with the given circumstances should be selected.
6.2. Implications, limitations, and future directions
This paper has presented the first comprehensive comparative anal-
ysis of expert semiconductor market forecasts. Long production lead
times of semiconductors means that fulfilled orders have to be placed
and the production has to be planned months in advance. As a result,
industry experts must be very accurate when it comes to short-term
sales forecasts. WSTS consolidates such industry forecasts to derive
market forecast. Hence, we hope that the strong performance of the
data driven methods motivates analysts and industry practitioners to
employ data-driven methods to enhance existing forecasts – even when
time series are short and the models are simple.
Additionally, this study shows that comprehensive multi-granularity
modeling of the semiconductor market is feasible. Therefore, our hope
is that this paper presents a first step in the direction of reconciling
the fields of semiconductor cycle prediction, which assumes a higher-
level view, and semiconductor sales and demand forecasting, which is
much more granular: (1) The semiconductor market consists of diverse
products from processors and memory chips to switches and sensors.
Therefore, a multi-granularity study of the semiconductor market could
yield insights into possible sub-cycles. (2) Conversely, demand and sales
forecasts could benefit from granular and reliable market forecasts as
an indicator for future sales. The same holds for semiconductor cycle
indicators. Both approaches remain open for future study.
Nevertheless, this study has several limitations. The first relating to
the short time series lengths, which had several consequences.
1. Model simplicity. We observed that simpler models such as
SARIMA, ETS, and GPR performed stronger than more complex
ML models such as RF. Furthermore, we abstained from the use
of even more complex models such as Long Short Term Memory
(LSTM) and boosting algorithms such as XGBoost.
2. Purely autoregressive modeling. We did not incorporate explana-
tory variables and relied solely on autoregressive modeling to
keep the number of parameters at a minimum.
Expert Systems With Applications 263 (2025) 125719
12
L. Steinmeister and M. Pauly
Table A.10
Mean ranks of the forecasts with horizon h = 2 months across the different time series lengths. Each row refers to a different error measure,
sorted by WSTS’ forecast type: algorithmic update, meeting (expert forecast), and overall.