Content uploaded by Shi-Qiang (Samuel) Liu
Author content
All content in this area was uploaded by Shi-Qiang (Samuel) Liu on Jan 24, 2025
Content may be subject to copyright.
Original Paper
Iron Ore Price Forecast based on a Multi-Echelon Tandem
Learning Model
Weixu Pan,
1
Shi Qiang Liu ,
1,7
Mustafa Kumral,
2
Andrea DÕAriano,
3
Mahmoud Masoud,
4
Waqar Ahmed Khan,
5
and Adnan Bakather
6
Received 21 November 2023; accepted 10 May 2024
Iron ore has had a highly global market since setting a new pricing mechanism in 2008. With
current dollar values, iron ore concentrate for sale price, which was $39 per tonne (62% Fe)
in December 2015, reached $218 per tonne (62% Fe) in mid-2021. It is hovering around $120
in October 2023 (cf. https://tradingeconomics.com/commodity/iron-ore). The uncertainty
associated with these fluctuations creates hardship for iron ore mine operators and steel-
makers in planning mine development and making future sale agreements. Therefore, iron
ore price forecasting is of special importance. This paper proposes a cutting-edge multi-
echelon tandem learning (METL) model to forecast iron ore prices. This model comprises
variational mode decomposition (VMD), multi-head convolutional neural network
(MCNN), stacked long short-term-memory (SLSTM) network, and attention mechanism
(AT). In the proposed METL (i.e., the combination of VMD, MCNN, SLSTM, AT) model,
the VMD decomposes the time series data into sub-sequential modes for better measuring
volatility. Then, the MCNN is applied as an encoder to extract spatial features from the
decomposed sub-sequential modes. The SLSTM network is adopted as a decoder to extract
temporal features. Finally, the AT is employed to capture spatial–temporal features to
obtain the complete forecasting process. Extensive computational experiments are con-
ducted based on daily-based and weekly-based iron ore price datasets with different time
scales. It was validated that the proposed METL model outperformed its single-echelon and
other categorized models by 10–65% in range. The proposed METL model can improve the
prediction accuracy of iron ore prices and thus help mining and steelmaking enterprises to
determine their sale or purchase strategies.
KEY WORDS: Iron ore price forecasting, Deep learning, Variational mode decomposition,
Decoder–encoder network, Attention mechanism.
1
School of Economics and Management, Fuzhou University, Fuz-
hou 350108, China.
2
Department of Mining and Materials Engineering, McGill
University, 3450 University Street, Montreal, Quebec H3A 0E8,
Canada.
3
Department of Civil, Computer Science and Aeronautical Tech-
nologies Engineering, Roma Tre University, 00146 Rome, Italy.
4
Department of Information Systems and Operations Manage-
ment, King Fahd University of Petroleum and Minerals, Dhahran,
Saudi Arabia.
5
Department of Industrial Engineering and Engineering Man-
agement, University of Sharjah, Sharjah, United Arab Emirates.
6
Center of Finance and Digital Economy, King Fahd University of
Petroleum and Minerals, Dhahran, Saudi Arabia.
7
To whom correspondence should be addressed; e-mail: sam-
sqliu@fzu.edu.cn
Ó2024 International Association for Mathematical Geosciences
Natural Resources Research (Ó2024)
https://doi.org/10.1007/s11053-024-10360-2
INTRODUCTION
Iron ore and its derivative steel are the back-
bones of the modern world. The countries such as
Australia, Brazil, China, India, and Russia are major
suppliers (Sahoo et al., 2021). From the distribution
of iron ore production, Australia is the worldÕs lar-
gest producer of iron ore. In 2022, AustraliaÕs iron
ore production was 880 million tons, accounting for
34% of global iron ore production. Next are Brazil,
China, and India, with iron ore production of 410
million tons, 380 million tons, and 290 million tons in
2022, accounting for 16%, 14%, and 11%, respec-
tively (cf. https://www.chyxx.com). The high iron ore
production in Australia and Brazil is mainly due to
the abundance of iron ore resources, with large
amounts of iron ore resources being used for export.
The fast economic development in some developing
countries, such as China and India, also motivates
their mining enterprises to increase resource explo-
ration and production efficiency. China also imports
almost 70% of the iron ore supplied. The iron ore
produced in China is relatively low quality and
blended with imported ores. It was reported in 2023
that ‘‘China imports most of its iron ore from Aus-
tralia and Brazil, taking 1.1 billion metric tons every
year and accounting for most of its steel-making
needs’’ (Lin, 2023). The vast majority of the worldÕs
iron ore extraction and exports are heavily depen-
dent on Australia and Brazil. Given that iron ore
production corporations are transnational compa-
nies, understanding market dynamics requires cor-
poration level analysis, as well as country and global
level analyses. A large quantity of iron ore is ex-
tracted by world-class corporations, namely,
Fortescue Metals Group, BHP Billiton, Rio Tinto,
and Vale. The market structure is based on the
performances of several exporters and importers to
a large extent (Kim et al., 2022). Australia, China,
and Brazil are three key players in this market. Iron
ore plays an important role in the Australian econ-
omy.
From the distribution of iron ore consumption,
iron ore is one of the indispensable raw materials for
many end-consumer goods (Wang & Sun, 2024).
With the rapid development of downstream appli-
cation fields, the demand for iron ore continues to
expand. In recent years, the consumption of iron ore
has steadily increased. 98% of iron ore concentrate
is used in steelmaking. China is the largest producer
of steel and a major trader of iron ore (Wang et al.,
2022). Particularly, ChinaÕs steelmaking production
accounted for 54% of the world production in
2023. Iron and steel are essential inputs for ChinaÕs
most important industries such as automobile and
construction (Lv et al., 2022). The import and usage
of iron ore resources will steadily continue along
with ChinaÕs economic growth. In summary, the
market dynamics of the iron ore market depend on a
series of macro- and micro-economic factors as well
as global economic developments, speculations, and
extreme events. In line with these dynamics, iron ore
prices are highly volatile. Figure 1shows iron ore
(62% Fe) prices in the last 12 years. Regarding iron
ore prices, readers are referred to the Iron Ore Fine
China Import (e.g., 62% grade spot cost and freight
for the delivery at the Qingdao port terminal).
Iron ore prices are governed by supply and
demand. For example, disruptions in large mines or
their supply changes, such as extreme events and
safety and environmental issues, cause price fluctu-
ations. On the demand side, the growth of the
emerging countries ascertains the prices. ChinaÕs
government-backed infrastructure projects have
been the main driver of iron ore prices. Fortescue
Metals GroupÕs stock prices provide insight into iron
ore prices. ChinaÕs estate marketÕs direction is also
an indicator of iron ore prices. Finally, since the US
10-year government bond and interest rates affect
the worldÕs growth, it is also a factor affecting the
iron ore market.
Historically, iron ore was traded on the annual
benchmark pricing mechanism (Kim et al., 2022).
Since the last 10–15 years, the index pricing mech-
anism has replaced the annual benchmark pricing
mechanism, resulting in that the iron ore financial
derivatives market has developed rapidly (Huang
et al., 2020). A long-term commitment contract is
widely used in the iron ore trade (Sauvageau &
Kumral, 2017). These contacts usually range 5–25
years. Growing demand from China also led to
forming future markets. The three worldÕs biggest
corporations dominate in the iron ore contract
trading. Many junior corporations trade their ores
through swap contracts. The pricing mechanism is
based on bid-and-ask prices. The best price obtained
is the spot price at the end of the day. The future
markets allow iron ore operators to hedge their price
risk. However, COVID-19 and the associated
recession have escalated price uncertainties in re-
cent years. As a result of future economic, interna-
tional, and social development plans, many countries
have materialized their strategic plans to ensure the
secure availability of mineral resources. Even
W. Pan et al.
though iron ore was not listed as a critical mineral
resource in many countriesÕlist, it is still essential for
infrastructural development and technological pro-
gress. As well as the critical mineral strategies of
countries, combating global warming also needs a
multitude of mineral resources. Therefore, the field
of mining science and technology has received con-
siderable interest in recent years, especially from
analytical or mathematical perspectives (e.g., fore-
casting, machine learning (ML), and operations re-
search). Investigating price trends, direction, and
forecasting is a key topic for the iron ore industry.
Emerging ML techniques offer new tools for
addressing this topic.
Existing prediction models for iron ore prices
can be divided into three major categories: econo-
metrical models (e.g., ARIMA (autoregressive
integrated moving average), ARCH (autoregressive
conditional heteroscedasticity), GARCH (general-
ized autoregressive conditional heteroscedasticity)),
deep learning (DL) (e.g., convolutional neural net-
work (CNN), long-short-term-memory (LSTM), and
attention mechanism (AT)), and ensemble learning
(e.g., CNN–LSTM, LSTM–AT). Although econo-
metric models are straightforward, they cannot
capture effectively the overall trend of prices due to
assumptions and parameter tuning (Jnr et al., 2022).
In recent years, DL models have obtained satisfac-
tory output in dealing with nonlinear data (Yin
et al., 2021). However, the performance of DL
models may be over-fitting to decrease the level of
prediction accuracy (Williams et al., 2021). Because
the single model cannot constantly achieve a desir-
able prediction performance, the so-called ensemble
learning technique was recently introduced by inte-
grating econometrics with DL methods to obtain
better predictive performance (Bai et al., 2022).
However, most ensemble learning models are un-
able to reduce the volatility of iron ore prices, as
evidenced by the following literature review section.
To overcome the limitations of the above three
prediction models (i.e., econometric, DL, and
ensemble learning models), this study attempted to
design a hybrid forecasting model by answering the
following questions.
How can the overall trend of multivariate iron
ore prices be analyzed?
How can parameter tuning and over-fitting be
handled in tandem?
How the accuracy and convergence of the fore-
casting model be improved?
How can the volatility of iron ore prices be
controlled in a better way?
How can the forecasting performance of various
models be evaluated using a comparative analy-
sis?
This paper proposes a novel forecasting model,
called multi-echelon tandem learning (METL),
which combines variational mode decomposition
(VMD), multi-head CNN (MCNN), stacked LSTM
(SLSTM) network, and AT in tandem. The METL
model is a tandem of high-level layers where each
single-echelon model (i.e., VMD, MCNN, SLSTM,
AT) is a tandem of the layers. Our proposed METL
compromises the merits of the econometric, DL, and
ensemble learning models, as well as adopts the di-
Figure 1. Iron ore prices over the past 12 years (2012–2023).
Iron Ore Price Forecast
vide-and-conquer data-feature-driven decomposi-
tion techniques such as VMD. The main contribu-
tions of this paper are as follows.
This study designed a novel METL model by
integrating the VMD method with DL tech-
niques (e.g., MCNN, SLSTM, and AT) for fore-
casting iron ore prices.
The VMD method is applied to decompose the
iron ore price data into multiple simple modes of
various frequencies for reducing the volatility of
iron ore prices.
Based on the MCNN and SLSTM with AT, an
end-to-end encoder–decoder neural network is
devised to extract nonlinear features from simple
modes of various frequencies.
To avoid getting trapped in local optima, a novel
RAdam-based training algorithm is developed to
improve the accuracy and convergence of the
proposed METL model.
A comparative analysis was conducted to evalu-
ate the proposed METL model and its catego-
rized models through extensive computational
experiments.
The remainder of this paper is organized as fol-
lows. The next section provides a literature review of
econometric, DL, and ensemble learning models for
predicting the prices of iron ore and other bulk com-
modities (e.g., copper, gas, and oil). The methodology
section details the proposed METL model. Compu-
tational results are reported to evaluate the proposed
model in the experiment section. The final section
draws conclusions and future work.
LITERATURE REVIEW
A literature review relevant to prediction
models of iron ore and other bulk commodities is
divided into three main categories, namely econo-
metric, DL, and ensemble learning models.
Econometric Models
Econometric models can be thought of as a
combination of economics, statistics, and mathe-
matics to make inferences about an economic phe-
nomenon. Significant research was devoted to
developing econometric models relating to mineral
commodities. To analyze the relationship on the iron
ore price and other commodity prices, Kim et al.
(2023) reviewed three statistical models: bivariate
nonlinear regression (BNLR), multiple linear
regression (MLR), and multiple nonlinear regres-
sion (MNLR), to forecast the iron ore prices. Using
the Baltic dry index (BDI), Chen et al. (2023)de-
signed a statistical model to explore the dynamic
volatility spillovers between the BDIs and iron ore
prices. Kim et al. (2022) implemented augmented
Dickey–Fuller (ADF) model to study the interde-
pendency between the market of iron ore and the
markets of other commodities (e.g., coal, copper,
oil), by analyzing the relationship between monthly
prices of iron ore and monthly prices of twelve other
commodities. To investigate the complete price
chain of global inflation, Chen and Yang (2021)
employed a structural vector autoregression
(SVAR) model to examine the influence of iron ore
price shocks. Ma (2021a) formulated a copula model
by conducting a sensitivity analysis of iron ore
pricing on ChinaÕs steel price fluctuations to explore
the influence of the spillover risk between iron ore
and steel price. Zhang & Zhou, (2021) proposed a
heterogeneous autoregressive (HAR) model to
investigate the impact of the day-of-the-week effect
on the iron ore price. Ma (2021b) developed a cop-
ula model with the spillover index to examine time-
varying spillovers and dependencies between iron
ore prices and steel prices. To investigate the non-
linear path of monthly time series for the iron ore
imported, Wang et al. (2020) proposed a hybrid
model, which integrates the empirical mode
decomposition (EMD) method, nonlinear autore-
gressive neural network (NARNN), and autore-
gressive integrated moving average (ARIMA)
model, to forecast the iron ore price. To evaluate the
performance influence of the iron ore mining
industry research and development policy, Sun &
Anwar (2019) developed a so-called coarsened exact
matching (CEM) statistical technique to analyze the
performance of the iron ore mining data. Zhu et al.
(2019) developed a rolling window regression
(RWR) model to analyze the change cause-and-ef-
fect of major iron ore suppliers in the international
iron ore market export. Wa
˚rell (2018) proposed a
quantile regression (QR) model to analyze the
change cause-and-effect of the iron ore pricing re-
gime. To investigate the relationship between the
iron ore price of industry concentration, demand,
and supply constraint, Su et al. (2017) designed a
generalized supremum augmented Dickey–Fuller
(GSADF) model to examine whether there exist
W. Pan et al.
multiple bubbles of iron ore prices. Chen et al.
(2016) developed a QR model with lagged variables
to measure key factors of iron ore import prices and
provided the reference for selecting the appropriate
prediction model based on empirical studies.
The main characteristics related to econometric
models for iron ore price prediction are given in
Table 1. Many econometric models directly investi-
gated the relationship between iron ore price and
the market. Econometric models can reasonably
explain the factors affecting the iron ore market
mechanism. However, the prediction bias and vari-
ance can be high when dealing with the nonlinear
time series of iron ore prices. EMD can handle these
issues. Furthermore, most studies on econometric
models considered the univariate data of iron ore
prices. In addition, these studies focused on a single-
scale data (i.e., iron ore, steel), primarily based on
linear assumptions and stationary data. Therefore,
these works have limitations to effectively capture
the overall trend of iron ore prices.
Deep Learning Models
The rapid development of artificial intelligence
(AI) offers the potential to tackle the above limita-
tions of econometric methods. The early AI models
are based on ML methods, such as support vector
regression (SVR), group method of data handling
(GMDH), classification and regression tree (CRT),
and artificial neural network (ANN). With the ad-
vent of the big data era, DL models have achieved
prediction performance in dealing with nonlinear
and non-stationary data.
Many papers on commodity price prediction
based on DL models were discussed. For example,
Xu et al. (2023) developed a one-dimension convo-
lution neural network (1DCNN) combined with a
bidirectional LSTM (BiLSTM) network for heat
load (i.e., iron ore, coke) prediction. Feili et al.
(2023) proposed a convolutional neural network
(CNN) model with long short-term memory (LSTM)
network to predict copper prices from raw data with
attribute features and high-level semantics. Deng
et al. (2023) designed a multiple timeframes extreme
gradient boosting (MTXGBoost) model to predict
crude oil prices from the perspective of multiple
high-frequency. Wang & Li (2022) designed an SVR
model with an AdaBoost algorithm to predict iron
ore prices based on Dalian Commodity Exchange.
To explore the influence of various factors on the
volatility of iron ore prices, Lv et al. (2022) proposed
an ANN model with the search and rescue opti-
mization algorithm to increase the accuracy of iron
ore price prediction. Das et al. (2022) proposed an
extreme learning machine (ELM) method with a
quasi-based crow search algorithm to predict oil and
gold prices. To analyze the effect of the high-di-
mensional monthly variables on price, Boubaker
et al. (2022) proposed a recursive neural network
(RNN) method to predict crude oil prices. Khosha-
Table 1. Characteristics analysis of econometric models for iron ore price prediction
Authors (year) Data types Duration Variable types Methods Scales
Kim et al. (2023) Iron ore price Daily Multivariate BNLR, MLR, MNLR Multi scales
Chen et al. (2023) Iron ore and oil price Daily Multivariate Copula–VAR–GARCH Multi scales
Kim et al. (2022) Iron ore price Monthly Multivariate ADF Multi scales
Ma (2021a) Iron ore, steel price Daily Multivariate Copula Multi scales
Chen and Yang (2021) Iron ore price Daily Univariate SVAR Single scale
Zhang & Zhou, (2021) Iron ore price Daily Multivariate HAR Multi scales
Ma (2021b) Iron ore, steel price Daily Univariate Copula Single scale
Wang et al. (2020) Iron ore price Monthly Univariate NARNN–ARIMA–DE Single scale
Sun & Anwar (2019) Iron ore mining data Daily Univariate CEM Single scale
Zhu et al. (2019) Iron ore market data Daily Univariate RWR Single scale
Wa
˚rell (2018) Iron ore prices Monthly Univariate QR Single scale
Su et al. (2017) Iron ore prices Monthly Multivariate GSADF Multi scales
Chen et al. (2016) Iron ore prices Monthly Multivariate QR Multi scales
*
BNLR bivariate nonlinear regression, MLR multiple linear regression, MNLR multiple nonlinear regression, VAR vector autoregression,
GARCH generalized autoregressive conditional heteroskedasticity, ADF augmented Dickey–Fuller, SVAR structural vector
autoregression, HAR heterogeneous autoregressive, NARNN nonlinear autoregressive neural network, ARIMA autoregressive
integrated moving average, DE decomposition, CEM coarsened exact matching, RWR rolling window regression, QR quantile
regression, GSADF generalized supremum augmented Dickey–Fuller
Iron Ore Price Forecast
lan et al. (2021) applied three DL models to predict
copper prices, which include gene expression pro-
gramming (GEP), ANN, and adaptive neuro–fuzzy
inference system (ANFIS). Huang et al. (2020) de-
signed a Bayesian network (BN) model to measure
the probability of high-risk iron ore investment. Li
et al. (2020) designed a GMDH model for predicting
iron ore prices from the relationship between the
supply and demand of iron ore. They demonstrated
that GMDH outperforms the other models such as
ARIMA, SVR, ANN, CRT. To investigate the
influence of the future investment and decision for
mining projects and related companies, Ewees et al.
(2020) proposed a multilayer perceptron (MLP)
model with a chaotic grasshopper optimization
algorithm to forecast iron ore prices. Elaziz et al.
(2020) proposed an ANN model with an adaptive
inference system to discuss the influence of crude oil
price fluctuation on the economy and country.
Alameer et al. (2019) designed an ANN model with
fuzzy logic systems to forecast the volatility of cop-
per prices. Weng et al. (2018) developed an ELM
method with a genetic algorithm to investigate the
influence of the Bayesian information criterion on
the relevant variables of iron ore prices. Ou et al.
(2016) proposed an ELM method with gray relation
analysis to predict the iron ore prices.
The characteristics of the above DL models in
terms of authors (year), data types, methods, and
results are summarized in Table 2. Most studies on
DL models can deal with nonlinear issues via a
stochastic training algorithm with ANN, RNN,
CNN, and LSTM. Compared with econometric
models, most DL models analyze the influence on
the market information of the iron ore to learn the
characteristics and rules of the iron ore price trend
change. However, these DL methods do not fully
consider the volatility characteristics of the iron ore
data, as they take a long time to train the neural
network to extract high-dimensional data features.
In addition, DL models are sensitive to the size of
parameters because the number of nodes in hidden
layers grows exponentially.
Ensemble Learning Models
An ensemble learning (EL) model is a hybrid
learning model that combines DL models (e.g.,
ANN, CNN or LSTM) with decomposition tech-
niques (e.g., EMD) to obtain better predictive per-
formance. The following papers relevant to bulk
commodities prediction by EL models are discussed.
To analyze the influence of the price volatility of the
oil, Dong et al. (2023) designed an EL model that
integrates the ensemble empirical mode decompo-
sition (EEMD), ARIMA, and LSTM models to
predict crude oil prices by decomposing the original
Table 2. Characteristics analysis of deep learning models for commodities
Authors (year) Data type Methods Results
Xu et al. (2023) Iron ore, coke 1DCNN–BiLSTM Presents better performances than other method in accuracy
Feili et al. (2023) Copper price CNN–LSTM Outperforms other comparison methods such as CNN, LSTM
Deng et al. (2023) Crude oil price MTXGBoost More accurate than benchmark models under RMSE metrics
Wang & Li (2022) Iron ore price SVR Proves advantageous to other models under RMSE metrics
Lv et al. (2022) Iron ore price ANN Shows a high accuracy method for forecasting iron ore price
Das et al. (2022) Oil, gold price ELM Outperforms other methods such as ANN, SVR
Khoshalan et al. (2021) Copper price GEP, ANN, ANFIS Outperforms the comparison models such as GEP and ANFIS
Boubaker et al. (2022) Crude oil prices ANN, RNN Outperforms the benchmarks by 12.5% in terms of the RMSE
Huang et al. (2020) Iron ore price BN Outperforms other comparison models such as ANN, ANFIS
Li et al. (2020) Iron ore price GMDH Significantly better than other predictive models under RMSE metrics
Ewees et al. (2020) Iron ore price MLP Shows better than other comparison models such as NN, ANN
Elaziz et al. (2020) Crude oil price ANN Proves advantageous to the other comparison models
Alameer et al. (2019) Copper price ANFIS Achieves better than other compared models such as ANN
Weng et al. (2018) Iron ore price ELM Achieves the state of art performance in iron ore prediction
Ou et al. (2016) Iron ore price ELM Offers an accurate and rapidly predicting result of iron ore price
*
1DCNN one-dimension convolution neural network, BiLSTM bidirectional long short-term memory, CNN convolutional neural networks,
LSTM long short-term memory, MTXGBoost multiple timeframes extreme gradient boosting, SVR support vector regression, ANN
artificial neural network, GEP gene expression programming, ANFIS adaptive neuro–fuzzy inference system, RNN recursive neural
network, RMSE root mean square error, BN Bayesian network, GMDH group method of data handling, MLP multilayer perception neural
network, ELM extreme learning machine.
W. Pan et al.
data into high-frequency and low-frequency terms.
Nasir et al. (2023) developed a novel EL model
that combines local mean decomposition (LMD),
ARIMA, and LSTM models to predict crude oil
prices. Ke et al. (2023) designed a hybrid model
that integrates the EEMD technique with the
LSTM model to reduce the volatility feature of
commodity futures gold and copper prices. Fig-
ueiredo & Saporito (2023) proposed an EL model
that combines the VAR model with the LSTM
neural network to investigate the influence of the
dynamic Nelson–Siegel and the Schwartz–Smith
factor of future gold prices. Yang et al. (2023a,
2023b) proposed an EL model that integrates
LSTM, ANN, and support vector machine (SVM)
to adjust the influence of price fluctuation on gas
importers and institutions. To investigate the
potential growth and cyclical fluctuations of the
copper company, Zhao et al. (2023) developed an
EML model to predict monthly copper prices in the
future by using different methods, including MLP,
SVR, and extreme gradient boosting (XGBoost).
To reduce the unpredictability of the fiscal dis-
pensation and speculative marketÕs exacerbation,
Antwi et al. (2022) applied an EMD method to
decompose the data of crude oil and gold prices.
They designed a hybrid model that combines the
back-propagation neural network with the ARIMA
model to predict crude oil and gold prices. Wei
et al. (2022) designed an improved complete
EEMD with adaptive noise (ICEEMDAN) ap-
proach with VAR technique to investigate the
Granger causality and characteristics of iron ore
daily spot and futures prices. To reduce the com-
prehensive effects of market supply and demand on
speculative trading and other factors, Hu (2021)
developed an EL model for crude oil prices that
combines complete EEMD with adaptive noise
(CEEMDAN), LSTM with an attention mecha-
nism. Jabeur et al. (2021) proposed a Shapley
additive explanations (SHAP) model with an
XGBoost algorithm to predict gold prices for
making the correct decisions for financial institu-
tions and mining companies. Li et al. (2021)de-
signed an EL model that combines bidirectional
gated recurrent unit (BiGRU) with variational
mode decomposition (VMD) to predict gold prices
for analyzing the influence of the internal and
exterior factors within the gold futures market
trends. Lin et al. (2021) proposed an EEMD model
with the K-nearest neighbor (KNN) method to
predict gold prices to further discuss the influence
on the internal and exterior market environment.
Tuo & Zhang (2020) designed an EEMD technique
Table 3. Characteristics analysis of ensemble learning models for commodities
Authors (year) Data type Methods Results
Dong et al. (2023) Crude oil prices ARIMA, LSTM Presents better RMSE and MAE results in oil price predic-
tion
Nasir et al. (2023) Crude oil prices ARIMA, LSTM Outperforms the single models in accuracy
Ke et al. (2023) Gold, copper prices LSTM Achieves better predictive performance than other models
Figueiredo & Saporito
(2023)
Oil, copper prices LSTM Shows better RMSE and MAPE results than other models
Yang et al. (2023a,2023b) Natural gas prices LSTM, ANN,
SVM
Achieves higher accuracy than other models
Zhao et al. (2023) Copper prices MLP, SVR Performs well in the monthly copper price prediction
Antwi et al. (2022) Crude oil, gold pri-
ces
BPNN, ARIMA Shows better MAPE and MAE results in forecasting prices
Wei et al. (2022) Iron ore price VAR Achieves high performance in iron ore price prediction
Hu, (2021) Crude oil prices LSTM, ADD Outperforms other methods for forecasting prices
Jabeur et al. (2021) Gold prices SHAP Shows better MAPE and MAE results than other models
Li et al. (2021) Gold prices BiGRU Outperforms several other trading strategies
Lin et al. (2021) Gold prices KNN Outperforms other comparison models
Tuo & Zhang (2020) Iron ore price GORU Provides better ability to predict iron ore prices
*
ARIMA autoregressive integrated moving average; LSTM long short-term memory; ANN artificial neural network; SVM support vector
machine; MLP multilayer perception neural network; SVR support vector regression; BPNN back-propagation neural network; VAR
vector autoregression; ADD attention; SHAP Shapley additive explanations; BiGRU bidirectional gated recurrent unit; KNN K-nearest
neighbor; GORU gated orthogonal recurrent unit; RMSE root mean square error; MAE mean absolute error; MAPE mean absolute
percentage error
Iron Ore Price Forecast
with a gated orthogonal recurrent unit (GORU)
neural network to predict iron ore prices for risk
management at mining enterprises and institutions.
The main characteristics related to ensemble
learning models for bulk commodities prediction are
analyzed in Table 3. Current studies on iron ore
price forecasting rarely use two or more decompo-
sition techniques to decompose iron ore price data.
Moreover, these EL methods only decompose the
feature of univariate that they ignore some other
factors on the iron ore price. The fluctuation of the
iron ore price data is jointly affected by a variety of
factors, while a single decomposition is not detailed
enough to conduct an in-depth analysis of the iron
ore price data. Most studies on EL models consid-
ered the univariate price data. In addition, it is found
that most of them used the single-scale data of bulk
commodities (e.g., crude oil, copper, and gold), but
only two papers studied iron ore price prediction.
Therefore, these studies are hard to effectively
capture the overall trend of iron ore prices.
Iron ore prices with two types of datasets
Modal decomposition layer
IMF1IMF2IMFn
CNN CNN
CNN
Flatten Flatten Flatten
Fully connected layer
CNN
CNN
CNN
LSTM LSTM
LSTM LSTM LSTM
LSTM LSTM LSTM
LSTM
Data
Decomposition
Echelon
Component
Prediction
Echelon
Tandem
Optimisation
Echelon Tandem modelRectifie d Adam al gorithm
Attention me chanism layer
Encoder-Decoder network
Tandem
Evaluation
Echelon
Iron ore prediction
Fully connected layer
Pool
Pool Pool LSTM LSTM LSTM
Model evaluation
Ten comparison models
Five evaluation metrics
Figure 2. The structure of the proposed METL model.
Step 1:Initialize {û
1
k
},{ω
1
k
},λ
1
,n=0
Step 2: According to Eq. (3), update
{û
k
}, k++
Step 3: According to Eq. (4), update
{ω
k
}, k++
Step 4: According to Eq. (5), update
{λ}
If kİK?
Output {û
k
},{ω
k
}
Yes
Yes
No
No
Convergence?
Figure 3. The main process of the VMD model.
W. Pan et al.
Research Gaps
Based on the above literature review, several
main research gaps are identified as follows.
Econometric models are inapplicable to predict iron
ore prices because they are based on considerable
assumptions and parameter tuning. DL models (e.g.,
ANN, CNN or LSTM) are inclined to generate over-
fitting phenomena over time because they belong to
a single-echelon model; thus, they cannot guarantee
to behave consistently across different forecasting
scenarios especially for the case of iron ore prices.
Most ensemble learning models are onerous for
training of data as a considerable number of weights
associated with several neural network layers should
be fine-tuned; thus, their convergence speeds may
speedily slow down when multicollinearity exists. In
general, econometric, DL, and ensemble learning
models are unsuitable for predicting iron ore prices.
Therefore, this study aimed to propose a hybrid
prediction model called METL, which combines
VMD, MCNN, SLSTM network, and AT in tandem.
MULTI-ECHELON TANDEM LEARNING
METHODOLOGY
Figure 2presents the structure of the proposed
METL model, which contains four echelons includ-
ing the data decomposition echelon (see details in
the next first subsection), the component prediction
echelon (see details in the second subsection), the
tandem optimization echelon (see details in the third
subsection), and the tandem evaluation echelon (see
details in the fourth subsection). In the first echelon,
the data decomposition echelon takes the iron ore
prices with two types of datasets as input. Then, the
modal decomposition layer decomposes the iron ore
data into intrinsic mode function (IMF) and resid-
uals of the sequence components. In the second
echelon, the component prediction echelon extracts
the feature between the decomposition data with
multivariate factors related to iron ore price via the
encoder–decoder network. In the third echelon, the
tandem optimization echelon trains the component
Atten tion mechan ism l ayer
Fully conne cted
layer
Input
layer
Convolut iona l
layer
Poolin g
layer
Flatten
layer
Fully co nnected
layer
Multi-headCNN
Encoder netwo rk Decoder network
Fully co nnected
layer
Figure 4. Structure of encoder–decoder network in the component prediction echelon.
Iron Ore Price Forecast
prediction echelon. Finally, the tandem evaluation
echelon assesses the complete prediction result.
Data Decomposition Echelon
The data decomposition echelon mainly takes
the original data as input, and then, it uses the signal
decomposition technology to decompose into sub-
sequences with relatively simple frequency compo-
nents. The VMD method is used to construct the
modal decomposition layer. The VMD method is an
adaptive signal processing method. The original data
can be decomposed into the IMF and residuals of
the sequence by the VMD method. The VMD
method aims to decompose the real-valued input
signal finto discrete sub-modes ukwith a finite
number of k. Each mode ukxðÞis mostly compact to
the frequency center xk. The estimated bandwidth
pattern of the variational optimization model is:
min
uk
fg
;xk
fg
Pk@xdxðÞþj
px
ukxðÞ
ejxkx
2
2
no
Pk
k¼1ukxðÞ¼f
8
<
:ð1Þ
where dxðÞis the unit pulse function, @xis the partial
derivative operator, is the convolution function,
uk
fgis the kth modal component, xk
fgis the fre-
quency center corresponding to the kth mode. kis
the number of all modes decomposed. The optimal
solution of the above problem is found by con-
structing the augmented Lagrangian function, thus:
Lu
k
fg
;xk
fg
;kðÞ¼aXk@xdxðÞþj
px
ukxðÞ
ejwkx
2
2
þfxðÞ
X
k
ukxðÞ
2
2
þkxðÞ;fxðÞ
XkukxðÞ
DE
ð2Þ
where adenotes the quadratic penalty factor and k
denotes the Lagrange multiplier. The alternating
direction multiplier method is used to alternately
update uk,wkand k, respectively, as:
^
unþ1
kxðÞ¼
^
fxðÞ
Pk1
i^
unþ1
ixðÞþ
^
knxðÞ=2
1þ2axxn
k
2ð3Þ
xnþ1
k¼
1
0x^
ukxðÞ
jj
2dx
1
0^
ukxðÞ
jj
2dxð4Þ
^
knþ1xðÞ¼
^
knxðÞþc^
fxðÞ
X
k
^
unþ1
ixðÞ
!
ð5Þ
where cdenotes the noise capacity, b
unþ1
ixðÞ,b
uixðÞ,
b
fxðÞ, and b
kxðÞare the Fourier transform values of
unþ1
ixðÞ,uixðÞ,fxðÞ, and kxðÞ, respectively. Figure 3
shows the process of VMD method, which contains
four main steps. The convergence is expressed as
Pkkb
unþ1
kb
un
kk2
2=kb
un
kk2
2\e, where nis the number of
iterations and eis the predetermined error.
Figure 5. Process of feature extraction in one-dimensional CNN.
W. Pan et al.
Component Prediction Echelon
After the data decomposition echelon, the iron
ore price data are decomposed into subsequences
with simpler frequency components. In fact, it is
difficult to extract the feature of subsequence data
because the number of subsequence data may be
different sizes and dimensions in the input and
output layers. Therefore, we propose an encoder–
decoder network to construct the component pre-
diction echelon. The structure of the component
prediction echelon contains two main modules (i.e.,
encoder and decoder networks; Fig. 4). The MCNN
and SLSTM are applied as an encoder–decoder
structure to extract the nonlinear features of each
sub-mode.
The encoder network structure is composed of
input, convolutional, pooling, flatten, and fully con-
nected layers. The input layer contains the subse-
quence modes of various frequencies from the VMD
method. Each sub-mode data is flattened, con-
nected, and reshaped before entering the decoder
network structure. The MCNN is the key compo-
nent of the encoder network structure, which ex-
tracts independent convolutional features in
parallel. The independent convolutional incorpo-
rates spatial information from the feature sub-mode
data with a low dimensionality compared to the
original data. The core of the MCNN model is the
convolutional layer, which performs convolution
operations (Yang et al., 2023a,2023b). The convo-
lution operation is a filtering process, and so the
convolution kernel is also called a filter . The input
feature is represented as x, which convolves over
different channels of the feature. The convolutional
layer performs a linear operation that involves the
multiplication of a set of weights. The output of the
convolution layer is called the feature map. The
convolutional layer Cis expressed as:
ð6Þ
where xirepresents the ith channel of input, rep-
resents the ith channel of convolution kernel, de-
notes convolution operation, rsig represents the
sigmoid activation function. Subsequently, the
pooling layer is used to compress feature data from
the convolutional layer. Then, the flatten layer can
be applied to remove redundant feature information
from the pooling layer. The fully connected layer
aggregates the spatial feature information to pro-
duce the variable-size data. The MCNNs apply
several filters to map features in parallel. Specifi-
Stacked
LSTM
Timeline
Input X
1
t = 1 t = 2
Fully
connected
layer
Input X
2
h
1
C
1
Long term
memory
Short term
memory
Input X
T
h
T
C
T
Attention mechanism layer
h
2
C
2
Fully
connected
layer
Figure 6. Structure of the decoder network.
Iron Ore Price Forecast
cally, the filter is applied to each overlapping part of
the input data. Every channel of the feature map has
a unique set of weights for the filter, which are
shared spatially. The fully connected layer vis
computed by the convolutional layer C, thus:
v¼rsig WCCþbC
ðÞ ð7Þ
where rsig represents the sigmoid activation func-
tion; WCrepresents the weight of the filter; denotes
convolution operation; and bCdenotes the threshold.
The one-dimensional CNN reads the input feature
and creates a thought vector vwhere it converts long
sequences into shorter sequences with vector rep-
resentation. The process of feature extraction (also
called feature sequence segment extraction) by the
filter (Fig. 5).
These one-dimensional CNNs are concerned
with only each input sequence segment separately.
Many one-dimensional CNN models can be stacked
together to understand long-term sequence seg-
ments. When a part of a feature is missing, these
one-dimensional CNN models are not very good at
generating the missing part in the sequence pro-
cessing. To identify long-term patterns and generate
a missing part of the feature, the RNNs are usually
applied to predict long-term sequence segments
through reading the current portion of the original
information. However, the RNN-based model suf-
fers from the vanishing gradients problem. Specifi-
cally, the RNN model suffers from exploding
gradients problem and fails to learn long-term
dependencies. The LSTM network is an improve-
ment over the RNN model (Hochreiter & Schmid-
huber, 1997).
The structure of the decoder network contains
five hidden layers (i.e., four LSTM layers and one
fully connected layer) between the fully connected
and AT layers (Fig. 6). The decoder network ex-
tracts the long-term and temporal features for the
next layer based on the long-term dependencies of
each feature row by row and the most recent output
from the preceding timestamp. The fully connected
layer feeds the input of the encoder network to the
decoder network at each time step. The multilayer
LSTM network is also called the SLSTM-based de-
coder network. Each hidden layerÕs neuron has a
weights array with the same size as the number of
neurons on the preceding LSTM layer. The last
hidden LSTM layer sends long-term and temporal
features to the fully connected layer. The fully
connected layer is a single neuron with a linear
activation function that connects the AT layer.
The structure of a single LSTM unit contains
three gates (i.e., forget, input, and output gates). The
decoder network has access to the original data
through the fully connected layer by creating the
encoder network when the encoder network misses
any information to capture. The SLSTM layers learn
the sequential data and produce a relationship of
lost information from the fully connected layer. The
LSTM layer overcomes the vanishing and exploding
gradients problem by introducing recurrent gates (or
called forget gates) (Yin et al., 2022). These LSTM
layers are capable of learning long-term dependen-
cies. Below, we provide an overview of a single
LSTM unit in the SLSTM layer.
A typical LSTM unit has three gates that reg-
ulate the flow of information into and out of the cell:
a forget gate; an input gate; and an output gate. The
forget gate decides which information the network
needs to retain or forget from the cellÕs state. The
input gate determines which new information the
network needs to record in the cellÕs state.
Depending on the state of the cell, the output gate
determines the output. Next, we briefly describe
forget gate Ft, input gate It;and output gate Otat
time step t. We use WF,WI,WO,WAdenote the
weight of the forget, input, output gate or cell state,
respectively. The parameter bF,bI,bO,bAto denote
the bias of the forget, input, output gate or cell state,
respectively.
The forget gate forgets the input data that
connects the previous node, because it does not need
to remember all feature information without unim-
portant. The forget gate takes the input Xt, the
hidden state ht1and outputs a value Ft, which
determines what information from the cell state Ct1
to retain or forget, thus:
Ft¼rsig WFXtþWFht1þWFvþbF
ðÞð8Þ
where rsig is the sigmoid function, vrepresents the
thought vector and is the element wise multipli-
cation operation. The sigmoid is defined as:
rsig ¼1
1þeXð9Þ
The input gate takes the input Xtand the hid-
den state ht1to compute Itin the cell state Ct, thus:
It¼rsig WIXtþWIht1þWIvþbI
ðÞð10Þ
W. Pan et al.
Then, the input gate takes the input Xtand the
hidden state ht1to compute the current date Atin
the cell state Ct(i.e., what current data will be kept
in the cell state), thus:
At¼rtanh WAXtþWAht1þWAvþbA
ðÞ
ð11Þ
where rtanh is the hyperbolic tangent function, which
is defined as:
rtanh ¼eXeX
eXþeXð12Þ
Then, the state of the cell Ctis updated as:
Ct¼FtCt1þItAtð13Þ
The output gate Othelps to determine what
portion of the cell state Ctwill be output, thus:
Ot¼rsig WOXtþWOht1þWOvþbO
ðÞ
ð14Þ
The output of the LSTM unit Htis computed
using the output gate Otand the cell state Ct, thus:
Ht¼Otrtanh Ct
ðÞ ð15Þ
Tandem Optimization Echelon
The tandem optimization echelon contains two
main modules (i.e., attention mechanism and recti-
fied Adam algorithm).
Attention Mechanism Layer
The AT has been established for natural lan-
guage processing sequence modeling in recent years
(Yin et al., 2023). The AT can choose to concentrate
our limited attention on the most crucial aspects of a
situation while disregarding less significant details
(Mahmoud et al., 2023). The AT allows rapid sifting
through vast amounts of historical input data. Then,
AT can identify information that is more important
for the forecast. Thus, we designed AT structure to
integrate the spatial–temporal features information.
At each time step t, the AT layer can scale the
Table 4. Pseudo-code of the RAdam algorithm
Input: step size for each ∈ , decay rate {
1
,
2
} to calculate moving average
and moving the 2nd moment stochastic objective function ( )
Output:
1. Initialize
0
,
0
←0,0
2.
∞
= 2 (1 ―
2
)―1
3. For to do
4. =∇ (
―1
)
5. =
1+1
+(1―
1
)
6. =
2 ―1
+(1―
2
)
2
7. =1―
1
8. =
∞
―
2
1―
2
9. If >4
, then
10. =1―
2
11. =
(―4)( ―2)
∞
(
∞
―4)(
∞
―4)
12. =
―1
―
13. Else
14. =
―1
―
15. Return ;
Iron Ore Price Forecast
previous hidden state and current state. Subse-
quently, the AT layer applies the state vector to
generate a non-normalized score through the cal-
culated weight. Finally, the AT layer normalizes the
generated scores. The AT layer generates weights to
represent relative significance of input features. In
general, the key-value pair can be used to represent
input features so that input features can be described
as a vector ðK;VÞ¼ 1;1
ðÞ;_
s;m;m
ðÞ;_
s;T;T
ðÞ½with
length T, where Kis applied to calculate the atten-
tion distribution mand Vis used to calculate the
aggregated information. The weighted sum output
AttðK;VðÞ;QÞis employed to obtain the attention
value, thus:
Att K;VðÞ;QðÞ¼
XT
m¼1amm ð16Þ
where Trepresents the capacity of key-value vec-
tors, and mis the result of the numerical transfor-
mation of attention mechanism weight of the mth
key-value vector, thus:
am¼exp sk
m;QðÞðÞ
PT
j¼1exp sk
j;Q
ð17Þ
where sðkm;QÞrepresents the attention score.
Rectified Adam Algorithm
To avoid getting trapped in local optima, a
rectified adaptive moment estimation (RAdam) (Liu
et al., 2019) training algorithm is adopted to improve
the accuracy and convergence of the proposed
METL model. When the amount of input data is
large and multicollinear, the training speed of the
encoder–decoder network in the proposed METL
model will sharply decrease by using the traditional
gradient descent algorithms, such as adaptive mo-
ment estimation (Adam) algorithm (Kingma & Ba,
2015). We describe the limitations of Adam and how
RAdam improves it as follows.
Adam is one of the most popular optimization
algorithms, and it improves upon the classical gra-
dient descent algorithm by combining gradient des-
cent with momentum. However, the Adam
algorithm may converge into local optima if a
warmup method is not used. Intuitively, the warmup
method uses a much lower learning rate during the
initial period of training, which helps to offset
excessive variance at the beginning. The RAdam
algorithm addresses this limitation of the Adam
algorithm by inserting a rectifier term that rectifies
the variance of the adaptive learning rate.
The pseudo-code of the RAdam algorithm is
shown in Table 4. The input parameter is the step
size at. The decay rates b1;b2are used to generate
moving average and moving 2nd moment and a
stochastic objective function. The output returns the
parameters ht. The RAdam algorithm initializes the
moving first moment m0and the moving second
moment v0to zero (line 1). The RAdam algorithm
computes q1, which refers to the maximum length
of the approximated simple moving average. The
exponential moving average was demonstrated to be
a close approximation of the simple moving average.
The RAdam algorithm then adaptively computes
the learning rate for each time step tas follows. The
RAdam algorithm updates the exponential moving
2nd moment vtand the exponential moving 1st
moment mt, as shown in lines 5 and 6, where gt
corresponds to the stochastic gradients for time step
t(line 4). It was noted by Kingma & Ba (2015) that
the estimators in the Adam algorithm are biased
toward zero when the averages are initialized with
zeros, especially when the decay rates are low.
Hence, the RAdam algorithm applies a bias cor-
rection technique by computing the bias-corrected
moving average b
mt(line 7), which is later used to
Table 5. Data used in our experiments
Indicators Units Data sources Time terms
Dataset A Iron ore price in DCFE CNY/ton Investing database Daily-based
Exchange rate (USD/RMB) Wind database Daily-based
Iron ore index Index Jan. 2005 = 100 Prospective database Daily-based
Dataset B Iron ore price in SGE $/ton Investing database Weekly-based
Exchange rate (USD/RMB) Wind database Weekly-based
Iron ore index Index Jan. 2005 = 100 Prospective database Weekly-based
W. Pan et al.
update the parameters. The RAdam algorithm also
computes qt, which is the length of the approximated
simple moving average.
Using the above-mentioned values, the RAdam
algorithm addresses the variance problem as follows.
If the variance is large (i.e., qt[4), the parameters
are updated by using adaptive momentum (lines 9 to
12). Specifically, the RAdam algorithm first com-
putes an adaptive learning rate lt(line 10) and
variance rectification term rt(line 11). These
warmup methods are used to update the parameters
(line 12). If the variance is small (i.e., the warmup
phase has passed), the parameters are updated with
unadopted momentum (without using the adaptive
learning rate and the rectification term).
Tandem Evaluation Echelon
The tandem evaluation echelon contains 10
comparison models including the CNN, MCNN,
LSTM, BiLSTM, SLSTM, CNN with SLSTM
(CNN–SLSTM), MCNN with SLSTM (MCNN–
SLSTM), MCNN with SLSTM and AT (MCNN–
SLSTM–AT), CNN with SLSTM and AT (CNN–
SLSTM–AT), and METL, which are detailed in the
next section. In addition, the tandem evaluation
echelon also contains five evaluation metrics
including the root mean square error (RMSE), the
mean absolute error (MAE), the mean absolute
percentage error (MAPE), the R-square (R
2
) and
the improvement rate (IR). These metrics are de-
fined, respectively, as:
RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
PN
i¼1yi^
yi
ðÞ
2
N
sð18Þ
MAE ¼PN
i¼1yi^
yi
jj
Nð19Þ
MAPE ¼1
NX
N
t¼1
yi^
yi
ðÞ
yi
100%ð20Þ
R2¼1PN
t¼1yi^
yi
ðÞ
2
PN
t¼1yiyi
ðÞ
2ð21Þ
where Nis the number of test sets, yiand b
yiare the
normalized values representing the actual and pre-
dicted of iron ore prices, respectively, for the ith test
case, and yirepresents the average of the actual
values. The IR is gained based on the evaluation
scores between the proposed METL model and the
10 comparison models. The IRs of RMSE, MAE,
MAPE, and R
2
can be expressed as:
Table 7. Correlations of datasets
Exchange rate Iron ore index
Dataset A (iron ore price) PearsonÕs r 0.101** 0.851*
KendallÕs r 0.121** 0.707*
Dataset B (iron ore price) PearsonÕs r 0.202** 0.873*
KendallÕs r 0.233** 0.752*
**
Correlation is significant at the 0.01 level (2-tailed)
*
Correlation is significant at the 0.05 level (2-tailed)
Table 6. Descriptive statistics of datasets
Data sources Minimum Maximum Mean Std. CV (%)
Dataset A Iron ore price 201 1315 637.79 201.07 31.52
Dataset B Iron ore price 38.65 219.77 93.18 36.72 39.41
Exchange rate Daily-based 6.019 7.337 6.625 0.310 4.680
Weekly-based 5.481 8.580 7.190 0.930 12.930
Iron ore Index Daily-based 41.8 225.6 89.5 34.1 38.1
Weekly-based 57.4 236.4 113.7 35.0 30.8
Iron Ore Price Forecast
IRmetric ¼metricpmetricb
metricb
100%ð22Þ
where metricprepresents the various indicators (i.e.,
RMSE, MAE, MAPE, and R
2
) of the proposed
METL model, while metricbrepresents the various
indicators (i.e., RMSE, MAE, MAPE, and R
2
)of
the 10 comparison models.
COMPUTATIONAL EXPERIMENTS
We present the experimental dataset in the next
first subsection. The parameter settings of the
experiment are established in the second subsec-
tion. We analyze the results in the third subsec-
tion. All models were implemented in PyCharm
using TensorFlow 1.15.0, and experiments were run
on a PC with the Intel (R) Core (TM) i7 CPU and 32
GB of RAM.
Dataset Description
Computational experiments were conducted
based on daily and weekly iron ore price datasets.
Dataset A records the iron ore price data of the
Dalian Commodity Futures Exchange (DCFE) from
January 2, 2014, to June 30, 2023, with exchange rate
and iron ore index. We chose the daily settlement
price of the continuous iron ore futures contract of
DCFE as the iron ore futures price because the
DCFE is the worldÕs largest market for iron ore fu-
tures trading volume. These data were divided into a
training set and a testing set. The first 1,848 days of
the price data were used as the training set, and the
prices quoted in the remaining 463 days were used as
the testing set.
Dataset B records the iron ore price data with
the exchange rate and iron ore index of the Singa-
pore Exchange (SGE) from 2014 to 2023. The SGE
is the birthplace of international iron ore derivatives.
Index pricing is the main pricing method in the
current international iron ore trade market. The iron
ore index CFR North China (62% iron fines) ac-
counts for the largest proportion of iron ore trade
contracts under index pricing. The CFR (Cost and
Freight) is a term in which the seller assumes
responsibility for the cost and freight of shipping to
the destination port. We chose the weekly-based
iron ore price data in SGE, which ranged from
January 2014 to June 2023. These obtained data
were divided into a training set and a testing set. The
first 348 weeks of price data were used as the
training set, and the prices quoted in the remaining
149 weeks were used as the testing set. Table 5de-
tails the data used in the experiments. The datasets
were collected from the Wind database (https://ww
w.wind.com.cn/), the Prospective database (https://d.
qianzhan.com/) and the Investing database (https://
www.investing.com/), respectively. Specifically, we
used two categories of factors in the input data: ex-
change rates of USD/RMB, and the iron ore index.
Table 6provides the descriptive statistics of
datasets, including total number of minimum, max-
imum and mean values, standard deviation (Std.),
and coefficient of variation (CV). Table 7shows the
correlations of the iron ore prices with the other
variables using the Pearson correlation coefficient
and the KendallÕs rank correlation coefficient. It is
observed that there exists statistically significant
correlations (negative or positive) between iron ore
prices and the other variables.
Table 8. Experimental parameters of the single-echelon and other categorized models
Models Layers Units Dropout Epoch Batches Learning rate
CNN 1 50 0.01 200 32 0.001
LSTM 1 50 0.01 200 32 0.001
BiLSTM 2 100 0.01 200 32 0.001
MCNN 2 100 0.01 200 32 0.001
SLSM 3 150 0.01 200 32 0.001
CNN–SLSTM 4 200 0.05 500 64 0.0005
MCNN–SLSTM 5 250 0.05 500 64 0.0005
CNN–SLSTM–AT 5 300 0.09 1000 128 0.00009
MCNN–SLSTM–AT 6 350 0.09 1000 128 0.00009
METL 7 350 0.09 1000 128 0.00009
W. Pan et al.
Hyperparameter Tuning
A series of single-echelon and other categorized
models was constructed to evaluate the cross-com-
parison performances. The single-echelon and other
categorized models, including LSTM, BiLSTM,
SLSTM, CNN, MCNN, CNN–SLSTM, MCNN–
SLSTM, MCNN–SLSTM–AT, CNN–SLSTM–AT,
METL. Among them, the LSTM, BiLSTM, and
SLSTM networks can reflect the impact of different
long-term and short-term memory layers on
improving prediction performance. This set of
experiments provides a basis for selecting the best
long-term and short-term memory layers for this
study. Subsequently, the comparison between
MCNN and CNN reflects how the MCNN affects the
feature extracted. These experiments provide a basis
for selecting the multi-head layer for this study.
Similarly, the comparison between CNN–SLSTM
and MCNN–SLSTM reflects how to further improve
the accuracy and convergence by the multilayer
model on the prediction performance. Then, the
Figure 7. Decomposition results for (a) Dataset A and (b) Dataset B.
Table 9. Comparative analysis of its single and categorized models under four metrics
Models Dataset A Dataset B
RMSE (%) MAE (%) MAPE (%) R
2
RMSE (%) MAE (%) MAPE (%) R
2
LSTM 11.283 8.255 5.891 0.917 11.341 8.462 5.937 0.919
CNN 10.951 7.973 5.779 0.918 10.737 7.821 5.691 0.922
BiLSTM 9.834 6.912 4.905 0.926 9.549 6.688 4.646 0.939
MCNN 9.616 6.731 4.698 0.929 9.441 6.517 4.551 0.941
SLSM 9.432 6.511 4.541 0.943 9.333 6.481 4.487 0.956
CNN–SLSTM 7.621 5.471 3.816 0.965 7.43 5.297 3.548 0.969
MCNN–SLSTM 6.394 4.968 3.161 0.969 6.521 5.071 3.261 0.971
CNN–SLSTM–AT 5.525 4.577 2.694 0.972 5.91 4.869 2.963 0.976
MCNN–SLSTM–AT 4.373 3.509 2.155 0.979 4.571 3.671 2.189 0.981
METL* 3.947 2.931 1.985 0.992 3.513 2.712 1.821 0.993
*
METL: VMD–MCNN–SLSTM–AT
Iron Ore Price Forecast
comparison between CNN–SLSTM–AT and
MCNN–SLSTM–AT proves how the AT layer af-
fects the prediction result. The comparison between
the MCNN–SLSTM–AT and the METL model
proves how the decomposition layer affects the
prediction result. Therefore, 10 comparison models
were designed to achieve the above cross-compar-
ison tasks: CNN, MCNN, LSTM, BiLSTM, SLSTM,
CNN–SLSTM, MCNN–SLSTM, MCNN–SLSTM–
AT, CNN–SLSTM–AT, METL model. The kernel
size of the CNN was set to 1. The kernel size of the
MCNN was set to 2. The parameter configuration
for each model experiment is shown in Table 8.
For the modal decomposition layer in the
METL model, the hyperparameter setting was set as
follows. The decomposition effect of the VMD
algorithm mainly depends on the K value (He et al.,
2019). If K is too small, some important features in
the original data may be ignored, resulting in
insufficient prediction accuracy. If K is too large, the
center frequencies of adjacent modal components
may be very close, resulting in modal repetition or
additional noise. It is difficult to evaluate the K va-
lue of modal components when decomposing the
original time series using VMD decomposition be-
cause of the noise. Therefore, the best K value was
found in this study using the average instantaneous
frequency approach. To avoid the use of future
information in the prediction process, the VMD
model was applied to break down the two types of
datasets into sub-modes of different frequencies.
When K was six, the average instantaneous fre-
quency for Dataset A declined less, indicating an
over decomposition. Therefore, the optimal number
of K values for Dataset A was 5. Similarly, the
optimal number of K values for Dataset B was 6.
Time series data decomposition was carried out after
parameter K was established. The K values of da-
tasets A and B were set to 5 and 6, respectively. The
decomposition results of the VMD algorithm for the
two datasets are shown in Figure 7, in which the
modal frequency goes from low to high. The first
IMF had the lowest frequency. The last IMF had the
highest frequency. Each subseries in the time series
represents a hidden oscillatory component. The
lowest frequency mode represents the relative long-
term trend of the data. The highest frequency mode
captures more sensitive short-term price fluctuations
in the data. The other VMD parameters, such as
penalty factor and convergence accuracy, were ini-
tialized to default values (Zhang et al., 2022). For
example, the value of moderate bandwidth was set
to 7000 and the control error parameter was set to
1e5.
All single-echelon and other categorized mod-
els were run 1000 times for reporting the average
results. For the RAdam algorithm, the parameter of
step size awas 100, the parameter of decay rates b1
was 0.99 and the parameter of decay rates b2was
0.999. For each epoch, the initial learning rate was
set to 0.001.
Figure 8. Comparison of improvement rates for 1-step prediction.
W. Pan et al.
Analysis of Research Outcomes
The analysis between the proposed METL
model with its single-echelon and other categorized
models is given here. Because the choice of time
step can directly affect the performance of the
model, if the time step is too short, the single-ech-
elon and other categorized models cannot thor-
oughly select the input features, and if the time step
is too long, the single-echelon and other categorized
models may lead to over-fitting. The 1-step-ahead is
a jargon defined as the input window predicting only
one future value at each time step. The multi-step-
ahead is defined as the input window predicting
multiple future values at each time step. We initially
analyzed the 1-step-ahead prediction results fol-
lowed by the analysis of the multi-step-ahead pre-
diction results.
1-Step-Ahead Results
The results of all single-echelon and other cat-
egorized models based on two types of datasets are
shown in Table 9. It was observed that the RMSE,
MAE, and MAPE of the proposed METL model on
Dataset A were 3.947%, 2.931%, and 1.985%,
respectively; the RMSE, MAE, and MAPE of the
proposed METL model on Dataset B were 3.513%,
2.712%, and 1.821%, respectively. The model index
of error was significantly lower than its single-eche-
lon and other categorized models. At the same time,
Table 10. Comparative analysis of models with multi-step-ahead prediction
Predicted hori-
zon
Model Dataset A Dataset B
RMSE (%) MAE (%) MAPE (%) R
2
RMSE (%) MAE (%) MAPE (%) R
2
4-step ahead LSTM 12.191 9.073 6.101 0.891 11.337 8.457 5.915 0.889
CNN 10.283 7.265 5.291 0.901 10.413 7.614 5.351 0.892
BiLSTM 9.683 6.819 4.796 0.926 9.481 6.568 4.552 0.929
MCNN 9.432 6.511 4.541 0.933 9.313 6.415 4.459 0.936
SLSM 8.351 5.619 3.986 0.951 8.681 5.841 4.101 0.959
CNN–SLSTM 6.525 5.126 3.262 0.955 6.432 4.997 3.191 0.965
MCNN–SLSTM 5.525 4.577 2.694 0.962 5.913 4.871 2.969 0.969
CNN–SLSTM–AT 5.167 4.116 2.431 0.967 5.368 4.213 2.517 0.973
MCNN–SLSTM–
AT
4.873 3.867 2.271 0.973 4.971 3.971 2.309 0.981
METL* 4.474 3.551 2.173 0.981 4.297 3.419 2.122 0.989
7-step ahead LSTM 14.834 10.905 7.596 0.875 14.373 10.724 7.337 0.873
CNN 14.591 10.863 7.417 0.891 14.163 10.661 7.259 0.889
BiLSTM 13.513 10.473 7.026 0.903 13.313 10.266 6.868 0.916
MCNN 13.467 10.287 6.929 0.911 13.158 10.038 6.761 0.912
SLSM 12.304 9.318 6.236 0.919 12.525 9.603 6.365 0.921
CNN–SLSTM 10.961 7.979 5.791 0.921 10.685 7.737 5.664 0.926
MCNN–SLSTM 9.257 6.357 4.352 0.929 9.112 6.223 4.293 0.932
CNN–SLSTM–AT 8.159 5.536 3.914 0.946 7.907 5.404 3.889 0.951
MCNN–SLSTM–
AT
7.861 5.396 3.878 0.961 6.613 5.103 3.316 0.975
METL* 5.921 4.876 3.062 0.977 5.575 4.591 2.763 0.981
10-step ahead LSTM 17.343 12.125 8.271 0.825 17.163 12.106 8.115 0.812
CNN 17.591 12.473 8.334 0.838 17.735 12.582 8.401 0.831
BiLSTM 15.483 11.312 7.895 0.864 15.525 11.403 7.931 0.853
MCNN 15.373 11.267 7.812 0.879 15.161 11.086 7.786 0.873
SLSM 13.261 10.201 6.774 0.898 13.358 10.293 6.874 0.901
CNN–SLSTM 11.461 8.516 5.958 0.895 11.681 8.734 5.991 0.892
MCNN–SLSTM 11.572 8.557 5.971 0.912 11.971 8.878 6.081 0.906
CNN–SLSTM–AT 9.528 6.581 4.614 0.921 9.857 6.954 4.914 0.918
MCNN–SLSTM–
AT
9.112 6.223 4.293 0.931 9.647 6.745 4.756 0.925
METL* 6.421 4.981 3.225 0.958 6.059 4.924 3.076 0.953
*
METL: VMD–MCNN–SLSTM–AT
Iron Ore Price Forecast
Figure 9. Comparison of improvement rates for the (a) 4-step prediction, (b) 7-step prediction, and (c) 10-step prediction.
W. Pan et al.
the proposed METL model had an R
2
of 0.992 on
Dataset A and R
2
of 0.993 on Dataset B, which were
the highest among its single-echelon and other cat-
egorized models. Table 9shows that our proposed
METL model outperformed the single-echelon and
other categorized models for four evaluation metrics
(RMSE, MSE MAE, and R
2
).
Figure 8shows how much improvement the
proposed METL model made in comparison to its
single and categorized models. It was validated that
the proposed METL model consistently outper-
formed its single-echelon and other categorized
models based on four evaluation metrics (RMSE,
MSE MAE, and R
2
). The 1-step-ahead prediction
results were not comprehensive enough to analyze
the performance of the model.
Multi-Step-Ahead Results
Accurate multi-step-ahead prediction is signifi-
cant in the iron ore price prediction model. The
multi-step-ahead results show the iron ore price
trend for the coming periods and thus help mining
and steelmaking enterprises to determine their sale
or purchase strategies. Table 10 summarizes the re-
sults of the multi-step-ahead prediction based on
four evaluation metrics (RMSE, MSE MAE, and
R
2
). The 4-step-ahead prediction showed that the
RMSE, MAE, and MAPE of the proposed METL
model on Dataset A were 4.474%, 3.551%, and
2.173%, respectively; the RMSE, MAE, and MAPE
of the proposed METL model on Dataset B were
4.297%, 3.419%, and 2.122%, respectively. The
METL model index of error was significantly lower
than the single-echelon and other categorized mod-
els in the 4-step-ahead prediction. The proposed
METL model had an R
2
of 0.981 on Dataset A and
R
2
of 0.989 on Dataset B in the 4-step-ahead pre-
diction. As the length of the prediction time step
increased, the performance of its single-echelon and
categorized models deteriorated gradually. The 7-
step-ahead prediction showed that the RMSE,
MAE, and MAPE of the proposed METL model on
Dataset A were 5.921%, 4.876%, and 3.062%,
respectively; the RMSE, MAE, and MAPE of the
proposed METL model on Dataset B were 5.575%,
4.591%, and 2.763%, respectively. The proposed
METL model had an R
2
of 0.977 on Dataset A and
R
2
of 0.981 on Dataset B in the 7-step-ahead pre-
diction. The performance of the proposed METL
model decreased gradually with increase in predic-
tion steps. The 10-step-ahead prediction showed that
the RMSE, MAE, and MAPE of the proposed
METL model on Dataset A were 6.421%, 4.981%,
and 3.225%, respectively; the RMSE, MAE, and
MAPE of the proposed METL model on Dataset B
were 6.059%, 4.924%, and 3.076%, respectively. The
proposed METL model had an R
2
of 0.958 on Da-
taset A and R
2
of 0.953 on Dataset B, which were
the highest among its single-echelon and other cat-
egorized models in the 10-step-ahead prediction.
Figure 9shows the improvement rates of the
proposed METL model in comparison to its single
and categorized models. The comparison of single-
echelon CNN with MCNN shows that the MCNN
model had lower RMSE, MAE, MAPE, and higher
R
2
on both datasets. It was observed that the per-
formance of the MCNN model was better than the
single-echelon CNN model in multi-step-ahead
prediction. Similarly, a comparison between a sin-
gle-echelon LSTM with BiLSTM and SLSTM
showed that the SLSTM model was superior to a
single-echelon LSTM model. Moreover, the com-
parison between CNN–SLSTM and MCNN–SLSTM
showed that the latter model had lower RMSE,
MAE, MAPE, and higher R
2
on both datasets.
Moreover, the comparison between CNN–SLSTM–
AT and MCNN–SLSTM–AT showed that the AT
layer can further improve the accuracy of CNN–
SLSTM and MCNN–SLSTM models. Figure 9a
shows that the proposed METL model improved the
prediction results of MCNN–SLSTM–AT by around
10%, 11%, and 9% in terms of RMSE, MAE, and
MAPE, respectively. Figure 9b displays that the
proposed METL model improved the prediction
results of MCNN–SLSTM–AT by around 18%,
11%, and 17% in terms of RMSE, MAE, and
MAPE, respectively. Figure 9c verifies that the
proposed METL model improved the prediction
results of MCNN–SLSTM–AT by around 26%,
19%, and 25% in terms of RMSE, MAE, and
MAPE, respectively. The proposed METL model
was significantly superior to the single-echelon and
other categorized models based on the four evalua-
tion metrics (RMSE, MAE, MAPE, and R
2
). This is
because the METL model combines the merits of
the single-echelon and other categorized models.
The proposed METL model not only has advantages
in facing the volatility of iron ore prices, but also
improves the forecasting accuracy of the single-
echelon and other categorized models.
Iron Ore Price Forecast
DISCUSSION
The METL model can employ multi-scale data
and multi-step prediction criteria effectively. The
multi-step prediction can evaluate the performance
of the METL model under different conditions. A 1-
step prediction may perform better for a specific
condition. However, its performance can be poor for
certain conditions. In other words, the performance
of a 1-step prediction is not robust. Therefore, multi-
step prediction was employed. This analogy can be
applied to monthly, quarterly, and annual iron ore
prices. Compared to single-echelon and other cate-
gorized models, the multi-echelon tandem learning
model has some potential improvements. The pro-
cess decomposes data into simple modes of various
frequencies to reduce the volatility characteristics of
input data. The ability of the feature extraction can
be clarified and enhanced by the decomposition
process. In addition, other component echelon
learning models (CNN, LSTM, AT) are integrated
to explore more multi-level deep features and cap-
ture more dependency relationships between simple
modes of various frequencies. This feature extrac-
tion of the simple modes effectively improves the
speed of sequenced data processing.
The limitation of the VMD method is its sen-
sitivity to noise and the sampling of multi-factors.
The parameters of the decomposition process can
directly affect the subsequent feature extraction and
prediction process. The prediction process requires a
large number of parameters, such as network
topology, initial values of weights, and thresholds.
Reducing the range of these parameters is essential
to speed up the prediction process. Compared with
other categorized (simplified) models, the draw-
backs of our prediction model lie in its excessive
dependence on the frequency characteristics of the
decomposed data, resulting in the prediction process
being much more sophisticated. Moreover, the
learning process of our prediction model involves
considerable computational complexity to make the
decomposed data more dependable and widely
available. The bias of the learning process can
accumulate as the number of prediction steps in-
creases. Consequently, the proposed prediction
model needs to be multi-echelon tandem to capture
the dependency relationships between variables. In
this study, the METL model was effectively applied
to iron ore price forecasting. The features of our
METL model make it a multi-scale model that is
applicable for predicting spot prices of other kinds
of commodities and analyze their futures markets.
CONCLUSIONS
In this paper, a novel multi-echelon tandem
learning (METL) model to forecast iron ore prices is
proposed. The developed METL model finds a
compromise solution using the merits of the econo-
metric, DL, and ensemble learning models and
adopts the divide-and-conquer data-feature-driven
decomposition techniques. The ability to extract
features can be clarified and enhanced by this
decomposed method. In addition, other component
echelon learning models (CNN, LSTM, AT) are
integrated to explore more multi-level deep features
and capture more dependency relationships between
variables. Due to the complexity and volatility of
iron ore price fluctuations, extracting deeper char-
acteristics is compulsory to provide accurate pre-
diction results for the practitioners in the iron ore
industry. Based on extensive computational experi-
ments, the proposed METL model can improve data
processing efficiency and meet the goal of providing
alternative solutions to iron ore price prediction
under various scenarios.
The high volatility of iron ore prices poses a
serious challenge for mining companies and policy-
makers. Depletion of mineral resources and climate
crisis continue to be major concerns for many coun-
tries today. Therefore, sustainable development and
green urbanization are priorities for economic deci-
sions. Steel is a core product for development. The life
cycle of the steel industry, from iron ore sourcing to
steel being wasted in the long term, should be moni-
tored, and environmentally compliant operations
should be encouraged. For example, recycling scrap
steel can help to build a circular economy for the
production of new steel and cast-iron products and
reduce supply risks created by the increasing demand
of downstream steelmaking sectors. The proposed
METL model can be extended to assist achieving a
balance between economic growth and environmen-
tal protection. Moreover, the practical applicability of
the proposed METL model can be enhanced by
incorporating more components and decomposition
approaches. The challenge of natural resource
depletion and environmental trends for mineral re-
sources in the wake of ongoing global decarboniza-
tion effects will be considered in future work.
W. Pan et al.
ACKNOWLEDGMENTS
This work is supported by the National Natural
Science Foundation of China under Grant No.
71871064.
DATA AVAILABILITY
The data that support the findings of this study
can be found online in [Appendix of IRON-OR-
E.xlsx] at the URL [https://www.kdocs.cn/latest?fro
m=docs].
DECLARATIONS
Conflict of Interests The authors declared that
they have no conflict of interests to this work.
REFERENCES
Alameer, Z., Elaziz, M. A., Ewees, A. A., Ye, H., & Jianhua, Z.
(2019). Forecasting copper prices using hybrid adaptive
neuro-fuzzy inference system and genetic algorithms. Natural
Resources Research, 28(4), 1385–1401.
Antwi, E., Gyamfi, E. N., Kyei, K. A., Gill, R., & Adam, A. M.
(2022). Modeling and forecasting commodity futures prices:
decomposition approach. IEEE Access, 10, 27484–27503.
Bai, Z., Liu, Q., & Liu, Y. (2022). Groundwater potential mapping
in Hubei region of china using machine learning, ensemble
learning, deep learning and AutoML methods. Natural Re-
sources Research, 31(5), 2549–2569.
Boubaker, S., Liu, Z., & Zhang, Y. (2022). Forecasting oil com-
modity spot price in a data-rich environment. Annals of
Operations Research.https://doi.org/10.1007/s10479-022-0500
4-8.
Chen, W., Lei, Y., & Jiang, Y. (2016). Influencing factors analysis
of ChinaÕs iron import price: based on quantile regression
model. Resources Policy, 48, 68–76.
Chen, Y., Xu, J., & Miao, J. (2023). Dynamic volatility contagion
across the Baltic dry index, iron ore price and crude oil price
under the COVID-19: a copula-VAR-BEKK-GARCH-X
approach. Resources Policy, 81(688), 103296.
Chen, Y., & Yang, S. (2021). Time-varying effect of international
iron ore price on ChinaÕs inflation: a complete price chain
with TVP-SVAR-SV model. Resources Policy, 73(121),
102200.
Das, S., Sahu, T. P., & Janghel, R. R. (2022). Oil and gold price
prediction using optimized fuzzy inference system based ex-
treme learning machine. Resources Policy, 79, 103109.
Deng, S., Zhu, Y., Duan, S., Yu, Y., Fu, Z., Liu, J., Yang, X., &
Liu, Z. (2023). High-frequency forecasting of the crude oil
futures price with multiple timeframe predictions fusion.
Expert Systems with Applications, 217, 119580.
Dong, H., Zheng, Y., & Li, N. (2023). Crude oil futures price
prediction by composite machine learning model. Annals of
Operations Research.https://doi.org/10.1007/s10479-023-0543
4-y.
Elaziz, M. A., Ewees, A. A., & Alameer, Z. (2020). Improving
adaptive neuro-fuzzy inference system based on a modified
Salp swarm algorithm using genetic algorithm to forecast
crude oil price. Natural Resources Research, 29(4), 2671–
2686.
Ewees, A. A., Elaziz, M. A., Alameer, Z., Ye, H., & Jianhua, Z.
(2020). Improving multilayer perceptron neural network
using chaotic grasshopper optimization algorithm to forecast
iron ore price volatility. Resources Policy, 65, 101555.
Figueiredo, M., & Saporito, Y. F. (2023). Forecasting the term
structure of commodities future prices using machine learn-
ing. Digital Finance, 5, 57–90.
He, F., Zhou, J., Feng, Z. K., Liu, G., & Yang, Y. (2019). A hybrid
short-term load forecasting model based on variational mode
decomposition and long short-term memory networks con-
sidering relevant factors with Bayesian optimization algo-
rithm. Applied Energy, 237, 103–116.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term
memory. Neural Computation, 9(8), 1735–1780. https://doi.
org/10.1162/neco.1997.9.8.1735.
Hu, Z. (2021). Crude oil price prediction using CEEMDAN and
LSTM-attention with news sentiment index. Oil and Gas
Science and Technology, 76, 28.
Huang, J., Liu, J., Zhang, H., & Guo, Y. (2020). Sustainable risk
analysis of ChinaÕs overseas investment in iron ore. Resources
Policy, 68, 101771.
Jabeur, S. B., Mefteh-Wali, S., & Viviani, J. L. (2021). Forecasting
gold price with the XGBoost algorithm and SHAP interac-
tion values. Annals of Operations Research, 334(1), 679–699.
Jnr, E. O., Ziggah, Y. Y., Rodrigues, M. J., & Relvas, S. (2022). A
new long-term photovoltaic power forecasting model based
on stacking generalization methodology. Natural Resources
Research, 31(3), 1265–1287.
Ke, H., Zuominyang, Z., Qiumei, L., & Yin, L. (2023). Predicting
Chinese commodity futures price: An EEMD-hurst-LSTM
hybrid approach. IEEE Access, 11, 14841–14858.
Khoshalan, H. A., Shakeri, J., Najmoddini, I., & Asadizadeh, M.
(2021). Forecasting copper price by application of robust
artificial intelligence techniques. Resources Policy, 73,
102239.
Kim, Y., Ghosh, A., Topal, E., & Chang, P. (2022). Relationship
of iron ore price with other major commodity prices. Mineral
Economics, 35(2), 295–307.
Kim, Y., Ghosh, A., Topal, E., & Chang, P. (2023). Performance
of different models in iron ore price prediction during the
time of commodity price spike. Resources Policy, 80, 103237.
Kingma, D. P., & Ba, J. L. (2015). Adam: A method for stochastic
optimization. In 3rd international conference on learning
representations (pp. 1–15). https://doi.org/10.48550/arXiv.141
2.6980.
Li, D., Moghaddam, M. R., Monjezi, M., Armaghani, D. J., &
Mehrdanesh, A. (2020). Development of a group method of
data handling technique to forecast iron ore price. Applied
Sciences, 10(7), 2364.
Li, F., Zhou, H., Liu, M., & Ding, L. (2023). A medium to long-
term multi-influencing factor copper price prediction method
based on CNN-LSTM. IEEE Access, 11, 69458–69473.
Li, Y., Wang, S., Wei, Y., & Zhu, Q. (2021). A new hybrid VMD-
ICSS-BiGRU approach for gold futures price forecasting and
algorithmic trading. IEEE Transactions on Computational
Social Systems, 8(6), 1357–1368.
Lin, Z. (2023). ChinaÕs steel makers issue dire profit warning.
Financial Review,https://www.afr.com/world/asia/china-s-ste
elmakers.
Lin, G., Lin, A., & Cao, J. (2021). Multidimensional KNN algo-
rithm based on EEMD and complexity measures in financial
time series forecasting. Expert Systems with Applications,
168, 114443.
Iron Ore Price Forecast
Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Han, J.
(2019). On the variance of the adaptive learning rate and
beyond. ArXiv 2019,arXiv:1908.03265.http://arxiv.org/abs/
1908.03265.
Lv, J., Tang, W., & Hosseinzadeh, H. (2022). Developed multiple-
layer perceptron neural network based on developed search
and rescue optimizer to predict iron ore price volatility: A
case study. ISA Transactions, 130, 420–432.
Ma, Y. (2021a). Do iron ore, scrap steel, carbon emission allow-
ance, and seaborne transportation prices drive steel price
fluctuations? Resources Policy, 72, 102115.
Ma, Y. (2021b). Dynamic spillovers and dependencies between
iron ore prices, industry bond yields, and steel prices. Re-
sources Policy, 74, 102430.
Mahmoud, A. S., Mohamed, S. A., El-Khoriby, R. A., AbdelSa-
lam, H. M., & El-Khodary, I. A. (2023). Oil spill identifica-
tion based on dual attention UNet model using synthetic
aperture radar images. Journal of the Indian Society of Re-
mote Sensing, 51(1), 121–133.
Nasir, J., Aamir, M., Haq, Z. U., Khan, S., Amin, M. Y., &
Naeem, M. (2023). A new approach for forecasting crude oil
prices based on stochastic and deterministic influences of
LMD using ARIMA and LSTM models. IEEE Access, 11,
14322–14339.
Ou, T. Y., Cheng, C. Y., Chen, P. J., & Perng, C. (2016). Dynamic
cost forecasting model based on extreme learning machine: A
case study in steel plant. Computers and Industrial Engi-
neering, 101, 544–553.
Sahoo, S., Singh, A., Biswas, S., & Sharma, S. P. (2021). 3D
Subsurface characterization of banded iron formation min-
eralization using large-scale gravity data: A case study in
parts of Bharatpur, Dausa and Karauli Districts of Rajasthan,
India. Natural Resources Research, 30(5), 3121–3138.
Sauvageau, M., & Kumral, M. (2017). Kalman filtering-based
approach for project valuation of an iron ore mining project
through spot price and long-term commitment contracts.
Natural Resources Research, 26(3), 303–317.
Su, C. W., Wang, K. H., Chang, H. L., & Dumitrescu-Peculea, A.
(2017). Do iron ore price bubbles occur? Resources Policy,
53, 340–346.
Sun, S., & Anwar, S. (2019). R&D activities and FDI in ChinaÕs
iron ore mining industry. Economic Analysis and Policy, 62,
47–56.
Tuo, J., & Zhang, F. (2020). Modelling the iron ore price index: A
new perspective from a hybrid data reconstructed EEMD-
GORU model. Journal of Management Science and Engi-
neering, 5(3), 212–225.
Wang, H., & Li, X. W. (2022). Research on iron ore price pre-
diction based on AdaBoost-SVR. In Proceedings—2022 11th
international conference of information and communication
technology, ICTech (pp 390–39).
Wang, J., & Sun, W. (2024). Decomposition of the site-level en-
ergy consumption and carbon dioxide emissions of the iron
and steel industry. Environmental Science and Pollution Re-
search.https://doi.org/10.1007/s11356-024-32162-y.
Wang, X., Lu, C., Shi, B., Chen, Y., Han, Z., & Nathwani, J.
(2022). Decomposition analysis, decoupling status, and future
trends of energy consumption in ChinaÕs iron and steel
industry. Environment, Development and Sustainability,
26(1), 885–908.
Wang, Z. X., Zhao, Y. F., & He, L. Y. (2020). Forecasting the
monthly iron ore import of China using a model combining
empirical mode decomposition, non-linear autoregressive
neural network, and autoregressive integrated moving aver-
age. Applied Soft Computing Journal, 94, 106475.
Wa
˚rell, L. (2018). An analysis of iron ore prices during the latest
commodity boom. Mineral Economics, 31(1–2), 203–216.
Wei, J., Ma, Z., Wang, A., Li, P., Sun, X., Yuan, X., Hao, H., &
Jia, H. (2022). Multiscale nonlinear Granger causality and
time-varying effect analysis of the relationship between iron
ore futures and spot prices. Resources Policy, 77(26), 102772.
Weng, F., Hou, M., Zhang, T., Yang, Y., Wang, Z., Sun, H., Zhu,
H., & Luo, J. (2018). Application of regularized extreme
learning machine based on BIC criterion and genetic algo-
rithm in iron ore price forecasting. In 3rd International con-
ference on modelling, simulation and applied mathematics,
(vol. 160, pp. 212–217).
Williams, J., Singh, J., Kumral, M., & Ramirez Ruiseco, J. (2021).
Exploring deep learning for dig-limit optimization in open-pit
mines. Natural Resources Research, 30(3), 2085–2101.
Xu, H. W., Qin, W., Sun, Y. N., Lv, Y. L., & Zhang, J. (2023).
Attention mechanism-based deep learning for heat load
prediction in blast furnace ironmaking process. Journal of
Intelligent Manufacturing, 35(3), 1207–1220.
Yang, S. F., Choi, S. W., & Lee, E. B. (2023). A prediction model
for spot LNG prices based on machine learning algorithms to
reduce fluctuation risks in purchasing prices. Energies,16(11).
Yang, N., Zhang, Z., Yang, J., & Hong, Z. (2023a). Mineralized-
anomaly identification based on convolutional sparse
autoencoder network and isolated forest. Natural Resources
Research, 32(1), 1–18.
Yin, B., Zuo, R., & Sun, S. (2023). Mineral prospectivity mapping
using deep self-attention model. Natural Resources Research,
32(1), 37–56.
Yin, B., Zuo, R., & Xiong, Y. (2022). Mineral prospectivity
mapping via gated recurrent unit model. Natural Resources
Research, 31(4), 2065–2079.
Yin, X., Liu, Q., Pan, Y., Huang, X., Wu, J., & Wang, X. (2021).
Strength of stacking technique of ensemble learning in
rockburst prediction with imbalanced data: Comparison of
eight single and ensemble models. Natural Resources Re-
search, 30(2), 1795–1815.
Zhang, Y., & Zhou, Z. (2021). Forecast on iron ore futures price
linked with day-of-the-week effect. In: Proceedings—2021
international conference on computer, blockchain and finan-
cial development, CBFD 2021, (pp. 260–264).
Zhang, C., Peng, T., & Nazir, M. S. (2022). A novel hybrid ap-
proach based on variational heteroscedastic Gaussian process
regression for multi-step ahead wind speed forecasting.
International Journal of Electrical Power and Energy Systems,
136, 107717.
Zhao, J., Hosseini, S., Chen, Q., & Jahed Armaghani, D. (2023).
Super learner ensemble model: A novel approach for pre-
dicting monthly copper price in future. Resources Policy, 85,
103903.
Zhu, X., Zheng, W., Zhang, H., & Guo, Y. (2019). Time-varying
international market power for the Chinese iron ore markets.
Resources Policy, 64, 101502.
Springer Nature or its licensor (e.g. a society or other partner)
holds exclusive rights to this article under a publishing agreement
with the author(s) or other rightsholder(s); author self-archiving
of the accepted manuscript version of this article is solely gov-
erned by the terms of such publishing agreement and applicable
law.
W. Pan et al.