ArticlePDF Available

DeepONet-Inspired Architecture for Efficient Financial Time Series Prediction

MDPI
Mathematics
Authors:

Abstract and Figures

Financial time series prediction is a fundamental problem in investment and risk management. Deep learning models, such as multilayer perceptrons, Convolutional Neural Networks (CNNs), and Long Short-Term Memory (LSTM), have been widely used in modeling time series data by incorporating historical information. Among them, LSTM has shown excellent performance in capturing long-term temporal dependencies in time-series data, owing to its enhanced internal memory mechanism. In spite of the success of these models, it is observed that in the presence of sharp changing points, these models fail to perform. To address this problem, we propose, in this article, an innovative financial time series prediction method inspired by the Deep Operator Network (DeepONet) architecture, which uses a combination of transformer architecture and a one-dimensional CNN network for processing feature-based information, followed by an LSTM based network for processing temporal information. It is therefore named the CNN–LSTM–Transformer (CLT) model. It not only incorporates external information to identify latent patterns within the financial data but also excels in capturing their temporal dynamics. The CLT model adapts to evolving market conditions by leveraging diverse deep-learning techniques. This dynamic adaptation of the CLT model plays a pivotal role in navigating abrupt changes in the financial markets. Furthermore, the CLT model improves the long-term prediction accuracy and stability compared with state-of-the-art existing deep learning models and also mitigates adverse effects of market volatility. The experimental results show the feasibility and superiority of the proposed CLT model in terms of prediction accuracy and robustness as compared to existing prediction models. Moreover, we posit that the innovation encapsulated in the proposed DeepONet-inspired CLT model also holds promise for applications beyond the confines of finance, such as remote sensing, data mining, natural language processing, and so on.
Content may be subject to copyright.
Citation: Ahmad, Z.; Bao, S.; Chen, M.
DeepONet-Inspired Architecture for
Efficient Financial Time Series
Prediction. Mathematics 2024,12, 3950.
https://doi.org/10.3390/
math12243950
Academic Editor: Antonella Basso
Received: 12 October 2024
Revised: 28 November 2024
Accepted: 2 December 2024
Published: 16 December 2024
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
mathematics
Article
DeepONet-Inspired Architecture for Efficient Financial Time
Series Prediction
Zeeshan Ahmad 1, Shudi Bao 2,* and Meng Chen 1
1School of Cyber Science and Engineering, Ningbo University of Technology, Ningbo 315211, China;
azee@nbut.edu.cn (Z.A.); cm@nbut.edu.cn (M.C.)
2Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo 315201, China
*Correspondence: sdbao@idt.eitech.edu.cn
Abstract: Financial time series prediction is a fundamental problem in investment and risk man-
agement. Deep learning models, such as multilayer perceptrons, Convolutional Neural Networks
(CNNs), and Long Short-Term Memory (LSTM), have been widely used in modeling time series data
by incorporating historical information. Among them, LSTM has shown excellent performance in
capturing long-term temporal dependencies in time-series data, owing to its enhanced internal mem-
ory mechanism. In spite of the success of these models, it is observed that in the presence of sharp
changing points, these models fail to perform. To address this problem, we propose, in this article, an
innovative financial time series prediction method inspired by the Deep Operator Network (Deep-
ONet) architecture, which uses a combination of transformer architecture and a one-dimensional
CNN network for processing feature-based information, followed by an LSTM based network for
processing temporal information. It is therefore named the CNN–LSTM–Transformer (CLT) model. It
not only incorporates external information to identify latent patterns within the financial data but also
excels in capturing their temporal dynamics. The CLT model adapts to evolving market conditions
by leveraging diverse deep-learning techniques. This dynamic adaptation of the CLT model plays
a pivotal role in navigating abrupt changes in the financial markets. Furthermore, the CLT model
improves the long-term prediction accuracy and stability compared with state-of-the-art existing
deep learning models and also mitigates adverse effects of market volatility. The experimental results
show the feasibility and superiority of the proposed CLT model in terms of prediction accuracy
and robustness as compared to existing prediction models. Moreover, we posit that the innovation
encapsulated in the proposed DeepONet-inspired CLT model also holds promise for applications
beyond the confines of finance, such as remote sensing, data mining, natural language processing,
and so on.
Keywords: deep operator networks; financial time series prediction; LSTM; neural networks; stock
price prediction; transformers
MSC: 68T07; 91B84; 62-04
1. Introduction
The prediction of financial time series, such as in stock prices and digital assets, has
been intensively studied in the financial literature for a long period [
1
]. In the past few
years, a number of time series prediction models have been proposed in the financial
literature, ranging from conventional statistical learning techniques to advanced machine
learning algorithms and cutting-edge deep learning models [
2
4
]. However, these models
face severe challenges since the nature of financial time series data is characterized by high
dimensionality, and the price or trend fluctuation of financial assets is usually non-linear
and non-stationary. At present, the fusion of big data analytics and deep learning has revo-
lutionized time series prediction tasks, leading to the development of more sophisticated
and accurate prediction models [4,5].
Mathematics 2024,12, 3950. https://doi.org/10.3390/math12243950 https://www.mdpi.com/journal/mathematics
Mathematics 2024,12, 3950 2 of 27
The stock market, alternatively referred to as a share or an equity market, is a public
entity for the sale and purchase of stocks, equities, bonds, securities, and their deriva-
tives [
1
3
]. It can play a substantial role in any economy, particularly in the current epoch
of globalization and liberalization, thus stimulating economic growth, encouraging in-
vestment, and boosting wealth creation over the years. Meanwhile, cryptocurrencies are
experiencing explosive growth in the global financial markets [
6
]. Initially used for elec-
tronic transactions, cryptocurrencies have emerged as attractive investment assets, drawing
significant interest from traders and investors in recent years. Although cryptocurrencies
and stocks share certain characteristics, they are fundamentally different. Stocks are gener-
ally considered more stable, while cryptocurrencies are highly volatile due to speculative
trading and investor sentiment. Like any other financial institution, stock market and crypto
market are also affected by a diverse set of factors that play a significant role in defining
their dynamics [
5
,
6
]. For example, one consequential factor that affects financial markets
the most is exchange-rate volatility. The more volatile a currency is, the greater it affects
the financial market. The existence of economic cycles is yet another predominant factor; it
causes cyclical changes in the performance of different sectors and fields, which ultimately
affects the performance of the financial markets [
7
]. Politics and investors’ sentiments also
greatly impact the volatility and variability of the markets and their performance. These
variations happen across the globe, where regions and countries have different market
dynamics as dictated by economic factors, political (in)stability, and currency fluctuations,
among others [
8
]. Traditional methods used to predict financial markets, e.g., stock prices,
can be classified into two categories: fundamental and technical analysis. The former cate-
gory makes predictions by analyzing various economic, financial, and qualitative factors,
whereas the techniques in the latter category are focused on historical market data, such as
stock prices and trading volumes, to predict future movements [9].
Nowadays, financial time series prediction is an important research topic at the inter-
section of finance, data science, and artificial intelligence [
10
]. The key concept behind
financial time series prediction (e.g., the future movements of stocks, indices, markets, and
digital assets) is to extract useful patterns and insights from complex financial data [
11
].
While machine learning algorithms have shown remarkable success in financial time series
prediction, they are outperformed by cutting-edge deep learning models. Numerous deep
learning models, such as Multi-Layer Perceptrons (MLPs), Convolutional Neural Networks
(CNNs), Recurrent Neural Networks (RNNs), and Long Short-term Memory Neural Net-
works (LSTMs), have demonstrated exceptional prediction performance, compared to
traditional machine learning algorithms [
1
3
]. However, given the growing complexity
of the financial markets, conventional deep learning models are limited in their ability
to capture the inherent multiscale dynamics of financial data. For example, the presence
of sharp change points in financial time series data due to noise contamination severely
degrades the prediction performance of these models [
12
]. Therefore, with the rapid ad-
vancement in deep learning technology, more complex models are also being tried for
financial time series prediction, including transformers [
13
], Generative Adversarial Net-
works (GANs) [
14
], Reinforcement Learning (RL) [
15
], and so on. Meanwhile, to overcome
deficiencies associated with individual models, a series of hybrid models have also been
implemented to enhance the prediction accuracy of financial time series [
3
,
5
]. Additionally,
several critical research questions stand at the forefront of investigation in stock prediction
research. The first question revolves around the development of robust methods aimed at
mitigating noise in the stock market data. Noise reduction is fundamental for enhancing
the accuracy of price movement analysis, a challenge that continues to perplex researchers
and practitioners alike. Secondly, there is a pressing need to decipher whether fundamental
differences exist in the factors, which influence short-term and long-term stock market
predictions. To address this, a pivotal objective is to tailor prediction models, which cater
to the unique dynamics of each scenario, ensuring precise and context-aware forecasts.
In the ever-evolving financial landscape, adapting to the dynamic nature of markets is
a constant challenge. Thus, the third research question pertains to the adaptability of
Mathematics 2024,12, 3950 3 of 27
prediction models in an environment that is characterized by rapid changes due to external
events and market sentiment fluctuations. Addressing these questions holds the potential
to advance the field and can yield refined models and strategies for more accurate and
dynamic stock market analysis.
The aim of this study is to understand the core of financial time series prediction by
exploring the data sources and features that power it, the prediction models applied, and
the evaluation metrics used to assess their accuracy. In particular, we have proposed a novel
financial time series prediction model inspired by the Deep Operator Network (DeepONet)
architecture [
16
]. The proposed model is designed to efficiently discern intricate patterns
in financial datasets. To enhance the model’s capacity for feature extraction and temporal
analysis, we have innovatively split it into two distinct networks in a similar fashion to
the DeepONet architecture: the branch net and the trunk net. The former is responsible
for encoding feature information, whereas the latter is tasked with modeling the temporal
aspects. This architectural division allows for optimized and focused data pre-processing,
which improves predictive capabilities. In particular, we have incorporated the transformer
model [
13
], which is followed by a one-dimensional Convolutional Neural Network (1D
CNN) in the branch net, to harness the power of attention mechanisms for feature im-
portance, whereas the temporal information is skillfully handled by the Long Short-Term
Memory (LSTM) network in the trunk net [
17
]. The proposed model is aptly named as
CNN–LSTM–Transformer (CLT). The integration of these cutting-edge frameworks yields
a highly efficient and adaptable model capable of addressing the complexity inherent in
financial datasets, thereby enhancing the predictive accuracy and robustness of tradeable
instruments such as stocks and cryptocurrencies.
The rest of this article is organized as follows. Section 2briefly reviews related work.
Section 3introduces the theoretical foundations required in this work. In Section 4, we
present the proposed CLT model architecture and the evaluation metrics employed for
training the model. Section 5presents the experimental setup used to perform the modeling
and analysis. Section 6evaluates the prediction performance of the proposed CLT model.
Finally, Section 7concludes this article.
2. Related Work
The prediction of financial trends, such as the stock market, forex, stock, and cryp-
tocurrency prices, has undergone tremendous development in recent years. This section
provides an overview of existing works related to financial time series prediction.
Statistical Models: In early studies, a number of statistical models, such as Autoregres-
sive (AR), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving
Average (ARIMA), exponential smoothing, regression models, and so on, were developed
to predict daily stock returns and prices [
18
]. As such, these linear models consider univari-
ate or stationary time series data as their input, posing certain limitations in identifying
the inter-dependencies among the various stocks in the complex and noisy real time series
data [19].
Machine Learning: With the emergence of machine learning models, they were also
utilized in the prediction of financial time series due to their ability to deal with nonlinear,
non-stationary, and high-dimensional data effectively. The most prominent techniques
in this category include Support Vector Machines (SVMs), decision trees, random forests,
K-means, and others [
20
]. However, these shallow models are not solely sufficient to model
highly correlated data with complex structures, and their generalization ability is also poor.
Therefore, various hybrid models have been proposed to enhance the forecasting accuracy
of financial time series data [
21
]. For example, various combinations of linear and nonlinear
dynamics models with machine learning models, such as ARIMA with Artificial Neural
Network (ANN), ARIMA with SVM, and SVM with General Autoregressive Conditional
Heteroscedastic (GARCH), were proposed to enhance the forecasting accuracy of financial
time series data [
22
]. Although these models have shown tremendous success in short-term
prediction, they are not able to achieve long-term forecasting. Additionally, the problem is
Mathematics 2024,12, 3950 4 of 27
more challenging when the subsequent model fails to accurately predict the residuals of
the prior model [
3
]. In this case, the overall prediction performance of the hybrid model is
severely degraded.
Deep Learning: In the last decade, the available computing power, easier collection of
data, and larger datasets have enabled the application of deep learning for key problems in
big data analytics [
23
]. Over the years, several state-of-the-art network structures, such as
MLPs, CNNs, RNNs, LSTM, and Gated Recurrent Units (GRUs), and their variations have
been incorporated to learn more complex abstractions in financial time series data. In [
24
],
the performances of various deep learning models are compared with regard to leading
stock markets. Although these models outperform the traditional machine learning and
statistical models, they still suffer one or more problems, such as information loss in pooling
layers of a CNN, gradient vanishing/explosion, local extremum problem, susceptibility
to overfitting, and so on [
9
]. In the past few years, numerous advanced deep learning
models—such
as GANs, Graph Convolutional Networks (GCNs), and transformers—with
enhanced learning abilities, have revolutionized many fields. Transformers, which leverage
the self-attention mechanism, excel in handling sequential data [
13
]. Initially designed
for natural language processing problems, it has become a hot topic across various do-
mains, and there are plenty of existing works in time series prediction utilizing the trans-
former model. For instance, Wang et al. [
9
] harnessed the power of a transformer and its
self-attention mechanism for time series forecasting, conducting extensive back-testing
experiments on global stock market indices. It has been shown to achieve better prediction
performance than RNNs and LSTMs. Zhang et al. [
25
] introduced the TEANet framework,
a novel Transformer Encoder-based Attention Network, which adeptly leverages small-
sample feature engineering to capture temporal dependence within financial data. This
framework, driven by transformers and multiple attention mechanisms, demonstrates
remarkable efficacy in feature extraction and predictive accuracy. A novel hierarchical,
transformer, multi-task learning (HTML) architecture is designed in [
26
] to predict volatility
in the prices of financial assets. It harnesses the text and audio data obtained from quarterly
earnings conference calls. The HTML model significantly improves prediction accuracy by
17–49% compared to state-of-the-art models. In [
27
], a novel method named AE-ACG is
proposed for stock price movement prediction, where a CNN-GRU block is embedded into
an autoencoder based on an attention mechanism and skip connection.
Reinforcement Learning: Reinforcement learning has recently become prevalent in
time series prediction tasks. Different from supervised and unsupervised learning models,
in RL, an agent can learn from its actions through trial and error to find which actions
can produce the best results. In a pioneering work, Zarkias et al. [
15
] proposed a novel
price trailing method by considering trading as a control problem. Using RL, robust
agents were trained to trail asset prices, thus providing a robust solution adaptable to
various applications by controlling the agent’s sensitivity to price fluctuations. Similarly,
Sathya et al. [28]
harnessed the power of RL and LSTM in their proposed model to enhance
the stock price prediction accuracy, especially in competitive marketing scenarios with
partially observable states. This approach extends the model’s capability to assess stability
and trust associated with stocks by incorporating sentiment analysis. He [
29
] delved into
forecasting stock prices, leveraging deep RL and incorporating the Relative Strength Index
(RSI) trend indicator as a reward function, which effectively reduced trade risk. Shahbazi
and Byun [
30
] proposed a machine learning-based approach fortified by blockchain and RL,
focusing on Litecoin and Monero cryptocurrencies, demonstrating superior price prediction
performance. Dang [
31
] explored the usage of Deep Q-Network for stock trading, revealing
the efficacy of RL techniques, particularly in generating profitable strategies with limited
data. Li et al. [32] implemented a novel deep RL approach for stock transaction strategies,
demonstrating its practicality and outstanding performance compared to traditional trans-
action methods. However, RL-based financial time series prediction methods face several
challenges, such as optimal reward structure, high computational demands, limited agent’s
exploration environment, and so on.
Mathematics 2024,12, 3950 5 of 27
Hybrid Models: Existing standalone deep learning models often strive to capture the
high volatility present in the complex financial time series data. To handle the complexities
of highly volatile financial time series data, researchers have proposed various hybrid
models by combining diverse techniques, such as CNN-LSTM, RNN-CNN, ARIMA-CNN-
LSTM, Autoencoder-LSTM, and CNN-BiLSTM, substantially improving the prediction
accuracy and the overall generalization ability of the model [
33
]. Hybrid models have
elicited a significant amount of interest due to their significant potential to enhance the
predictive performance of the models. Zaheer et al. [
34
] conducted a comprehensive
comparative study exploring various combinations of LSTM, CNN, and RNN for time
series forecasting in financial data. Their findings unveiled intriguing results, showcasing
the superiority of certain models over others. More recently, transformers have been
increasingly utilized in financial time series predictions due to their robust capabilities in
modeling complex and long sequences. While CNNs are good at capturing local contextual
information, they cannot learn long-term dependencies due to their limited receptive field.
Transformers, on the other hand, focus on global information and are good at modeling long-
range information relationships. Therefore, researchers have begun to explore combining
these two models to achieve better performance across various applications, including
time series prediction and vision tasks. Zeng et al. [
35
] proposed a novel method called
CNN and transformer-based time series modeling (CTTS) for the prediction of stock prices.
Additionally, different combinations of transformers with other existing deep learning
models have been presented and analyzed. For example, Li and Qian [
36
] developed the
innovative FDG-Transformer, a hybrid neural network that blends GRU, LSTM, and multi-
head attention transformers. This approach captures temporal information within cluttered
data, resulting in a deeper and fine-grained time series model. Wang [
37
] proposed a
comprehensive model, namely BiLSTM-MTRAN-TCN, which combines the strengths of
BiLSTM, Transformer, and Temporary Revolution Network (TCN) to enhance the model’s
ability to capture both long-range dependencies and bidirectional information in sequences,
showcasing improved performance. Haryono et al. [
38
] present a novel approach for
quantifying news sentiments utilized in stock price forecasting using a Transformer encoder
GRU (TEGRU) architecture. Lai et al. [
39
] employed a Differential Transformer Neural
Network (DTNN) to extract information from noisy high-frequency Limit Order Books
(LOBs) time series data, resulting in an improved prediction accuracy of stock prices.
Nevertheless, there is still a lot of room for further optimization to develop more innovative
and efficacious prediction models.
3. Preliminaries
This section provides a brief overview of some relevant preliminary works. We begin
with a comprehensive understanding of the price-changing framework and then introduce
several deep learning models, including CNNs, LSTM, and transformers, which are used
in the model comparisons; these are also relevant to the proposed CLT model.
3.1. Stochastic Stock Price Modeling
In the realm of stock market dynamics, understanding the stock price time series
is paramount for investors and financial analysts. The price variations of a stock can be
viewed as a stochastic process. Geometric Brownian Motion (GBM) is the most commonly
adopted model to formulate the stochastic behavior of stock prices [
40
]. In our modeling,
since we do not have features like the expected dividend of the next period and dividend
growth rate, we have not included the cash dividend, stock dividend, or special dividends
in our modeling framework.
Let
P(t)
denotes the price of a stock at time
t
. The GBM model for a typical stock price
movement is expressed by the following Stochastic Differential Equation (SDE) [41]
dP(t) = µP(t)dt +σP(t)dW(t)(1)
Mathematics 2024,12, 3950 6 of 27
where
dP(t)
is the change in the stock price,
µ
and
σ
respectively denote the drift and
volatility parameters,
dt
represents the differential of time, and
dW(t)
is a Wiener process
(Brownian motion) representing the random noise or shocks in the market. Equation (1) de-
scribes the expected evolution of the stock price over time, incorporating both deterministic
and stochastic components. The term
µP(t)dt
represents deterministic growth, whereas
σP(t)W(t)
accounts for random fluctuations. The logarithm of the stock price often follows
a Brownian motion, offering insights into the statistical properties of price change.
Alternatively, the SDE in Equation (1) can be expressed in a discretized form as [42]
P(t)
P(t)=µt+σϵ(t)t(2)
where
P(t)
is the change in price over a short time increment
t
, and
ϵ(t)
is a random vari-
able sampled from a normal distribution with zero mean and unit variance, i.e.,
ϵ N(
0, 1
)
.
According to the GBM model, the relative price change,
P(t)
P(t)
, follows a normal distribution.
The general solution to the SDE in Equation (1) is given by Itô’s lemma as [43]
P(t) = P0exp((µ1
2σ2)t+σW(t)) (3)
where
P0=P(
0
)
is the initial price. Equation (3) is used to predict the stock price at a
certain time.
The GBM model exhibits two main features: Non-negativity, i.e.,
P(t)>
0 for all
t[
0,
T]
, and Log-Normal returns, that is, the relative price change or returns are pro-
portional to the current price, making it popular for stock price prediction. It has been
widely applied in financial analysis for predicting stock prices, risk management, and to
gain insights into the statistical properties of financial time series.
3.2. 1D CNN
Conventional deep CNNs, often referred to as 2D CNNs, are specifically built to
handle spatial (2D) data such as images and videos [
44
]. They are, however, not a viable
option for time-series (1D) data [45]. In 2016, Kiranyaz et al. [46] proposed a compact and
adaptive design of 1D CNNs to process and analyze 1D sequential data. Different from 2D
CNNs, 1D CNNs are tailored for tasks such as time series analysis, audio processing, and
natural language processing [47].
Similar to deep CNNs, the key components in a 1D CNN also include several con-
volutional layers (1D), pooling layers, and fully connected layers. In the following, we
introduce the working mechanism of each layer in detail.
The convolutional layer is a key component of a 1D CNN architecture that retrieves
features from the input data by applying convolution operation and activation functions.
The convolution operation is a linear process that involves applying several kernels/filters
of length
k
, denoted as
wRk
, to the input sequence
xRp
, where
kp
. This operation
can be defined as follows:
xl
j=h(
iMj
xl1
iwl
ij +bl
j)(4)
where xl
jand xl1
irespectively denote the j-th output feature map of layer land the input
features of layer
l
1,
wl
ij
is defined as the weights of the
j
-th convolution kernel and input
features,
bl
j
is the bias term associated with the
j
-th convolution kernel,
Mj
denotes the
number of input feature maps,
represents the convolution operation, and
h(·)
denotes
the activation function.
Next, the pooling operation is used to downsample the spatial dimensions of the
feature maps and reduce the number of parameters in the network. Two commonly used
pooling types are max-pooling and average-pooling. The former computes the average
Mathematics 2024,12, 3950 7 of 27
value for each pool on the feature map, while the latter retains the maximum value from
each pool of the feature map.
The output of the convolutional and pooling layers is then flattened and passed
through one or more fully connected layers, also known as dense layers, where each input
is connected to every output through learnable weights. For a multiclassification problem,
the last fully connected layer employs a soft-max activation function, whose output is the
final prediction.
Finally, the network is trained by optimizing a loss function using backpropagation [
48
]
and gradient descent [49].
3.3. LSTM
LSTM [
50
] is a special type of an RNN model designed to capture long-range depen-
dencies in sequential/time-series data and prevent vanishing gradients. Unlike standard
RNNs [
51
], LSTMs have specialized mechanisms to remember and forget information over
extended sequences. A basic LSTM node is composed of a memory cell and three gates
that regulate the in/out flow of information. The forget gate determines which part of the
information to retain or discard. Then the input gate decides the information to be updated
in the cell state, and an output gate yields the relevant information to the next time step.
Let
xt
be the input vector to the LSTM node at any given time
t
,
Wb
,
Wi
,
Wf
,
Wo
, and
Rb
,
Ri
,
Rf
,
Ro
denote the input and recurrent weight matrices associated with respective
gates, and
bb
,
bi
,
bf
,
bo
are the corresponding bias diagonal matrices. The output of an
LSTM node is computed as
zt=g(Wbxt+Rbyt1+bz)(5)
it=f(Wixt+Riyt1+bi)(6)
ft=f(Wfxt+Rfyt1+bf)(7)
ct=ztit+ct1ft(8)
ot=f(Woxt+Royt1+bo)(9)
yt=g(ct)ot(10)
where
zt
,
it
,
ft
, and
ot
are the outputs of the block input gate, input gate, forget gate,
and output gate at time
t
,
ct
is the cell state, and
yt
is the final output (hidden state).
denotes the Hadamard product and
f(·)
and
g(·)
represent the sigmoid and
tanh
activation
functions, respectively.
3.4. Transformers
Transformers are a powerful class of deep learning models that have revolutionized
Natural Language Processing (NLP) and other sequence-to-sequence tasks. Introduced
by Vaswani et al. [
13
] in 2017, transformers have a unique architecture built on the self-
attention mechanism. Their success is attributed to their ability to capture long-range
dependencies in data, making them highly effective for various applications beyond NLP,
including image processing, speech recognition, and recommendation systems.
Transformers rely on the self-attention mechanism, e.g., additive attention and dot-
product attention, to capture the interaction among all the embeddings. Given an input
sequence
X
of length
N
, self-attention (scaled dot-product) computes the attention scores as
A=softmax(QKT
dk
)V(11)
where
1
dk
denotes the scaling factor, and
Q
,
K
, and
V
are the queries, keys, and values
associated with input elements for the attention computation, respectively.
Mathematics 2024,12, 3950 8 of 27
Transformers also utilize multi-head attention, which applies self-attention with dif-
ferent sets of parameters and concatenates the results:
MultiHead(X) = Concat(head1, . . . , head h)WO(12)
where headi=Attention(QWQ
i,KWK
i,VWiV).
Furthermore, to account for positions in the input sequence, positional encodings are
injected as an additional input as
PE(pos,2i)=sin(pos
100002i/d),
PE(pos,2i+1)=cos(pos
100002i/d)
(13)
where
pos
and
i
denote the position and the dimension, respectively. These encodings
enable the model to capture positional information.
The combination of self-attention, multi-head attention, and positional encodings
forms the mathematical foundation of transformers, enabling them to excel in various
sequence-based tasks.
4. Proposed Model
This section proposes a novel CLT model inspired by the DeepONet architecture
for stock price prediction. The prediction of stock prices is abstracted into three stages,
namely data analysis and preprocessing, prediction model building, and experimental
result prediction and evaluation, as shown in Figure 1.
Figure 1. Analysis flowchart of the stock price prediction.
At the core of our proposed model lies the structure of the DeepONet, a novel operator
learning architecture made up of two subnetworks: the branch network and the trunk
network, which are trained simultaneously [16]. The former encodes the input function u
discretized at
m
fixed points
xim
i=1
, i.e.,
u= [u(x1)
,
u(x2)
, ...,
u(xm)]
, as input, and outputs a
vector of features
b= [b1
,
b2
, ...,
bp]Rq
, whereas the latter process the location variables
yRn
and outputs a features embedding
t= [t1
,
t2
, ...,
tp]TRq
. The DeepONet output is
then obtained by merging the outputs of two subnetworks via an inner product as follows
G(u)(y) =
p
k=1
bk·tk+b0
=
p
k=1
bk(u(x1),u(x2), ..., u(xm))
| {z }
Branch
·tk(y)
|{z}
Trunk
+b0
(14)
where the model
G
has a set of weights and biases, which are trainable parameters. Addi-
tionally, b0is the optional bias term to reduce the generalization error.
Generally, the dimension of
y
does not match the dimension of
u(xi)
in high-dimensional
problems. Therefore, the essence of the DeepONet structure is to segregate the network
into two subnetworks to process both inputs separately. More importantly, DeepONet
is a flexible network without any specific architectures of its trunk and branch networks.
Mathematics 2024,12, 3950 9 of 27
Lu et al. [16]
consider fully connected neural networks. Thus, we can employ any suitable
deep neural network, such as a CNN or RNN, in the subnetworks of DeepONet. Figure 2
illustrates the standard DeepONet architecture.
Figure 2. Schematic diagram of the DeepONet.
Replicating the structure of DeepONets, the proposed CLT model is composed of two
distinct modules, the feature processing unit (branch net) and the spatiotemporal data
processing unit (trunk net), as shown in Figure 3. Each module is designed to handle
different aspects of the data processing and prediction tasks, contributing to the overall
functionality and performance of the model. The branch net includes a transformer model
and a convolutional feature extractor, i.e., a 1D CNN, whereas the trunk net incorporates a
time series processing module, i.e., an LSTM network.
Figure 3. Proposed DeepONet-inspired architecture of the CLT model.
4.1. Branch Network
The branch net is responsible for extracting rich features from the input data using a
combination of transformer encoders and convolutional layers. The input to the branch net
is the features present in the dataset, except the temporal information that is input to the
trunk net.
Mathematics 2024,12, 3950 10 of 27
4.1.1. Attention Based Encoding of Features
The feature information first passes through the transformer layers. The detailed
architecture of the transformer model is introduced in Section 3.4. Generally, it has
two specific components: an encoder and a decoder. Given the time series features
Xf eat = (Xf eat1
,
Xf eat2
,
. . .
,
Xf eatT)
at each time step
t
, the hidden state of the encoder
is calculated using feedforward layers, which are then multiplied by multi-head attention.
The relationship between the input and the multi-head attention is given by Equation (12).
The output of the multi-head attention is converted into probabilistic values using a softmax
activation function via Equation (11). Furthermore, the softmax function computes the
attention weights
αt
, which determine how much focus each time step should receive. At
sharp change points, certain time steps,
t1
,
t2
,
. . .
, will have higher attention scores, indicat-
ing their importance for predicting future stock prices. These attention scores are then used
to compute a weighted sum of the input features, which forms the attention-based output
for each time step. This allows the model to capture the most relevant temporal patterns in
the data.
4.1.2. Short-Time Fourier Transform Based 1D Convolution
One of the main features of stock data is the presence of sharp points. By the term
sharp point, we try to identify those timestamp data where the price of the stock has a
trend of sharp decrease or increase. Mathematically, we can represent these points using
the derivative of the function. Let
P(t)
be the function that represents the stock price with
respect to time. Suppose the stock price has a sharp change at a point
x=x0
, which can be
formulated as
lim
ϵ0 dP
dt t=t0±ϵ!(if ϵ0
if ϵ0+(15)
where
ϵ
denotes the infinitesimal change. Due to their complex non-linearity, modeling
these sharp points has been one of the key research fields in stock price modeling.
We implement a 1D CNN model to extract short time period-based features from the
transformer-encoded output. In particular, we have used a Short-Time Fourier Transform
(STFT) based 1D convolution mechanism to extract features, which are modeled according
to short time windows. According to the convolution theorem, the convolution operation
in the time domain is equivalent to multiplication in the frequency domain. When working
with the STFT, this relationship still holds, but STFT adds a time-localized element due to
the windowing of the signal. Here, we define the input to the convolutional model as a
first function,
f(t)
, and the kernel-based weight function,
g(t)
, which is also a time-based
operator. In the following, we discuss the proposed STFT-based convolution model, which
is efficient in predicting sharp change points in the stock features.
The convolution of two functions f(t)and g(t)in the time domain is given by
(fg)(t) = Z
f(τ)g(tτ)dτ(16)
where
τ
denotes the integration variable. The STFT of a function
f(t)
with a window
function w(t)is defined as
STFTf(t,ω) = Z
f(τ)w(τt)eiωτ dτ(17)
Here, ωdenotes the angular frequency.
ConvolutionTheorem for STFT: In the frequency domain, the convolution of two time-
domain signals f(t)and g(t)becomes a multiplication of their Fourier transforms.
F{fg}(ω) = F{f}(ω)· F{g}(ω)(18)
Mathematics 2024,12, 3950 11 of 27
where
F
denotes the Fourier transform. For the STFT, we use windowed signals. The
STFT of f(t)and g(t)are given as Equations (17) and (19), respectively.
STFTg(t,ω) = Z
g(τ)w(τt)eiωτ dτ(19)
In the context of STFT, where localized Fourier transforms are performed on windowed
signals, the convolution becomes the pointwise product of their STFTs in the time-
frequency domain.
STFTfg(t,ω) = STFT f(t,ω)·STFTg(t,ω)(20)
This means that the convolution of two signals in the time domain corresponds to the
multiplication of their STFTs for each time window.
Time-Frequency Interpretation of Convolution: In STFT-based convolution, signals
are first decomposed into their time-frequency components. For each time step
t
,
we perform convolution between the corresponding time-frequency components of
two signals. The STFT allows us to track how the interaction between the signals
evolves over time and frequency, as formulated by Equation (20).
Inverse STFT of Convolution: After performing the STFT-based convolution, the
convolved signal can be reconstructed by applying the inverse STFT (ISTFT).
fg=ISTFT(STFTf(t,ω)·STFTg(t,ω)) (21)
This converts the time-frequency representation back into the time domain, which
allows us to recover the convolved signal.
The convolution operation in the time domain is represented as a pointwise multi-
plication of STFTs in the time-frequency domain. This method is particularly useful for
analyzing non-stationary signals and allows the convolution to be performed locally in both
time and frequency domains. The window size
w
determines how well the sharp changes
can be captured. A shorter window provides better time resolution, which implies that the
sharp change can be detected more precisely at the time it happens. So, considering a small
window size, the model can capture sharp change points. We have considered a rectangular
kernel for this modeling. Mathematically, this means that if a sharp change occurs at time
t0
, the STFT will produce high-frequency components within the short window centered
around t0, enabling the detection of the sharp change both in time and frequency.
4.2. Trunk Network
The timestamp information is used to model the trunk net of the proposed CLT model.
This is performed by using LSTM layers to capture the long-term dependencies and smooth
transitions in the time series data.
The input to the trunk net is the sequence of timestamps,
t1
,
t2
,
. . .
,
tT
, where
T
is the
number of time steps. The LSTM processes this sequence to learn how the stock prices
evolve over time. At each time step
t
, the hidden state
HtRD
(where
D
is the number of
LSTM units) is updated based on the previous hidden state
Ht1
and the current timestamp
t, which can be expressed as
Ht=LSTM(t,Ht1,Ct1)(22)
where
CtRD
is the cell state at time step
t
. An LSTM unit has three gates to control the
flow of information: an input gate, a forget gate, and an output gate, as formulated by
Equations (6), (7), and (9), respectively.
The LSTM hidden states evolve over time, allowing the model to learn both short-
term and long-term dependencies in the stock prices. The final output of the trunk net
is the sequence of hidden states
H= [H1
,
H2
,
. . .
,
HT]RT×D
, which represents the
time-dependent features of the stock prices.
Mathematics 2024,12, 3950 12 of 27
4.3. Multilayer Perceptron Model
The outputs from the branch net and trunk net are concatenated to form a unified
feature representation of the stock prices.
Concat(FBranchNet,HTrunkNet)RT×(C+D)(23)
This concatenated feature vector holds both the temporal significance and the time-dependent
features. This combined representation is then passed through a series of fully connected
layers to make the final stock price predictions.
5. Experimental Setup
This section introduces the experimental setup used to perform the modeling and
analysis and presents the training configuration of the proposed CLT model.
All experiments were performed on a 10th-generation computing system equipped with
an Intel i5 processor and one NVIDIA GeForce RTX 3050 GPU (8 GB). The system has a RAM
of 16 GB and supports eight processing threads. PyCharm 2022.2.1 and Google Colab with
Python 3.9.13, alongside essential libraries such as PyTorch, Numpy, Pandas, Matplotlib, etc.,
were utilized for building the model and data manipulation and preprocessing.
5.1. Dataset
We used the G-Research Crypto dataset for modeling purposes. This dataset is sourced
from a Kaggle open-source dataset pool. It is created by combining historical trades of
several cryptoassets, such as BitCoin and Ethereum. The G-Research Crypto dataset is
composed of several files, including a training file, which contains all the historical data
for training the model, and an asset details file, which contains information about each
cryptoasset and the weights of the cryptoassets. In addition, there are supplementary data
and an unoptimized version of the time series API files for offline training. The features
present in the training set are given below:
1. timestamp: minute-by-minute coverage of data;
2. Asset_ID: ID code corresponding to a specific cryptocurrency;
3. Count: total number of trades in the time interval;
4. Open: the opening price of a crypto asset in USD;
5. High: the highest price of a crypto asset in USD during the time interval;
6. Low: the lowest price of a crypto asset in USD during the time interval;
7. Close: the closing price of a crypto asset in USD (end of the time interval);
8. Volume: units of crypto assets traded;
9. VWAP: volume weighted average price of an asset over the time interval;
10.
Target: quarterly residualized log-returns of the asset.
The G-Research Crypto dataset is a massive dataset with approximately 20 million
data points. The dataset stored the historical asset data since the 1970s on a minute-to-
minute interval. There are 13 assets present in the dataset, as shown in Table 1. Based
on the experimental analysis, it is revealed that the model performance saturates with
the training data split over 55%. Moreover, considering the size of the dataset, a large
portion of the data for validation turns out to be redundant. Consequently, we split the
dataset into training (55%), validation (1%), and testing sets (44%), respectively, to ensure a
comprehensive evaluation of the model. The dataset is preprocessed for model application.
Since most of the features are numerical, the data are normalized to reduce the volatility
of datasets.
Additionally, we also use stock datasets for performing a comparative study of the
model. For this purpose, we have collected the stock data from the Yahoo Finance API
available publicly. The data are collected for a span of 20 years, and the interval considera-
tion is on a daily basis. The average amount of data points is on the order of 10
4
. For the
stock indices, since the number of data points is not as large as the G-research crypto data,
therefore, we have considered a train-test split of 0.8 and 0.2, respectively.
Mathematics 2024,12, 3950 13 of 27
Table 1. Cryptocurrencies in the G-Research Crypto Dataset.
Asset_ID Weight Asset_Name
2 2.397895 Bitcoin Cash
0 4.304065 Binance Coin
1 6.779922 Bitcoin
5 1.386294 EOS.IO
7 2.079442 Ethereum Classic
6 5.894403 Ethereum
9 2.397895 Litecoin
11 1.609438 Monero
13 1.791759 TRON
12 2.079442 Stellar
3 4.406719 Cardano
8 1.098612 IOTA
10 1.098612 Maker
4 3.555348 Dogecoin
5.2. Data Processing
The time series data considered in this study necessitates a robust representation
approach in order to preserve the temporal features of the dataset. To achieve this, we
have efficiently structured and sampled our data. Instead of considering the entire dataset
as a single entity, a frame-based approach is employed where each frame corresponds to
a specific time interval. Subsequently, the frames are shuffled based on their duration,
leading to batch representations of different time frames throughout the entire dataset,
which serve as inputs to our model. In addition, the dataset is divided into two different
parts: the temporal part and the feature part. The former holds the timestamp information,
whereas the latter contains features of the dataset. This segregation of the dataset enables
us to adopt more specific and practical feature engineering methods when operating
and improving each part separately. More importantly, this approach ensures that the
temporal characteristics of the data are maintained for the systematic execution of the
feature engineering process.
Feature engineering plays a crucial role in preparing data for modeling. This is
particularly important when dealing with time information typically provided in a date-
time format. The temporal data are converted into timestamp format to better fit our model.
Then, we apply scaling techniques to keep the temporal data within a uniform range of 0 to
1. Furthermore, this scaling procedure is extended to the entire feature set. The scaling of
the data not only ensures consistency but also enhances the numerical stability of the model
and mitigates issues related to exploding and vanishing gradients. This preprocessing of
data ensures that our chosen models can perform calculations effectively across the dataset,
enabling robust and accurate analysis. Our methodology involves applying selected models
to various data segments, yielding valuable insights and predictions.
For the trunk net, we are only feeding the timestamp information, a feature in the
dataset that contains minute-by-minute coverage of data. Since we are dealing with a time
series dataset, a modified sampling approach is applied to change the format of the input
to the trunk net. Figure 4illustrates the procedure for preparing data that are fed to the
trunk net. Given a long time series data for a stock, we divide the data into smaller chunks
based on a particular period
p
. Suppose the length of the original data is
L
. After dividing
the dataset into periods, the resulting data shape is
(L
p
,
p)
, which is fed to the trunk net.
Here,
⌊·⌋
denotes the floor function. Similarly, for input to the branch net, we obtain a data
shape of (L
p,p,k), where kis the number of features present.
Mathematics 2024,12, 3950 14 of 27
Figure 4. Data sampling method.
The ultimate task in the proposed work is the financial time series regression. Here,
we predict the target variable, which is derived from the log returns
Ra
over a quarter of an
hour. The log return of an asset aover an interval of 15 min can be expressed as
Ra(t) = log(Pa(t+16)/Pa(t+1)) (24)
where
Pa(t)
denotes the closing price of an asset
a
at time
t
. Since crypto asset returns
are highly correlated with the overall crypto market, we perform linear residualization
by removing the market signal from individual asset returns when creating the target
variable. Let
M(t) = awaRa(t)
awa(25)
be the weighted average market return. Accordingly, the target variable can be expressed as
Targeta(t) = Ra(t)βaM(t)(26)
where
βa=M·Ra
M2(27)
Here,
·
is the rolling average over time for 3750 min window. Moreover, the same asset
weights waare used for the evaluation metric.
5.3. Training Configuration and Hyperparameter Tuning
We conducted several experiments with different configurations on the training set to
determine optimal hyperparameters for the proposed CLT model. However, it is essential
to strike a balance between the model’s complexity and its performance due to limited
computing resources. As such, a major limitation of any deep learning model is that with
the increased complexity of the model, the probability of overfitting also increases. While
a very shallow model underfits the dataset, very deep models incorporate unnecessary
complexity and also increase the computational time. As a large-scale model, optimization
of hyperparameters based on grid search for the proposed CLT model is computationally
expensive. Therefore, we opted for manual tuning of hyperparameters.
The CLT model’s architecture is defined by several hyperparameters. We have shown
these hyperparameters and their details in Table 2. The transformer layers’ complexity
is controlled by nhead_transformer (number of attention heads) and dim_feedforward
_transformer (dimension of feedforward network). The number of transformer layers is
determined by num_layers_transformer. The extractor model’s complexity is influenced by
Mathematics 2024,12, 3950 15 of 27
hidden_dims_extractor (number of hidden units) and kernel_size_extractor (convolution
kernel size). The complexity of the LSTM model is defined by num_classes_lstm (number of
output classes), hidden_size_lstm (number of hidden units), and num_layers_lstm (number
of layers). Finally, the MLP’s complexity is determined by hidden_dims_mlp (number of
hidden units). These hyperparameters significantly impact the model’s performance and
computational cost.
The optimum set of hyperparameters is selected through experiments, considering
both the performance and computational complexity of the model. Moreover, we have
observed that the optimum set of hyperparameters is different for the CLT model on
stock datasets and the Crypto assets dataset. The parameter nhead_transformer and input
dimensions for different components of the model depend on the data dimension. Table 3
shows the detailed configuration of different modules in the proposed CLT model for the
G-Research Crypto dataset. Besides, we have considered a 512 batch size dataloader, Adam
optimizer with a learning rate of 1
×
10
6
and 500 epochs for training the model. In the
next section, we conducted more thorough experiments to give some intuition behind the
chosen values of these hyperparameters.
Table 2. Description of hyperparameters for the CLT model.
Hyperparameter Name Details
nhead_transformer The number of attention heads.
dim_feedforward_transformer The dimension of the feedforward network.
num_layers_transformer The number of transformer layers.
hidden_dims_extractor The number of hidden units in the extractor.
out_dims_extractor The number of output features from the extractor model.
kernel_size_extractor The size of the convolution kernel in the extractor model.
num_classes_lstm The number of output classes for the LSTM model
hidden_size_lstm The number of hidden units in each LSTM layer.
num_layers_lstm The number of LSTM layers.
hidden_dims_mlp The number of hidden units in the MLP’s hidden layers.
Table 3. Hyperparameter settings for the CLT model.
Hyperparameter Value
nhead_transformer 8
dim_feedforward_transformer 128
num_layers_transformer 16
hidden_dims_extractor 32
kernel_size_extractor 3
num_classes_lstm 2
hidden_size_lstm 32
num_layers_lstm 1
hidden_dims_mlp 32
6. Results and Discussion
In this section, we conducted extensive experiments to evaluate the prediction per-
formance of the proposed CLT model. For evaluation, we use a series of quantitative
indicators, including Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean
Absolute Error (MAE), Mean Squared Log Error (MSLE) and
R2
Score, to test the prediction
performance of our proposed CLT model. In addition, we also discuss the qualitative
aspects of the model predictions by examining specific examples and patterns observed in
time series data. The discussion highlights the advantages of the proposed CLT model and
explores possible limitations and areas for improvement.
6.1. Model Training and Evaluation
To verify the prediction accuracy of the proposed CLT model, the loss curves for
training and validation sets are plotted in Figure 5. The convergence of training and
Mathematics 2024,12, 3950 16 of 27
validation losses indicates a good generalized model with minimal bias and variance. The
initial rapid decrease followed by a transient increase in the loss curves is a characteristic
response of the model adaptation to the intricacies of the training data. Subsequently,
the curves become stable as the model converges to a lower loss (MSE) of 0.0002 after
300 epochs, emphasizing the effectiveness of the proposed CLT model for capturing intricate
patterns in the time series data.
Figure 5. Loss curve on training and validation sets.
Another group of experiments was carried out to investigate the effect of the number
of hidden layers and nodes on the prediction performance of the proposed CLT model. The
number of hidden layers and nodes are crucial parameters of a deep learning model. An
increase in the number of nodes or layers enhances the model complexity. However, it may
also make the model prone to overfitting. The prediction accuracy of the proposed CLT
model with varying numbers of nodes and hidden layers is compared in Figures 6and 7.
From Figure 6, we can see that as the number of nodes increases, the prediction
accuracy of the proposed CLT model improves. Moreover, the alignment of the predicted
and ground truth curves is obvious, signifying the model’s proficiency in replicating the
underlying trends present in the dataset. Although minor deviations exist, particularly in
the peak positions where the model tends to overestimate slightly, the overall prediction
accuracy of the proposed CLT model is commendable. Notably, the model demonstrates a
remarkable ability to anticipate the highs and lows of the curve, demonstrating a nuanced
understanding of the temporal dynamics in the time series data. Additionally, the sharpness
of peaks in the prediction curves suggests an improved ability of the proposed model to
capture sudden fluctuations compared to ground truth values. This qualitative assessment
substantiates the quantitative success indicated by the low MSE, collectively portraying a
well-performing and reliable model for stock price prediction.
Mathematics 2024,12, 3950 17 of 27
Figure 6. Comparison of the prediction accuracy with varying number of nodes in a hidden layer.
Figure 7. Comparison of the prediction accuracy with varying number of hidden layers.
Similarly, in Figure 7, we identify the optimal depth for the model. We analyze the
performance of the proposed CLT model with a varying number of hidden layers. It is
observed from Figure 7that as the number of hidden layers increases, notable improve-
ments are observed. The model with six hidden layers demonstrates the best performance
compared to two and four hidden layers. However, as the depth of the model is further
increased to eight hidden layers, a decrease in the performance efficiency becomes apparent,
Mathematics 2024,12, 3950 18 of 27
indicating the onset of vanishing gradient issues. Hence, it is inferred that the optimal
depth for our model is six hidden layers. This depth balances the model complexity and
prevents issues associated with deep architectures, enhancing predictive capabilities.
6.2. Model Comparison
In this section, we compare the prediction performance of the proposed CLT model
with existing machine learning techniques, such as Random Forest, XGBoost, and Light-
GBM, as well as neural network variants like ANN, RNN, LSTM, and LSTM-CNN. The
values of various evaluation metrics for different models are tabulated in Table 4. It is
observed from Table 4that the proposed CLT model outperforms the other compared
models in all evaluation metrics, demonstrating its exceptional prediction accuracy. The
CNN-LSTM model demonstrates notable efficacy among the neural network architectures,
outperforming other variants such as LSTM and RNN. Notably, the traditional ANN ex-
hibits comparatively poor performance on the dataset, substantiating the argument that
ANN models might not be the most suitable choice for time series modeling, especially
considering the intricacies of stock price data. On the other hand, a comparison with the
state-of-the-art machine learning techniques, including LightGBM, XGBoost, and Random
Forest, reaffirms the inefficacy of ANN in this domain. These ensemble methods, partic-
ularly LightGBM with tuned hyperparameters, exhibit robust performance, showcasing
their suitability for capturing complex patterns within time series data. Nevertheless, our
proposed CLT model surpasses all the comparative models in terms of MSE, RMSE, MAE,
and MSLE, emphasizing its overall superiority. The ensemble methods, while competitive,
fall short of achieving the same level of accuracy as the proposed CLT model, reinforcing
its efficacy in the intricate task of stock price prediction. This collective analysis positions
the CLT model as a novel, innovative approach and high-performing solution for time
series modeling.
Table 4. Comparison of the prediction performance of various models.
Model MSE RMSE MAE MSLE
Random Forest 0.0098 0.098 0.0132 0.0166
XGBoost 0.0074 0.086 0.0072 0.0058
LightGBM 0.0034 0.058 0.0047 0.0044
ANN 0.0168 0.129 0.1143 0.0221
RNN 0.0086 0.092 0.0633 0.0076
LSTM 0.0026 0.050 0.0194 0.0011
CNN-LSTM 0.0008 0.0282 0.0093 0.0003
CLT 0.0002 0.0141 0.0059 0.0001
6.3. Model Performance on Standard Cryptocurrencies
In this section, we demonstrate the performance of the proposed CLT model on
standard cryptocurrencies that are present in the G-research crypto dataset. There are
13 different crypto assets present in the dataset. However, we have considered the five
cryptocurrencies that have the most weightage according to Table 1. These cryptocurrencies
are Bitcoin, Ethereum, Binance Coin, Dogecoin, and Cardano. We grouped the data
according to the asset id and modeled each crypto asset using the CLT model. We compare
the performance of the CLT model with other traditional and state-of-the-art models,
such as CNN-LSTM, LSTM, RNN, and ANN models. Table 5shows the results of the
comparative study on different cryptocurrencies. Metrics such as MSE, RMSE, MAE, and
R2
score were considered for the evaluation of different models. From Table 5, we can
observe that the CLT model is able to outperform state-of-the-art models. The proposed
CLT model achieves an
R2
score of 0.9734 for bitcoin, and the accuracy ranges from 0.95 to
0.97 for other assets. The CNN-LSTM model achieves an accuracy that ranges from 0.88 to
0.93. Additionally, we can observe that the RNN and ANN models are unable to generalize
the performance, and the metrics values are poor. From this analysis, we can infer that the
proposed CLT model is outperforming both the traditional and state-of-the-art techniques.
Mathematics 2024,12, 3950 19 of 27
Table 5. Performance comparison of various models for different cryptocurrencies.
Index Loss/Metric CLT CNN-LSTM LSTM RNN ANN
Bitcoin MSE 0.0007 0.0021 0.0047 0.0092 0.0227
RMSE 0.0264 0.0458 0.0685 0.0959 0.1016
MAE 0.0237 0.0378 0.0603 0.0882 0.0812
R2Score 0.9734 0.9335 0.8053 0.6219 0.5729
Ethereum MSE 0.0006 0.0027 0.0031 0.0046 0.0162
RMSE 0.0245 0.0519 0.0556 0.0678 0.127
MAE 0.0219 0.0438 0.0683 0.0811 0.146
R2Score 0.9767 0.9116 0.6843 0.6229 0.5961
Binance Coin MSE 0.0009 0.0037 0.0082 0.0143 0.0372
RMSE 0.03 0.0608 0.0905 0.1196 0.1928
MAE 0.0218 0.0732 0.8115 0.0991 0.1741
R2Score 0.9611 0.9043 0.7355 0.6592 0.5066
Dogecoin MSE 0.0008 0.0026 0.0043 0.0061 0.01
RMSE 0.0282 0.0509 0.0655 0.0781 0.1
MAE 0.0214 0.0553 0.0626 0.0834 0.0982
R2Score 0.9464 0.8817 0.7291 0.6711 0.5916
Cardano MSE 0.0009 0.0031 0.0055 0.0086 0.0078
RMSE 0.0309 0.0556 0.0741 0.0927 0.0883
MAE 0.0384 0.0591 0.0683 0.0818 0.0947
R2Score 0.9518 0.8816 0.7661 0.6834 0.5871
6.4. Model Comparison on Standard Indices
We further demonstrate the performance of the proposed CLT model on different
international standard indices in comparison to the existing state-of-the-art deep learn-
ing models.
We consider eight separate indices for our analysis. The AEX index, originating
from the Amsterdam Exchange Index, represents a stock market index comprising Dutch
companies traded on Euronext Amsterdam, previously recognized as the Amsterdam Stock
Exchange. Its inception dates back to 1983, and the index typically includes up to 25 of
the most actively traded securities on the exchange. The Austrian Traded Index (ATX)
is the primary stock market indicator for the Wiener Börse. Similar to the majority of
European indices, the ATX is categorized as a price index and presently encompasses a
total of 20 stocks. The CAC 40 or FCHI serves as a stock market indicator for the French
stock exchange, which is composed of 40 companies listed on the Euronext Paris exchange.
Utilizing a free-float market capitalization approach, the index assesses the weight of each
stock. CAC 40 is an acronym for Cotation Assistée en Continu, denoting “continuous
assisted trading”. This index is a benchmark for funds engaged in investments within
the French stock market. The FTSE 100 Index, commonly known as the Financial Times
Stock Exchange 100 Index, FTSE 100, FTSE, or colloquially as the “Footsie”, represents a
stock index comprising the top 100 companies listed on the London Stock Exchange based
on their highest market capitalization. The Hang Seng Index functions as a stock-market
indicator in Hong Kong, employing a free-float-adjusted market- capitalization-weighted
methodology. This index is instrumental in tracking and documenting daily fluctuations in
the most prominent companies within the Hong Kong stock market, serving as a primary
gauge of the overall market’s performance in the region. Bursa Malaysia functions as
the primary stock exchange in Malaysia, standing as one of the most substantial trading
platforms within the ASEAN region. Headquartered in Kuala Lumpur, it was formerly
recognized as the Kuala Lumpur Stock Exchange. The S&P 100 Index is a collection of
United States stocks within the stock market index managed by Standard & Poor’s. Trading
of index options associated with the S&P 100 is conducted under the “OEX”. Due to these
options’ widespread use and appeal, investors frequently identify the index using its
ticker symbol.
Mathematics 2024,12, 3950 20 of 27
Table 6presents the metric values for various indices, reflecting the performance of
different models applied to the regression problem. Metrics such as MSE, RMSE, MAE, and
R2
score were considered for the evaluation of different models. Figure 8visually compares
different models on the AEX index. From Table 6, it is evident that the proposed CLT model
exhibits superior efficiency, boasting the lowest MSE at 0.0004 and an impressive
R2
Score
of 0.9826. The CNN-LSTM model outperforms the LSTM and RNN models, while the
ANN model demonstrates comparatively poor performance. Similarly, the results for the
ATX index, illustrated in Figure 9, align with the tabular data, showcasing the superiority
of the CTL model over other models. As can be observed from Table 6, this consistent
trend extends to other indices, including FCHI, FTSE, and HSI. From the results of JKSE,
KLSE, and OEX, it is apparent that the performance of the ANN model is notably inferior,
whereas the proposed CLT model consistently demonstrates superior performance. We
refer the readers to Appendix Afor additional figures of the remaining indices.
Figure 8. Comparison of the results produced by various models for the AEX.
Figure 9. Comparison of the results produced by various models for the ATX.
Mathematics 2024,12, 3950 21 of 27
Table 6. Performance comparison of various models for different indices.
Index
Loss/Metric
CLT CNN-LSTM LSTM RNN ANN
AEX MSE 0.0004 0.0016 0.0057 0.0082 0.0103
RMSE 0.0205 0.0394 0.0753 0.0903 0.1016
MAE 0.0161 0.0315 0.0612 0.069 0.0812
R2Score 0.9826 0.9357 0.7654 0.6627 0.5729
ATX MSE 0.0005 0.0018 0.0046 0.0083 0.01
RMSE 0.0214 0.0419 0.068 0.0911 0.0998
MAE 0.0172 0.0338 0.0543 0.0716 0.0797
R2Score 0.9793 0.9207 0.7914 0.6257 0.5503
FCHI MSE 0.0004 0.0019 0.0055 0.0083 0.0103
RMSE 0.0207 0.0439 0.0741 0.0913 0.1014
MAE 0.0165 0.034 0.0606 0.073 0.0834
R2Score 0.9761 0.8923 0.6938 0.5346 0.4266
FTSE MSE 0.0004 0.002 0.005 0.0091 0.01
RMSE 0.0189 0.045 0.0705 0.0955 0.1001
MAE 0.0142 0.0361 0.0563 0.075 0.0792
R2Score 0.9813 0.8941 0.7401 0.5237 0.4771
HSI MSE 0.0005 0.0017 0.0044 0.0074 0.0078
RMSE 0.0213 0.0415 0.0661 0.086 0.0881
MAE 0.0172 0.0334 0.054 0.0677 0.0727
R2Score 0.9881 0.9546 0.8847 0.8048 0.795
JKSE MSE 0.0004 0.0017 0.0061 0.0065 0.0106
RMSE 0.0204 0.041 0.0781 0.0806 0.1031
MAE 0.0163 0.0331 0.0631 0.0651 0.0816
R2Score 0.9546 0.8174 0.337 0.2923 0.156
KLSE MSE 0.0005 0.0018 0.0057 0.0082 0.0108
RMSE 0.0215 0.0428 0.0754 0.0904 0.1037
MAE 0.0169 0.0345 0.0609 0.0713 0.082
R2Score 0.9548 0.8216 0.4464 0.2034 0.0475
OEX MSE 0.0005 0.0024 0.0053 0.0079 0.0098
RMSE 0.0212 0.0486 0.0726 0.0888 0.0992
MAE 0.0169 0.0382 0.058 0.0705 0.0752
R2Score 0.9568 0.7736 0.4954 0.2462 0.0583
In addition to evaluating the prediction performance of various models across multiple
indices, we also analyze their runtimes. The runtime results are shown in Table 7. It is worth
noting that a factor of 10
3
has scaled all the presented values to reflect the computational
requirements for running 500 epochs. Notably, the proposed CLT model exhibits higher
computational costs when compared to simpler models, such as RNN and LSTM. This
increased computational demand can be attributed to the intricacies of the CLT model.
Unlike other models, CLT involves the simultaneous training of two distinct networks:
branch net and trunk net. This dual-training requirement and the overall complexity of the
model contribute to an extended runtime.
As we increase the model size, the performance of the model improves, which is
advantageous. However, the computational complexity increases with the increasing
model size, which is disadvantageous. Similarly, considering a light model with a smaller
number of parameters reduces the model’s accuracy but increases the computational speed.
Based on the requirement of modeling the financial data, we can consider the model size.
Additionally, the inference time is less compared to training. Hence, if there is an offline
training process, we can train a larger model and perform the inference in the application.
However, for real-time training or online training, we will have to consider the trade-off
mentioned above.
Mathematics 2024,12, 3950 22 of 27
Table 7. Runtime comparison of various models for different indices.
Index CLT CNN-LSTM LSTM RNN ANN
AEX 8.2 6.4 3.3 2.1 1.6
ATX 7.9 6.6 3.3 1.8 1.8
FCHI 8.6 5.8 3.8 1.9 1.4
FTSE 8.3 6.1 3.5 1.9 1.7
HSI 8.8 6.9 3.3 2.2 1.1
JKSE 8.7 6.8 3.9 2 1.1
KLSE 8.1 6.8 3.5 1.8 1.4
OEX 8 6.1 3.1 2.3 1.2
6.5. Net Value Analysis
In this section, we implement a generic trading strategy based on the predicted values
to evaluate the performance and risk profile of the proposed CLT model from net values,
such as return ratio, volatility, Max Drawdown (MDD), and Sharpe ratio. The return ratio
of our model is 72.8%, which surpasses the 70% threshold that other models struggle to
achieve. This indicates a robust performance in terms of profitability. The volatility of the
predicted results is 1.14, suggesting moderate fluctuations in returns, which is acceptable
in the context of our strategy. The MDD value for our predicted result is
18.44. This
relatively low MDD value highlights the resilience of our investment portfolio against
significant declines, thereby minimizing potential losses. It underscores the robustness of
our trading strategy in maintaining value during adverse market conditions. The Sharpe
ratio for the predicted results from our proposed model is approximately 1.76. This high
Sharpe ratio demonstrates the efficiency of our model in balancing risk and return. A
ratio of this magnitude indicates that our model not only generates substantial returns but
does so with a manageable level of risk, thereby providing a more stable and rewarding
investment strategy compared to others. These results show that the proposed model is
able to deliver consistent and high-quality trading outcomes.
6.6. Ablation Study
In this section, we present ablation experiments to infer the impact of each module on
the prediction performance of the proposed CLT model. Since the model has four different
modules, i.e., Transformer, 1D CNN, LSTM, and MLP, we remove each module individually
from the model and also adapt the output of the previous module to the next module to
avoid modeling errors. The ablation results are shown in Table 8. It can be observed from
Table 8that the absence of any module can degrade the prediction performance of the model.
The removal of the Transformer module from the model results in a worst-case performance,
which can be attributed to the ability of the transformer model to encode the long-term
dependencies in the features. By removing this layer, the model is only performing feature
extraction with any historical context. This implies the significant contribution of the
Transformer module to the prediction accuracy of the proposed CLT model. Moreover,
the ablation of the 1D CNN and the LSTM modules provides us with an MSE loss of
0.222 and 0.263, respectively. The reason behind this performance variation is only using
the ANN model instead of LSTM for modeling a time series causes inconsistency in the
feature extraction. Additionally, the ANN model has the tendency to assign overweight to
certain nodes more than others, where the time information should be independent of any
external disturbance since it is an independent variable. LSTM also captures the long-term
dependency in this type of sequential data. This indicates that the LSTM model in the
trunknet is equally important for the performance of the proposed CLT model. Finally, the
model suffers an MSE loss of 0.164 by the omission of the MLP module, indicating the least
accuracy drop among all the modules. This may be sued to the fact that the previously
mentioned modules are extracting rich feature information from the data. Consequently,
the weights associated to the MLP module remain invariant to some extent.
Mathematics 2024,12, 3950 23 of 27
Table 8. Ablation results of the CLT model.
Case MSE RMSE MAE MSLE
CLT Model (Baseline) 0.0002 0.0141 0.0059 0.0001
CLT Model—Transformer 0.479 0.692 0.645 0.293
CLT Model—1D CNN 0.222 0.471 0.311 0.148
CLT Model—LSTM 0.263 0.68 0.278 0.105
CLT Model—MLP 0.164 0.404 0.361 0.094
6.7. Uncertainty and Robustness Analysis
Finally, we evaluate the performance of the proposed CLT model by prediction uncer-
tainty and robustness to external perturbations. From Table 4, we can see that the MSE loss
for the proposed CLT model on the normalized dataset is 0.0002. Now to understand the
prediction uncertainty of the model, we run the model 100 times to observe variations in
the prediction results. Similarly, this process has been implemented for other models, such
as LightGBM, RNN, LSTM, and CNN-LSTM.
Figure 10 illustrates the results of the prediction uncertainty of numerous models. As it
can be seen in Figure 10, the normal distributions of all the models are symmetric around a
standard mean value. The CLT model shows the lowest standard deviation value of 0.0029.
Additionally, the CNN-LSTM model has a lower deviation value of 0.0041 compared to
other traditional models. Although the LSTM model performs better than LightGBM, the
LightGBM model has less uncertainty than the LSTM model. To sum up, we can infer that
the level of uncertainty in the proposed CLT model is lower than other standard models.
Figure 10. Model performance variation.
Next, we introduce some noise in the dataset to evaluate the robustness of the models
to external perturbation. In this case, we perturbed the dataset using Gaussian noise with
a specified mean of 0.5 and standard deviation of 1 scaled to a lower value by a scaling
factor
s
. Here
s
is the factor that determines the amount of noise to be added into the
dataset. Table 9shows the experimental results on the prediction performance of various
models in the presence of different noise levels. From Table 9, we can see that the proposed
CLT model is highly robust to the introduced noise. The standard deviation values are
lower than all the other models for different noise levels. On the other hand, RNN and
LSTM models show low resistance to external perturbation. Moreover, the performance of
the CNN-LSTM model is very close to the proposed CLT model. This may be due to the
Mathematics 2024,12, 3950 24 of 27
fact that the convolution operation significantly enhances the robustness of the model to
external perturbations.
Table 9. Model robustness analysis.
Model SD
(Scale = 0.01)
SD
(Scale = 0.001)
SD
(Scale = 0.002)
SD
(Scale = 0.005)
SD
(Scale = 0.0001)
LightGBM 0.0168 0.0144 0.0118 0.0092 0.0089
RNN 0.027 0.0187 0.0141 0.0137 0.0133
LSTM 0.0169 0.0152 0.124 0.0106 0.0097
CNN-LSTM 0.0087 0.0063 0.0055 0.005 0.0046
CLT 0.0085 0.0059 0.0043 0.0035 0.003
7. Conclusions and Future Work
In this article, a novel DeepONet-inspired CLT model for the prediction of financial
time series is proposed. The proposed CLT model is an orchestrated symphony of estab-
lished frameworks and innovative architectures, which can navigate the complexities of
financial data dynamics. It infuses self-attention in a CNN-LSTM framework to improve
both the modeling performance and robustness of the model. The proposed CLT model is
composed of two different subnetworks: the branch net and the trunk net. We choose LSTM
as the architecture of the trunk net to model temporal features, whereas the branch net
incorporates a transformer, which is followed by a 1D CNN, to model feature information.
The outputs from both the subnetworks are then concatenated and fed to the MLP network
for final prediction. This integrated approach leverages the strengths of the transformer and
CNNs for efficient feature extraction and LSTM in learning spatio-temporal features, pro-
viding a more comprehensive understanding of the intricate dynamics governing financial
time series.
We conducted extensive experiments to demonstrate the prediction performance of the
proposed CLT model. Compared with other competing models, the proposed CLT model
shows state-of-the-art predictive performance by achieving the lowest MSE of 0.0002 on
the G-Research Crypto dataset. It also outperforms existing deep learning models on
various international standard indices across all the evaluation metrics considered in this
study. In addition, the proposed CLT model shows less prediction uncertainty and more
robustness to external perturbation. Although the proposed CLT model has demonstrated
enhanced predictive performance and adaptability, the runtime is the slowest among the
other compared models. This is because the inherent quadratic time complexity of the self-
attention mechanism in the transformer model leads to a longer runtime, even if the dual-
training requirement of the model is parallelizable. In the future, it would be interesting
to optimize the Transformer module in the proposed CLT model for faster processing.
Additionally, we aim to implement an explainable framework for comprehending the
results of the prediction model. As we have implemented an aggregate model, converting
the framework as a surrogate model in traditional explainable techniques, such as LIME
and SHAP, would be difficult.
Author Contributions: Conceptualization, Z.A.; methodology, Z.A. and S.B.; software, Z.A. and
M.C.; validation, Z.A. and S.B.; formal analysis, Z.A. and M.C.; investigation, M.C.; resources, M.C.;
data curation, Z.A.; writing—original draft preparation, Z.A. and S.B.; writing—review and editing,
Z.A. and S.B.; visualization, M.C.; supervision, S.B.; funding acquisition, S.B. All authors have read
and agreed to the published version of the manuscript.
Funding: This work was supported in part by the open research fund of National Mobile Communi-
cations Research Laboratory, Southeast University (No. 2023D15), and in part by the Ningbo Clinical
Research Center for Medical Imaging (No. 2022LYKFYB01).
Data Availability Statement: The datasets analyzed in this study are available at: https://www.
kaggle.com/ (accessed on 7 July 2023).
Conflicts of Interest: The authors declare no conflicts of interest.
Mathematics 2024,12, 3950 25 of 27
Appendix A. Additional Figures for Various Indices
(a) (b)
(c) (d)
(e) (f)
Figure A1. Comparison of the results produced by various models for the following: (a) FCHI;
(b) FTSE; (c) HSI; (d) JKSE; (e) KLSE; (f) OEX.
References
1. Yu, P.; Yan, X. Stock price prediction based on deep neural networks. Neural Comput. Appl. 2020,32, 1609–1628. [CrossRef]
2.
Vukovi´c, D.B.; Radenkovi´c, S.D.; Simeunovi´c, I.; Zinovev, V.; Radovanovi´c, M. Predictive patterns and market efficiency: A deep
learning approach to financial time series forecasting. Mathematics 2024,12, 3066. [CrossRef]
3.
Yadav, K.; Yadav, M.; Saini, S. Stock values predictions using deep learning based hybrid models. CAAI T. Intell. Technol. 2022,
7, 107–116. [CrossRef]
4.
Fan, J.; Wang, Z.; Sun, D.; Wu, H. More efficient models for long sequence time-series forecasting. IEEE Trans. Emerg. Top. Comput.
2024,12, 432–443. [CrossRef]
5.
Ma, W.; Hong, Y.; Song, Y. On stock volatility forecasting under mixed-frequency data based on hybrid RR-MIDAS and
CNN-LSTM models. Mathematics 2024,12, 1538. [CrossRef]
6.
Roy, P.K.; Kumar, A.; Singh, A.; Sangaiah, A.K. Forecasting bitcoin prices using deep learning for consumer-centric industrial
application. IEEE Trans. Consum. Electron. 2024,70, 1351–1358. [CrossRef]
7.
Shen, J.; Shafiq, M.O. Short-term stock market price trend prediction using a comprehensive deep learning system. J. Big Data
2020,7, 66. [CrossRef]
8.
Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Appl. 2021,33, 4741–4753.
[CrossRef]
9. Wang, C.; Chen, Y.; Zhang, S.; Zhang, Q. Stock market index prediction using deep Transformer model. Expert Syst. Appl. 2022,
208, 118128. [CrossRef]
10.
Sivadasan, E.; Mohana Sundaram, N.; Santhosh, R. Stock market forecasting using deep learning with long short-term memory
and gated recurrent unit. Soft Comput. 2024,28, 3267–3282. [CrossRef]
Mathematics 2024,12, 3950 26 of 27
11.
Sang, S.; Li, L. A novel variant of LSTM stock prediction method incorporating attention mechanism. Mathematics 2024,12, 945.
[CrossRef]
12.
Fang, Z.; Ma, X.; Pan, H.; Yang, G.; Arce, G.R. Movement forecasting of financial time series based on adaptive LSTM-BN network.
Expert Syst. Appl. 2023,213, 119207. [CrossRef]
13.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need.
In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9
December 2017; pp. 6000–6010.
14.
Ahmad, Z.; Jaffri, Z.U.A.; Chen, M.; Bao, S. Understanding GANs: Fundamentals, variants, training challenges, applications, and
open problems. Multimed. Tools Appl. 2024, latest articles. [CrossRef]
15.
Zarkias, K.S.; Passalis, N.; Tsantekidis, A.; Tefas, A. Deep reinforcement learning for financial trading using price trailing. In
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019;
pp. 3067–3071. [CrossRef]
16.
Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal
approximation theorem of operators. Nat. Mach. Intell. 2021,3, 218–229. [CrossRef]
17. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997,9, 1735–1780. [CrossRef] [PubMed]
18.
Billah, B.; King, M.L.; Snyder, R.D.; Koehler, A.B. Exponential smoothing model selection for forecasting. Int. J. Forecast. 2006,
22, 239–247. [CrossRef]
19.
Clements, M.P.; Franses, P.H.; Swanson, N.R. Forecasting economic and financial time-series with non-linear models. Int. J.
Forecast. 2004,20, 169–183. [CrossRef]
20.
Henrique, B.M.; Amorim, S.V.; Kimura, H. Practical machine learning: Forecasting daily financial markets directions. Expert Syst.
Appl. 2023,233, 120840. [CrossRef]
21.
Yu, H.; Ming, L.J.; Sumei, R.; Shuping, Z. A hybrid model for financial time series forecasting—Integration of EWT, ARIMA with
the improved ABC optimized ELM. IEEE Access 2020,8, 84501–84518. [CrossRef]
22.
Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003,50, 159–175.
[CrossRef]
23.
Jiang, W. Applications of deep learning in stock market prediction: Recent progress. Expert Syst. Appl. 2021,184, 115537.
[CrossRef]
24.
Selvin, S.; Vinayakumar, R.; Gopalakrishnan, E.A.; Menon, V.K.; Soman, K.P. Stock price prediction using LSTM, RNN and
CNN-sliding window model. In Proceedings of the International Conference on Advances in Computing, Communications and
Informatics, Udupi, India, 13–16 September 2017; pp. 1643–1647. [CrossRef]
25.
Zhang, Q.; Qin, C.; Zhang, Y.; Bao, F.; Zhang, C.; Liu, P. Transformer-based attention network for stock movement prediction.
Expert Syst. Appl. 2022,202, 117239. [CrossRef]
26.
Yang, L.; Ng, T.L.J.; Smyth, B.; Dong, R. HTML: Hierarchical transformer-based multi-task learning for volatility prediction. In
Proceedings of the Web Conference, Taipei, Taiwan, 20–24 April 2020; pp. 441–451. [CrossRef]
27.
Li, S.; Huang, X.; Cheng, Z.; Zou, W.; Yi, Y. AE-ACG: A novel deep learning-based method for stock price movement prediction.
Financ. Res. Lett. 2023,58, 104304. [CrossRef]
28.
Sathya, R.; Kulkarni, P.; Khalil, M.N.; Nigam, S.C. Stock price prediction using reinforcement learning and feature extraction. Int.
J. Recent Tech. Eng. 2020,8, 3324–3327. [CrossRef]
29. He, Q. Deep reinforcement learning stock trading strategies combining trends. Comput. Sci. Appl. 2022,12, 49527. [CrossRef]
30.
Shahbazi, Z.; Byun, Y.C. Improving the cryptocurrency price prediction performance based on reinforcement learning. IEEE
Access 2021,9, 162651–162659. [CrossRef]
31. Dang, Q.V. Reinforcement learning in stock trading. In Advanced Computational Methods for Knowledge Engineering; Le Thi, H.A.,
Le, H.M., Pham Dinh, T., Nguyen, N.T., Eds.; Springer: Berlin/Heidelberg, Germany, 2020; pp. 311–322. [CrossRef]
32.
Li, Y.; Ni, P.; Chang, V. Application of deep reinforcement learning in stock trading strategies and stock forecasting. Computing
2020,102, 1305–1322. [CrossRef]
33.
Shah, J.; Vaidya, D.; Shah, M. A comprehensive review on multiple hybrid deep learning approaches for stock prediction. Intell.
Syst. Appl. 2022,16, 200111. [CrossRef]
34.
Zaheer, S.; Anjum, N.; Hussain, S.; Algarni, A.D.; Iqbal, J.; Bourouis, S.; Ullah, S.S. A multi parameter forecasting for stock time
series data using LSTM and deep learning model. Mathematics 2023,11, 590. [CrossRef]
35.
Zeng, Z.; Kaur, R.; Siddagangappa, S.; Rahimi, S.; Balch, T.; Veloso, M. Financial time series forecasting using CNN and
Transformer. arXiv 2023, arXiv:2304.04912v1. [CrossRef]
36.
Li, C.; Qian, G. Stock price prediction using a frequency decomposition based GRU Transformer neural network. Appl. Sci. 2023,
13, 222. [CrossRef]
37.
Wang, S. A stock price prediction method based on BiLSTM and improved Transformer. IEEE Access 2023,11, 104211–104223.
[CrossRef]
38.
Haryono, A.T.; Sarno, R.; Sungkono, K.R. Transformer-gated recurrent unit method for predicting stock price based on news
sentiments and technical indicators. IEEE Access 2023,11, 77132–77146. [CrossRef]
39.
Lai, S.; Wang, M.; Zhao, S.; Arce, G.R. Predicting high-frequency stock movement with differential Transformer neural network.
Electronics 2023,12, 2943. [CrossRef]
Mathematics 2024,12, 3950 27 of 27
40. Morters, P.; Peres, Y. Brownian Motion; Cambridge University Press: Exeter, UK, 2010; Volume 30.
41.
Poufinas, T. On the number of deviations of Geometric Brownian Motion with drift from its extreme points with applications to
transaction costs. Stat. Probab. Lett. 2008,78, 3040–3046. [CrossRef]
42.
Ibrahim, S.N.I.; Misiran, M.; Laham, M.F. Geometric fractional Brownian motion model for commodity market simulation. Alex.
Eng. J. 2021,60, 955–962. [CrossRef]
43. Itô, K. Stochastic integral. Proc. Imp. Acad. 1944,20, 519–524. [CrossRef]
44.
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in
convolutional neural networks. Pattern Recognit. 2018,77, 354–377. [CrossRef]
45.
Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, J.D. 1D convolutional neural networks and applications: A
survey. Mech. Syst. Signal Proc. 2021,151, 107398. [CrossRef]
46.
Kiranyaz, S.; Ince, T.; Hamila, R.; Gabbouj, M. Convolutional neural networks for patient-specific ECG classification. In
Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Milan, Italy,
25–29 August 2015; pp. 2608–2611. [CrossRef]
47.
Tang, W.; Long, G.; Liu, L.; Zhou, T.; Blumenstein, M.; Jiang, J. Omni-scale CNNs: A simple and effective kernel size configuration
for time series classification. In Proceedings of the 10th International Conference on Learning Representations, Virtual, 25–29
April 2022; pp. 1–17.
48.
Lecun, Y. A theoretical framework for back-propagation. In Proceedings of the 1988 Connectionist Models Summer School;
Touretzky, D., Hinton, G., Sejnowski, T., Eds.; CMU: Pittsburg, PA, USA, 1988; pp. 21–28.
49.
Shapiro, A.; Wardi, Y. Convergence analysis of gradient descent stochastic algorithms. J. Optim. Theory Appl. 1991,91, 439–454.
[CrossRef]
50.
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput.
2019,31, 1235–1270. [CrossRef]
51. Grossberg, S. Recurrent neural networks. Scholarpedia 2013,8, 1888. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... Accurate time series forecasting helps enterprises seize market opportunities, prepare for potential challenges in advance, and effectively regulate resource allocation, thereby reducing costs and increasing efficiency [3]. In finance, time series prediction methods strongly support stock price forecasting [4,5] and macroeconomic indicator forecasting [6], offering references for investors' decision-making [7]. Therefore, time series forecasting is not only an efficient data analysis technique but also an extremely valuable decision-support tool. ...
... Time series that exhibit potential cyclical patterns are characterized by high similarity values. The autocorrelation coefficient R xx can be computed by the Wiener-Khinchin theorem [28], utilizing the Fast Fourier Transform as detailed in Equations (4) and (5). ...
Article
Full-text available
Accurate time series forecasting is crucial in fields such as business, finance, and meteorology. To achieve more precise predictions and effectively capture the potential cycles and stochastic characteristics at different scales in time series, this paper optimizes the network structure of the Autoformer model. Based on multi-scale convolutional operations, a multi-scale feature fusion network is proposed, combined with date–time encoding to build the MD–Autoformer time series forecasting model, which enhances the model’s ability to capture information at different scales. In forecasting tasks across four fields—apparel sales, meteorology, finance, and disease—the proposed method achieved the lowest RMSE and MAE. Additionally, ablation experiments demonstrated the effectiveness and reliability of the proposed method. Combined with the TPE Bayesian optimization algorithm, the prediction error was further reduced, providing a reference for future research on time series forecasting methods.
Article
Full-text available
This study explores market efficiency and behavior by integrating key theories such as the Efficient Market Hypothesis (EMH), Adaptive Market Hypothesis (AMH), Informational Efficiency and Random Walk theory. Using LSTM enhanced by optimizers like Stochastic Gradient Descent (SGD), Adam, AdaGrad, and RMSprop, we analyze market inefficiencies in the Standard and Poor’s (SPX) index over a 22-year period. Our results reveal “pockets in time” that challenge EMH predictions, particularly with the AdaGrad optimizer at a size of the hidden layer (HS) of 64. Beyond forecasting, we apply the Dominguez–Lobato (DL) and General Spectral (GS) tests as part of the Martingale Difference Hypothesis to assess statistical inefficiencies and deviations from the Random Walk model. By emphasizing “informational efficiency”, we examine how quickly new information is reflected in stock prices. We argue that market inefficiencies are transient phenomena influenced by structural shifts and information flow, challenging the notion that forecasting alone can refute EMH. Additionally, we compare LSTM with ARIMA with Exponential Smoothing, and LightGBM to highlight the strengths and limitations of these models in financial forecasting. The LSTM model excels at capturing temporal dependencies, while LightGBM demonstrates its effectiveness in detecting non-linear relationships. Our comprehensive approach offers a nuanced understanding of market dynamics and inefficiencies.
Article
Full-text available
Most of the deep-learning algorithms on stock price volatility prediction in the existing literature use data such as same-frequency market indicators or technical indicators, and less consider mixed-frequency data, such as macro-data. Compared with the traditional model that only inputs the same-frequency data such as technical indicators and market indicators, this study proposes an improved deep-learning model based on mixed-frequency big data. This paper first introduces the reserve restricted mixed-frequency data sampling (RR-MIDAS) model to deal with the mixed-frequency data and, secondly, extracts the temporal and spatial features of volatility series by using the parallel model of CNN-LSTM and LSTM, and finally utilizes the Optuna framework for hyper-parameter optimization to achieve volatility prediction. For the deep-learning model with mixed-frequency data, its RMSE, MAE, MSLE, MAPE, SMAPE, and QLIKE are reduced by 18.25%, 14.91%, 30.00%, 12.85%, 13.74%, and 23.42%, respectively. This paper provides a more accurate and robust method for forecasting the realized volatility of stock prices under mixed-frequency data.
Article
Full-text available
Generative adversarial networks (GANs), a novel framework for training generative models in an adversarial setup, have attracted significant attention in recent years. The two opposing neural networks of the GANs framework, i.e., a generator and a discriminator, are trained simultaneously in a zero-sum game, where the generator generates images to fool the discriminator that is trained to discriminate between real and synthetic images. In this paper, we provide a comprehensive review about the recent developments in GANs. Firstly, we introduce various deep generative models, basic theory and training mechanism of GANs, and the latent space. We further discuss several representative variants of GANs. Although GANs have been successfully utilized in various applications, they are known to be highly unstable to train. Generally, there is a lack of understanding as to how GANs converge. We briefly discuss the sources of instability and convergence issues in GANs from the perspectives of statistics, game theory and control theory, and describe several techniques for their stable training. Evaluating GANs has been a challenging task, as there is no consensus yet reached on which measure is more suitable for model comparison. Therefore, we provide a brief discussion on quantitative and qualitative evaluation measures for GANs. Then, we conduct several experiments to compare representative GANs variants based on these evaluation metrics. Furthermore, the application areas of GANs are briefly discussed. Finally, we outline several important open issues and future research trends in GANs.
Article
Full-text available
Long Short-Term Memory (LSTM) is an effective method for stock price prediction. However, due to the nonlinear and highly random nature of stock price fluctuations over time, LSTM exhibits poor stability and is prone to overfitting, resulting in low prediction accuracy. To address this issue, this paper proposes a novel variant of LSTM that couples the forget gate and input gate in the LSTM structure, and adds a “simple” forget gate to the long-term cell state. In order to enhance the generalization ability and robustness of the variant LSTM, the paper introduces an attention mechanism and combines it with the variant LSTM, presenting the Attention Mechanism Variant LSTM (AMV-LSTM) model along with the corresponding backpropagation algorithm. The parameters in AMV-LSTM are updated using the Adam gradient descent method. Experimental results demonstrate that the variant LSTM alleviates the instability and overfitting issues of LSTM, effectively improving prediction accuracy. AMV-LSTM further enhances accuracy compared to the variant LSTM, and compared to AM-LSTM, it exhibits superior generalization ability, accuracy, and convergence capability.
Article
Full-text available
In this paper, recurrent neural networks consisting of GRU and LSTM architectures are used to extract meaningful insights, characteristics, and specific patterns from previously observed, equally spaced, stock market data. The long-term dependency of nonlinear time series data can be learned using GRU and LSTM. In the first phase, the sliding window technique is used to analyse the daywise (i) open, (ii) high, (iii) low, and (iv) closing values of various stocks on the stock market to forecast the future. Performance comparisons show that the proposed GRU and LSTM networks outperform the existing models in terms of prediction accuracy. Multiple datasets were compared and the findings are: (1) the proposed model has a MAPE of 0.630, while the present model’s is 1.748; (2) the MAPE of the proposed model is 0.6243, while that of the existing model is 1.92; (3) the recommended model has a MAPE of 0.7924, while the existing model’s is 0.8587; (4) the recommended model’s MAPE is 1.191, but the current model’s MAPE is 2.99. In the second phase of the process, SMA, EMA, RSI, MACD, and ADX are chosen from among the many technical indicators and used in conjunction with OHLC to further optimise the models.
Article
Full-text available
News sentiment can influence stock prices indirectly, in addition to the technical indicators already used to analyze stock prices. The information quantification of news sentiment by considering time sequence data in the stock analysis has been the primary issue; this article proposes methods for quantifying news sentiments by considering time sequence data. The news sentiment quantification uses a daily confidence score from the classification model. The active learning model uses to build a classification model considering time sequence data, which results in sentiment indicators. Then the sentiment indicators are utilized by stock price forecasting using the proposed Transformer Encoder Gated Recurrent Unit (TEGRU) architecture. The TEGRU consists of a transformer encoder to learn pattern time series data with multi-head attention and pass it into the GRU layer to determine stock price. The accuracy mean absolute percentage error (AcMAPE) uses to evaluate forecasting models sensitive to the misclassification of stock price trends. Our experiment showed that the sentiment indicator could influence stock issuers based on the increased performance of the stock price forecasting model. The TEGRU architecture outperformed other transformer architecture on five feature scenarios. In addition, TEGRU presented the best-fit parameters to produce low financial risk for each stock issuer.
Article
Full-text available
How to maximize shareholder returns has always been a focus of research in the financial field. In order to improve the accuracy and stability of stock price prediction, this article proposes a new method, BiLSTM-MTRAN-TCN. Improve the transformer model and introduce TCN (Temporary Revolution Network) to construct a new transformer model (MTRAN-TCN), making it suitable for stock price prediction. This method consists of BiLSTM (Bi-directional Long Short-Term Memory) and MTRAN-TCN, which can fully utilize the advantages of the three models: BiLSTM, transformer and TCN. Transformer is good at obtaining full range distance information, but its ability to capture sequence information is weak. BiLSTM can capture bidirectional information in sequences, while TCN can capture sequence dependencies and improve the model’s generalization ability. Not only did the improvement effect of the transformer and the effectiveness of introducing the BiLSTM model be verified, but the effectiveness of the method was also verified using 5 index stocks and 14 Shanghai and Shenzhen stocks. Compared with other existing methods in the literature, this method has the best fit on each index stock, and the R 2 of this method is the best in 85.7% of the stock dataset. RMSE decreases by 24.3% to 93.5%, and R 2 increases by 0.3% to 15.6%. In addition, this method has relatively stable prediction performance at different time periods and does not have timeliness issues. The results indicate that the BiLSTM-MTRAN-TCN method performs better in predicting stock prices, with high accuracy and generalization ability.
Article
Full-text available
Predicting stock prices has long been the holy grail for providing guidance to investors. Extracting effective information from Limit Order Books (LOBs) is a key point in high-frequency trading based on stock-movement forecasting. LOBs offer many details, but at the same time, they are very noisy. This paper proposes a differential transformer neural network model, dubbed DTNN, to predict stock movement according to LOB data. The model utilizes a temporal attention-augmented bilinear layer (TABL) and a temporal convolutional network (TCN) to denoise the data. In addition, a prediction transformer module captures the dependency between time series. A differential layer is proposed and incorporated into the model to extract information from the messy and chaotic high-frequency LOB time series. This layer can identify the fine distinction between adjacent slices in the series. We evaluate the proposed model on several datasets. On the open LOB benchmark FI-2010, our model outperforms other comparative state-of-the-art methods in accuracy and F1 score. In the experiments using actual stock data, our model also shows great stock-movement forecasting capability and generalization performance.
Article
As cryptocurrencies become more popular as investment vehicles, bitcoin draws interest from businesses, consumers, and computer scientists all across the world. Bitcoin is a computer file stored in digital wallet applications where each transaction is secured using strong cryptographic algorithms. It was challenging to forecast the future price of bitcoin due to its nonlinearity and extreme volatility. Several recent classic parametric models have been found with limited accuracy. To address the limitations and fill the existing research gaps, there is a need for a good prediction model which will provide the desired accuracy in the case of uncertainty and dynamism. This research suggested a deep learning-based framework for predicting and forecasting Bitcoin price. The research will be helpful for worldwide consumers and industries to take their decision on whether to invest or not. The research utilizes Yahoo! finance dataset for the period of 01-03-2016 to 26-02-2021 having 1828 samples. The experimental outcomes of the proposed Long Short-Term Memory (LSTM) model outperformed similar deep learning models by securing minimum loss and confirming that it can be used for future price prediction of the cryptocurrencies, which is helpful for the buyer to take their decision.