ArticlePDF Available

Long short-term memory (LSTM) model-based reinforcement learning for nonlinear mass spring damper system control

Authors:
  • Meta Industry Polytechnic
  • Institut Teknologi Nasional Yogyakarta (ITNY)

Abstract

The Neural Networks (NN) model which is incorporated in the control system design has been studied, and the results show better performance than the mathematical model approach. However, some studies consider that only offline NN model learning and does not use the online NN model learning directly on the control system. As a result, the controller's performance decreases due to changes in the system environment from time to time. The Reinforcement Learning (RL) method has been investigated intensively, especially Model-based RL (Mb-RL) to predict system dynamics. It has been investigated and performs well in making the system more robust to environmental changes by enabling online learning. This paper proposes online learning of local dynamics using the Mb-RL method by utilizing Long Short-Term Memory (LSTM) model. We consider Model Predictive Control (MPC) scheme as an agent of the Mb-RL method to control the regulatory trajectory objectives with a random shooting policy to search for the minimum objective function. A nonlinear Mass Spring Damper (NMSD) system with parameter-varying linear inertia is used to demonstrate the effectiveness of the proposed method. The simulation results show that the system can effectively control high-oscillating nonlinear systems with good performance.
ScienceDirect
Available online at www.sciencedirect.com
Procedia Computer Science 216 (2023) 213–220
1877-0509 © 2023 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 7th International Conference on Computer Science and Computational
Intelligence 2022
10.1016/j.procs.2022.12.129
10.1016/j.procs.2022.12.129 1877-0509
© 2023 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientic committee of the 7th International Conference on Computer Science and
Computational Intelligence 2022
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2022) 000000
www.elsevier.com/locate/procedia
1877-0509 © 2022 The Authors. Published by ELSEVIER B.V. This is an open access article under the CC BY-NC-ND license
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 7th International Conference on Computer Science and
Computational Intelligence 2022
7th International Conference on Computer Science and Computational Intelligence 2022
Long short-term memory (LSTM) model-based reinforcement
learning for nonlinear mass spring damper system control
Santo Wijayaa*, Yaya Heryadia, Yulyani Arifina, Wayan Supartab, Lukasc
aComputer Science Department, BINUS Graduate Program Doctor of Computer Science,Bina Nusantara University, Jakarta 11480, Indonesia.
bDepartment of Electrical Engineering, Faculty of Industrial Technology, Institut Teknologi Nasional Yogyakarta, Yogyakarta 55281, Indonesia.
cCognitive Engineering Research Group (CERG), Universitas Katolik Indonesia Atma Jaya, Jakarta 12930, Indonesia.
Abstract
The Neural Networks (NN) model which is incorporated in the control system design has been studied, and the results show
better performance than the mathematical model approach. However, some studies consider that only offline NN model learning
and does not use the online NN model learning directly on the control system. As a result, the controller's performance decreases
due to changes in the system environment from time to time. The Reinforcement Learning (RL) method has been investigated
intensively, especially Model-based RL (Mb-RL) to predict system dynamics. It has been investigated and performs well in
making the system more robust to environmental changes by enabling online learning. This paper proposes online learning of
local dynamics using the Mb-RL method by utilizing Long Short-Term Memory (LSTM) model. We consider Model Predictive
Control (MPC) scheme as an agent of the Mb-RL method to control the regulatory trajectory objectives with a random shooting
policy to search for the minimum objective function. A nonlinear Mass Spring Damper (NMSD) system with parameter-varying
linear inertia is used to demonstrate the effectiveness of the proposed method. The simulation results show that the system can
effectively control high-oscillating nonlinear systems with good performance.
© 2022 The Authors. Published by ELSEVIER B.V. This is an open access article under the CC BY-NC-ND license
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 7th International Conference on Computer Science
and Computational Intelligence 2022
Keywords: Model-based Reinforcement Learning; Random-Shooting Policy; LSTM; MPC; Nonlinear System
* Corresponding author. Tel.:+62-812-9906-4921.
E-mail address: santo.wijaya001@binus.ac.id
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2022) 000000
www.elsevier.com/locate/procedia
1877-0509 © 2022 The Authors. Published by ELSEVIER B.V. This is an open access article under the CC BY-NC-ND license
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 7th International Conference on Computer Science and
Computational Intelligence 2022
7th International Conference on Computer Science and Computational Intelligence 2022
Long short-term memory (LSTM) model-based reinforcement
learning for nonlinear mass spring damper system control
Santo Wijayaa*, Yaya Heryadia, Yulyani Arifina, Wayan Supartab, Lukasc
aComputer Science Department, BINUS Graduate Program Doctor of Computer Science,Bina Nusantara University, Jakarta 11480, Indonesia.
bDepartment of Electrical Engineering, Faculty of Industrial Technology, Institut Teknologi Nasional Yogyakarta, Yogyakarta 55281, Indonesia.
cCognitive Engineering Research Group (CERG), Universitas Katolik Indonesia Atma Jaya, Jakarta 12930, Indonesia.
Abstract
The Neural Networks (NN) model which is incorporated in the control system design has been studied, and the results show
better performance than the mathematical model approach. However, some studies consider that only offline NN model learning
and does not use the online NN model learning directly on the control system. As a result, the controller's performance decreases
due to changes in the system environment from time to time. The Reinforcement Learning (RL) method has been investigated
intensively, especially Model-based RL (Mb-RL) to predict system dynamics. It has been investigated and performs well in
making the system more robust to environmental changes by enabling online learning. This paper proposes online learning of
local dynamics using the Mb-RL method by utilizing Long Short-Term Memory (LSTM) model. We consider Model Predictive
Control (MPC) scheme as an agent of the Mb-RL method to control the regulatory trajectory objectives with a random shooting
policy to search for the minimum objective function. A nonlinear Mass Spring Damper (NMSD) system with parameter-varying
linear inertia is used to demonstrate the effectiveness of the proposed method. The simulation results show that the system can
effectively control high-oscillating nonlinear systems with good performance.
© 2022 The Authors. Published by ELSEVIER B.V. This is an open access article under the CC BY-NC-ND license
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of the scientific committee of the 7th International Conference on Computer Science
and Computational Intelligence 2022
Keywords: Model-based Reinforcement Learning; Random-Shooting Policy; LSTM; MPC; Nonlinear System
* Corresponding author. Tel.:+62-812-9906-4921.
E-mail address: santo.wijaya001@binus.ac.id
214 Santo Wijaya et al. / Procedia Computer Science 216 (2023) 213–220
208 Wijaya et al. / Procedia Computer Science 00 (2022) 000000
1. Introduction
Research into NN has a lengthy history. Psychologist Donald Hebb [1] invented the oldest description of NN and
created the Hebbian Learning scheme based on the brain plasticity process in the 1940s. In 1958, the perceptron,
which is also known as the primary component of neural networks today and constructed as a two -layer neural
network without a training method, was invented by Frank Rosenblatt [2]. In 1975, Paul Werbos created and
published in his Ph.D. thesis of the back-propagation algorithm that is now the most extensively used [3]. In the
following years, several NN architectures emerged, such as feedforward neural networks [4], Recurrent Neural
Networks (RNN) [5], and others. A specific derivative model from RNN to solve the vanishing gradient was
invented by Schmidhuber, which is called Long Short-Term Memory (LSTM) [6]. The LSTM is of great interest in
the control system because they are particularly well-suited for prediction based on time series data, as there may be
unpredictable delays or uncertainties between events in time series.
Pisa et al. [7] proposed the use of LSTM as a model in the Internal Model Control (IMC) scheme applied to a
wastewater treatment plant. The result showed that the LSTM-based IMC approach improves the control metrics
compared to the traditional PI controller. However, the authors used an offline learning approach to train the LSTM
model, which is not robust enough to cope with the uncertainties that may occur in the system, especially
uncertainties in the nonlinear system that might degrade control performance further. Sanaz et al. [8] proposed that
the Mf-RL method be used to identify a dynamic model of a three-phase power converter by using the state-space
Neural Network (ssNN) with Particle Swarm Optimization (PSO) to update the weights under parameter mismatch
and incorporate them into a Model Predictive Control (MPC) scheme for control of a three-phase power converter.
In the paper, the Mf-RL method succeeded in increasing the accuracy of the model in MPC model-based control.
However, this approach did not directly calculate the control actions required in the control system.
The Reinforcement Learning (RL) method is a research topic that has been intensively investigated recently. The
RL may be utilized to learn from the limited labeled data and derive additional essential insights from the unlabeled
data [9]. Nagabandi et al. [10] proposed combining Mb-RL and Mf-RL methods to accomplish various MuJoCo
[11] locomotion tasks. The author first utilizes the Mb-RL method combined with MPC for the initial movement
task of the locomotion model to achieve robustness control with minimum sample data. The Mb -RL approach
enables online learning to update a model based on reward, and the model can be used directly in the control system.
The paper proposed a combined offline and online learning approach utilizing the Mb-RL method. In offline
learning, the LSTM architecture will be determined to have minimum weight parameters based on the dataset
obtained from the controlled object of the system. Simple LSTM architecture is needed to learn faster in the online
learning approach. Then online learning is done for every defined batch size to update the LSTM model online
based on the given control action
() and the measured state response of the system
(). We consider an MPC
scheme with a random-shooting policy to search for the minimum objective function of trajectories error by
calculating a set of random control actions within the given prediction horizon. The first control action of a set of
random control actions that minimize the objective function is applied to the system every time step. We also
consider the LSTM to learn from local dynamics, which is the rate of changes at each time step, to overcome the
difficulties of learning when the difference of the state is too small, especially when the time step difference is small
[10]. The NMSD system is used as an object to be controlled to show the effectiveness of the proposed method. The
paper's primary contribution is two-fold: The online learning of local dynamics using the Mb-RL method utilizing
LSTM as model-based in the MPC scheme with the Random-Shooting policy; Demonstrates the effectiveness of the
proposed method for trajectory control of the NMSD system with parameters varying in the spring inertia and
random disturbance.
2. Methods
The research method conducted in this paper consists of three phases: the initial phase, the design phase, and the
simulation phase, as shown in Fig. 1. In the initial phase, we build the dynamic simulator using an NMSD system,
then run it to obtain the dynamics data of systems input and outputs. The data will be divided into training, testing,
and validation dataset to determine the minimum nodes of LSTM architecture with offline training. After that, in the
design phase, the MPC is designed to use the LSTM model as a predictor to calculate the optimized action to
Santo Wijaya et al. / Procedia Computer Science 216 (2023) 213–220 215
Wijaya et al. / Procedia Computer Science 00 (2022) 000000 209
minimize the objective function in the pre-determined horizon window. In addition, the MPC acts as an agent in the
RL framework designed with a random-shooting policy. Lastly, in the simulation phase, we show the effectiveness
of the proposed method in controlling the NMSD system to accomplish a regulatory objective.
2.1. Nonlinear Mass Spring Damper System
The NMSD system is an oscillation system consisting of a mass spring and damper elements that store kinetic
and potential energy. This model can be used in many modeling approaches to model the system's dynamics, such as
Flexible Bodies [12] and Mechanical Systems [13]. The NMSD system is particularly a mechanical system studied
intensively in the control theory. However, it is still unavailable in the benchmark environment, such as MuJoco.
In the paper, the arbitrary free-body diagram of the NMSD system satisfying the motion of Newton’s second law
is shown in Fig. 2. It is assumed that the NMSD system consists of three masses or inertia of (kg), in this
case identical masses considered where  with initial positions 󰇛󰇜. Damper
damping constant also assumed to be identical with  respectively. Since the nonlinear
spring is considered, the stiffness function is given by  with parameter varying linear
inertia of  and nonlinear inertia of  . The control inputs of force to the NMSD
system are and . Then,  is a disturbance that disrupts the motion of the NMSD system; in this case, it is
considered a random value between  to . It is assumed that there is no measurement data for the
position , and the control actions are bounded between  to .
Equation (1), (2), and (3) shows the NMSD system nonlinear differential equation function 󰇛󰇜 after
derivation with Newton's law. SciPy Library will be used in the paper to obtain the dynamics of the NMSD
differential equations using the scipy.integrate.odeint function with simulation time step   seconds,
absolute error =  and relative error = .
󰇘󰇛󰇜󰇛󰇛 󰇜󰇜󰇛󰇗󰇗󰇜
(1)
󰇘󰇛 󰇜󰇛󰇛󰇜󰇛󰇜󰇜󰇛󰇗󰇗󰇗󰇜
(2)
󰇘󰇛󰇜󰇛󰇜󰇛󰇗󰇗󰇜
(3)
2.2. LSTM Neural Network Model
The LSTM model based on RNN is well-suited for learning time series or sequence data. Previous works [14, 15]
comprehensively describe the LSTM network. This paper's LSTM model is constructed with Tensorflow and Keras
Library [16] in the Google Colaboratory environment. Three layers are constructed, the Lambda layer, the LSTM
layer and the Dense layer. The LSTM parameters are setup as follows, #return_sequence = True,
#activation_function = Leaky ReLU ( ), #recurrent_activation = Sigmoid, #look_back = , #loss_function =
Mean Square Error, #optimizer = Adam (learning rate = ), #epochs = , #batch_size = .
Fig. 1. Overall research method of the paper
216 Santo Wijaya et al. / Procedia Computer Science 216 (2023) 213–220
210 Wijaya et al. / Procedia Computer Science 00 (2022) 000000
Parameterization of an LSTM dynamics function is defined as
󰆹󰇛󰇜, where denotes the vector of
network weights. The input to the model is the control action , the measured disturbance () and the
measured state of the NMSD system  at every time step 󰇛󰇜. The output of the model is the predicted state
response  at next time step 󰇛󰇜 The paper also employed learning a dynamics function based on
the rate of change in measurement data, 󰇛 󰇜, proposed by Nagabandi et al. [10], for better prediction
results in the case of the small-time step 󰇛󰇜. In this case, the predicted next state becomes   .
We considered the learned dynamics function is defined as
󰆹󰇛󰇜. The additional input to
the model is the rate of change  , and  . Following that, the output of the
model is changed to the rate of change of the predicted state response  at next time step 󰇛󰇜.
Thus, totally the LSTM model require 9 input neurons in the Lambda layer and 2 output neurons in the Dense layer.
The paper also preprocessed the input dataset based on the Z-score normalization method [17] to standardize
mean = and standard deviation = for fast training of the LSTM model. It is formulated as  󰇛 󰇜, where
󰆒 denotes the normalize data, denotes the source date, denotes the dataset mean, and denotes the dataset
standard deviation. The Z-score normalization is chosen because the dataset feature distribution obtained from the
NMSD system does not contain extreme outliers. Fig. 3 shows the overall configuration of the LSTM neural
network model with input-output parameters and data preprocessing.
2.3. Model Predictive Control
The MPC scheme is well known in the control theory and has been intensively investigated in combination with
the Machine Learning method and Neural Network model [18, 19]. The MPC scheme utilizes a model to make the
system dynamics prediction in the pre-determined prediction horizon () and calculate a sequence of control action
inside the prediction window to minimize the objective function () by optimization, apply the first control action
from the calculated sequence at each time step, and then re-do the algorithm at the next time step. The input to the
MPC controller () is the vector of 󰇝󰇞󰇝󰇞 from the NMSD system.
Fig. 4 shows the design of the LSTM model-based MPC scheme. The learned dynamics function of the LSTM
model
󰆹󰇛󰇜 received vector input () and calculate the predicted state response rate of
change  inside the prediction window () where the length of the horizon is every time step (). Then the
MPC scheme calculate the objective function () of error between reference trajectory () and the predicted state
response (). The minimization problem of is solved with optimization solver to calculate a set of optimized
control actions within the prediction horizon (). But, to get the solution using optimization directly is difficult
considering the model's nonlinearity. In this case, we employed a Random-Shooting policy [20] where set of
control action vector () with length are generated, recalculate to obtain a set of the objective function (),
then find the minimum objective function (Ω) within the set, and get the first control action () of in which
has the minimum objective function, finally apply the  to the system for each time step.
Fig. 3. LSTM Neural Network Model with Input Output Parameters
Santo Wijaya et al. / Procedia Computer Science 216 (2023) 213–220 217
Wijaya et al. / Procedia Computer Science 00 (2022) 000000 211
2.4. Control Performance Evaluation
There are three control performance parameters in the paper. The trajectory deviation performance is
calculated as a Root Mean Square Error 
󰇡󰇢
 , where denotes the simulation time
steps, denotes the reference values, denotes the predicted output values. The total impulse is formulated as 
󰇛󰇜
 and the action smoothness is formulated as 
󰇛 󰇜

 , where denotes the
control action.
2.5. Model-based Reinforcement Learning Framework
In the paper, the agent is the MPC controller, the model is the LSTM model, and the control inputs represent the
action to the environment. The maximum reward is awarded if the system follows the reference trajectory or is
considered as a minimization of RMSE. A framework of the Mb-RL in the paper, as shown in Fig. 5 and Algorithm
1. By exploration with random actions applied to the system and obtaining the exploration dataset of the first 
to be normalized with Z-score, which can be used to train
󰆹 of the LSTM model, re-train
󰆹 for every  size.
At , the MPC controller start to calculate optimized actions, then applied them to the system and obtained
the exploitation dataset, re-train
󰆹 is continue using the exploitation dataset. Retraining use 1 epochs, and use the
weights of
󰆹 from the previous iteration. The algorithm stopped at maximum simulation time.
Algorithm 1 Mb-RL Framework
01:
Initialize
󰆹 based on offline learning.
16:
Generate random sets of random action sequences vector
02:
Set simulation time (), normalization time (),
17:
with length , 󰇛󰇜.
03:
batch size (), control time ().
18:
For to
04:
Generate random action data points ().
19:
Applied
󰆹 to predict the next time step state.
05:
For to 
20:
Calculate objective func.
󰇼 󰇼

.
06:
If isInteger() then
21:
Find minimum objective function .
07:
Do normalization to obtain new and .
22:
End For
08:
End If
23:
Get the first control action  from the minimum .
09:
If ( ) AND isInteger() then
24:
Set 
10:
Retrain
󰆹 and clear 
25:
End If
11:
End If
26:
Simulate the NMSD system  󰇛󰇜with
12:
If ( ) then
27:
ODEsolver   seconds.
13:
Set 
28:
Populate dataset (), batch .
14:
Elseif ( ) then
29:
Reset objective function
15:
Get the current dataset ().
30:
End For
Fig. 4. Design of MPC Scheme with Random-Shooting Policy
Fig. 5. The Model-based Reinforcement Learning Framework
218 Santo Wijaya et al. / Procedia Computer Science 216 (2023) 213–220
212 Wijaya et al. / Procedia Computer Science 00 (2022) 000000
3. Results and Discussions
3.1. Offline Learning
At this phase, the LSTM minimum configuration will be determined. Then, random action data points are
generated to analyze the NMSD system and collect training, validation, and testing datasets through the system
dynamic simulator. Fig. 6 shows the response of the NMSD system using 20.000 data points for training data. Fig. 8
shows the response of the NMSD system using 100.000 data points for training data which can be generated again
for testing data. Finally, the normalization using Z-score is done using 20.000 data points before applying the
dataset to the model for training, and will be calculated using the training dataset as shown in Fig. 7.
Once the dataset is obtained, offline training of the LSTM model is conducted. Fig. 9 shows the results of
training and validation of the LSTM model with hidden node layers: 3, 5, and 10. Several activation functions have
been experimented with (tanh, sigmoid, ReLU, Leaky ReLU with =0.3); the best activation function is Leaky
ReLU with = 0.5, notably, the model has a reasonable convergence rate and good validation result with only five
layers of hidden nodes. However, the result with activation function Leaky ReLU with = 0.5 only is shown due to
the space limitation.
Employing online learning means that a test of model accuracy only needed within the length of the prediction
horizon of MPC compared to the testing dataset. In this case, we consider using =10, the model's accuracy is
based on the predicted ten-step-ahead state output. Even with a short horizon, it has been proven that an MPC
controller can successfully control a robot's movement in a MuJoco environment [10]. Fig. 10 shows the test result
of the ten-step-ahead prediction output and the RMSE = 1.62×10−4. Thus, we selected the LSTM with five layers
of hidden nodes. The weight parameters obtained in the offline training will warm-start the model in the simulation
phase.
Fig. 7. Z-score normalization result of and
Fig. 6. Response of the NMSD system for training data
Fig. 8. Response of the NMSD system for validation data
Fig. 9. Training and validation result of the LSTM model
Fig. 10. Test result of 10 step-ahead prediction using LSTM 5 nodes
Santo Wijaya et al. / Procedia Computer Science 216 (2023) 213–220 219
Wijaya et al. / Procedia Computer Science 00 (2022) 000000 213
3.2. Online Learning
In the simulation phase, we setup   timesteps,   timesteps,  data
points,   timesteps,  horizons. It is also assumed that the actuators arbitrarily have a limit of
 󰇝󰇞 before control action is started and  󰇝󰇞 after control action is
started. Random disturbance of . Lastly, we also applied randomly distributed Gaussian noise to the
measurement data of the NMSD system output, and there is a random parameter varying on the inertia of the spring
  N/m. Then, there are two scenarios are considered for the random-shooting policy to
generate sets of random actions, , randomly. Case A, to generate a set of randomly between 
 and case B to generate a set of randomly between . The control scenario is a regulatory
objective to follow the pre-determined reference trajectory. There are three reference trajectories set up after control
time () is reached, (1) IF   THEN  AND  , (2) IF  
 THEN   AND  , (3) IF    THEN 
AND  .
As shown in Fig. 11, the simulation result indicated that the proposed method effectively controls the NMSD
system according to the objective control scenario. Table 1 shows the control performance result based on Random-
Shooting Policy (RSP) cases A and B. Notably, the RMSE and total Impulse performance in case A were better than
in case B. In contrast, the performance of action smoothness in case B was better than in case A. The other
important parameter is simulation time; case A, which has more extensive sets of to search minimum objective
function, produces better control performance but with a longer simulation time. We also compare the proposed
LSTM model with a Multi-Layer Perceptron (MLP) model for better insights. In this comparison, case A of the
Random-Shooting policy is selected. Furthermore, we selected the MLP model with 30 hidden nodes ( ) as
it contains similar weight parameters as the LSTM model with five hidden nodes ( ). Table 2 shows the
control performance comparison; notably, the LSTM model performs better in RMSE and action smoothness but
with a longer simulation time.
Table 1. Control Performance of RSP Cases A and B
Case



Sim_Time
A


󰇛󰇜
󰇛󰇜

󰇛󰇜
󰇛󰇜

B


󰇛󰇜
󰇛󰇜

󰇛󰇜
󰇛󰇜

Table 2. Control Performance of RSP Case A of LSTM and MLP
Model



Sim_Time
LSTM
5 nodes


󰇛󰇜
󰇛󰇜

󰇛󰇜
󰇛󰇜

MLP
5 nodes


󰇛󰇜
󰇛󰇜

󰇛󰇜
󰇛󰇜

Fig. 11. Simulation result of Random Shooting Case A
220 Santo Wijaya et al. / Procedia Computer Science 216 (2023) 213–220
214 Wijaya et al. / Procedia Computer Science 00 (2022) 000000
4. Conclusions
Several studies considered only offline learning of the NN model and did not use the NN model directly in the
control system. As a result, the environmental changes of the system over time might not be incorporated into the
model; hence it may degrade the controller performance. The paper proposed online learning of local dynamics
using the Mb-RL method utilizing the LSTM model to make the system more robust to environmental changes by
enabling online learning. The MPC as an agent, provides a reward function calculated based on a random-shooting
policy, which generates random control action and finds the minimum objective function through a closed-loop
strategy. The online learning scheme retrains the LSTM model and minimizes model mismatch with the system. The
NMSD system with varying parameters is employed to show the effectiveness of the pro posed method. The
simulation results showed that the proposed method effectively controls the system with good performance.
Extending the proposed method to a real-world mechanical system and investigating how the framework works
on the real-time control system are considered for future research. The proposed approach requires considerable
computing power to shorten the calculation time. This issue motivates the adoption of cloud-based control
structures, where the controller is operated remotely as a cloud-based service, enabling a real-time control system.
References
[1] Brown RE. (2020) "Donald O. Hebb and the organization of behavior: 17 years in the writing." Molecular Brain 13 (1): 55.
[2] Rosenblatt F. (1958) "The perceptron: A probabilistic model for information storage and organization in the brain." Psychological Review 65
(6): 386408.
[3] Werbos P. (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. Dissertation, Harvard
University, Cambridge.
[4] Glorot X, Bengio Y. (2010) "Understanding the difficulty of training deep feedforward neural networks." Journal of Machine Learning
Research 9: 249256.
[5] Dong A, Du Z, Yan Z. (2019) "Round trip time prediction using recurrent neural networks with minimal gated unit." IEEE Communication
Letter 23 (4): 584587.
[6] Yu Wang. (2017) "A new concept using LSTM neural networks for dynamic system identification". In: 2017 American Control Conference
(ACC) pp. 53245329.
[7] Pisa I, Morell A, Vicario JL, et al. (2020) "Denoising autoencoders and LSTM-based artificial neural networks data processing for its
application to internal model control in industrial environmentsThe wastewater treatment plant control case." Sensors 20 (13): 37433773.
[8] Sabzevari S, Heydari R, Mohiti M, et al. (2021) "Model-free neural network-based predictive control for robust operation of power
converters." Energies 14 (8): 112.
[9] Ge Z, Song Z, Ding SX, et al. (2017) "Data mining and analytics in the process industry: The role of machine learning." IEEE Access 5:
2059020616.
[10] Nagabandi A, Kahn G, Fearing RS, et al. (2018) "Neural network dynamics for model-based deep reinforcement learning with model-free
fine-tuning". In: Proc. 2018 IEEE Int. Conf. on Robotics and Automation pp. 75597566.
[11] Todorov E, Erez T, Tassa Y. (2012) "MuJoCo: A physics engine for model-based control." In: 2012 IEEE/RSJ International Conference on
Intelligent Robots and Systems pp. 50265033.
[12] Subedi D, Tyapin I, Hovland G. (2020) "Modeling and analysis of flexible bodies using lumped parameter method." In: Proc 2020 IEEE
11th Int Conf Mech Intell Manuf Technol (ICMIMT) pp. 161166.
[13] Tijsseling AS, Hou Q, Bozkuş Z. (2018) "Moving liquid column with entrapped gas pocket and fluid-structure interaction at a pipe’s dead
end: A nonlinear spring-mass system." In: ASME Conf. 2018 Pressure Vessels and Piping Division.
[14] Terzi E, Bonassi F, Farina M, et al. (2021) "Learning model predictive control with long short-term memory networks." International
Journal of Robust and Nonlinear Control 31 (18): 88778896.
[15] Nabipour M, Nayyeri P, Jabani H, et al. (2020) "Deep learning for stock market prediction." Entropy 22 (8): 840863.
[16] Géron A. (2019) Hands-on machine learning with scikit-learning, keras and tensorfow.
[17] Altman EI. (2018) "Applications of distress prediction models: What have we learned after 50 years from the z-score models?" International
Journal of Financial Studies 6 (3): 7085.
[18] Carlet PG, Tinazzi F, Bolognani S, et al. (2019) "An effective model-free predictive current control for synchronous reluctance motor
drives." IEEE Transactions on Industry Applications 55 (4): 37813790.
[19] Masti D, Bemporad A. (2021) "Learning nonlinear statespace models using autoencoders." Automatica 129: 109666.
[20] Rao A V. (2009) "A survey of numerical methods for optimal control." Advances in the Astronautical Sciences 135 (1): 497528.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
An accurate definition of a system model significantly affects the performance of model-based control strategies, for example, model predictive control (MPC). In this paper, a model-free predictive control strategy is presented to mitigate all ramifications of the model’s uncertainties and parameter mismatch between the plant and controller for the control of power electronic converters in applications such as microgrids. A specific recurrent neural network structure called state-space neural network (ssNN) is proposed as a model-free current predictive control for a three-phase power converter. In this approach, NN weights are updated through particle swarm optimization (PSO) for faster convergence. After the training process, the proposed ssNN-PSO combined with the predictive controller using a performance criterion overcomes parameter variations in the physical system. A comparison has been carried out between the conventional MPC and the proposed model-free predictive control in different scenarios. The simulation results of the proposed control scheme exhibit more robustness compared to the conventional finite-control-set MPC.
Article
Full-text available
The prediction of stock groups values has always been attractive and challenging for shareholders due to its inherent dynamics, non-linearity, and complex nature. This paper concentrates on the future prediction of stock market groups. Four groups named diversified financials, petroleum, non-metallic minerals, and basic metals from Tehran stock exchange were chosen for experimental evaluations. Data were collected for the groups based on 10 years of historical records. The value predictions are created for 1, 2, 5, 10, 15, 20, and 30 days in advance. Various machine learning algorithms were utilized for prediction of future values of stock market groups. We employed decision tree, bagging, random forest, adaptive boosting (Adaboost), gradient boosting, and eXtreme gradient boosting (XGBoost), and artificial neural networks (ANN), recurrent neural network (RNN) and long short-term memory (LSTM). Ten technical indicators were selected as the inputs into each of the prediction models. Finally, the results of the predictions were presented for each technique based on four metrics. Among all algorithms used in this paper, LSTM shows more accurate results with the highest model fitting ability. In addition, for tree-based models, there is often an intense competition between Adaboost, Gradient Boosting, and XGBoost.
Article
Full-text available
The evolution of industry towards the Industry 4.0 paradigm has become a reality where different data-driven methods are adopted to support industrial processes. One of them corresponds to Artificial Neural Networks (ANNs), which are able to model highly complex and non-linear processes. This motivates their adoption as part of new data-driven based control strategies. The ANN-based Internal Model Controller (ANN-based IMC) is an example which takes advantage of the ANNs characteristics by modelling the direct and inverse relationships of the process under control with them. This approach has been implemented in Wastewater Treatment Plants (WWTP), where results show a significant improvement on control performance metrics with respect to (w.r.t.) the WWTP default control strategy. However, this structure is very sensible to non-desired effects in the measurements—when a real scenario showing noise-corrupted data is considered, the control performance drops. To solve this, a new ANN-based IMC approach is designed with a two-fold objective, improve the control performance and denoise the noise-corrupted measurements to reduce the performance degradation. Results show that the proposed structure improves the control metrics, (the Integrated Absolute Error (IAE) and the Integrated Squared Error (ISE)), around a 21.25% and a 54.64%, respectively.
Article
Full-text available
The Organization of Behavior has played a significant part in the development of behavioural neuroscience for the last 70 years. This book introduced the concepts of the "Hebb synapse", the "Hebbian cell assembly" and the "Phase sequence". The most frequently cited of these is the Hebb synapse, but the cell assembly may be Hebb's most important contribution. Even after 70 years, Hebb's theory is still relevant because it is a general framework for relating behavior to synaptic organization through the development of neural networks. The Organization of Behavior was Hebb's 40th publication. His first published papers in 1937 were on the innate organization of the visual system and he first used the phrase "the organization of behavior" in 1938. However, Hebb wrote a number of unpublished papers between 1932 and 1945 in which he developed the ideas published in The Organization of Behavior. Thus, the concept of the neural organization of behavior was central to Hebb's thinking from the beginning of his academic career. But his thinking about the organization of behavior in 1949 was different from what it was between 1932 and 1937. This paper examines Hebb's early ideas on the neural basis of behavior and attempts to trace the rather arduous series of steps through which he developed these ideas into the book that was published as The Organization of Behavior. Using the 1946 typescript and Hebb's correspondence we can see a number of changes made in the book before it was published. Finally, a number of issues arising from the book, and the importance of the book today are discussed.
Article
Full-text available
The performances of a model predictive control (MPC) algorithm largely depend on the knowledge of the system model. A model-free predictive control approach skips all the effects of parameters variations or mismatches, as well as of model nonlinearity and uncertainties. A finite-set model-free current predictive control is proposed in this paper. The current variations predictions induced by the eight base inverter voltage vectors are estimated by means of the previous measurements stored into look-up tables. To keep the current variations information up to date, the three current measurements due to the three most recent feeding voltages are combined together to reconstruct all the others. The reconstruction is performed by taking advantage of the relationships between the three different base voltage vectors involved in the process. In particular, 210 possible combinations of three-state voltage vectors can be found, but they can be gathered together in six different groups. A light and computationally fast algorithm for the group identification is proposed in this paper. Finally, the current reconstruction for the prediction of future steps is thoroughly analysed. A compensation of the motor rotation effect on the input voltages is proposed, too. The control scheme is evaluated by means of both simulation and experimental evidences on two different synchronous reluctance motors.
Article
We propose a methodology for the identification of nonlinear state–space models from input/output data using machine-learning techniques based on autoencoders and neural networks. Our framework simultaneously identifies the nonlinear output and state-update maps of the model. After formulating the approach and providing guidelines for tuning the related hyper-parameters (including the model order), we show its capability in fitting nonlinear models on different nonlinear system identification benchmarks. Performance is assessed in terms of open-loop prediction on test data and of controlling the system via nonlinear model predictive control (MPC) based on the identified nonlinear state–space model.
Article
This article analyzes the stability‐related properties of long short‐term memory (LSTM) networks and investigates their use as the model of the plant in the design of model predictive controllers (MPC). First, sufficient conditions guaranteeing the Input‐to‐State stability (ISS) and Incremental Input‐to‐State stability (ISS) of LSTM are derived. These properties are then exploited to design an observer with guaranteed convergence of the state estimate to the true one. Such observer is then embedded in a MPC scheme solving the tracking problem. The resulting closed‐loop scheme is proved to be asymptotically stable. The training algorithm and control scheme are tested numerically on the simulator of a pH reactor, and the reported results confirm the effectiveness of the proposed approach.
Article
Round trip time (RTT) can influence the performance of network applications and it will be of great significance to predict RTT for the network applications with strict Quality of Service (QoS) requirements. However, it is challenging to accurately predict RTT due to the intrinsic asymmetric and unequal characteristics of Internet network. To overcome this challenge, a novel approach based on recurrent neural networks (RNNs) with minimal gated unit (MGU) is proposed for the prediction of RTT. Experimental results indicate that the proposed method shows considerable advantages compared with other approaches in RTT prediction.