Conference PaperPDF Available

Iterative Backpropagation Disturbance Observer with Forward Dynamics Model

Authors:
  • OMRON SINIC X Corporation, Japan, Tokyo
  • OMRON SINIC X Corporation
Iterative Backpropagation Disturbance Observer
with Forward Dynamics Model
Takayuki Murooka#1, Masashi Hamaya2, Felix von Drigalski2, Kazutoshi Tanaka2, and Yoshihisa Ijiri2,3
Abstract Disturbance Observer (DOB) has been widely
used for robotic applications to eliminate various kinds of
disturbances. Recently, learning-based DOB has attracted sig-
nificant attention as it can deal with complex robotic systems.
In this study, we propose the Iterative Backpropagation Distur-
bance Observer (IB-DOB) method. IB-DOB learns the forward
model with a neural network, and calculates disturbances
via iterative backpropagations, which behaves like the inverse
model. Our method can not only improve estimation perfor-
mances owing to the iterative calculation but also be applied
to both model-free and -based learning control. We conducted
experiments for two manipulation tasks: the cart pole with
Deep Deterministic Policy Gradient (DDPG) and the pushing
object task with Deep Model Predictive Control (DeepMPC).
Our method demonstrated better task performances than the
baselines without DOB and with DOB using a learned inverse
model even though disturbances of external forces and model
errors were provided.
I. INTRODUCTION
For real-world robotic applications in uncertain environ-
ments containing unexpected disturbances, studies for ro-
bust control methods have been addressed for the last four
decades [1]. Distance Observer (DOB) is one of the most
popular robust control methods owing to its simplicity [1],
[2]. It has also been extended to deal with nonlinear sys-
tems [3], resulting in wide applications to robotic scenarios.
However, most of them required identification for dynamics
models manually designed by a rational polynomial. As the
real-world environments contain many unmodeled effects,
the existing methods might suffer from modeling errors.
For robotic manipulation that has complex dynamics,
learning approaches attract significant attention because of
their high expressiveness using deep neural networks [4]. We
are interested in leveraging both advantages for a simple and
high expressive structure by combining DOB and a learning-
based approach. Several previous approaches have learned
inverse dynamics models with neural networks to estimate
the disturbances [5], [6].
Meanwhile, in this study, we propose Iterative Backpropa-
gation Disturbance Observer (IB-DOB), which approximates
the forward dynamics model with a deep neural network as
#Work done at OMRON SINIC X Corp. as a part of an internship.
1Takayuki Murooka is with the Department of Mechano-Informatics, The
University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.
t-murooka@jsk.imi.i.u-tokyo.ac.jp
2Masashi Hamaya, Felix von Drigalski, Kazutoshi Tanaka, and Yoshi-
hisa Ijiri are with OMRON SINIC X Corporation, Hongo 5-24-5,
Bunkyo-ku, Tokyo, Japan. {masashi.hamaya, f.drigalski,
kazutoshi.tanaka, yoshihisa.ijiri}@sinicx.com
3Yoshihisa Ijiri is with OMRON Corporation, Konan 2-3-13, Minato-ku,
Tokyo, Japan. yoshihisa.ijiri@omron.com
Policy
 )
Dynamics
  , )
IB-DOB

Train
Model Error
External Force Sim-to-Real Gap
Disturbance Compensation


Fig. 1. Overview of proposed method of IB-DOB. We propose using a
learned forward model with neural networks to estimated disturbances.
shown in Fig. 1. It estimates disturbances from the errors
occurring between the predicted and observed states with an
optimization-based method using iterative backpropagations
of the forward model. This idea is motivated by the recent
learning-based manipulation approaches [7], [8], [9] using
iterative backpropagations to obtain optimal actions. We
apply this algorithm to disturbance estimation.
Compared to the previous methods using learned inverse
dynamics models, our method can be more practical for the
following two reasons. First, as our method estimates the
disturbance via iterative calculation, the estimation errors can
be reduced than the existing ones, which performed single
calculation with the inverse models. Second, although our
method can be applied to both model-free and -based learn-
ing control, it is useful especially in model-based learning
settings because the forward model can be shared to estimate
the disturbance and behavior prediction.
We applied IB-DOB to two learning-based approaches:
Deep Deterministic Policy Gradient (DDPG) [10], one of
the most popular off-policy model-free deep reinforcement
learning algorithms and Deep Model Predictive Control
(DeepMPC) [7], a supervised learning-based model pre-
dictive control which has been applied to some robotic
tasks [8] [9] [11].
We prepared some experimental environments with some
kinds of disturbances to evaluate our method. First, we
applied disturbances of external forces and model errors to
cart pole and pushing object tasks in simulation. Besides, in
the pushing task, we applied the model learned in simulation
to the real-world environments to provide model errors.
Note that the modeling errors can also be considered as
disturbances [12]. In summary, the main contributions of this
study are as follows.
We propose a novel learning-based method named as
IB-DOB. It approximates the forward dynamics model
with a deep neural network and estimates disturbances
by the errors between the predicted and observed states
via iterative backpropagations.
We performed manipulation experiments on various
conditions with two types of environments and learning
methods while applying disturbances. Results demon-
strated that IB-DOB could successfully deal with dis-
turbances such as external forces and model errors.
The remainder of this paper is organized as follows.
Section II presents previous works related to our research.
We introduce our proposed method in Sections III. Section
IV describes the experiments. Section V discusses our results
and Section VI provides a conclusion.
II. REL ATED WORKS
A. Disturbance observer and its applications
DOB has been widely used in several fields such as pro-
cess and mechanical control, where uncertain disturbances
such as external forces or model errors must be removed to
achieve an accurate control [2]. A lot of the DOB’s variants
exist such as nonlinear DOB [3] and optimization-based
DOB [13] so that they can be applied to various kinds of
dynamics models.
Also in the field of robotics, these kinds of DOBs have
been widely used as the robot model in the real world is
complex, and various types of modeling errors occur between
the analytical and the actual models. Chen et al. applied non-
linear DOB to the control of robotic manipulators, and that
it was reported that that DOB overcame the disadvantages
of previous DOBs such as designed or analyzed by linear
systems [14]. Eom et al. achieved force control of robotic
manipulators without force sensors using DOB [15]. They
applied DOB to the force estimation, and they successfully
estimated the external force and proved that DOB can be
an altenative to force/torque sensors. Huang et al. proposed
an adaptive controller for nonholonomic wheeled mobile
robot using DOB for external disturbances and unknown
parameters [16]. The other approaches have been used
for a quadrotor [17] and force estimation of cooperative
robots [18], [19] or assistive robots [20].
Unlike these methods, which employed manually designed
dynamics models, we aim at leveraging learning-based dy-
namics models to express more complex dynamics systems.
B. Learning-based Disturbance Observer
To compensate for disturbances with unknown and unmod-
eled effects in complicated robotic applications, learning-
based DOB approaches have attracted significant attention.
Juan et al. proposed NNDOB representing DOB with
a neural network [5]. Its simplest form to calculate the
estimated disturbance
ˆ
dtcan be written as follows:
ˆut=f1(xt,xt+1),(1a)
ˆ
dt=ˆutut,(1b)
where fis the forward dynamics model (xt+1 =f(xt,ut)),
utis the action, ˆutis the estimated action, and xtis the state.
NNDOB has been extended in various ways, and RBFN-
DOB was proposed to use radial basis function network
instead of a normal neural network [6]. Other learning-based
DOBs also approximated a part of the inverse model includ-
ing the disturbance [21] or estimated the disturbance directly
as an output of the network [22], [23], [24]. They also extend
NNDOB or RBFNDOB to MNNDOB or MRBFNDOB,
which can deal with multi-input multi-output systems [25],
and they empirically proved that these DOBs overwhelmed
model-based MIMO linear disturbance observer.
In this study, we leverage the forward model instead of
the inverse model seen in previous studies.
III. IB-DOB: ITE RATIV E BACKP ROPAGATION
DISTURBANCE OBSE RVER
In this section, we explain IB-DOB, which is the learning-
based DOB utilizing the forward model referring to the DOB
framework in the control theory [2] and back-propagation-
based action optimization methods [7], [8], [9].
A. Training of Forward Model
First, we formulate a network to represent the forward
model of manipulation. The network input represents the
current state and action, and the output represents the next
state. Let xtand utbe the state and action at time step t,
respectively; then, the network of the forward model fis
formulated as follows with the network weight θ.
xt+1 =f(xt,ut;θ)(2)
By collecting a dataset of xt+1,xt, and ut, we train the
forward model the similar way to Stochastic Gradient De-
scent(SGD) [26], which is one of simplest network training
algorithms with the learning rate f.
θθf∂J (xdata
t+1 ,xpred
t+1 )/∂θ(3a)
J(xdata
t+1 ,xpred
t+1 ) = 1
2kxdata
t+1 xpred
t+1 k2(3b)
Jis the loss function for training the network, and xpred
and xdata are the predicted and dataset states, respectively.
We can easily calculate ∂J/∂θwith backpropagations and
chain rules of neural networks.
B. Disturbance Estimation
The aim of IB-DOB is to estimate the disturbance dtusing
the forward model described as a neural network. This is
formulated as the following optimization problem.
ˆut= argmin
ut
J(xobs
t+1,xpred
t+1 )(4a)
s.t. J(xobs
t+1,xpred
t+1 ) = 1
2kxobs
t+1 xpred
t+1 k2(4b)





IB-DOB
 

 

s.t. 
   )

 
 




 
)
Forward Model
Policy
 )
Dynamics
  , )
IB-DOB

Fig. 2. IB-DOB with policy of learning-based approaches. The disturbances
are estimated via iterative backpropagation with the forward model.
Jis the cost function to estimate the action ˆu, and xpred and
xobs are the predicted and observed states, respectively. This
calculation is conducted at time step t+ 1 after observing
xobs
t+1. The formulation represents a nonlinear optimization
problem, thus requiring a prohibitive amount of time to
solve accurately. Therefore, we try to solve this problem
approximately by using sequential optimization with an
update rate uas follows.
g=∂J (xobs
t+1,xpred
t+1 )/∂ut(5a)
ˆutˆutug/kgk2(5b)
ˆutis initialized by ut, and k ∗ k2represents the L2
norm. ∂J (xobs
t+1,xpred
t+1 )/∂utcan be calculated by the chain
rule of the backpropagation of the network with the loss
J(xobs
t+1,xpred
t+1 ). The calculation process of the forward prop-
agation to predict xpred
t+1 and the update of the action with the
gradient of the cost function utilizing the backpropagation in
Eq. (5) are repeated several times so that the estimation of
the action can converge. Finally, the disturbance at time step
tis estimated as:
ˆ
dtˆut(utkd
ˆ
dt1),(6)
where kdis a coefficient for the smoothing filter to avoid
noisy estimation.
C. IB-DOB with Learning-based Approaches
IB-DOB is calculated independent of the action so that
it can be applied to any learning-based policy obtained
from model-free or -based learning control. Let π(xt)be
the policy of learning-based approaches; then, the action is
calculated as follows.
ut=π(xt)(7)
Then, we update the action with the estimated disturbance
as follows.
ututkd
ˆ
dt1(8)
The calculation process is described in Fig. 2, and we
organize the whole process of IB-DOB with a learning-based
Algorithm 1 Calculation of Action and Disturbance
Require: Policy π(x)e. g. learned by DDPG, DeepMPC
(xobs
t+1,xobs
t,ut)observe
ˆututInitialize ˆut
for k= 1,2, . . . , Nu
update do
xpred
t+1 f(xobs
t,ˆut)
ˆutUP DATECO NTRO LINPU T(xobs
t+1,xpred
t+1 ,ˆut)Eq.5
end for
ˆ
dtˆut(utkd
ˆ
dt1)
ut+1 π(xobs
t+1)
ut+1 ut+1 kd
ˆ
dt
policy in Algorithm. 1. Nu
update is the number of times to
repeat the forward and backpropagation to estimate the action
within a control cycle.
IV. EXP ERI MEN TS
To validate our method, we performed simulation and
real-robot experiments. The goal of the experiments was to
investigate whether our method could estimate the distur-
bances more accurately, leading to better performances than
the baselines without DOB or using DOB learned with the
inverse model. We tackled cart-pole and pushing objects task
simulations. Then, we tested the learned model in simulation
with real-world environments for the pushing objects task.
A. Cart Pole
We conducted experiments on the cart-pole task in the
simulation using CartPole-v0 in OpenAI [27]. We extended
it such that the action can be calculated continuously as
shown in Fig. 3 (a). The action represents the horizontal force
required to move the cart, and the state represents the position
and angle of the pole and their derivatives. First, we trained a
learning-based policy and the forward model simultaneously
with 1000 episodes; then, we tested the performance of
the cart-pole stabilization. For the learning-based policy,
we adopted DDPG [10] as it is a widely used method for
reinforcement learning.
We compared three types of methods:
1) DDPG with no disturbance compensation (w/o DOB),
2) DDPG with direct DOB with the inverse model as
formulated in Eq. (1) typically used in previous stud-
ies [5] [6] (w/ Direct DOB),
3) DDPG with IB-DOB (w/ IB-DOB).
We employed a DDPG implementation based on Py-
Torch [28] and extended it for DOB. We designed two fully-
connected layers of 50 units for both the policy and value
networks. ReLU activations were used after each hidden
layer, and the output layer of the value network consisted
of linear units, while the policy network consists of tanh
output units. The forward model of IB-DOB and the inverse
model of Direct DOB have three fully-connected layers
of 50 units with ReLU activations except for the output
layer. These three networks were trained with the ADAM
(a) Cart pole
Push
forward
Goal
𝒙
𝒚
(b) Pushing objects in simulation
Motion-capture
camera ×6
Marker×3
𝒙
𝒚
Push
forward
Goal
(c) Pushing objects in real world
Fig. 3. Experimental Setup.
TABLE I
DIS TURBA NCES O F EXTE RNAL F ORCE S FOR CA RT POLE
amplitude [N] width [s]
External Force1 15 1
External Force2 25 0.2
External Force3 20 0.6
TABLE II
DISTURBANCES OF MODEL ERRORS FOR CART POLE
cart mass [kg] pole mass [kg] pole length [m]
Model Error1 4.5 3.0 0.3
Model Error2 10.5 0.5 0.4
Model Error3 4.5 6.0 0.4
optimizer [29]. We set the number of iteration for the forward
and backpropagation Nu
update as 100.
We evaluated six types of disturbances: three for external
forces and three for model errors as shown in Table I and
Table II. To simulate external forces, we applied a pushing
force to the cart using a step function by changing its
magnitude and width. To simulate model errors, we changed
the cart mass, pole mass, and pole length.
We show the variation of the mean squared errors of the
pole angle over time in Table III and also graphs of the state
error, disturbance, and estimated disturbance in Fig. 4. In
almost all cases, w/ IB-DOB was seen to perform favorably
compared to other methods.
B. Pushing Objects in Simulation
We conducted simulation experiments on the object-
pushing task using Gazebo [30] as shown in Fig. 3 (b). The
action urepresents the position to push the edge of the object
which is normalized from 0 to 1, while the state represents
the two-dimensional pose of the object (x, y, ϕ)as shown in
Fig. 5 (a). The aim of this task is to push the object forward
until the xposition of the object reaches the goal. First,
we collected the dataset of {(xτ+1,xτ,uτ)}τ, and trained a
learning-based policy and the forward model simultaneously;
then, we conducted simulations on pushing objects. For the
learning-based policy, we adopted an effective supervised-
TABLE III
MEA N SQUAR ED ERRO R BET WE EN R EF ER EN CE A ND O BS ERV ED S TATE
FO R CA RT PO LE E XP ER IM EN TS (×102) .
Method w/o DOB w/ Direct DOB w/ IB-DOB
Origin 0.30 0.29 0.30
External Force1 33.30 1.49 1.49
External Force2 130.46 79.16 45.09
External Force3 94.96 8.11 5.74
Model Error1 1.48 1.24 1.14
Model Error2 2.46 1.68 1.59
Model Error3 1.90 1.33 1.29
learning approach, DeepMPC [7]. We calculated the optimal
action by minimizing the cost function, which was the differ-
ence between the next predicted state xpred
t+1 and its reference
state xref
t+1. This optimization problem can be formulated as
follows:
min
ut
J(xref
t+1,xpred
t+1 )(9a)
s.t.xpred
t+1 =f(xt,ut)(9b)
J(xref
t+1,xpred
t+1 ) = 1
2kxref
t+1 xpred
t+1 k2.(9c)
This was also obtained by the iterative forward and back-
propagation like Eq. 5. The number of iterations was 50.
For comparison, we used three types of methods (w/o
DOB,w/ Direct DOB, and w/ IB-DOB) similar to the cart-
pole experiments.
To implement DeepMPC, we used the forward model
which consists of two fully-connected layers of 50 units with
ReLU activations except for the output layer. This forward
model is also used for IB-DOB. We set the number of
iteration for the forward and backpropagation Nu
update as
30 so that the calculation time did not exceed the control
period.
We evaluated four types of disturbances: two for external
forces and two for model errors. To simulate external forces,
we pushed the object during control by adding a random
torque and force. To simulate model errors, we changed
the center of gravity and shape of the object. The details
of the training and test objects are shown in Fig. 5 (b).
IB-DOB
(a) External force2
IB-DOB
(b) External force3
IB-DOB
(c) Model error3
Fig. 4. Graph of state error, disturbance and estimated disturbance on cart-pole experiments. Our method showed smaller state errors than the baselines
without DOB and with Direct DOB in most of the disturbance conditions.
𝒙
𝒚
(𝒙, 𝒚, 𝝋)
(a) Detail of
object pushing
Model Error1
Model Error2
Origin
(b) Objects in
simulation
Model Error1
Model Error2
Origin
Model Error3
(c) Objects in real world
Fig. 5. Detail of pushing objects. The black and white dots are the center
of gravity. The red arrow describes the pushing action.
TABLE IV
MEA N SQ UAR ED E RRO R B ET WE EN R EF ER EN CE A ND O BS ERV ED S TATE
FOR PUSHING OBJECTS IN SIMULATION (×104).
Method w/o DOB w/ Direct DOB w/ IB-DOB
Origin 1.74 ±0.18 1.21 ±0.74 1.03 ±0.35
External Force1 1.68 ±0.16 1.06 ±0.56 0.686 ±0.19
External Force2 71.26 ±20.61 70.46 ±43.56 58.37 ±22.83
Model Error1 68.67 ±37.08 12.54 ±5.97 8.03 ±5.36
Model Error2 4.35 ±0.60 3.78 ±1.62 1.60 ±0.80
First, we trained our model with Origin object. Then, we
tested with Origin, External Force1 (random force), External
Force2 (random torque) conditions, Model Error1 (shifted the
center of mass), and Model Error2 (different shape) objects.
We show the mean squared errors and the standard de-
viation of the object angle ϕfrom the reference value (0
degree) among five trials in Table IV. The variance of the
errors occurred likely due to the latency of heavy calculation
processes of Gazebo simulator. Under all conditions, w/ IB-
DOB performed favorably compared to other methods.
C. Pushing Objects in the Real World
Finally, we conducted experiments on the object-pushing
task with a robot in the real world. The detail of object-
pushing and its policy, the forward model, comparative
methods, and its calculation process were the same as those
of the object-pushing simulation experiments. We learned the
forward model learned in the simulation and then tested it in
the real world. We exploited the model errors between the
TABLE V
MEA N SQ UAR ED E RRO R B ET WE EN R EF ER EN CE A ND O BS ERV ED S TATE
FO R PU SH IN G OB JE CT S IN R EA L WO RL D (×104).
Method w/o DOB w/ Direct DOB w/ IB-DOB
Origin 2.92 ±0.51 2.54 ±0.94 2.37 ±0.49
Model Error1 15.60 ±1.17 13.49 ±2.90 11.51 ±1.40
Model Error2 13.79 ±4.73 11.83 ±6.27 6.10 ±2.28
Model Error3 4.42 ±0.53 4.55 ±0.81 3.03 ±0.72
simulation and real-world as the disturbance since quantita-
tive evaluation for disturbances of external forces is difficult
in the real-world. We used a motion-capture system to detect
the object pose as shown in Fig. 3 (c).
After training the policy and the forward model in simu-
lation similar to the object-pushing simulation experiments,
experiments were conducted on the pushing objects in the
real world using DeepMPC with or without the disturbance
compensation. In addition, we evaluated three types of dis-
turbances for model errors by changing the center of the
gravity of the object and the shape of the object as shown
in Fig. 5 (c).
The mean squared errors and the standard deviation of the
object angle ϕfrom the reference value (0 degrees) among
five trials are shown in Table V. Under all conditions, w/
IB-DOB performed favorably compared to other methods.
V. DISCUSSIONS
We demonstrated that IB-DOB estimated the disturbance
and increased the accuracy of the results of learning-based
approaches. It dealt with a wider range of disturbances such
as external forces, model errors than the previous studies.
In order to compare IB-DOB with other methods used in
previous studies, we evaluated direct DOB used in previous
studies [5] [6] as formulated in Eq. (1). As shown in the
experimental tables, in some cases, direct DOB performed
nearly as well as IB-DOB, although in other cases its
performance is not as favorable. In addition, we can see
that the standard deviation of the state error in direct DOB
is significantly larger than that of IB-DOB. We interpret
this to suggest that direct DOB can sometimes estimate the
disturbance correctly, but not always.
Although our method demonstrated our effectiveness,
several challenges remain. First, although our method was
evaluated with several force disturbances such as constant
and step inputs, we need to test our method with noisier and
faster disturbances. Moreover, we will investigate what kind
of settings our method works better by increasing the number
of test environments. Second, we need to address other envi-
ronments that have higher dimensional systems. However, as
previous approaches using iterative backpropagation [7], [8],
[9] could perform complicated tasks, our method would work
well even with such complex systems. Although ensuring the
stability in the fully data-driven domain is one of the grand
challenges, we will explore how to analyze the robustness
and stability of our method.
VI. CONCLUSION
In this study, we proposed a learning-based disturbance
observer named as IB-DOB to eliminate disturbances such as
external forces, model errors in complex dynamics systems.
Instead of estimating disturbances with the inverse dynamics
model, it utilizes the forward model and iterative backprop-
agation. We evaluated our method in two manipulation tasks
with two learning-based approaches. Results demonstrated
that IB-DOB accurately estimated disturbances and achieved
more accurate manipulation than both conventional learning-
based DOB approaches using the inverse model, and methods
without any DOBs.
REFERENCES
[1] Emre Sariyildiz, Roberto Oboe, and Kouhei Ohnishi. Disturbance
observer-based robust control and its applications: 35th anniversary
overview. IEEE Transactions on Industrial Electronics, Vol. 67, No. 3,
pp. 2042–2053, 2019.
[2] Wen-Hua Chen, Jun Yang, Lei Guo, and Shihua Li. Disturbance-
observer-based control and related methods—an overview. IEEE
Transactions on Industrial Electronics, Vol. 63, No. 2, pp. 1083–1095,
2015.
[3] Xinkai Chen, Satoshi Komada, and Toshio Fukuda. Design of a
nonlinear disturbance observer. IEEE Transactions on Industrial
Electronics, Vol. 47, No. 2, pp. 429–437, 2000.
[4] Julian Ibarz, Jie Tan, Chelsea Finn, Mrinal Kalakrishnan, Peter Pastor,
and Sergey Levine. How to train your robot with deep reinforcement
learning: lessons we have learned. The International Journal of
Robotics Research, 2021.
[5] Juan Li, Xisong Chen, Shihua Li, Jun Yang, and Jiancheng Zhou.
Nndob-based composite control for binary distillation column under
disturbances. In IEEE Chinese Control Conference, pp. 592–597,
2013.
[6] Juan Li, Shihua Li, and Xisong Chen. Adaptive speed control of a
pmsm servo system using an rbfn disturbance observer. Transactions
of the Institute of Measurement and Control, Vol. 34, No. 5, pp. 615–
626, 2012.
[7] Ian Lenz, Ross A Knepper, and Ashutosh Saxena. Deepmpc: Learning
deep latent features for model predictive control. In Robotics: Science
and Systems, 2015.
[8] Daisuke Tanaka, Solvi Arnold, and Kimitoshi Yamazaki. Emd net:
An encode–manipulate–decode network for cloth manipulation. IEEE
Robotics and Automation Letters, Vol. 3, No. 3, pp. 1771–1778, 2018.
[9] Takayuki Murooka, Kei Okada, and Masayuki Inaba. Diabolo
orientation stabilization by learning predictive model for unstable
unknown-dynamics juggling manipulation. In IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 9174–9181, 2020.
[10] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas
Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra.
Continuous control with deep reinforcement learning. arXiv preprint
arXiv:1509.02971, 2015.
[11] Kento Kawaharazuka, Kei Tsuzuki, Shogo Makino, Moritaka Onit-
suka, Koki Shinjo, Yuki Asano, Kei Okada, Koji Kawasaki, and
Masayuki Inaba. Task-specific self-body controller acquisition by mus-
culoskeletal humanoids: application to pedal control in autonomous
driving. In IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 813–818, 2019.
[12] Lerrel Pinto, James Davidson, Rahul Sukthankar, and Abhinav Gupta.
Robust adversarial reinforcement learning. In International Confer-
ence on Machine Learning, pp. 2817–2826, 2017.
[13] Addisu Tesfaye, Ho Seong Lee, and Masayoshi Tomizuka. A sensitiv-
ity optimization approach to design of a disturbance observer in digital
motion control systems. IEEE/ASME Transactions on mechatronics,
Vol. 5, No. 1, pp. 32–38, 2000.
[14] Wen-Hua Chen, Donald J Ballance, Peter J Gawthrop, and John
O’Reilly. A nonlinear disturbance observer for robotic manipulators.
IEEE Transactions on industrial Electronics, Vol. 47, No. 4, pp. 932–
938, 2000.
[15] Kwang Sik Eom, Il Hong Suh, Wan Kyun Chung, and S-R Oh.
Disturbance observer based force control of robot manipulator without
force sensor. In IEEE International Conference on Robotics and
Automation, Vol. 4, pp. 3012–3017, 1998.
[16] Dawei Huang, Junyong Zhai, Weiqing Ai, and Shumin Fei. Distur-
bance observer-based robust control for trajectory tracking of wheeled
mobile robots. Neurocomputing, Vol. 198, pp. 74–79, 2016.
[17] Wei Dong, Guo-Ying Gu, Xiangyang Zhu, and Han Ding. High-
performance trajectory tracking control of a quadrotor with disturbance
observer. Sensors and Actuators A: Physical, Vol. 211, pp. 67–77,
2014.
[18] Shirin Yousefizadeh and Thomas Bak. Nonlinear disturbance observer
for external force estimation in a cooperative robot. In IEEE Interna-
tional Conference on Advanced Robotics, pp. 220–226, 2019.
[19] Akiyuki Hasegawa, Hiroshi Fujimoto, and Taro Takahashi. Filtered
disturbance observer for high backdrivable robot joint. In Annual
Conference of the IEEE Industrial Electronics Society, pp. 5086–5091,
2018.
[20] Barkan Ugurlu, Masayoshi Nishimura, Kazuyuki Hyodo, Michihiro
Kawanishi, and Tatsuo Narikiyo. Proof of concept for robot-aided up-
per limb rehabilitation using disturbance observers. IEEE Transactions
on Human-Machine Systems, Vol. 45, No. 1, pp. 110–118, 2014.
[21] Junhai Huo, Tao Meng, and Zhonghe Jin. Adaptive attitude control
using neural network observer disturbance compensation technique.
In IEEE International Conference on Recent Advances in Space
Technologies, pp. 697–701, 2019.
[22] Haibin Sun and Lei Guo. Neural network-based dobc for a class of
nonlinear systems with unmatched disturbances. IEEE Transactions on
Neural Networks and Learning Systems, Vol. 28, No. 2, pp. 482–489,
2016.
[23] Bu Xuhui, Hou Zhongsheng, Yu Fashan, and Fu Ziyi. Model
free adaptive control with disturbance observer. Journal of Control
Engineering and Applied Informatics, Vol. 14, No. 4, pp. 42–49, 2012.
[24] Juan Li, Shihua Li, Shengquan Li, Xisong Chen, and Jun Yang.
Neural-network-based composite disturbance rejection control for a
distillation column. Transactions of the Institute of Measurement and
Control, Vol. 37, No. 9, pp. 1146–1158, 2015.
[25] Juan Li, Shihua Li, and Shengquan Li. The design of robust mimo
neural network disturbance observer for multi-variable system. In
IEEE Chinese Control Conference, pp. 2379–2383, 2014.
[26] L´
eon Bottou. Stochastic gradient descent tricks. In Neural networks:
Tricks of the trade, pp. 421–436. 2012.
[27] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider,
John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv
preprint arXiv:1606.01540, 2016.
[28] Pytorch-ddpg-naf. https://github.com/ikostrikov/
pytorch-ddpg- naf.
[29] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic
optimization. arXiv preprint arXiv:1412.6980, 2014.
[30] Nathan Koenig and Andrew Howard. Design and use paradigms for
gazebo, an open-source multi-robot simulator. In IEEE International
Conference on Intelligent Robots and Systems, pp. 2149–2154, 2004.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.
Article
Full-text available
In this paper, we present a motion planning method for automatic operation of cloth products. The problem setting we adopt here that the current shape state of a cloth product and the shape state of the goal are given. It is necessary to decide where and what kind of manipulation is to be applied, and it is not always possible to arrive at the goal state by a single operation; that is, multiple operations might be necessary. To this problem, we propose a novel motion planning method. Our method directly connects cloth manipulations with shape changes of the cloth, consisting of deep neural network structure. The effectiveness of the proposed method was confirmed by simulation and real robot experiments.
Article
Disturbance Observer (DOb) has been one of the most widely used robust control tools since it was proposed by K. Ohnishi in 1983. This paper introduces the origins of DOb and presents a survey of the major results on DOb-based robust control in the last thirty-five years. Furthermore, it explains DOb's analysis and synthesis techniques for linear and nonlinear systems by using a unified framework. In the last section, this paper presents concluding remarks on DOb-based robust control and its engineering applications.