Content uploaded by Yuliang Ma
Author content
All content in this area was uploaded by Yuliang Ma on Mar 15, 2024
Content may be subject to copyright.
Abstract—Industrial cyber-physical systems (ICPS) are
becoming more complex due to increasing behavioral and
structural complexity. This increases the likelihood of faults,
errors and failures. This can lead to economic losses and even
hazardous events. Fault injection is an efficient method to
estimate the potential risk of safety-critical ICPS. In this paper,
we propose a new fault injection-based risk analysis method for
Robot Operating System (ROS) and demonstrate its
applicability with a robot manipulator case study. We conducted
extensive fault injection experiments using a pick-and-place task.
We injected two types of sensor signal faults: bias and noise.
First, fault injections were implemented on a ROS/Gazebo
model of the manipulator with randomly selected fault
parameters such as fault type, location, magnitude and duration.
The experiments helped to identify potential failure scenarios
and to find critical fault locations. The most important factor
contributing to system failures was the operational phase during
which the faults were injected. We then tested our fault injection
method on a real Franka Emika Panda collaborative
manipulator to validate the effectiveness of the proposed ROS-
based fault injection method. We observed that the digital model
showed similar behavior to the real manipulator.
Index Terms—Industrial Cyber-Physical Systems, fault
injection, robot operating system, risk analysis
I. INTRODUCTION
Cyber-Physical Systems (CPS) integrate computational
and physical capabilities that enable interaction between the
cyber and physical worlds through computation,
communication and control [1]. In the industrial domain,
Industrial Cyber-Physical Systems (ICPS) play an important
role. However, due to the increasing behavioral and structural
complexity of internal components, different types of failures
are likely to occur in ICPS. Therefore, it is important to assess
the potential risks for ICPS, especially for safety-critical
applications, such as collaborative robot manipulators, which
are widely used in industrial Human-Robot Collaboration
(HRC) scenarios. Fault injection is a widely used and
promising risk analysis method. At the same time, Robot
Operation System (ROS) [2] is a powerful open-source
software framework that helps to develop software for various
robotic applications, e.g. manipulators, mobile robots and
drones. An internal communication mechanism in ROS
provides convenient means to inject errors into target ROS
nodes. The ROS/Gazebo model allows users to simulate the
robot in any environment, making fault injection experiments
effective and safe.
*This work is funded by the Ministry of Science, Research and Arts of the
Federal State of Baden-Württemberg for the financial support of the projects
within the InnovationCampus Future Mobility (ICM).
In this paper we follow the definitions of faults, errors and
failures given in [3]. A fault activation leads to an error. An
error in turn can lead to a failure if it propagates through
different components of a system. Fig. 1 outlines such a fault
propagation process. In modern ICPS, fault tolerance
mechanisms allow systems to continue to perform their tasks
even when a fault occurs. In other words, not every fault leads
to failure. Faults with different parameters can lead to different
failure scenarios. Therefore, it is important to study critical
faults and estimate their consequences for safety-critical
applications.
Figure 1. Visualization of the fault-error-failure chain.
Contribution: In this paper, we present a new automated
fault injection method for risk assessment of a ROS-based
robotic manipulator. A Franka Emika Panda (hereafter
referred to as Panda) manipulator is used in this case study.
Fault injection is implemented for a pick-and-place task. Two
types of sensor faults are injected with six different fault
parameters: fault type, fault location, fault magnitude, fault
duration, fault activity and fault phase. Our method enables
random fault selection, injection and monitoring based on
ROS communication mechanisms. First, we extensively tested
our method on a ROS/Gazebo model. We then used it on a real
Panda manipulator. The real Panda manipulator showed
similar behavior to the simulated model after fault injection.
Experiments proved the feasibility of our fault injection
method. The main results are summarized in this paper.
The rest of the paper is organized as follows: Section Ⅱ
discusses the relevant state-of-the-art methods. Section Ⅲ
Yuliang Ma, Philipp Grimmeisen, and Andrey Morozov are with the
Institute of Industrial Automation and Software Engineering, University of
Stuttgart, Germany (e-mail: yuliang.ma@ias.uni-stuttgart.de).
Case Study: ROS-based Fault Injection for Risk Analysis of Robotic
Manipulator
Yuliang Ma, Philipp Grimmeisen, and Andrey Morozov
introduces the proposed fault injection method. Section Ⅳ
presents the experiment results. Finally, conclusions and
future steps are discussed in Section Ⅴ.
II. STATE OF THE ART
Collaborative robot applications require a comprehensive
risk assessment of both the robot systems and the workplace
(ISO/TS 15066:2016). Fault injection is a promising method
to estimate the potential risks of HRC scenarios and to evaluate
the fault tolerance of the robot system.
A. Fault Injection for ROS
Much impressive work has been done on fault injection
based on ROS and a simulation platform called Gazebo for
various safety-critical robotic applications. For autonomous
mobile robots, a ROS/Gazebo-based hierarchical fault-tolerant
framework that can inject faults and analyze error propagation
chains is proposed in [4]. In [5], sensor faults of mobile robot
localization systems are described, but with relatively simple
fault parameters. A fault diagnosis method for multiple mobile
robots is proposed in [6], and its fault injector module is
implemented using the ROS service. In [7], an end-to-end fault
analysis framework called MAVFI is proposed. It shows a
comprehensive analysis of fault injection for different
algorithms, error propagation and recovery strategy in a
simulation environment. MAVFI is based on a ROS node that
uses the ROS communication protocol and Linux system
commands to inject faults.
There are also ROS-based fault injection methods and
tools for robotic manipulators. In [8] and [9], faults are injected
into teleoperated surgical robots, and experiments are
implemented in a simulator. They focus on malicious control
commands generated by different types of attacks and a
model-based framework is used to estimate the consequences.
B. Fault Injection for other Systems
In addition to ROS-based fault injection, other tools are
available for other platforms and frameworks. In [10], model-
based fault injection experiments are performed on an
exoskeleton system using a highly customizable Simulink
block called FIBlock. In [11], a safety analysis platform called
xSAP is developed, which allows customizable definition of
fault modes and implements safety analysis. With the rapid
development of Artificial Intelligence (AI), some AI-based
fault injection paradigms are also attractive [12].
Fault injection is a popular method for testing the resilience
of a system. Although many ROS-based fault injection
methods have been introduced, most of them are used to
evaluate the impact of a specific fault. In this paper, we
perform extensive Monte Carlo fault injections for a
collaborative manipulator, which helps to identify important
failure scenarios and find the critical fault parameters. Our
method has the following distinctive features:
1) It is a specific fault injection method for Panda
manipulator based on ROS.
2) Our method enables Monte Carlo fault injection with
random and customizable fault parameter selection and
fault monitoring in the ROS/Gazebo model.
3) It has been applied to both the ROS/Gazebo model and
the real Panda manipulator.
III. FAULT INJECTION METHOD
A. Panda and Gazebo
The Panda manipulator has seven degrees of freedom and
is becoming increasingly popular due to its ease of use and
relatively low price [13]. The end-effector has two fingers that
can perform open and close actions. In our lab, we have a
demonstrator under development that helps to perform
chemical experiments: moving test tubes, pouring chemical
reagents and shaking the beaker. In this work, we focus on the
pick-and-place task. The Panda manipulator moves a test tube
from position A to position B. We have also built a simplified
ROS/Gazebo model. Gazebo is a simulator that integrates
precise physics and advanced 3D graphics, making it a popular
simulator in the ROS research community. Fig. 2 shows the
real Panda manipulator (a) and its ROS/Gazebo model (b).
Figure 2. Real-world manipulator and simulation models
B. Fault Configuration
We introduce errors into the Panda manipulator by
manipulating the original sensor data. The Panda manipulator
has seven joints. Each of them generates time series data about
positions, velocities and torques. Since a position-based
motion planner is used (rapidly exploring random tree), we
focus on the position signal channel to inject erroneous data in
real time.
a) Panda manipulator
b) ROS/Gazebo model
We defined the fault space using the parameters listed in
Table 1. Here is the description of these parameters.
Fault type: We consider two common fault types, bias,
and noise.
Fault location: Which position signal will be changed.
We inject faults in one of seven position signals.
Fault magnitude: For two types of faults, the fault
injector randomly selects a value within a predefined
range.
Fault duration: For how long the fault will be injected.
We set the maximum duration as 1.0s for bias and 3.0s
for noise, respectively.
Fault activity: During which activity a fault is injected.
The pick-and-place task consists of nine sequential
activities, such as hover, GoDown, close gripper,
GoUp, and so on.
Fault phase: For each activity, the system has a
planning period and an execution period. We define
the fault phase as execution fault when the fault
duration entirely falls into the execution period of a
certain activity. Otherwise, it will be defined as a
planning fault. It is worth noting that sometimes a
long-term fault could last for two planning periods or
more, and we only consider the first planning period
and its corresponding activity.
TABLE I. FAULT PARAMETERS
Fault
parameters
Attributes
Description Set notations
Type Bias, Noise T = {1,2}
Location Joint 1,2…7 L = {1,2,3,4,5,6,7}
Magnitude Intensity of fault M = (0,1]
Duration Time length of the
fault
D = (0,1] for bias,
D = (0,3] for noise
Activity
PickHover,
GoDown,
CloseHand,
GoUp,
PlaceHover,
GoDown,
OpenHand,
GoUp,
BacktoInit
A ={1,2,3,4,5,6,7,8,9}
Phase Planning period,
Execution period P = {1,2}
C. Fault Injection Method
As an executable program unit in ROS, a ROS node can
perform computation, publish, and subscribe to ROS topics
and provide ROS services. Our fault injector is a ROS node
that communicates with other nodes through related ROS
topics, which is a one-to-many communication protocol. The
architecture of the fault injector is shown in Fig. 3. The fault
injection process consists of three steps:
Figure 3. Overview of fault injection process. Fault injector generates faults
and Failure monitor judges failure modes.
1) Fault Selection: The fault injector extracts a set of
fault parameters including fault type, fault location, fault
duration, and fault magnitude. For each round of pick and
place, only one selected fault is injected.
2) Fault Injection: After selecting fault parameters, the
fault injector randomly selects a start time to inject faults.
Specifically, the fault injector subscribes to the normal
Joint states data and manipulates them according to
selected fault parameters. It then publishes a new ROS
topic named Faulty joint states to subscribers. Meanwhile,
the fault activity and fault phase are determined by
checking the start and end time of the injected fault. It is
worth noting that sensor data are abnormal only during the
selected fault duration, and only one fault is injected
during each round of the pick-and-place process.
3) Failure Monitoring: Failure monitor is a ROS node
that subscribes to Ros out, Joint states, and Model states
topics. Ros out topic contains information about the start
and end times of various activities and all fault parameters.
In addition, the failure monitor determines different
failure modes by monitoring the status of the Panda
manipulator (according to Joint states) and the test tube
(according to Model states). Based on our experiments
and observations, five failure modes are defined:
1) Critical Acceleration (Crit. Accel.),
2) Pick Failure (P-Failure),
3) Release Failure (R-Failure),
4) Collision,
5) Drop.
For each failure mode, a hazard level is defined to indicate
the severity of the failure. A higher hazard level indicates
a more serious failure. In addition, some reasonable
constraints are defined for the failure monitor based on the
pick-and-place task. For example, only when the object is
held by the end effector, it has the possibility to Drop the
object. Table 2 shows the definition of all failure modes
and their trigger conditions. The final step of the failure
monitor is to store all fault parameters and hazard levels.
TABLE II. FAILURE MODES, TRIGGERS, AND HAZARD LEVELS.
Failure Mode Trigger condition Hazard
level
Success No Failures (Not a failure mode) 0
Crit. Accel. Joint velocity is over a certain threshold.
1
P-Failure Object position never changed. 2
R-Failure Object position is still changing after
release activity. 3
Collision Object position changed before the pick
activity. 4
Drop Object velocity is over a certain threshold
during the manipulator is holding it. 5
IV. EXPERIMENTS AND RESULTS
We have used two approaches to implement fault injection
experiments: 1) the complete fault space is covered for discrete
fault parameters; 2) fault parameters of continuous values are
investigated through a constrained fault space.
A. Fault injection for the complete fault space
2900 experiments were conducted to estimate levels of
risks caused by a particular fault (Bias: 1500 samples, Noise:
1400 samples). The results of the fault injection are shown in
Fig. 4, and this figure ignores faults parameters defined by
continuous values, such as fault duration and fault magnitude.
For the two types of faults configured, bias, and noise, the fault
phase is the most critical parameter in determining whether a
failure will occur. Specifically, if a fault is injected during the
planning period, the system has an extremely high percentage
of failures (bias fault: 84.00%, noise fault: 68.55%). On the
other hand, if a fault is injected during the execution period,
the failure rate is much lower (bias fault: 6.87%, noise fault:
11.11%). There is a logical explanation for this: If the fault is
injected during the planning period, the Motion planner node
receives corrupted information about the current Joint states
and implements incorrect motion planning. The Panda
manipulator then receives incorrect waypoints (due to a bias
fault) or is stuck at a certain activity (due to a noise fault) thus
leading to a failure. In contrast, the failure rate is much lower
in the case of an execution period fault since Motion planner
does not compute a new path for the current activity, even
though the erroneous position data are sent to the Motion
planner.
Figure 4. Bias and noise faults during different fault phases
Fig. 5 shows the distribution of the failures caused by the
faults injected during the planning period. For bias faults, the
top two failure modes are Critical Acceleration (52.67%) and
Drop (22.33%). On the other hand, when noise faults are
injected during the planning period, the Panda manipulator has
a relatively high percentage of Pick Failure (36.57%) and
Drop (28.09%).
Figure 5. Failure modes distribution of different fault types for the planning
phase
a) Bias
b) Noise
c) Failure rate distribution
Fig. 6 shows the distribution of failures for different values
of the fault location parameter during the planning period. For
bias faults, Critical Acceleration (average: 60.06%) and Drop
(average: 25.54%) are two failure modes with a relatively high
percentage of occurrence. This is because bias faults during
the planning period bring incorrect current positions of the
manipulator to the Motion planner node. This results in
incorrect waypoints that deviate from the normal situation are
generated. The manipulator must move at a relatively high
speed to pass through all the abnormal waypoints. This makes
Critical Acceleration failures more likely, and if the speed is
extremely high, Drop failures will occur. For noise faults, Pick
Failure (average: 38.79%) and Drop (average: 30.16%) are
two more common failure modes. When the Motion planner
node receives noisy sensor data, it cannot plan any waypoints
due to the unstable data, so the manipulator keeps refusing to
perform the following activities. When the fault is over, the
manipulator proceeds to the next activity directly. In other
words, the noise fault changes the original activity flow and
the failure modes are more dependent on the start and end time
of faults. For example, when the manipulator is moving an
object but is stuck at a hovering position due to the noisy data,
and the fault ends before the OpenHand activity, the
manipulator will perform OpenHand activity directly and
ignore the GoDown activity. As a result, the object is released
prematurely, and a Drop failure will occur in this situation.
Figure 6. Histogram results summary of the failure rate distribution for all
joints.
Finally, the fault activity parameter is selected as a
reference for the risk analysis with two types of faults during
the planning period. The results are illustrated in Fig. 7. We
focus only on the Drop scenario, which has the highest hazard
level. For bias faults, the most dangerous failure occurs only
during Closehand, GoUp, PlaceHover, and GoDown activities.
This is obviously because the manipulator is holding the object
during these activities so these activities are riskier when the
fault is injected. As for the noise faults, during the PlaceHover
and GoDown activities, the rate of Drop failures is very high.
Figure 7. Histogram results summary of the failure rate distribution for all
activities.
B. Fault parameters of continuous values
To investigate the influence of continuous fault parameters
(fault magnitude and fault duration) on risk estimation, we
inject planning period faults (bias faults) into a given activity.
The experiment results are shown in Fig. 8. The second
activity, GoDown is the target of fault injection. When the
fault duration parameter is taken as a reference, there is no
clear relationship between the injected fault and the failure it
caused. On the contrary, the distribution of failure modes
shows a strong dependence on the fault location parameter.
The most dangerous failure modes are often caused by faults
in the first three joints, which are closer to the manipulator root
and far away from the end effector. Due to the mechanical
structure of the Panda manipulator, the Critical Acceleration
failures that occur in the first three joints often result in a large
deviation from the correct position. As such, the more serious
failure modes are more likely to happen in this case. On the
other hand, when the fault magnitude parameter is taken into
account, faults from the first three joints are still riskier than
the others. In addition, a fault magnitude of around 0.25 is
likely to be a valid threshold for distinguishing the most severe
failure mode from other modes.
a) Bias
b) Noise
a) Bias
b) Noise
Figure 8. Monte Carlo fault injection results for the 1st GoDown activity
C. Key findings
The next three interesting observations we made are:
1) Fault phase is the most critical parameter and if a fault
is injected during the planning period, the failure rate is
much higher than for faults in the execution period.
2) For the abnormal behavior of the manipulator after
fault injection, bias faults lead to a Critical Acceleration
while noise faults cause a stuck and disrupt the original
activity flow of the pick and place task. As such, the failure
distribution is different for these two types of faults as well.
3) When a partivular activity is chosen to inject faults,
fault location parameter is a more important factor
contributing to a failure. In our case study, fault duration
does not have a strong relationship with the failure
distribution while fault magnitude shows a visible
threshold for the most severe failure.
We test our fault injection method on a real Panda
manipulator, and it shows similar behaviors after fault
injection operations: https://youtu.be/LKtGLkaFTPo. For
safety reasons, fault magnitude parameter is limited to 0.1 for
both fault types in the demo video.
V. CONCLUSION AND FUTURE WORK
In this paper, we presented the results of a case study where
we injected noise and bias faults using ROS software to
identify potential failures of a robotic manipulator. First,
extensive Monte Carlo fault injections are performed, and we
obtain some interesting experimental results. Then, we test our
fault injection method on a real Panda manipulator, and it
shows very similar behavior to the simulated one after fault
injections. This method proved to be helpful for risk
assessment of ICPS and it helps to identify critical faults and
develop mechanisms.
REFERENCES
[1] R. Baheti and H. Gill. “Cyber-physical systems.” The impact of control
technology., vol. 12, pp. 161-166, 2011.
[2] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs,
R. Wheeler, A. Y. Ng et al., “Ros: an open-source robot operating
system,” in ICRA workshop on open source software, vol. 3, no. 3.2.
2009, p. 5.
[3] A. Avizienis, J. . -C. Laprie, B. Randell and C. Landwehr, “Basic
concepts and taxonomy of dependable and secure computing,” in IEEE
Transactions on Dependable and Secure Computing, vol. 1, no. 1, pp.
11-33, 2004.
[4] A. Favier, A. Messioux, J. Guiochet, J. -C. Fabre and C. Lesire, “A
hierarchical fault tolerant architecture for an autonomous robot, ” in
2020 50th Annual IEEE/IFIP International Conference on Dependable
Systems and Networks Workshops (DSN-W), pp. 122-129.
[5] Z. Zhao, J. Wang, J. Cao, W. Gao and Q. Ren, “A Fault-tolerant
Architecture for Mobile Robot Localization,” in 2019 IEEE 15th
International Conference on Control and Automation (ICCA), 2019, pp.
584-589.
[6] M. G. Morais, F. R. Meneguzzi, R. H. Bordini and A. M. Amory,
“Distributed fault diagnosis for multiple mobile robots using an agent
programming language,” in 2015 International Conference on
Advanced Robotics (ICAR), 2015, pp. 395-400.
[7] Y. S. Hsiao, Z. Wan, T. Jia, R. Ghosal, A. Mahnoud, A. Raychowdhury,
D. Brooks, G.Y. Wei and V. J. Reddi. “Mavfi: An end-to-end fault
analysis framework with anomaly detection and recovery for micro
aerial vehicles,” arXiv preprint arXiv:2105.12882, 2021.
[8] H. Alemzadeh, D. Chen, X. Li, T. Kesavadas, Z. T. Kalbarczyk and R.
K. Iyer, "Targeted Attacks on Teleoperated Surgical Robots: Dynamic
Model-Based Detection and Mitigation," in 2016 46th Annual
IEEE/IFIP International Conference on Dependable Systems and
Networks (DSN), 2016, pp. 395-406.
[9] X. Li, H. Alemzadeh, D. Chen, Z. Kalbarczyk, R. K. Iyer and T.
Kesavadas, "A hardware-in-the-loop simulator for safety training in
robotic surgery," in 2016 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS). IEEE, 2016, pp. 5291-5296.
[10] T. Fabarisov, I. Mamaev, A. Morozov and K. Janschek. “Model-based
fault injection experiments for the safety analysis of exoskeleton
system.” arXiv preprint arXiv:2101.01283, 2021.
[11] B. Bittner, M. Bozzano, R. Cavada, A. Cimatti, M. Gario, A. Griggio,
C. Mattarei, A. Micheli and G. Zampedri. “The xSAP safety analysis
platform.” in 2016 Tools and Algorithms for the Construction and
Analysis of Systems (TACAS): 22nd International Conference, 2016, pp.
533-539.
[12] A. Sedaghatbaf, M. Moradi, J. Almasizadeh, B. Sangchoolie, B. Van
Acker and J. Denil, “DELFASE: A Deep Learning Method for Fault
Space Exploration, ” in 2022 18th European Dependable Computing
Conference (EDCC), 2022, pp. 57-64.
[13] C. Gaz, M. Cognetti, A. Oliva, P. Robuffo Giordano and A. De Luca,
“Dynamic identification of the Franka Emika Panda robot with retrieval
of feasible parameters using penalty-based optimization,” in IEEE
Robotics and Automation Letters, vol. 4, no. 4, pp. 4147-4154, 2019.
a) Overview
b) Duration