Conference PaperPDF Available

Distributed and Synchronized Setup towards Real-Time Robotic Control using ROS2 on Linux

Authors:

Abstract and Figures

A monolithic black-box controller made by the individual robotic manufacturers commonly controls modern industrial robots. The setup's single components are not accessible nor exchangeable, often due to them being specially tuned and adjusted to fulfill the demanding requirements for robotic control. The open-source framework ROS enables to combine these monolithic controllers with simple interfaces, therefore allowing more complex robotic applications. The next generation, ROS2, targets highly modular systems of sensors, actuators and controllers, each being interchangeable and further providing real-time capabilities by employing DDS as middleware. This study uses system inherent tools alongside non-invasive measurements for comprehensive insights, thereby guiding to ROS2 applications on an underlying distributed and synchronized real-time Linux system.
Content may be subject to copyright.
Distributed and Synchronized Setup towards Real-Time Robotic Control
using ROS2 on Linux
L. Puck1P. Keller1T. Schnell1C. Plasberg1A. Tanev1G. Heppner1A. Roennau1R. Dillmann1
Abstract A monolithic black-box controller made by the
individual robotic manufacturers commonly controls modern
industrial robots. The setup’s single components are not ac-
cessible nor exchangeable, often due to them being specially
tuned and adjusted to fulfill the demanding requirements for
robotic control. The open-source framework ROS enables to
combine these monolithic controllers with simple interfaces,
therefore allowing more complex robotic applications. The
next generation, ROS2, targets highly modular systems of
sensors, actuators and controllers, each being interchangeable
and further providing real-time capabilities by employing DDS
as middleware. This study uses system inherent tools alongside
non-invasive measurements for comprehensive insights, thereby
guiding to ROS2 applications on an underlying distributed and
synchronized real-time Linux system.
I. INTRODUCTION
Modern industrial robots are commonly controlled by
a monolithic black box controller made by the individual
manufacturers.
Breaching such monolithic designs is one of the concep-
tual goals of the open-source ROS2 (Robot Operating System
2) [1] using DDS (Data Distribution Service) [2] as real-time
capable middleware. However, choosing a modular architec-
ture imposes new challenges such as time synchronization
between modules. The modularity generates reusability of
modules in similar systems and allows specialization for
different tasks. Moreover, it would even allow the separation
of modules on distributed hosts. By being modular and
distributable one adheres to the core principles of ROS2 and
its predecessor.
As ROS2 is under ongoing development, new features and
improvements are integrated quickly in every new release.
Current research by Casini et al [3] explores the response
time of ROS2 in processing chains. The authors provide an
analysis to measure the expected worst case of a robotic
application. Furthermore, they present a real-time scheduling
model for ROS2. Guit´
errez et al. [4] evaluate how CPU
load and network communication affect the latencies of
ROS2. Their work includes a thorough analysis of the Linux
network stack and different DDS versions. Moreover, they
propose methods to obtain bounded latencies.
This work explores the fundamental setup for a dis-
tributed and time critical real-time system, enabling future
and improved ROS2 evaluations. Current research explores
the limitations of ROS2, however it lacks benchmarking the
1Department of Interactive Diagnosis and Service Systems (IDS), FZI
Research Center for Information Technology, Haid-und-Neu-Straße 10–14,
76131 Karlsruhe, Germany.
Fig. 1. Evaluation setup and communication structure. Time synchroniza-
tion is started by the PTP grandmaster on the top left. The time is transferred
over the switch to PTP slaves, i.e. the industrial PCs (IPCs) and the PTP
time converter (TICRO). The slaves generate pulses on their digital outputs
(DOs) which are measured by the oscilloscope. The precise periodic pulse
of the TICRO is used as reference. The IPCs communicate using ROS2 with
DDS OpenSplice as underlying middleware. IPC1 runs two publisher nodes,
which publish every millisecond. On IPC1 and on IPC2 a node subscribes
to the corresponding publisher.
underlying system as well as a high-precision time synchro-
nization between the individual components of the system.
Minimizing and finding bounds for the latencies and jitter
of the ROS2 communication is the main challenge for the
distributed system. Since the ROS2 applications are limited
by the underlying operating system (OS) configuration, this
has to be set up and evaluated first. As a possible next step,
such thoroughly optimized real-time capabilities can be used
in scenarios with distributed ROS2 hardware controllers.
In this research the underlying OS is prepared and evalu-
ated for real-time requirements. Robustness of the system
is achieved by exploring different benchmark scenarios.
Further, multi-perspective evaluation methods are included,
due to each method inferring differently with the system.
With different measurements critical configurations are less
likely to be missed. A small ROS2 application designed
for test purposes serves as a high-level software example
that is examined. The long-term evaluation is conducted
by measuring response time and communication latencies
externally using directly addressed GPIOs.
Core components of this study are the real-time capa-
ble setup and evaluation of a system and its applications.
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to
servers or lists, or reuse of any copyrighted component of this work in other works. DOI: 10.1109/CASE48305.2020.9217010
IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, Hong Kong, pp. 1287-1293, 2020.
Thereby achieving minimized external perturbations which
conflict with the constraints.
The structure of the paper is as follows:
In Section II a review of the current research will be
presented.
Section III highlights the setup of a distributed real-time
capable system.
In Section IV the presented approach will be confirmed
by multi-perspective experiments.
Afterwards, Section V will highlight the key findings.
Finally, in Section VI a conclusion and an outlook will
be given.
II. STATE OF THE ART
Real-time capabilities can be divided into soft, firm and
hard real-time, with the latter stating that a missed deadline
leads to system failure. A common approach to enhance
Linux kernels with real-time capabilities is the Preempt-RT
patch [5]. In contrast to commercial or open source operating
systems, such as FreeRTOS [6] for micro-controllers, the
kernel patch does not provide hard real-time. Hard real-
time requires proofs that task deadlines are guaranteed to be
met. The complexity of these proofs becomes unfeasible for
complex and customizable operating systems. Therefore the
Preempt-RT patch for the standard Linux kernel does not
contain a mathematical proof. The goal of the patch is to
minimize the impact of general non-deterministic behavior,
thereby allowing a more predictable execution of tasks by
the operating system. The open source accessibility and
progressed development status also attracts industrial asso-
ciations such as the Open Source Automation Development
Lab [7].
A survey about the use and evaluation methods regarding
the Preempt-RT Linux systems is provided by Reghenzani et
al [8]. The authors conclude, that although the kernel patch
is not hard real-time and therefore not suitable for security
related applications, the Preempt-RT patch enables research
and test applications with little effort and high usability.
Furthermore, the authors state that dual-kernel approaches
reach better performance regarding worst-case scheduling
latencies at the expense of increased development efforts of
user space applications and limited portability. A possible
scenario for real-time Linux systems are real time networks,
belonging to this group are distributed ROS2 applications.
Dantam et al [9] and Guti´
errez et al [10] assess the network
communication performance using the Preempt-RT patch.
They execute their benchmarks on a single host using both
the Preempt-RT Linux and Xenomai as a counterexample
for the dual-kernel scheme. They show that Xenomai has
better worst case latencies of 7µs compared to the Preempt-
RT patch with latencies up to 21µs. The evaluation was
conducted using the loopback devices of each system.
Local wired network communications on embedded multi-
core devices using the Linux Preempt-RT patch are con-
ducted by Guti´
errez et al [10]. The setup of the patched
Linux OS is evaluated employing cyclictest [11]. Detecting
a maximum task scheduling latency of 110µs (88µs with
CPU isolation). For their evaluation the authors measured
the round-trip time of a 500-Byte UDP message sent with
1kHz rate between two identical systems. Furthermore, CPU
affinity and CPU isolation, two tools for managing CPU
usage for user-space applications, are examined. Using a
Linux system without RT capabilities the worst-case round-
trip latency was 1.5ms without any load on the devices.
Applying the Preempt-RT patch reduced this by more than
60% to 522µs. CPU affinity (644µs) and isolation (592µs)
could not improve the result. However, under CPU and mem-
ory load or increased network traffic (up to 100 Mbps) these
CPU management tools showed their potential avoiding any
missed deadline. Increasing the concurrent network traffic
devices (TX/RX) high latency spikes could be observed even
with affinity or isolation applied.
The network stack of Linux is a key element in ROS2
communications. Especially the UDP transport is of impor-
tance, since most of the DDS implementations use UDP as
their default protocol [2]. Extensive tests evaluating ROS2
are performed by Maruyama et al [12]. The authors test local
and remote communications of ROS2 nodes employing three
different DDS implementations (Connext [13], OpenSplice
[14], FastRTPS [15]). Real-time capabilities are enabled
using FIFO scheduling policies (SCHED FIFO) and memory
locking (mlockall). Cerqueira et al [16] realize an in-depth
comparison of using real-time priorities on mainline Linux
kernels and patched versions (Preempt-RT), thereby reveal-
ing significant differences. The authors state that the task
scheduling is performed faster and with increased robustness
on systems with the preemptive patch applied, even in idle
scenarios. Once CPU- or IO-bound load is generated, laten-
cies became infeasible large with peaks over 1ms. Neither
these types of stress sources nor others were applied in
[12] for the benchmarks. Their tests expose various ROS2
performance characteristics, especially for the integrated
DDS implementations. Primarily, they show that DDS is
responsible for 70% of the time in the communication
stack. The rest is split nearly equally into type conversion
procedures regarding the message data from ROS2 to DDS
and vice versa. A negligibly small amount of the transmission
delay is caused by other processes. In addition, they notice
that there can be significant differences of communica-
tion performance in ROS2. In their evaluation OpenSplice
(Community Edition) [14] performs better than Connext
[13]. FastRTPS [15] did not support enough functionality
to allow for a fair comparison at the time of their study. The
development and improvements since 2016 certainly have
changed the performance of DDS implementations overall.
Nonetheless, it demonstrates that performance levels between
DDS vendors might be significantly different. Furthermore,
Maruyama et al [12] evaluate ROS2 against ROS version 1.
ROS2 communication with reliable quality of service (QoS)
is comparable to TCP communication which is commonly
used in the predecessor. In their evaluation, both versions
perform almost equally in remote communication scenarios
between multiple machines. The authors state, that in local
scenarios ROS1 is faster than ROS2 by a noticeable factor.
Similar to this work, Guti´
errez et al[4] examine the real-
time capabilities and performance of ROS2 on patched
preemptive RT Linux systems. Three different DDS imple-
mentations, varying CPU work-load and concurrent network
traffic are taken into account. The round-trip time between a
standard PC and an embedded device (dual core) is measured
using software timestamping. Despite carefully assigning
real-time priorities to all involved threads and using real-
time compatible allocation in ROS2 the measured worst-
case round-trip time with applied system stress and 1Mbps
concurrent network traffic reaches 2182µs. With 40Mbps
network traffic they observe a worst-case round-trip time of
4942µs. Profound evaluation of the real-time performances
regarding the underlying Preempt-RT systems is missing.
Two more adaptations of ROS2 are developed, both with
ongoing research regarding real-time ROS2 applications.
Micro-ROS [17] uses the ROS2 stack but targets embedded
devices. For this use-case explicit real-time OS are used
and DDS implementations for limited hardware resources
are integrated. Apex.OS [18], a commercial OS, is based
on ROS2 and running on a real-time operating system
it provides ROS2 functionality in a modified version to
guarantee hard real-time. Both projects actively contribute
to the open source ROS2 community.
Previous work revealed insights and crucial points in
ROS2 and real-time Linux systems and their performance. In
this work we examine the real-time capabilities of the latest
ROS2 release Eloquent Elusor on a real-time Linux OS using
various evaluation scenarios and strategies. Furthermore,
ROS2 and the underlying preemptive real-time Linux OS
are subject to an holistic iterative evaluation and optimization
procedure to improve the performance.
III. DISTRIBUTED REAL-TIME SYSTEM
The used hardware and the setup for external measure-
ments is introduced in this section. A comprehensive multi-
perspective evaluation of the system capabilities follows.
To conclude this section, research methods to inspect the
network latencies are investigated.
A. Hardware and Software Setup
The off-the-shelf hardware consists of two identical PCs,
in the following abbreviated with IPCs. The IPCs and pro-
cessor type were chosen according to their promising OS-
ADL benchmark results. This industrial association regularly
tests complete systems for their real-time capabilities using
cyclictest [11]. Those results can further be used as baselines
to evaluate the configuration of comparable setups. In table
I the hardware and OS specifications are detailed.
Both IPCs are initially set up with a clean ”out-of-the-box”
install of Debian 10 (buster). For the kernel the Linux 4.19
version is chosen, due to the fact that the already released 5.X
versions revealed worse real-time capabilities according to
several OSADL benchmark results. As evaluated in previous
work [19], [16], [8] a well performing real-time system can
be set up with the Preempt-RT Linux patch, in this work
the 4.19.72-rt25 patch is applied. Using the preempt patch
the introduced requirements on the software are negligible,
contrary to other approaches like Xenomai, which require
special libraries [20].
TABLE I
SYS TEM SET UP
Hardware
Platform Shuttle SH370 R8
Chipset Intel H370
Processor Intel Core i9-9900K (3600 MHz, 8 Cores)
RAM G.Skill DIMM 16GB DDR4-3200
Setup
OS Debian 10
Kernel (Preempt Patch) 4.19.72(-rt25)
BIOS 2.20.1271 (American Megatrends)
PTP
Grandmaster OTMC 100i (Omicron)
PTP time converter TICRO 100 - OCXO 25 (Omicron)
Switch IDS 509 (Perle)
Software
ROS2 Eloquent Elusor
DDS Vortex OpenSplice Community Edition
Regarding the time synchronization, a Precision Time Pro-
tocol (PTP) grandmaster sends out master clock messages.
The PTP switch is set up to forward mode, thereby allow-
ing pass-through of all PTP-messages within the network.
The PTP slaves, i.e. the IPCs and the PTP time converter
(TICRO), are synchronized using their network interface
controllers (NIC) with hardware-timestamping. Moreover,
the PTP time converter publishes a periodically pulsed ref-
erence signal required as a reference for the oscilloscope
measurements. These pulses are recorded as the ground truth
with respect to the timing of the other components, which
generate pulses themselves on their GPIOs to non-invasively
measure scheduling latencies or communication delays. The
serial interface pins on the motherboards of the IPCs are used
as GPIOs. They are directly set using the according registers
to introduce only minimized overhead.
Figure 1 displays the communication setup of the PTP
time synchronization and of the ROS2 test application. IPC1
contains two publishers, which are scheduled to wake up
every millisecond according to the internal synchronized
clock and publish a ROS2 message. When woken up, they
immediately enable the corresponding GPIO, then publish
the message and finally disable the GPIO again. To fulfill
the requirements for both use-cases one local publisher has
a subscriber that runs locally and the remote publisher’s
subscriber runs on the remote IPC2. The local one publishes
every millisecond with an offset of 150µs, to ensure that no
race conditions appear between the publishers. In figure 1 the
different communication protocols are visualized. The net-
work is synchronized using PTP and the IPCs communicate
on the same network using DDS (via ROS2). Both protocols
rely on UDP messages. For this work Vortex OpenSplice
(Community Edition) is chosen, however any DDS imple-
mentation can be evaluated in the same manner. Furthermore,
the ROS2 version Eloquent Elusor is used. The TICRO
provides Pulse per Second (PPS) signals, similar pulses are
sent from the GPIOs of the IPCs, thereby allowing the
external measurement and validation with an oscilloscope.
B. Real-time evaluation
The system, once set up, has to be configured and tuned
for real-time capabilities. Starting from a clean ”out-of-the-
box” install of Debian 10 the most important steps to ensure
the fulfillment of real-time requirements for the system are
demonstrated in the following. Using known methods such
as the OSADL cyclictest [11] benchmark each system is
analyzed and evaluated, showing the limitations of such tests
and how to overcome them. The main challenge for this
section is to minimize the response times for each system,
initially regarding the latencies for scheduled interrupts.
Afterwards ROS2 benchmark tests are conducted for intra-
process communication, excluding all network connections.
The systems will be time synchronized using PTP before
performing any external evaluation measurements. Those
non-invasive measurements are used to validate the response
time of the time-synchronized systems against real-time
requirements. The response times are set in relation to the
reference time from the TICRO. To comply with real-time
behavior they need to exhibit an upper bound and ideally are
as short as possible. Once established, this system can then
further be used to develop a modular robotic controller.
The proposed method will first analyze how real-time
wake up times can be measured using cyclictest benchmarks
and further ROS2 real-time tests. Hereafter, the setup is
configured and fine-tuned showing the limitations of the tests
and the need for multi-perspective analysis. Being optimally
configured, a ROS2 application is used to conduct scheduled
wake ups to measure latencies of response times for higher
level software applications. It is ensured that the ROS2 node
is running with real time priority of 95 with FIFO scheduling.
To analyze the periodic wake-up, which is triggered every
millisecond, the GPIOs instantly emit a pulse. By setting this
pulse in relation to the reference signal from the TICRO, the
response time of a time synchronized system can be evalu-
ated. To further extend this research the tests are performed
in a stressed environment1. Since this workload challenges
the real-time capabilities of the operating system, approaches
to specifically seclude the relevant real-time processes are
evaluated. First, CPUs are isolated using the isolcpus
command and processes are moved onto the isolated cores
using taskset. The second approach uses shielded CPUs
keeping the linux scheduler and load balancer active for the
respective cores. This keeps essential kernel threads inside
the shield. The shielding is realized using cset shield.
This evaluation strategy guides the setup of a real-time
capable systems and provides in-depth insight regarding the
execution of the real-time critical applications, thereby allow-
ing to further extend this research into a time synchronized
real-time capable system on consumer based hardware.
C. Network evaluation
ROS2 is developed with real-time capabilities in mind.
Before further evaluating ROS2 and the underlying DDS
1Command used stress -c 8 -i 8 -m 8 - d 8
middleware, the time-synchronized network has to be as-
sessed. As established in the previous section, both IPCs are
synchronized and set up for real-time applications. The linux
network stack is potentially a risk to all distributed real-time
systems [10]. Therefore, an analysis is conducted to evaluate
the network latencies in the local network.
This is achieved by evaluating the timestamps of the net-
work events on both IPCs using Wireshark. The packets can
be compared by their unique identifiers and since both IPCS
are time-synchronized the packets can be set in relation.
Thus, not the round-trip time is measured with additional
overhead as it is done in most related work[10], [4], but
instead the one-way transmission time of the messages.
Furthermore, the network is then set under load using iperf2
to generate RX- and TX-traffic each.
By evaluating the transmission time, further research can
focus on the overlaying applications and middleware for
real-time purposes. Together with the previous evaluation of
the real-time capabilities of the system, this research allows
modular approaches for real-time critical applications using
linux.
IV. EVALUATION
Following this research everybody should be able to set
up and validate a real-time capable system. This means
that the system is robust against perturbations and there are
empirically confirmed upper bounds regarding the system’s
latencies. Furthermore, the system should act deterministic,
taking into account already known limitations regarding the
linux network stack.
The results of this research lay the foundations for upcom-
ing research of the ROS2 communication process. Therefore
the results are validated using different perspectives. This
work will introduce optimization techniques for the complete
setup. They are generally applicable for applications on
real-time Linux systems and not limited to ROS2. Further
optimizations regarding in-depth ROS2 or specific DDS
middleware implementations are only addressed briefly.
The final evaluation shows a validated stress-robust dis-
tributed network with no additional overhead in the com-
munications. Moreover, even though this research does not
include mathematical proofs, the setup withstands long-term
tests under heavy computational load.
The first view into the real-time capabilities is the OSADL
benchmark test using cyclictest [11]. According to this test
the hardware and kernel version have been chosen, thus, this
setup is expected to be real-time capable. In Figure 2 the
plots of different settings are visualized. Figure 2a shows the
initial setting of a clean install with only hyperthreading and
virtualization technologies disabled. The maximum latency
is rather high with 425µs. Changing the mode of the CPU
frequency scaling from powersave to performance mode
drastically reduces the latencies to 38µs. This shows that the
main focus of consumer based technologies are power saving
capabilities. Further improvement is gained by disabling all
2Command used iperf -u -c -b $bandwidth
(a) Response time of the out-of-the-box-system (b) Response time after changing to performance
modus
(c) Response time in performance modus with
disabled debug options
Fig. 2. Evaluation of system response time using OSADL’s cyclictest benchmark. Different setups improve the response time drastically. 2c shows response
times according to the benchmark tests, therefore the system appears to be setup accordingly. Command used for evaluation is cyclictest -lNUMCYCLES
-m -Sp95 -i200 -h400 with NUMCYCLES being 108.
kernel debug options which are enabled by default. Response
times shrink to a maximum of 21µs. All test are based on 108
cycles in cyclictest. This last result is among the best of the
real-time benchmark baselines given by OSADL. Based on
these observations, the system appears to be set up correctly.
Additional experiments to examine the response times
and latencies of ROS2 applications are performed using
the pendulum real-time test provided by ROS2. It becomes
apparent that further investigations are needed to validate a
system. Figure 3a shows the result of the test before disabling
the processors’ C-states. Looking at the histogram two peaks
show up, at 4µs and at 80µs. Further in-depth insights
can be gathered using this demo combined with ftrace [21],
which revealed occurrences of idle states. These idle or
C-states were already disabled in the BIOS, however the
Linux kernel overwrites some BIOS settings with its own
configuration. By disabling them in the corresponding grub
configuration, they can be disabled with little effort. After
correctly disabling the C-states, the resulting latencies are
visualized in Figure 3b. The results show, except for an initial
delay, stable latencies below 5µs. Even after inducing stress
to the system these values hold true (see Fig. 3c).
The previously shown experiments validated the response
time of the preemptive Linux operating system using two
procedurally different benchmarking systems. To validate the
wake up times on the application layer, the non-invasive
external testing with the oscilloscope has been conducted.
The triggering of the GPIOs from within the applications
source code adds an overhead of a few microseconds, which
is accepted in favor of an external measurement which
is otherwise not inferring with the system’s performance.
The application consists of a simple ROS2 subscriber and
its publisher, that is triggered to publish a message each
millisecond. The wake up times of different scenarios can
be seen in table II and in the corresponding figure 4. Each
scenario contains over 2x106samples.
The results show that, on average, the systems perform
similarly under stress and in idle states. Furthermore, the
stress only increases the response time of the application by
2µs. However, core isolation and shielding are still realized
TABLE II
MEASUREMENTS OF RESPONSE TIME
Min Max Mean Std Dev
Idle 14.31 µs 43.45 µs 18.18 µs 294 ns
Stress 9.89 µs 45.60 µs 17.43 µs 1.49 µs
Isolated 0.0 µs 1000.0 µs 17.20 µs 1.72 µs
Shielded 2CPUs 9.71 µs 69.56 µs 17.34 µs 1.49 µs
Shielded 4CPUs 11.43 µs 41.47 µs 17.45 µs 1.47 µs
to reduce perturbations. It has to be noted, that ROS2 and
OpenSplice spawn 58 processes, where most computation
time is applied by the ROS2 Nodes and the OpenSplice
transmitting and receiving processes. Isolation is done by
adding the isolcpus parameter to the grub configuration
and then using taskset to move the processes to the
isolated cores. For this study two cores have been isolated for
the processes of the test application. The results exhibit that
core isolation leads to inaccuracy regarding the timings, since
the scheduler and load balancer are not used by those cores
anymore. Thereby the system wakes up at desynchronized
times, and thus the underflow occurs. Due to the limitations
of the oscilloscope’s measuring method, min and max cannot
be measured for isolated cores (table II). The underflow can
be seen in figure 4. The true maximum lies around 160µs,
therefore still decreasing in accuracy. Shielding with cset
shield slightly improves over the idle state. Test with each
two and four shielded cores were conducted, showing that,
due to many ROS2 and DDS processes, it is advisable to have
more cores inside the shield. However, the standard deviation
increases for all stressed systems over the idle state.
Finally, the transmission times between the two network
interfaces of IPC1 and IPC2 has been evaluated. This ex-
cludes all delays that are introduced above the physical
layer (e.g. by the application) and focuses on the network
transmissions with induced traffic on top. Traffic was created
using iperf. For the results of the evaluation in Table III
only TX-traffic is regarded.
They state that increased network traffic will further
increase the latencies of the network. The latencies grow
with higher network load until the threshold of 300 Mbit/s
(a) Latencies of intra-process test after system
setup solely based on cyclictest performance.
(b) Response time after disabling C-states for
the CPU.
(c) Response time of a stressed system after
disabling C-states.
Fig. 3. Further evaluation of the real-time system using the ROS2 pendulum demo for intra-process communication. Shows that, even though a system
appears to be set up correctly, there still might be deficits. The right graphs show the results after final optimization with completely disabled the processors’
C-States.
Fig. 4. Shows the different response times as measured by the oscilloscope.
This is a selection of the oscilloscope recordings. Only the first half of the
test interval (1ms) is shown. At the bottom is the reference signal from the
TICRO. The other lines are the summed results of 2x106measurements for
each trial. Especially striking is the wide spread of the isolated trial with
the underflow due to desynchronization, which can be seen in Table II.
TABLE III
NETWORK DELAYS WITH INDUCED TRAFFIC
Traffic in Mbit/s Min Mean Max
0 91.1 µs 109.7 µs 299.8 µs
10 89.1 µs 124.7 µs 435.2 µs
100 87.5 µs 255.1 µs 622.6 µs
500 91.5 µs 225.8 µs 553.1 µs
is exceeded. Then instead of using interrupt requests a
continuous polling mode is automatically activated by the
system to reduce load. This explains the minor improvement
between 100 Mbit/s and 500 Mbit/s. Issuing the same test
using RX-traffic on IPC1, PTP reaches its limitations in a
busy network. Since PTP, by design, assumes stable round-
trip times, changes in latency lead to a loss in accuracy.
This leads to the conclusion that for time synchronization an
additional network dedicated only to PTP is preferred over
a shared network.
This research shows that a stable real-time capable system
can be configured using consumer-based hardware. The
results were confirmed in repeated long-term runs, and hold
true over a large time frame. Furthermore, it is shown that
the system interrupts and tasks on the application layer are
handled with low latencies and therefore react in a sufficient
time frame.
The results have been evaluated using an external, non-
invasive measurement method, and show reaction times of
45µs as maximum on a stressed system. Even though the
control of digital outputs introduces slight overhead, it can
be used to measure the systems latencies according to a
reference signal. Regarding the order of magnitude the results
are comparable to the benchmark tests of OSADL, but are
on a high-level application. More so, the experiments show
that with the proper shielding methods the real-time critical
components still work correctly in an otherwise stressed
system. The shielding appears to not be that important for
scheduled wake-ups, they are, however, presumed to have a
greater impact once the whole stack is examined.
Main limitations as previously expected are the Linux
network stack, which introduced high latencies and jitter
in busy networks. Additionally, the PTP synchronization is
easily disturbed with concurrent RX-network traffic.
In conclusion, applying a correct system setup a real-time
capable system can be set up, even reducing disturbances
using the correct shielding. Though as a bit of a surprise,
isolating cores can have negative influences, due to the load
balancer and scheduler not being active on the cores any-
more. Therefore, using cset shield is advised instead.
V. RESULTS
This work evaluates the setup of a distributed time-
synchronized system. Though other work already showed
that a real-time capable system is possible to be set up, this
work further evaluated the response time of such a system
using external measurements, thus giving the measurements a
higher confidence. This further adds the possibility for future
research to set up time critical applications on consumer
based hardware. In the context of this work, it allows to do
an in-depth evaluation of ROS2 and its underlying structures
with the final goal of the development of a modular and
distributed real-time robotic controller.
In contrast to the work of Guti´
errez et al [10] firstly an
evaluation of the real-time capabilities of the system has
been performed. Their focus was more on the evaluation of
the network stack, which constitutes the secondary work of
this research. In their ongoing work [4], the authors further
evaluated round-trip times of the ROS2 stack using different
DDS implementations, a step yet to be conducted for this
work. Casini et al [3] present a processing chain analysis
method, taking a theoretical approach to examine real-time
capabilities of ROS2 applications.
Core contributions of this work are:
Real-time capabilities of consumer-based hardware
This study uses off-the-shelf hardware to set up a real-
time capable system. A very precise distributed time-
synchronization is realized using PTP.
Evaluation of real-time systems
This work has introduced a multi-perspective evaluation
of the real-time capabilities of a system. Even though
the cyclictest already showed promising results, it is
worth performing additional tests to also detect failures,
which might occur only under certain circumstances.
Tooling such as ftrace in combination with the required
applications helps in understanding what is going on in
which layer of the system.
External validation
Using the serial interface of the IPCs as GPIOs the
time-synchronized setup can be validated. Furthermore,
the application response time can be evaluated. Even
though introducing a minor overhead, this allows precise
measurements with regard to a reference time.
Evaluation of network latencies
An evaluation of the network latencies, which can be
expected with higher network, load has been conducted.
However, main insights are that PTP should be used on a
dedicated or low traffic network, to ensure best possible
synchronization. Since PTP is becoming unstable with
variable network latencies caused by network traffic.
VI. CONCLUSIONS AND FUTURE WORKS
The presented approach evaluates the real-time capabilities
of a distributed time-synchronized system, as a potential
basis for robotic control applications with ROS2 on Linux.
A multi-perspective analysis was realized, not only relying
on single benchmark tests. Instead, various tests of the
use-case scenario were performed with different evaluation
methods. In addition, an external measurement approach for
non-invasive validation was applied using a high-precision
reference signal.
As expected the linux network stack causes non-
deterministic latencies and jitter. Therefore, it is advisable not
to overload the local network, and use a dedicated network
for the time-synchronization with PTP when possible.
This research demonstrates the configuration and versa-
tile validation of real-time capabilities with consumer-based
hardware and open-source software. The external measure-
ments allow to validate the robustness of the system even on
extremely stressed systems and networks. For this, the serial
ports of the hosts are utilized as digital outputs to examine
the response time and the synchronicity.
An evaluation of the real-time capabilities regarding the
operating system was conducted using cyclictest as
baseline. From thereon the system was further evaluated
using network and ROS2 benchmarks but also ftrace
to analyze the current configurations in-depth. Finally, the
response time was evaluated using an external measurement
including high workload situations. Furthermore, the limita-
tions of isolating cores versus the benefit of shielding cores
to cope with stress were inspected. Thereby this research
allows the recreation of a stable real-time ready linux system,
further showing where pitfalls reside and how to overcome
them.
After establishing a robust fundamental system and evalu-
ating the network latencies, an analysis of ROS2 and differ-
ent DDS implementations as middleware can be completed.
With all parts thoroughly tested and optimized, an evalua-
tion of ROS2 as real-time capable software for distributed
modular robotic control is intended.
REFERENCES
[1] “ROS2.” [Online]. Available: https://index.ros.org/doc/ros2/
[2] “DDS Interoperability Wire Protocol.” [Online]. Available: https:
//www.omg.org/spec/DDSI-RTPS/
[3] D. Casini, T. Blaß, I. L¨
utkebohle, and B. B. Brandenburg, “Response-
time analysis of ros 2 processing chains under reservation-based
scheduling,” in 31st Euromicro Conference on Real-Time Systems
(ECRTS 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,
2019.
[4] C. S. V. Guti ´
errez, L. U. S. Juan, I. Z. Ugarte, and V. M. Vilches,
“Towards a distributed and real-time framework for robots: Evaluation
of ros 2.0 communications for real-time robotic applications,” arXiv
preprint arXiv:1809.02595, 2018.
[5] “Linux Real-Time [Wiki].” [Online]. Available: https://wiki.
linuxfoundation.org/realtime/start
[6] “FreeRTOS - Real-time operating system for microcontrollers.
[Online]. Available: https://www.freertos.org
[7] “Open Source Automation Development Lab.” [Online]. Available:
https://www.osadl.org
[8] F. Reghenzani, G. Massari, and W. Fornaciari, “The real-time linux
kernel: A survey on preempt rt,” ACM Computing Surveys (CSUR),
vol. 52, no. 1, pp. 1–36, 2019.
[9] N. T. Dantam, D. M. Lofaro, A. Hereid, P. Y. Oh, A. D. Ames,
and M. Stilman, “The ach library: A new framework for real-time
communication,” IEEE Robotics Automation Magazine, vol. 22, no. 1,
pp. 76–85, 2015.
[10] C. S. V. Guti ´
errez, L. U. S. Juan, I. Z. Ugarte, and V. M. Vilches,
“Real-time linux communications: an evaluation of the linux com-
munication stack for real-time robotic applications,” arXiv preprint
arXiv:1808.10821, 2018.
[11] “Manpages for cyclictest on Debian Buster.” [Online]. Available:
https://manpages.debian.org/buster/rt-tests/cyclictest.8.en.html
[12] Y. Maruyama, S. Kato, and T. Azumi, “Exploring the performance
of ROS2,” in Proceedings of the 13th International Conference
on Embedded Software - EMSOFT ’16. New York, New
York, USA: ACM Press, 2016, pp. 1–10. [Online]. Available:
http://dl.acm.org/citation.cfm?doid=2968478.2968502
[13] “Connext DDS Professional by RTI.” [Online]. Available: https:
//www.rti.com/products/connext-dds-professional
[14] “Adlink Vortex OpenSplice.” [Online]. Available: https://www.
adlinktech.com/en/vortex-opensplice-data- distribution-service.aspx
[15] “eProsima FastRTPS.” [Online]. Available: https://www.eprosima.
com/index.php/products-all/eprosima-fast-rtps
[16] F. Cerqueira and B. Brandenburg, “A comparison of scheduling latency
in linux, preempt-rt, and litmus rt,” in 9th Annual Workshop on
Operating Systems Platforms for Embedded Real-Time Applications.
SYSGO AG, 2013, pp. 19–29.
[17] “Micro ROS.” [Online]. Available: https://micro-ros.github.io/
[18] “Apex.OS.” [Online]. Available: https://www.apex.ai/apex-os
[19] H. Fayyad-Kazan, L. Perneel, and M. Timmerman, “Linux preempt-rt
vs commercial rtoss: How big is the performance gap?” GSTF Journal
on Computing, vol. 3, no. 1, 2013.
[20] J. H. Brown and B. Martin, “How fast is fast enough? choosing
between xenomai and linux for real-time applications,” in proc. of
the 12th Real-Time Linux Workshop (RTLWS’12), 2010, pp. 1–17.
[21] “ftrace - Function Tracer.” [Online]. Available: https://www.kernel.
org/doc/Documentation/trace/ftrace.txt
... They conclude that security settings have much more impact on performance than any of the Quality of Service (QoS) optimizations. Puck et al. [17], [18] focused on a setup to reliably check round-trip times in a distributed system with external measurements and took a closer look at the performance of ROS2 Eloquent Elusor with OpenSplice as DDS implementation. In a recent work, Blass et al. [19] outline general difficulties between demanding a real-time system and using ROS2. ...
... This hardware as well as the methods used for evaluating the setups' capabilities are introduced in this section. While this work largely depends on our previous works [17], [18] the setup has changed in some critical points. ...
... A few works also analyze some non-functional metrics, such as CPU performance benchmarks, to explore bottleneck behaviors in selected workloads [23], [24], [49]. Recent work has also explored the implications of operating systems and task schedulers on ROS 2 computational graph performance through benchmarking [50], [51], [52], [53], [54] as well as by optimizing the scheduling and communication layers of ROS and ROS 2 themselves [55], [56], [57], [58], [59], [60], [61], [62]. These works often focused on a specific context or (set of) performance counter(s). ...
Preprint
Full-text available
We introduce RobotPerf, a vendor-agnostic bench-marking suite designed to evaluate robotics computing performance across a diverse range of hardware platforms using ROS 2 as its common baseline. The suite encompasses ROS 2 packages covering the full robotics pipeline and integrates two distinct benchmarking approaches: black-box testing, which measures performance by eliminating upper layers and replacing them with a test application, and grey-box testing, an application-specific measure that observes internal system states with minimal interference. Our benchmarking framework provides ready-to-use tools and is easily adaptable for the assessment of custom ROS 2 computational graphs. Drawing from the knowledge of leading robot architects and system architecture experts, RobotPerf establishes a standardized approach to robotics benchmarking. As an open-source initiative, RobotPerf remains committed to evolving with community input to advance the future of hardware-accelerated robotics.
... Moreover, the DDS allows for the implementation of different Quality of Service profiles, and provides reliable and secure communication. Also, ROS 2 has been designed for industrial settings, thus supporting real-time [14] and embedded systems [15] development. Authors in [16] propose a framework for collaborative industrial manipulators. ...
Article
Full-text available
This paper introduces CRAZYCHOIR,, a modular Python framework based on the Robot Operating System (ROS) 2. The toolbox provides a comprehensive set of functionalities to simulate and run experiments on teams of cooperating Crazyflie nano-quadrotors. Specifically, it allows users to perform realistic simulations over robotic simulators as, e.g., Webots and includes bindings of the firmware control and planning functions. The toolbox also provides libraries to perform radio communication with Crazyflie directly inside ROS 2 scripts. The package can be thus used to design, implement and test planning strategies and control schemes for a Crazyflie nano-quadrotor. Moreover, the modular structure of CRAZYCHOIR allows users to easily implement online distributed optimization and control schemes over multiple quadrotors. The CRAZYCHOIR package is validated via simulations and experiments on a swarm of Crazyflies for formation control, pickup-and-delivery vehicle routing and trajectory tracking tasks. CRAZYCHOIR is available at https://github.com/OPT4SMART/crazychoir .
... First, the general performance of ROS 2 was evaluated by Maruyama et al. [14], Gutiérrez et al. [15], and Puck et al. [16]. Other work focuses on more specific elements of the performance of ROS 2, including its overhead with relation to the underlying middleware, DDS [17]. ...
... Furthermore, Puck et al. [21] are proposing ROS2 real-time control architecture in timesynchronized networks and investigating real-time capabilities. Distribution and synchronization in computation for multiagent robot systems towards real-time control of robots are addressed by the work of Puck and Keller [22]. Real-time communication inside ROS2 networks offers further potential in improving the end-to-end delay of the system. ...
... To meet safety and/or performance goals, some parts of a system must execute in deterministic amounts of time. ROS 2 offers APIs for developers of realtime systems to enforce application-specific constraints [33], [34]. ...
Preprint
Full-text available
The next chapter of the robotics revolution is well underway with the deployment of robots for a broad range of commercial use-cases. Even in a myriad of applications and environments, there exists a common vocabulary of components that robots share - the need for a modular, scalable, and reliable architecture; sensing; planning; mobility; and autonomy. The Robot Operating System (ROS) was an integral part of the last chapter, demonstrably expediting robotics research with freely-available components and a modular framework. However, ROS 1 was not designed with many necessary production-grade features and algorithms. ROS 2 and its related projects have been redesigned from the ground up to meet the challenges set forth by modern robotic systems in new and exploratory domains at all scales. In this review, we highlight the philosophical and architectural changes of ROS 2 powering this new chapter in the robotics revolution. We also show through case studies the influence ROS 2 and its adoption has had on accelerating real robot systems to reliable deployment in an assortment of challenging environments.
... The integration of additional end-effector force-torque sensors and active compliant control could compensate for uncertainty with force-sensitive docking. We will further investigate the possibility of space qualification of ReCoBot in its current state, and consider software-side safety and redundancy through a switch to ROS2 [20] with real-time context [21]. ...
Article
Full-text available
The increasing functional and nonfunctional requirements of real-time applications, the advent of mixed criticality computing, and the necessity of reducing costs are leading to an increase in the interest for employing COTS hardware in real-time domains. In this scenario, the Linux kernel is emerging as a valuable solution on the software side, thanks to the rich support for hardware devices and peripherals, along with a well-established programming environment. However, Linux has been developed as a general-purpose operating system, followed by several approaches to introduce actual real-time capabilities in the kernel. Among these, the PREEMPT_RT patch, developed by the kernel maintainers, has the goal to increase the predictability and reduce the latencies of the kernel directly modifying the existent kernel code. This article aims at providing a survey of the state-of-the-art approaches for building real-time Linux-based systems, with a focus on PREEMPT_RT, its evolution, and the challenges that should be addressed in order to move PREEMPT_RT one step ahead. Finally, we present some applications and use cases that have already benefited from the introduction of this patch.
Conference Paper
Full-text available
Middleware for robotics development must meet demanding requirements in real-time distributed embedded systems. The Robot Operating System (ROS), open-source middleware, has been widely used for robotics applications. However, the ROS is not suitable for real-time embedded systems because it does not satisfy real-time requirements and only runs on a few OSs. To address this problem, ROS1 will undergo a significant upgrade to ROS2 by utilizing the Data Distribution Service (DDS). DDS is suitable for real-time distributed embedded systems due to its various transport configurations (e.g., deadline and fault-tolerance) and scalability. ROS2 must convert data for DDS and abstract DDS from its users; however, this incurs additional overhead, which is examined in this study. Transport latencies between ROS2 nodes vary depending on the use cases, data size, configurations, and DDS vendors. We conduct proof of concept for DDS approach to ROS and arrange DDS characteristic and guidelines from various evaluations. By highlighting the DDS capabilities, we explore and evaluate the potential and constraints of DDS and ROS2.
Article
Full-text available
Correct real-time software is vital for robots in safety-critical roles such as service and disaster response. These systems depend on software for locomotion, navigation, manipulation, and even seemingly innocuous tasks such as safely regulating battery voltage. A multi-process software design increases robustness by isolating errors to a single process, allowing the rest of the system to continue operating. This approach also assists with modularity and concurrency. For real-time tasks such as dynamic balance and force control of manipulators, it is critical to communicate the latest data sample with minimum latency. There are many communication approaches intended for both general purpose and real-time needs. Typical methods focus on reliable communication or network-transparency and accept a trade-off of increased message latency or the potential to discard newer data. By focusing instead on the specific case of real-time communication on a single host, we reduce communication latency and guarantee access to the latest sample. We present a new Interprocess Communication (IPC) library, Ach, which addresses this need, and discuss its application for real-time, multiprocess control on three humanoid robots.
Conference Paper
Bounding the end-to-end latency of processing chains in distributed real-time systems is a well-studied problem, relevant in multiple industrial fields, such as automotive systems and robotics. Nonetheless, to date, only little attention has been given to the study of the impact that specific frameworks and implementation choices have on real-time performance. This paper proposes a scheduling model and a response-time analysis for ROS 2 (specifically, version "Crystal Clemmys" released in December 2018), a popular framework for the rapid prototyping, development, and deployment of robotics applications with thousands of professional users around the world. The purpose of this paper is threefold. Firstly, it is aimed at providing to robotic engineers a practical analysis to bound the worst-case response times of their applications. Secondly, it shines a light on current ROS 2 implementation choices from a real-time perspective. Finally, it presents a realistic real-time scheduling model, which provides an opportunity for future impact on the robotics industry. 2012 ACM Subject Classification Software and its engineering → Real-time schedulability
Article
We needed data to help ourselves and our clients to decide when to expend the extra effort to use a real-time extension such as Xenomai; when it is sufficient to use mainline Linux with the PREEMPT RT patches applied; and when unpatched mainline Linux is sufficient. To gather this data, we set out to compare the performance of three kernels: a baseline Linux kernel; the same kernel with the PREEMPT RT patches; and the same kernel with the Xenomai patches. Xeno-mai is a set of patches to Linux that integrates real-time capabilities from the hardware interrupt level on up. The PREEMPT RT patches make sections of the Linux kernel preemptible that are ordinarily blocking. We measure the timing for performing two tasks. The first task is to toggle a General Purpose IO (GPIO) output at a fixed period. The second task is to respond to a changing input GPIO pin by causing an output GPIO pin's value to follow it. For this task, rather than polling, we rely on an interrupt to notify us when the GPIO input changes. For each task, we have four distinct experiments: a Linux user-space process with real-time priority; a Linux kernel module; a Xenomai user-space process; and a Xenomai kernel module. The Linux experi-ments are run on both a stock Linux kernel and a PREEMPT RT-patched Linux kernel. The Xenomai experiments are run on a Xenomai-patched Linux kernel. To provide an objective metric, all timing measurements are taken with an external piece of hardware, running a small C program on bare metal. This paper documents our results. In particular, we begin with a detailed description of the set of tools we developed to test the kernel configurations. We then present details of a a specific hardware test platform, the BeagleBoard C4, an OMAP3 (Arm architecture) system, and the specific kernel configurations we built to test on that platform. We provide extensive numerical results from testing the BeagleBoard. For instance, the approximate highest external-stimulus frequency for which at least 95% of the time the latency does not exceed 1/2 the period is 31kHz. This frequency is achieved with a kernel module on stock Linux; the best that can be achieved with a userspace module is 8.4kHz, using a Xenomai userspace process. If the latency must not exceed 1/2 the frequency 100% of the time, then Xenomai is the best option for both kernelspace and userspace; a Xenomai kernel module can run at 13.5kHz, while a userspace process can hit 5.9kHz. In addition to the numerical results, we discuss the qualitative difficulties we experienced in trying to test these configurations on the BeagleBoard. Finally, we offer our recommendations for deciding when to use stock Linux vs. PREEMPT RT-patched Linux vs. Xenomai for real-time applications.
Article
Real-time operating systems (RTOSs) are getting more and more important for different uses in industry and become an integral part of commercial products today. Currently, there are many types of RTOSs, either open source or commercial ones with less or more features and characteristics. The aim of this research is to benchmark the real-time (RT) behaviour and performance of an open source RTOS (Linux PREEMPT-RT v3.6.6-rt17) and two commercial ones (QNX and Windows Embedded Compact 7), where all of them fall in the same RTOS category: they use virtual memory techniques to protect the kernel from user space, and protect the user space applications from each other. Flat memory RTOS’s are not in this category. The benchmark is based on experimental measurements’ metrics such as thread switch latency, interrupt latency, sustained interrupt frequency, mutex and semaphore acquisition and release durations, and finally the locking behaviour of mutex. These tests are executed on an x86 platform (ATOM processor) following a test framework and using non-invasive measurement equipment. The results show that the Linux PREEMPT-RT in its current version 3.6.6 is starting to be a competitor against the tested commercial RTOSs.
Towards a distributed and real-time framework for robots: Evaluation of ros 2.0 communications for real-time robotic applications
  • C S V Gutièrrez
  • L U S Juan
  • I Z Ugarte
  • V M Vilches
C. S. V. Gutiérrez, L. U. S. Juan, I. Z. Ugarte, and V. M. Vilches, "Towards a distributed and real-time framework for robots: Evaluation of ros 2.0 communications for real-time robotic applications," arXiv preprint arXiv:1809.02595, 2018.
The ach library: A new framework for real-time communication
  • N T Dantam
  • D M Lofaro
  • A Hereid
  • P Y Oh
  • A D Ames
  • M Stilman
N. T. Dantam, D. M. Lofaro, A. Hereid, P. Y. Oh, A. D. Ames, and M. Stilman, "The ach library: A new framework for real-time communication," IEEE Robotics Automation Magazine, vol. 22, no. 1, pp. 76-85, 2015.
Real-time linux communications: an evaluation of the linux communication stack for real-time robotic applications
  • C S V Gutièrrez
  • L U S Juan
  • I Z Ugarte
  • V M Vilches
C. S. V. Gutiérrez, L. U. S. Juan, I. Z. Ugarte, and V. M. Vilches, "Real-time linux communications: an evaluation of the linux communication stack for real-time robotic applications," arXiv preprint arXiv:1808.10821, 2018.
A comparison of scheduling latency in linux, preempt-rt, and litmusrt
  • cerqueira
F. Cerqueira and B. Brandenburg, "A comparison of scheduling latency in linux, preempt-rt, and litmus rt," in 9th Annual Workshop on Operating Systems Platforms for Embedded Real-Time Applications. SYSGO AG, 2013, pp. 19-29.