Available via license: CC BY 4.0
Content may be subject to copyright.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier XX.XXXX/ACCESS.2025.XXXXXXX
SmartData: Towards the Data-Driven Design of
Critical Systems
JOSÉ L. CONRADI HOFFMANN, (Member, IEEE), ANTÔNIO A. FRÖHLICH,(Member, IEEE)
Software/Hardware Integration Lab, Federal University of Santa Catarina, Florianópolis, Santa Catarina, 88040-900 Brazil
(e-mail: [hoffmann,guto]@lisha.ufsc.br)
Corresponding author: José L. Conradi Hoffmann (e-mail: hoffmann@lisha.ufsc.br).
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) project
INCANTO PrINT - Finance Code 001, and by FUNDEP Rota 2030/Linha VI project Auto5G - Finance Code 29271.02.01/2022.01-00.
ABSTRACT Machine Learning algorithms and safety models are enabling higher levels of autonomy in
modern Cyber-Physical Systems (CPS). Ensuring safe autonomous operation requires strict adherence to
timing and security constraints, best expressed in terms of the data consumed rather than tasks executed.
This paper introduces a Data-Centric design for Data-Driven Systems using SmartData, a data construct
enriched with metadata to encapsulate origin, semantics, and relationships. SmartData interact via Interest
relationships, inheriting requirements such as freshness, periodicity, and security. We extend SmartData
with six novel stereotypes: Sensor, Storage, Transformer, Secure, Persistent, and Actuator. To facilitate
system design, we propose a method to algorithmically build a SmartData Graph (SDG), a directed graph
representing the relationships between SmartData elements. The SDG construction algorithm dynamically
updates demands for timing, security, and persistence, ensuring data production satisfies all data require-
ments. Therefore, a Data-Driven design that can be built directly from the system’s data requirements at
early states. With the notion of how actuation is expected, we comprise the dataflows necessary to perform
this actuation. This approach allows system designers to estimate latency, bandwidth, and data generation
periodicity while identifying critical paths requiring reliable communication and processing technologies.
The SmartData API bridges design and implementation, enabling seamless integration. We demonstrate the
proposed method through a use case of an imitation-learning-based autonomous driving system implemented
on a Linux platform and integrated with the CARLA simulator.
INDEX TERMS Data-Driven, Critical Systems Design, Cyber-Physical Systems, Data Timing, SmartData
I. INTRODUCTION
ADVANCES in Machine Learning algorithms lead to
more accurate and capable models, enabling more
higher levels of autonomy, as is the case in autonomous ve-
hicles and smart factories of the fourth industrial revolution.
These systems are themselves driven by Data [1]–[6], not
by tasks and Artificial Intelligence (AI) mechanisms that
compose these critical systems can impose large and unpre-
dictable delays in a worst-case scenario [7]. If the complex
data flows introduced by AI are to find their place within
critical systems, then we must learn to design and implement
these systems around the data they handle instead of the tasks
they run. Deriving the performance of these systems from the
period of critical tasks and deadlines was promoted through
paradigms such as Time-Triggered Architecture (TTA) [8].
However, the complexity associated with data relationships,
like the variable latency added by security algorithms, tran-
sient network loads, AI algorithms, and data dependencies,
now represents a significant conflict for the myriad of mod-
ern critical systems designed around AI concepts, such as
autonomous vehicles [9], [10].
A data-centric or data-driven system design must be able
to model Data in a broader sense, from physical data sources,
like sensors and data storage, to the data that drives actu-
ators or must persist for feed-back loops, safety assurance,
forensics investigation, or to power machine learning and
analytics at cloud infrastructures. In this scenario, tasks be-
come elements of the transformational data that ephemer-
ally exist to perform fusions, aggregations, and all sorts of
transformations. Data is what remains. Data is what must be
secured, exchanged, stored, and processed. Actuators have
response time estimates. Data is strongly typed, defining size
that allows to infer bandwidth and processing time when
coupled with timing constraints from data demands. Physical
VOLUME 11, 2023 1
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
resources have capacity estimates to handle data, networks
have bandwidth estimates, and stream processing units have
throughput estimates. All in all, system-wide latency can be
more promptly derived from data that must be used while they
are fresh enough to still represent the physical phenomena
that they originally captured than tasks that must never miss
their deadlines while manipulating data. This timeliness of
data can be expressed through an Expiry, which represents the
last moment in time they can be used without compromising
any of the temporal requirements [11].
In this paper, we propose a method for a data-driven
design of Cyber-Physical Systems based on SmartData, a
data construct with sufficient metadata to represent origin,
security, and periodicity. We believe modeling time require-
ments in terms of data instead of the tasks that manipu-
late them may be advantageous. A data-driven design can
more promptly accommodate all the aforementioned first-
order requirements, such as timing, security and AI-readiness.
The design methodology starts with the decomposition of
the problem domain into the data used by the system to
actuate, the data necessary to build actuation plan, up to
the data generated to provide perception from the physical
world the system interact with. Based on the origin of this
data, we map the respective stereotypes to model the system
into SmartData. Actuators are naturally associated with an
actuation rate or event that will trigger actuation. The de-
pendency relationships derived from this data dependency
chain, from actuation to sensors, are mapped into SmartData
Interests, defined according to data origin and destination.
The temporal constraints are expressed in terms of period
and freshness, while the security constraints are represented
with the addition of new stereotypes to the data. Finally, the
period of generation of each data is defined as the period that
is sufficient to supply all Interest associated to it. Thus, we
can use these definitions to combine the multiple data flows
that compose the system and provide a graph with sufficient
information to guide designers’ decisions through design-
time analysis of lower bounds for communication bandwidth
and system-wide latency, which should be mapped to process-
ing units and communication technologies that suffice these
constraints derived from the data properties of the system. The
main contributions of this paper are:
1) A Data-Driven design methodology for a Cyber-
Physical System based on SmartData, including the
definition of novel SmartData stereotypes focusing on
the specification of data relationships while encom-
passing timing, sampling mode configurations, critical-
ity, and security.
2) The SmartData Graph (SDG), an illustrative represen-
tation of the system that is built algorithmically as a
directed graph. The algorithm derives the periodicity
necessary for a data to supply all of its consumers inside
the system, allowing for the derivation of the lower
bounds for communication bandwidth and system-wide
latency, which should be mapped to processing unit
and communication technologies that suffice these con-
straints derived from the data properties of the system.
3) A use case example of the SDG specification to build a
design-time model for autonomous driving based on an
Imitation Learning model that, through the SmartData
API, demonstrates straight forward process of deriving
implementation from an SDG.
The remainder of the paper is organized as follows: Section
II presents an overview of the SmartData concept. Section
IV presents the proposed design concepts for data relations
in Data-Driven Systems using SmartData and the definition
of sampling modes. Section V presents the SmartData Graph
specification following the design concepts proposed in the
previous section. Section V-C applies the aforementioned
concepts in order to model a real autonomous driving system.
Section VII presents a discussion regarding the proposed
design and concludes with an overview of previous imple-
mentations and experiences developed based on SmartData.
Section III presents the related works. Finally, Section VIII
concludes the paper.
II. SMARTDATA OVERVIEW
SmartData is a data construct proposed by Fröhlich in [11]. It
provides an alternative to represent critical systems through
the data they rely on, fitting the data-driven concept. The
SmartData Interface proposed in [11] is presented in Fig-
ure 1. SmartData comprises information regarding data
type (UNIT), origin (location() and time()), tim-
ing (period and expiry), data fusion (fuser), spe-
cific modes (mode, presented in Section IV-A), and trans-
parent integration with real-world sensors and actuators
(Transducer) and Network (local and remote trans-
ducer). Through the wait() method, SmartData provides
synchronization features to control periodicity regarding up-
dating its value for periodic configurations. In the case of
aperiodic modes (period = 0), SmartData are updated at
each access. Once data is updated, the expiry is updated
accordingly.
+operator Value() : Value
+SmartData(region, expiry, period, fuser)
+SmartData(dev, expiry, period, mode)
+location(): Coordinates
+time(): Time
SmartData
defines UNIT
Transducer
remote transducer
local transducer
+wait()
+operator update() : Value
FIGURE 1. The SmartData Interface [11].
In this sense, SmartData gives system designers an abstrac-
tion layer for the data management itself, where the source
can be defined as local or remote, and the necessary means
for communication are handled by the data source abstraction
2VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
(e.g., via a network, a CAN Bus, or any other connection, as
long as supported by the underlying OS). The key concepts
from SmartData that are relevant to the data-driven modeling
approach presented here are:
1) Period: The period to which data must be sampled.
This attribute is derived directly from the Critical Sys-
tem requirements, for instance, from the rate at which
an image captured by a camera must be sampled and
analyzed by the system.
2) Expiry: The freshness of a data is associated with
its accuracy in representing the current condition of
the system. Whenever a data is not fresh anymore, it
expires. The expiry is the attribute that represents the
last moment a data can be considered fresh. The expiry
then drives the scheduling decisions in a data-driven
system, especially for critical scenarios, where missing
an expiry implies missing a decision round. Different
components can instantiate SmartData objects bound
to the same physical transducer with different expiry
times. It is, therefore, an Interest relationship attribute.
3) Strong Typing: The semantic aspects of a SmartData
are described using a strategy inspired by the Trans-
ducer Electronic Data Sheets in the IEEE 1451 stan-
dard [12]. Each piece of data is tagged with a 32-bit type
identifier designating either an SI Physical Quantity or
plain digital data. The corresponding SI UNIT identi-
fies physical quantities. This type identifier is called
UNIT and encompasses the semantics of the data, like
size, which can be used further in the modeling to
estimate bandwidth.
4) Sensing and Actuation: A Transducer abstracts phys-
ical sensors and actuators through an uniform interface
that enables a subsequent encapsulation as a Smart-
Data. As depicted in Figure 1, in the general case, an
SmartData is associated with a Transducer. A Trans-
ducer comprises the definition of the UNIT that the
SmartData will consider. From the sensor perspective,
Transducers can include information regarding sensing
error and sensing time and value from the most up-to-
date sample (accessed through operator Value()
and time()). Moreover, Transducers can be classified
according to their sampling modes, either passive or
active. An active transducer will automatically perform
a new sensor reading following its periodic sampling
rate or the occurrence of an event. On the other hand,
a passive transducer will require a secondary polling
mechanism to trigger a new sensor reading. From the
actuator perspective, the interactions with the trans-
ducer encompass the actuation itself, where the Smart-
Data provides the actuation’s input.
5) Space-Time Coordinate System: Each piece of
SmartData is associated with a point in a space-time
coordinate system designating when and where the data
was created. When specifying the relationship among
SmartData, a space-time region defined as the tuple
{x,y,z,r,t0,tf }, where x,y,z,rrepresents the space,
with x,y,zbeing ECEF coordinates of a sphere of
interest with radius r, while [t0,tf ]represents the time
to which this relationship is intended to last.
6) Unambiguous Identification: SmartData are unam-
biguously identified by their type, coordinates and,
in case multiple SmartData of the same type coexist
in the same space-time coordinates, a disambiguation
Id designated dev. Nevertheless, the notion of space
to specify the origin of a data is not enough when
modeling a mobile system. Sensor are set at specific
positions relative to the position of the mobile sys-
tem. To this end, whenever dealing with the mobile
version of SmartData, regions are given by the tu-
ple {x,y,z,r,t0,tf ,sig}, where the sphere defined by
x,y,z,rhas x,y,ztaken as relative coordinates to the
position of the system represented by the signature sig.
Asignature is a crypto Id defined as a hash of the
public certificate of the mobile object. x,y,zare not
discarded once the position of the sensor inside the
system might be noteworthy to the system, for instance,
in data fusion and safety verification. The information
tagged to a data concerning region is referred to as
origin. It is defined by the tuple x,y,z,ts,sig, where ts
is the timestamp of the data generation, and sig is only
tagged in mobile SmartData.
7) Efficient Aggregation: Many sensors and data trans-
formation produces a group of data that may loose
semantics when presented alone. For instance, the out-
put of object recognition and tracking modules in au-
tonomous vehicles usually present the information re-
garding a nearby vehicle position, speed, acceleration,
heading, and extent. SmartData introduces the notion
of Multi-SmartData to address this issue. There are
three types of Multi-SmartData: Multi-Value Smart-
Data,Multi-Unit SmartData, and their combination.
AMulti-Value SmartData is data that share the same
origin (Space-Time for Static SmartData or Time-
Signature for Mobile SmartData) and UNIT. For in-
stance, the data generated by an axial accelerometer
share the same origin and UNIT (acceleration in x, y,
and z axis).A Multi-Unit SmartData, for instance, can
be used to comprise the a collection of that that share
the same origin but not the same unit. For instance,
the motion vector of a vehicle, represented by position
(x, y, z attributes of SmartData), speed, acceleration,
yaw rate, and heading. A special case of a Multi-Value
SmartData is when it is composed of Multi-Unit Smart-
Data. For instance, the representation of a time-series
where each sample is separated by a time offset (e.g.,
list of future way-points of a vehicle motion plan, or a
historical time-series of its previous positions), or a col-
lection of different instances that have been produced
by a transformation (e.g., a list of objects detected by a
object recognition and tracking algorithm).
8) Asynchronous Application Interaction: The current
VOLUME 11, 2023 3
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
implementation of the SmartData framework defines
and API as presented in Figure 1 [11]. The API fol-
lows the Concurrent Observer design pattern [13] to
promote a transparent integration between local and re-
mote transducers. A class implementing an Observed
interface holds a list of Observers, which subscribe
to be notified of new samples of the Observed ob-
ject. The notification invokes the operator update()
of each subscribed Observer. SmartData observing
Local transducers will subscribe as an Observer of this
Transducer. The transducers will notify the SmartData
by invoking the operator update() whenever it produces
a new sample. A passive Transducer will require a
periodic thread associated with the SmartData period
to trigger readings and invoke the operator update().
Remote Transducers are implemented using the Con-
current Observers design pattern to the network inter-
face, and invokes operator update() of the SmartData
whenever a sample of the SmartData is observed.
SmartData can also include concepts to handle QoS re-
quirements like strongly typed data, Geo-location, and cor-
rectness. For instance, it enables the critical system to check
for data correctness and sensor confidence assurance [14],
which could lead to the triggering of a safe model, for in-
stance, triggering a parking or slow-down procedure in an
autonomous car. Moreover, one can also account for secure
communication protocols, like in [15], to ensure data confi-
dentiality and integrity or the integrity of the network compo-
nents itself based on the data they produce [16].
III. RELATED WORKS
SmartData and Interfaces of the Time-Triggered Architec-
ture (TTA) [8] share many common aspects. Both constructs
mediate data access, decoupling locality while preserving
timeliness. TTA describes that time progresses in a dense
timeline, consisting of an infinite set of instants, where a
happening that occurs at a given instant is called an event.
Thus, an observation of the state of the world is an event. In
this sense, an observation contains a value related to an entity
captured at a specific instant, while the event information is
transmitted through event messages. The communication is
taken periodically using pre-specified time divisions known
to all cluster nodes beforehand. These instants are the dead-
lines for tasks within a host. TTA specifies two description
levels: the architecture and node design. In the architecture
design, the application is decomposed into clusters and nodes,
followed by the specification of timing and values of the com-
munication interfaces to define the time-division schedules
of the communication network. At the node design stage, the
implementation of the software must be validated regarding
the temporal constraints established during the architecture
design to satisfy the deadlines specified.
Also addressing a time-triggered design, Klobedanz et
al. [17] proposed Timing Augmented Description language
to model safety-critical systems constraints over AUTOSAR.
They explore the concept of the event chain. An incoming
event triggers each element in this chain and produces an out-
going event. Timing is modeled as a Worst-Case estimation
of the delay between input and output events. Moreover, each
element in the event chain is tagged with timing elements,
such as period, sampling period, writing period, and delay,
aiming at estimating both reaction and age, thus enabling an
evaluation of load distribution between ECUs of a vehicle.
Similarly, Kim et al. [18] propose an Algebraic approach to
model timing and resource constraints for automotive archi-
tectures based on AUTOSAR. They enhance the functional
model obtained through AUTOSAR with timing capabilities
to obtain a time-constrained model, which can be further en-
hanced to encompass resource constraints and enable a timing
analysis of the system’s execution. A formal specification lan-
guage for real-time systems is proposed to communicate data
values between processes considering priority, parallelism,
and period and deadline notions, where processes communi-
cate through interfaces and ports, which defines Worst-Case
Response Time (WCRT) for the respective communication.
Posadas et al. [19] use a notion of data flow to model data
dependencies using UML/MARTE-based design, modeling
the data path as sequence diagrams. In the proposed design,
the authors model the system constraints over the tasks that
communicate data through ports in a client/server approach.
The authors focus on integrating a simulator to evaluate the
task’s timeliness for the proposed design.
Becker et al. [20] address the problem of end-to-end tim-
ing analysis in autonomous embedded systems, proposing
a maximum data age estimation analysis for cause-effect
chains, where age is considered the end-to-end timing of a
chain. In their approach, tasks with data dependency inside
a task-set are modeled as a Directed Acyclic Graph (DAG)
following the dependency relation. They define a notion of
minimum and maximum read intervals and data intervals
based on Worst-Case Execution Time (WCET) and tasks’
periods. Their solution considers the intersection of reading
and data intervals of dependent tasks to establish a relation
between jobs. The solution supports only data communication
taken over shared registers. Thus, data is assumed to be
available instantaneously after production. Finally, multiple
levels of knowledge are considered regarding the maximum
data age estimation, covering known schedules, worst-case
response time, task offsets, and strictness of read-execute-
write ordering.
Nevertheless, each of the above approaches models the
system over the tasks it must run and not the data it handles.
For the TTA design, the time division established for commu-
nications imposes a secondary deadline for processes inside
clusters, where communication occurs within the pre-defined
schedules, adding a new constraint to the system. The two
approaches based on the AUTOSAR model abstract the data
communication as a WCRT estimation even when no network
resource or communication protocol is defined. For the end-
to-end timing analysis proposed by Becker et al. [20], the
need for WCET is taken even without any knowledge of the
platform of choice. At the same time, data communication
4VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
is limited to shared registers. Moreover, the hidden latency
added by security algorithms, transient network loads, data
dependencies, and AI algorithms represents a significant con-
flict when deriving the performance of such systems.
Considering data expiry as a design constraint enables a
straightforward concept for timing that can be more easily de-
rived from the notion that only valid data must be considered
to achieve correctness. However, this notion is not promptly
available when modeling tasks with deadlines and periods
only. With a Data-Driven design, we envision providing a way
for the system designers to represent the system complexity
over the specification of the data they handle and the flow of
this data in the system, enabling a more robust development
of the system requirements over the data needs and their
respective timing, promoting more promptly estimations at
design-time for bandwidth, network capacity, and resource
allocation estimation while addressing criticality and security
alongside timing. Moreover, our design supports best-effort
configurations, which is helpful for approaches that benefit
from additional data.
Paz et al. [21] propose a systematic approach to the design
of critical systems. The authors focus on componentization
of the system into a design that combines multiple modeling
methods, such as UML, AADL, Simulink, and Stateflow.
The methodology of their approach relies on three phases:
one for eliciting the requirements and defining the modeling
languages that will be used, one for the description of the
meta-level functionalities, and an operation phase where the
selected tools are applied to the design. However, they do not
address the timing and the communication of the system. In-
stead, they focus on meta-models of the system functionalities
and the consistent design over multiple modeling languages.
Our approach differs as we focus on the data instead of
the components that will use them. Moreover, by modeling
data, our design encompasses the system complexity over the
definition of data dependencies and the constraints associated
with the data and its communication, such as timing, security,
location, and size.
Sari et al. [22] propose modeling safety-critical functions
using Electronics Architecture and Software Technology -
Architecture Description Language (EAST-ADL) and AU-
TOSAR as a solution to address safety and dependability re-
quirements in the system design. EAST-ADL is implemented
as a UML2 profile with extensions to safety and dependabil-
ity, enabling safety requirements. Nevertheless, the authors
do not address timelines and data validity. Instead, they focus
on sensor faults and other related system hazards. In our
design, the impact of failures and errors on sensor readings are
abstracted into the criticality attribute of SmartData. Thus, we
can also provide a design-time representation of the complex-
ity associated with critical data flows, helping the designers to
decide on fault tolerance and allocation. Considering EAST-
ADL and AUTOSAR, Krawczyk et al. [23] propose modeling
complex Events for Automotive Embedded Systems over
activation patterns. Nevertheless, the activation patterns are
themselves defined as a combination of data and timing,
considering period or aperiodic behaviors. This concept can
also be represented as a data-driven process in our design.
Regarding resource allocation considering data constraints,
in [24], the authors propose ResCue, a Data-Driven resource
management framework to cope with the data-intensive na-
ture of Autonomous Embedded Systems. The framework in-
cludes a data scheduler to load and unload program code and
a memory reservation scheme aiming for minimal memory
reservation requirements while providing spatial data avail-
ability. The ResCue framework requires Worst-Case Execu-
tion time estimation for the task-sets and the amount of mem-
ory a task-set requires to perform its analysis. The scheduler
extracts the Least Chance Release Time of a task-set along
the supply rate for each output period to assess the start time
to which data must be transferred from the supply to the
memory for the task-set to meet its output deadline. Such
concepts necessary to build the requirements for the ResCue
framework, for instance, can be easily derived from the Data-
Driven specification proposed in this paper.
Focusing on timing and data dependency over data dissem-
ination, in [25], the authors propose an architecture based on
a database that periodically samples the data items and stores
the most up-to-date value to be disseminated in the system.
In their design, data items are represented by the tuple (value,
expiration, update time), and data requests are represented
by a list of the tuple (requested data, send time, deadline).
Request feasibility is given by the expiration of the requested
data and the transmission time associated with the request.
Thus, the request is feasible if all requested data arrives valid
at the end of transmission. Similarly, by modeling the system
centered on the data it produces and consumes, our design
can yield the necessary timing and dependency information
at design-time for estimating the system capacity.
Also concerned with the temporal validity of data, Goud
et al. [26] present an approach for Adaptive Cruise Control
where the distance to the vehicles in front and the speed vari-
ation change the data expiration constraint. This composition
helps reduce oversampling and unnecessary computations.
Our design also supports such configuration by exploring
dynamic expiration.
IV. DESIGNING DATA-DRIVEN SYSTEMS WITH
SMARTDATA
Object-oriented programming (OOP) employs the principles
of encapsulation, inheritance, and polymorphism to structure
and manage the domain decomposition process [27]. Within
this paradigm, each sub-domain is represented as an object
encapsulating data and methods specific to its respective
portion of the domain. These objects interact through well-
defined interfaces, ensuring the modularity, scalability, and
maintainability of the computational code. Object-oriented
domain decomposition leverages OOP’s ability to model
complex systems by assigning responsibilities and behaviors
to distinct domain regions, thereby enhancing the perfor-
mance of parallel computations and the organization of the
underlying software architecture.
VOLUME 11, 2023 5
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
A key step in domain decomposition is understanding the
actors within the system, identifying their respective domains,
and defining the interfaces through which they interact. These
interfaces encompass not only data formats but also tem-
poral constraints, such as those from real-time subsystems.
For instance, sensors produce data at a specified frequency;
however, the communication of such data to another domain
must adhere to the temporal constraints expressed by these
interfaces in order to optimize bandwidth. By combining data
structure specifications with periodicity requirements, it is
possible to derive, at design time, the bandwidth necessary
to implement these interfaces in a real system.
Designing systems with data as the central focus entails
decomposing the system based on the data it produces and
consumes. The flow of data, from sensors to intermediate
transformations and finally to actuators or monitoring sys-
tems, encompasses notions of security, communication, trans-
formation, and timing. Defining these data relationships is the
objective of the approach proposed here using SmartData.
SmartData, inspired by the Application-Driven Embedded
System Design (ADESD) method [28], exploits polymor-
phism and inheritance to enable decoupling the definition
of the data structure from the physical world interaction
through the Transducer interface. Furthermore, it allows for
the transparent integration of lower level communication do-
main with the application domain through the definition of
two interfaces: Interested, a proxy for a remote data source;
and Responsive, supporting local data sources.
Each object in every sub-domain implements the Smart-
Data interface, as depicted in Figure 1. This interface specifies
attributes such as timing, data structure, location, and aggre-
gation, along with the associated transducer. The transducer
encompasses methods for data generation, actuation, and
transformation. The SmartData interface itself transparently
manages data communication through the aforementioned
Interested and Responsive interfaces.
To address the distributed nature of modern Cyber-Physical
Systems (CPS), such as autonomous vehicles, the domain
decomposition must also account for security and monitor-
ing attributes associated with the interfaces connecting sub-
domains. For example, an autonomous vehicle can be di-
vided into six primary domains: Sensing, Perception, V2X,
Planning, Control, and Actuation. Sensor data, combined
with V2X messages, feeds into the Perception domain to
construct a representation of the environment and the ve-
hicle’s state. The Planning module uses this information to
generate a route, plan a path, and devise motion strategies
to reach the destination while avoiding collisions. This plan
is then utilized by high-level controllers, such as Propor-
tional–Integral–Derivative (PID) controllers, to adjust longi-
tudinal and lateral controls. These controls are forwarded to
actuators for low-level execution.
Each of these objects interact with its respective transduc-
ers via specific communication channels such as CAN buses,
USB, or Ethernet interfaces, each adhering to a specific pro-
tocol, potentially including security measures. Additionally,
communication interfaces between domains may use wired or
wireless technologies, such as IEEE 802.11p [29] or LTE/5G-
V2X [30] for V2X communication, and Vehicular Ethernet
for intra-vehicle communication. While some technologies
rely on channel isolation to avoid the computational overhead
of ensuring security, others, particularly wireless channels,
necessitate secure communication protocols. These include
encryption and authentication algorithms, which define meth-
ods to ensure security. Such protocols also establish key data
structures, lifecycle procedures (e.g., represented as sequence
diagrams or automata), certificate formats, verification pro-
cesses, maximum communication frequencies, and message
headers and payload structures.
Finally, modeling efficient monitoring mechanisms are
essential in data-driven system for auditing [31], run-time
verification [32], [33], and Machine Learning training [34],
[35]. Therefore, defining monitoring rates as well as storage
destination is a key step in the domain decomposition.
In the following sections, we will detail the key aspects of
designing data-driven systems with SmartData. This includes
the system model, modeling of time requirements, the formal
definition of SmartData as a representation of the system,
and the novel SmartData stereotypes introduced in this work.
Table 1 presents the glossary of terms adopted throughout the
remainder of the paper.
Symbol Description
Πthe set of data properties
Xthe set of data properties in the output layer
Tthe set of data properties in the transformational layer
Ithe set of data properties in the input layer
Ωa data properties
ωan instance of a data properties
Ψthe set of Interest relationships in a System
ψan Interest relationship
ψΩi,Ωjan Interest relationship from datum Ωito datum Ωj
ρperiod
εexpiry
Θthe set of vertices in Algorithm 1 yet to be handled
SDG a SmartData Graph
Vthe set of vertices in a SDG
Ethe set of edges in a SDG
GCD returns the Greatest Common Divisor between two numbers
min returns the minimum value in a list or set
max returns the maximum value in a list or set
TABLE 1. Glossary.
A. SYSTEM MODEL
We assume a system model in which subsystems and compo-
nents interact following the Interested/Responsive interfaces
to allow the flow of sensing, processing, and actuation. Pro-
cessing is itself not limited to control and includes interfacing
with non-critical systems.
When a component or subsystem utilizes data from
another component or subsystem, it instantiates a proxy
of the required input data using the Interested inter-
face. This interface communicates with the Responsive
6VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
interface over a wired or wireless communication chan-
nel. The Interested interface generates an Interest mes-
sage, which is defined with the following attributes:
(type,region, ρ, event , ε, mode). These attributes express in-
terest in a specific type of data produced during a time
interval ∈[region.t0,region.tf ]and originating from a de-
fined spatial region (region.x,region.y,region.z,region.r)or
a unique signature sig. Any Responsive component or sub-
system meeting these criteria will generate a Response mes-
sage at periodic intervals of ρunits of time or upon the
occurrence of the specified event, adhering to the designated
response mode. A response message, represented as the tuple
(type,origin, ε, data), contains the requested data along with
metadata specifying its type,origin = (x,y,z,ts,sig), and ε.
Control messages are incorporated into the system
model. To manage communication channel system design-
ers must define a bandwidth reservations, defined as a
Margin of Safety (MOS), to model communication chan-
nel bandwidth. Control messages include the attribute tu-
ple (region,command ), which triggers a system opera-
tion mode change specified by command. This change ap-
plies to a defined spatial region, represented as a sphere
(region.x,region.y,region.z,region.r), or identified by a sig-
nature region.sig. The command takes effect starting at
region.t0and concludes at region.tf . Interest messages can
be dynamically injected by components or subsystems re-
sponding to specific circumstances or changes in the system’s
operational mode. These messages can also be revoked.
The system model is agnostic to the underlying commu-
nication protocol or channel. When security is a requirement
for data transmission over a particular communication chan-
nel, the protocol must support encapsulating the described
messages within securely transmitted packets. These packets
should employ security algorithms, such as encryption and
authentication, and specify keys and certificates as part of the
security attributes.
A SmartData Interest can specify Time-Triggered (Time-
Triggered (TT)) or Event-Driven (Event-Driven (ED)) con-
figurations using the mode attribute in the Interest message.
In a TT configuration, the periodicity of a Response mes-
sage is determined by its Period and Expiry. Thus, a TT
SmartData Interest is characterized by having ψ.ρ > 0. In
contrast, SmartData Interests with ψ.ρ = 0 are classified as
ED SmartData. This configuration applies uniformly to all
Interest messages ψissued by the corresponding SmartData.
The supported modes for SmartData Interests are defined as
follows:
1) Single: This is the common scenario for most classical
networks, where a single piece of data is transmitted
every request. For TT SmartData interests, this mode
leads to a single data instance being transmitted every
period. It suits well when ψ.ε =ψ.ρ, generating a
single data instance per period. For ED SmartData in-
terests, this mode means that at most one data instance
will be produced per interest relation specified.
2) N: This is an extension of the Single-mode, where in-
stead of a single data, N samples will be transmitted per
period, for TT. This scenario suits well when building
a time-series of the sensor values. For instance, an
Actuator SmartData is interested in a Multi SmartData
with the length of N samples per period. For the ED
scenario, this mode is relevant when N can be derived
by a distribution function that describes the probability
of occurrence of the event, thus, representing the need
for the system to guarantee capacity to up to N data
samples, allowing for a deterministic bandwidth.
3) N+Many: This configuration extends the definition of
N mode by adding the notion of best-effort data pro-
duction after N data have been produced. After N data
productions, the remaining data produced in a period
(or Interest) are taken with r.ε =∞, enabling a best-
effort behavior when forwarding and transforming this
data, as their priority is the lowest in the system. Thus,
only the first N data productions are accounted for when
modeling the system capacity.
4) TT-ED-N Hybrid: This configuration enables a TT-
N behavior for a specific duration ψ.region.tf −
ψ.region.t0after the occurrence of an ED triggering.
This configuration reserves bandwidth for a TT-N exe-
cution ED-N times.
5) Urgent: Except for the Many data samples in N+Many
configurations, each of the modes described above can
be extended to enable high criticality data forwarding.
For instance, to model an urgent alarm for evacuation
in case of an emergency in an industry. In our design,
this is represented by setting ψ.ε = 0. A zero expiry
definition has a different semantic from data that ex-
pires. Instead, the zero expiry definition configures the
highest priority during communication when consider-
ing routing priority based on expiry.
Nevertheless, for SmartData following a ED mode, if no
information is available regarding a closed finite interval
(e.g., ψ.region.t0 = −∞ and ψ.region.tf =∞)), interests
can be issued considering an approximated resource reser-
vation, for instance, specifying a reservation over bandwidth
for communication. In our previous work [36], we have
demonstrated that through the SmartData principles and the
definition of period and expiry, we can obtain a design-time
analysis of schedulability of time-triggered wireless sensor
networks while considering a reservation for ED modes.
B. MODELING TIME REQUIREMENTS
Within the flow of data production and consumption in a data-
driven system, interests can be modeled with periods equal to,
greater than, or smaller than the expiry time. An expiry equal
to a period implies that a datum must be updated at least at the
same rate as the data consuming it. An expiry smaller than
a period imposes a freshness requirement, where a datum
is produced only once per period but must be completed
before its input expires. Conversely, an expiry greater than
a period indicates that a datum can be updated at a slower
VOLUME 11, 2023 7
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
rate than the rate at which it is consumed. However, this is
valid only if the data consuming it also relies on another input
updated at a rate equal to or higher than its own period. Thus,
for the last case, the period of at least one of the datum shall
be adjusted to either match the expiry in a data relationship
if the expiry exceeds the period or conform to the period
otherwise.
In the proposed extension of SmartData, we abstract a
system into three layers: input, transformation, and output.
From a system architecture perspective, the input layer com-
prises sensors and other components that observe the phys-
ical subsystem and generate data. The transformation layer
includes programmable devices that process input data to
produce new data. Lastly, the output layer consists of devices
that utilize data to act on the physical subsystem or generate
global outputs. These three layers form the foundation for
new stereotypes integrated into SmartData in this work.
This abstraction is formalized using the following defini-
tion, where atomic elements are denoted as data properties.
We assume an abstract set Πof data properties, denoted as
{Ω1,Ω2,...,Ωn}.
Periodic data (ρ > 0) is assumed to be produced at
least once per period, with Period expressed in the datum’s
properties (e.g., "1920x1200 RGB images generated by a
camera at 60 frames per second" or "acceleration along the
x-axis measured in m/s2every 2 ms"). If at least one data
production adheres to its timing requirements, and subsequent
components use this data, the system satisfies timing correct-
ness. For event-driven data properties, the Period is ρ= 0. To
estimate network bandwidth, event-driven data must provide
a maximum event frequency.
The period of a datum can also be defined based on its
data relationships. Using the SmartData interest message, we
define the datum period as:
Pψ_,Ωi= max(ψ_,Ωi.ρ, ψ_,Ωi.ε)(1)
Ωi.ρ =GCD(Pψ_,Ωi| ∀ψ_,Ωi∈Ψ, ψ_,Ωi.ρ > 0) (2)
where ρψ_,Ωirepresents the maximum of the period and expiry
in a data relationship involving Ωi, and GCD denotes the
greatest common divisor. Event-driven interests are ignored
since they do not establish a periodic data relationship.
When the expiry exceeds the period, we recommend align-
ing the expiry with the period (i.e., setting the expiry as a
multiple of the period). Similarly, we recommend aligning
periods across dependencies on a datum. Following this ap-
proach, a datum that only consumes data and is not consumed
further (e.g., an Actuator) must specify at least one depen-
dency with ε≤ρ. Otherwise, this configuration may permit
the production of a datum with repeated inputs.
To prioritize data communication, expiry is defined as the
minimum among all expiries issue to a datum:
Ωi.ε = min(ψ_,Ωi.ε | ∀ψ_,Ωi∈Ψ) (3)
Expiry is transitive in a chain of data dependencies, im-
plying that the expiry at a lower level must not exceed the
expiry at higher levels. For instance, if Cdepends on Dwith
expiry εD> εAand on Fwith εA< εF< εD, only εFneeds
adjustment to εAto satisfy the expiry constraints imposed by
A.
C. FORMAL DEFINITION OF THE SMARTDATA
REPRESENTATION OF THE SYSTEM
In our formalization, we assume each data properties Ωi∈Π
to include at least an attribute Ωi.ρ describing the production
Period (0for ED modes). In addition, we define the rela-
tionship element based of the SmartData interest message,
simplified here to Timed Data as ψΩi,Ωj= (ρ, ε)to represent
that a datum Ωidepends on another datum Ωjand imposes
a requirement that samples ωjconsumed shall not be older
than Eunits of time. So, Eis the relative Expiry requirement
imposed by Ωito Ωj. Note that the elements type,region,
t0, and tf are abstracted here since the source datum and
target datum are explicit in the definition of the relationship
as Ωiand Ωj. Moreover, the mode attribute is expected to be
adjusted to match the idea of at least one to cover for data
replication, while TT −ED −NHybrid mode are supposed
to be considered TT for the sake of performance estimation
on the system. The set Ψcontains the definition of all data
dependencies in the system. In this way, we can formally
define the abstraction of the system as:
Definition 1 (Formal SmartData Representation of the Sys-
tem): Given a finite set of data properties Π = {Ω1, ..., Ωn},
we define a SmartData Representation of a System as a tuple
S= (I,T,X,Ψ), such that I⊂Π,T⊂Π,X⊂Π, and
I∩T=I∩X=T∩X=∅, where:
•I=∅are data originating from sensors and input
components of the System, i.e., a system with no data
in the input layer produces nothing;
•Tare data resulting from transformations of other data
in the system;
•X=∅are data used for actuation commands, i.e., a
system with no output data does not actuate;
•∃ψΩi,_∈Ψ∀Ωi∈T∪X, i.e., every transformation and
actuation depends in at least one other datum;
•∃ψ_,Ωi∈Ψ∀Ωi∈I, i.e., every sensor in the system is
used by at least one other datum;
•ψΩi,_/∈Ψ∀Ωi∈I, i.e., a sensor shall not depend on
other datum to produce a sample;
•ψ_,Ωi/∈Ψ∀Ωi∈X, i.e., an actuation shall not be used
to produce other datum;
•∃ψΩi,Ωj∈Ψ. ψΩi,Ωj.ε ≤Ωi.ρ∀Ωi∈T∪X, i.e., at
least one of the dependencies of a datum Ωi∈T∪X
has an Expiry smaller than or equal to the Period of Ωi.
Otherwise, it would be possible for two consecutive Ωi
instances (in consecutive periods) to be produced using
the same set of inputs.
We remark that data persistency is considered to be set as
data properties in the output layer. Moreover, for the sake of
the production of feedback loops over actuation data, the view
of a previous value of actuation can be seen as another sensor.
8VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
To ease readability, here onward, the usage of ψwithout refer-
ring to a specific Ωiand Ωjwill be used to represent a general
data dependency. Moreover, other SmartData attributes can
also be referred as attributes of ψfor the sake of simplicity
to demonstrate the integration between the general concept
presented here and SmartData.
D. SMARTDATA STEREOTYPES
A data-driven critical system is designed around the data it
handles, from physical data sources, like sensors and storage,
to the data driving actuators or producing persistent data,
being secured, passing through transformational data (e.g.,
aggregations and fusions), and finally triggering an actuation
or storage based on this data flow. These three layers, Input,
Transformational, and Output, are modeled here through
six SmartData stereotypes, namely, Sensor, Storage, Trans-
former, Actuator, Secure, and Persistent, depicted in Figure 2.
The Problem domain is decomposed into entities rep-
resenting the data produced and consumed by the sys-
tem. They are represented as classes that implement the
SmartData interface presented in Section II, tagged with
either «Storage»,«Sensor»,«Transformer», or
«Actuator» stereotypes, and optionally tagged with
«Secure» and «Persistent». Any data can be asso-
ciated with security requirements, and therefore, be tagged
with the «Secure» stereotype. Moreover, whenever data
needs to persist for any reason, it will be tagged with
the «Persistent» stereotype, except for data originating
from «Storage», which is already persistent by nature.
The decomposition starts with identifying the actuation
that will be envisioned for the system, followed by the Smart-
Data the actuators are interested in, up to the sensors. For in-
stance, in an autonomous vehicle, one may need to actuate, at
a given rate, over throttle, brake, and steering. Each actuation
is associated with a specific data input, which must be pro-
vided with a specific freshness constraint to avoid consuming
expired data. This data dependency will generate Interest in
other SmartData, resulting from a transformation or a sensing
process. This Interest relation will then carry the timing and
security requirements associated with the actuation. If more
than one actuation is interested in a SmartData, this Smart-
Data must adapt its period to supply all its consumers accord-
ingly (eq. 2). The idea of the domain decomposition is then
to identify each of the entities that compose the system and
map their relationships according to data dependency. Thus,
defining the set of data properties Πand data dependencies
Ψas depicted in Definition 1, and subsequently the sets I,T,
and X, or input, transformer, and output data, respectively.
For instance, Π = {Ω1,Ω2},X={Ω1},I={Ω2},
and Ψ = {ψΩ1,Ω2= (100ms ,100ms)}, a system defined
over an «Actuator» SmartData Ω1that is interested in
a«Sensor» SmartData Ω2with temporal requirements of
period ρ= 100ms and expiry E= 100ms. The remaining
characteristics of Ω1and Ω2are defined using the remaining
SmartData attributes, such as type and origin, while mode and
region are defined as additional attributes of ψ.
The SmartData Graph of Data-Driven systems proposed in
this paper also encompasses an abstraction of data criticality.
The criticality attribute represents the importance of the data
and the impact of errors on data communication, enabling a
design-time analysis of the critical paths of communication
and the level of data correctness expected at that specific
communication channel. For instance, communication errors
(e.g., bit inversions) have a far higher impact on the final
actuation data, like throttle intensity, as if the same error
occurs while communicating an image with >1MB. The
criticality attribute is abstracted in [0,1], where 0 is non-
critical, and 1 is the highest.
1) Secure
Security and Privacy are major concerns when modeling con-
temporary complex critical systems [37]. Thus, the security
regarding data and the variable latency added by security
algorithms must be acknowledged in the early stages of a
system design. In essence, SmartData security builds from a
produce-consume cycle with no external interactions. Such
configuration is intricate, requiring a careful design of data
communication and the possible entry points for security
flaws and physical attacks. In this way, SmartData not tagged
with the «Secure» stereotype require formal proof of its
isolation to guarantee security.
Nevertheless, interacting with a non-isolated environment
is a common non-functional requirement that can lead to
a breach in security, like communicating with the cloud,
interfacing V2V communication, intra-vehicle wireless com-
munication, and handling persistence. In this sense, Smart-
Data consider such interactions as the entry point to ex-
ternal attacks in the system and must be denoted with an
explicit secure feature, the «Secure» stereotype. Smart-
Data tagged with the «Secure» stereotype are expected to
handle security requirements over non-isolated environments
through security protocols, which should manage such inter-
faces whenever interacting with a SmartData update() or
Value() operators. In this sense, SmartData using the one
denoted with the «Secure» stereotype must consider the
same security concerns. Thus, the «Secure» stereotype is
the division between the isolated environment and the non-
isolated one, a critical definition when modeling contempo-
rary complex critical systems. From baseline attributes of a
secure stereotype, we include the overhead added to commu-
nication in terms of payload size (e.g., adding a signature)
and the respective encrypting and decrypting complexity to
be accounted as additional delay based on the data size (i.e.,
ψ.type.UNIT .size). To further describe the security protocol
associated, UML annotations can be included to account for
state machines of such protocols, which can later be used
to produce safety properties for formal verification as an
expansion of the method defined in [32] for safety models.
Finally, the «Secure» stereotype can be tagged alongside
other SmartData Stereotypes.
VOLUME 11, 2023 9
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
Transducer
Transducer
Transducer
+operator Value() : Value
+SmartData(region, criticality)
<<Transformer>>
SmartData
SmartData
SmartData
SmartData
SmartData
+update() : Value
+SmartData(region, fuser, criticality)
<<Storage>>
SmartData
Input Transformational Output
+update() : Value
+SmartData(region, fuser, criticality)
<<Sensor>>
SmartData
defines UNIT
Transducer
defines UNIT
Transducer
+SmartData(period, expiry, mode, criticality)
<<Actuator>>
SmartData
defines UNIT
defines ACTUATION
Transducer
<<Secure>>
SmartData
+update() : Value
+operator Value() : Value
+update() : Value
<<Persistent>>
SmartData
SmartData
SmartData
SmartData
SmartData
+operator Value() : Value
+operator Value() : Value
defines Monitoring Rate
Transformation
defines UNIT
FIGURE 2. The novel SmartData stereotypes: Sensor, Storage, Transformer, Secure, Persistent, and Actuator.
2) Persistent
Persistency is a natural state of a data lifetime. The
concept of building time-series bridges the conventional
producer-consumer cycle of Input-Transform-Output loops,
enabling offline analysis and online learning, a demand
to several data-driven systems involving AI [38]–[41]. The
«Persistent» stereotype aims to represent such capabil-
ities during design time, interacting with the platform mon-
itoring capabilities in order to provide the necessary means
for data persistence, not limiting the data consumption to im-
mediate usage by actuation. The Monitoring Rate tag of
«Persistent» SmartData is derived from the monitoring
configurations specified for this system. For instance, fol-
lowing the non-intrusive monitoring design proposed in [33],
the monitoring configuration encompasses the rate at which
data will be captured by a Monitor component at specific
data collection points, handled through different monitoring
policies [42], the data structures, and the necessary process-
ing for data persistency. The Monitoring Rate speci-
fied must be considered when defining the period associated
with the SmartData sampling to supply the monitoring. The
«Persistent» stereotype must be tagged simultaneously
to either «Transformer»,«Sensor», or «Actuator».
It can be tagged alongside the «Secure» stereotype as well.
However, it cannot be tagged alongside «Storage» once
«Storage» SmartData is already persistent.
The «Persistent» stereotype adds to the regular
update() method of SmartData the necessary means for
monitoring and the storage of current reads. Similarly, the
Value() operator is empowered with access to the stored
time-series in a similar fashion to the update() method
for the «Storage» stereotype. For instance, a SmartData
interested in a SmartData tagged with «Persistent»
stereotype can express interest in previous moments in time
through the Interest mode and interval, which will lead to
responses encompassing the time-series values instead. Thus,
modeling data persistency enables the realization of time-
series for online and offline usage, providing means to es-
timate the necessary demands at run-time to support such
features (e.g., memory usage for data structures, bandwidth
estimation for time-series communication, and connection to
cloud databases).
3) Sensor
Sensors are the primordial data sources in data-driven sys-
tems. They mediate the analog, continuous, physical world
with the digital, discrete, computing system. A SmartData
tagged with the «Sensor» stereotype encompasses the def-
inition of a Transducer, which makes available the real-world
sensing through the update() method, updating its expiry.
Values read through update() can be accessed by the
Value() operator. When associated with multiple Trans-
ducers, we can address it as a Multi-SmartData, a concept of
multiple data sampled synchronously at the same location and
represented as a collection of such sensors. Multi-SmartData
is a relevant concept, especially in the industry when using
third-party sensing technology as a black-box, and must be
captured by the design due to its interactions with other
components. Moreover, whenever a fuser is available, the
update method applies the fuser before returning the real-
world sampling. A fuser is meant to return a combination
of multiple sensors that answer to the same Interest ψ(i.e.,
ψ.type = Ω.type ∧Ω.origin ⊆ψ.region), a fusion of
Multi-SmartData readings, and multiple readings from the
same sensor inside a period. Finally, the region parameter
specifies the Space-Time region of SmartData.
10 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
4) Storage
Calibration tables, configuration files, static information of
the system properties, and offline maps, are some examples
of previously stored that data constantly required for initial-
ization and operation of cyber-physical systems. Moreover,
data-driven control operations, specially those based on AI-
models, constantly require access to historical data to per-
form predictions and retraining. Thus, a Data-Driven System
may require access to stored data. SmartData tagged with
the «Storage» stereotype represents historical data of a
specific Transducer at a previous moment in time. Different
from «Sensor», a «Storage» Transducer does not repre-
sent a sensing element. Instead, it provides access to specific
storage through queries (e.g., an SQL database connection,
a comma-separated-values (CSV) file, or any historical data
structure). An interest relation ψto a «Storage» SmartData
Ωmust provide a space-time or signature-based Ω.origin ⊆
ψ.region. The update() method returns historical data by
iterating on the stored data following the interest ψ.period
at each invocation, starting at the data point nearest to
ψ.region.t0. The sample read through update() can be ac-
cessed by the Value() operator. Similar to the «Sensor»
stereotype, SmartData tagged with the «Storage» stereo-
type can also comprehend a fuser with the same semantics.
Due to the incompatibility regarding the Transducer inter-
actions, a SmartData cannot accommodate the «Storage»
and «Sensor» stereotype simultaneously.
5) Transformer
Any data transformation promoted by AI (e.g., classifiers
and predictors) or traditional transformations (e.g., aggre-
gation, filters, FFT) is modeled on the second layer of the
SDG through the «Transformer» stereotype. SmartData
tagged with this stereotype does not encompass a Transducer.
Instead, the SmartData encompasses a Transformation that
defines the UNIT and the set of SmartData necessary to
perform it, to which interests will be issued to perform the
Transformation accordingly. The update() method
executes the Transformation at each invocation and
updates the relative expiry. Values transformed by Smart-
Data tagged with «Transformer» are made available
by the Value() operator. Since the Transformation
can modify the UNIT of the SmartData used as input
(e.g., a converter or a classifier), SmartData tagged with
the «Transformer» stereotype cannot be tagged with a
stereotype from other SmartData layers simultaneously ex-
cept «Secure» and «Persistent».
6) Actuator
The final cyber-physical interaction in a data flow is the
actuation. SmartData tagged with «Actuator» stereotype
encompasses the definition of a Transducer, which defines
the UNIT for the actuation data and makes available the
cyber-physical actuation itself. This stereotype also defines
the set of SmartData necessary to build the respective actu-
ation following the period,expiry, and mode parame-
ters. period specifies the actuation period expected by the
system, usually derived from the actuation rate. The period,
alongside the expiry, and the mode (see Section II), spec-
ifies the attributes composing the timing demands imposed
to the interest relations defined by this SmartData, which
must be back-propagated to SmartData of interest, so they
can adapt their sampling in order to supply this actuation.
The actuation is triggered by the update() method ev-
ery period (or event in the case of event-driven sampling
modes, ψ.ρ = 0). The «Actuator» stereotype can be
tagged to a SmartData simultaneously to «Persistent»
and «Secure» stereotypes.
V. BUILDING A SMARTDATA GRAPH
The SmartData Graph (SDG) consists of a straightforward
visual representation of the design requirements using the
SmartData concepts. We can formally define a SDG as a
graph SDG = (V,E), where Vis the set of vertices and E
is the set of oriented edges. Each vertex v∈Vrepresents
a SmartData encompassing the stereotypes presented in this
work, including the definitions of criticality, security, persis-
tency, region, and the information derived from its UNIT,
like size and format. Visually, the criticality attribute is repre-
sented as the color of SmartData vertexes, enabling an easier
visualization of critical paths to be considered, ranging from
blue (0 - not critical) to red (1 - highest criticality). Attributes
derived from the UNIT, like size and format, are represented
textually inside the vertex. Other attributes like persistency
and security are displayed as annotations of the respective
vertex and customized according to each configuration. Tag-
ging a SmartData as secure shall be represented by adding the
annotation with the symbol of a padlock, where the index of
the padlock shall represent which type of security is assumed
for this SmartData (i.e., maps into a dictionary of padlock).
Tagging a SmartData as persistent will shall be represented
by adding an annotation with the symbol of a database (i.e.,
stacked disks icon) alongside the respective monitoring rate,
represented by an integer value and the timing unit (i.e., s
for seconds, ms for milliseconds, and µsfor microseconds).
Moreover, the SDG can be enriched with other meaningful
information to help designers by adding annotations to the
respective vertex, such as state machines or UML diagrams
describing complex transformations or actuation flows.
Each oriented edge e∈Eis an interest relationship
represented by ψΩi,Ωj. Thus, the interest relation from Ωi
to Ωjis represented in the SDG Graph as an oriented edge
from Ωjto Ωi, representing the direction flow of the data.
Moreover, this relation implies following the requirements
described by the interest ψ. Considering the characteristic of
aregion (i.e., (x,y,z,r,t0,tf )for static data and (sig,t0,tf )
for mobile data), an Interest can be matched by more than
one SmartData. Since we are considering the design of each
entity in a critical system, this notion leads to the defini-
tion of an oriented edge from each SmartData that meets
the interest requirements to the SmartData vertex issuing
interest, thus, corroborating the explicit representation of
VOLUME 11, 2023 11
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
ψas ψΩi,Ωjwhile omitting type and region. Each oriented
edge e∈Eis visually represented with the attribute tuple
(ψ.mode, ψ.period , ψ.expiry, ψ.interval)since type and re-
gion have already been represented by the existence of the
edge itself.
The logical order to build a SDG is to start from Actuator
and Persistent SmartData, followed by the SmartData they
are interested in, up to the sensors. This practice is meant
to follow the principle that all data composing a SDG is
meant for actuation or persistency, enabling a more straight-
forward view that every data inside the system is either the
actuation/persistent itself or is of interest to them, or to the
data required to produce them. Thus, the building prohibits
including data that is not from the Output layer or is not of
Interest to any SmartData.
In this way, we present the Algorithm 1 as an algorithmic
solution to build a SDG. The algorithm requires as input a set
of SmartData Π=∅and the respective sets X,T, and I. The
algorithm starts with an empty SDG and follows to populate
the vertex set Vwith «Actuator» and «Persistent»
SmartData (lines 5 to 7). The algorithm also accounts for
an auxiliary FIFO queue Θto store SmartData added to the
model that have their interests not yet added to SDG.E. The
algorithm consumes the elements in the FIFO queue until
empty, adding the corresponding interests to the SDG.Eset
(lines 8 to 36).
We start by adding an edge for all interests issued to the
SmartData Ωiunder analysis (e.g., ∀j∈[0,|Π|]. ψΩj,Ωi∈Ψ)
(line 13 to 27). Note that Ωi∈Xhave their period defined ac-
cording to actuation rate or monitoring rate, therefore we skip
lines 10 to 12 for these datum. For each interest a consumer
issued to this SmartData, we update its period according to
eq. (2) (lines 15 to 26). For TT configurations, we update
the period using the greatest common divisor (GCD) between
the current period and the new interest period, accounting
for the Nrequirement of the TT mode configuration (lines
19). The use of the GCD to update the SmartData period
aims to ensure the freshness of data whenever an interest
response is issued since a new sample will be generated to
compose the interest response. Without GCD usage, there is
no guarantee that a response will be issued with a new data
sample, which can impact the scheduling and lead to the usage
of expired data. Thus, harmonic periods are recommended
for SmartData interested in the same SmartData to avoid
oversampling. In this way, with the set of edges Ecomputed
for all interests in the system (∀ψ∈Ψ, we guarantee that
every SmartData in the system (Ωi∈SDG.V) has a period
(Ωi.ρ) sufficient to supply every SmartData interest issue to
him (i.e., ∀j∈[0,|Π|]. ψΩj,Ωi∈Ψ).
For the scenarios with ED interest mode (ψ.ρ = 0),
the number of messages issued for each interest must be
accounted as a reservation of bandwidth (line 26), which is
accounted in a reserve attribute. Moreover, as mentioned
in Section II, for ED interests that do not specify a distribution
for the event or are specified with an infinite interval, the
reservation account in line 26 can be replaced with an approx-
imated bandwidth reservation. In this way, when a SmartData
is of Interest to both Periodic and Aperiodic Interests, it will
act on behalf of both configurations, considering its periodic
behavior and reserving the necessary capacity for the event-
driven interests.
Algorithm 1 Build SmartData Graph
Require: Π=∅,X,T,I
1: V← ∅
2: E← ∅
3: SDG ←(V,E)
4: Θ = FIFO()
5: for Ωi∈Xdo
6: SDG.V←SDG.V∪Ωi
7: Θ.enqueue(Ωi)
8: while not Θ.empty() do
9: Ωi←Θ.dequeue()
10: if not Ωi∈Xthen
11: Ωi.ρ ←0
12: Ωi.ε ← ∞
13: for j∈[0,|Π|]. ψΩj,Ωi∈Ψdo
14: SDG.E←SDG.E∪ψΩj,Ωi
15: if ψΩj,Ωi.ρ > 0then
16: ρaux ← ⌈ ψΩi,Ωj.ρ
ψΩj,Ωi.mode.N⌉
17: ρψΩj,Ωi←max(ρaux , ψΩj,Ωi.ε)
18: if Ωi.ρ = 0 then
19: Ωi.ρ ←PψΩj,Ωi
20: else
21: Ωi.ρ ←GCD(Ωi.ρ, PψΩj,Ωi)
22: else
23: Ωi.reserve ←Ωi.reserve +
ψΩj,Ωi.mode.N*Ωi.type.UNIT .size
24: Ωi.ε ←min(Ωi.ε, ψΩj,Ωi.ε)
25: ψmin_expiry.ε ←0
26: for j∈[0,|Π|]. ψΩi,Ωj∈Ψdo
27: if Ωj∈ SDG.Vthen
28: SDG.V←SDG.V∪Ωj
29: Θ.enqueue(Ωj)
30: if ψΩi,Ωj.ε < ψmin_expiry.ε then
31: ψmin_expiry ←ψΩi,Ωj
32: if ψmin_expiry.ε > Ωi.ε then
33: ψmin_expiry.ε ←Ωi.ε
34: return SDG
Expiry and sampling time are highly correlated when ad-
dressing the temporal validity of data. The expiry parameter
expressed by the interest relation, and handled when building
a SDG, is a relative expiry to be tagged with the data when
sampled. Since the source of data in a SDG are «Sensor»
and «Storage» SmartData, their generation influences the
age of data in the entire flow. As «Storage» SmartData do
not perform any new sampling, expiry does not apply based
on its sample timing. Instead, the expiry of a «Storage»
12 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
SmartData sample is set only to accommodate the Inter-
ested SmartData prioritization. In this way, a «Sensor»
and «Storage» SmartData Ωalways tag a new sample
with the current timestamp of its generation ts =now().
The expiry notion represented by Ω.expiry is derived from
the requirements of all the interests issued to this SmartData
according to eq. 3 (line 27), as the minimum expiry between
all interest issued by its consumer.
We add each SmartData Ωjthat Ωiis interested in (i.e.,
∀j∈[0,|Π|]. ψΩi,Ωj∈Ψ) into the vertexes of SDG if
they have not been added yet, while also adding them into
the auxiliary FIFO Θto adjust their period and expiry and
include their own dependencies into the SDG (lines 29 to
32). Using expiry as the minimum between the interests does
not negatively affect SmartData since the expiry is just used
for prioritization in communication (the lower the expiry,
the higher the priority). Since different SmartData can apply
different expiry constraints, the expiry verification is done
individually using the current time, the sample generation
time t, and the specific SmartData interest requirement.
Finally, we update the minimum expiry requirement is-
sued by Ωi, represented by ψmin_expiry, to match the minimum
expiry of its consumers (Ωi.ε), whenever ψmin_expiry >Ωi.ε
(lines 33 to 36).
Since Θis a FIFO that was first populated with
all «Actuator» and «Persistent» SmartData, we
guarantee that all consumers of Ωihave already been
added to the SDG, thus, guaranteeing that their pe-
riods have already been updated accordingly. In this
sense, the timeliness requirements are propagated from the
«Actuator»/«Persistent» SmartData across the en-
tire interest chain until the Sensors and Storage SmartData,
adapting sampling to supply the system’s actuation and per-
sistence by generating data every period that suffices the
required periodicity and forwarding it, following a prioriti-
zation via the data expiration.
A. INTEREST ON PERSISTENT SMARTDATA
An interest in an «Actuator» SmartData does not fit the
concepts presented in this work since this interest would
lead to implying new timing constraints on the actuation.
Nevertheless, the interest in the persistent time-series built
using previous results for this actuation SmartData is possible
(e.g., need for Reinforcement Learning algorithms). In this
way, to easily accommodate the interest in persistent data, a
constraint regarding the initial interval of such interest must
be established in order for the data to be produced with
no additional constraints in the actuation itself, fitting the
«Persistent» period. The constraint is set as follows:
(ψΩj,Ωi.mode.N∗Ωi.ρ +(4)
max(ψΩj,Ωi.t0∀j∈[0,|Π|]. ψΩj,Ωi∈Ψ)
≤ψΩj,Ωi.t0) ∧(ψΩj,Ωi.ρ ≥Ωi.ρ)
In this way, the initial interval of the interest must be
greater than the time it takes for Ωito generate Ndata points
composing the time-series under request. Moreover, after the
first Nelements of the time-series are produced, it must hold
that the period of interest is greater than the period of the
datum Ωi, such that at every period of the interest at least one
new sample is available.
B. VISUAL DESIGN NOTATION OF AN SDG
Each vertex in an SDG is represented visually by a circle.
The name representing the data properties and the semantic
of respective unambiguous identifier (e.g., RGB images gen-
erated with 1920x1080 pixels resolution can have multiple
unambiguous identifier, such as 0, 1, 2, and 3, representing
images generated from cameras placed in the front, left, back,
and right sides of a vehicle) is presented above or below the
circle. An icon can be used to illustrate the type of the datum.
If an icon is not available, a textual description might be used
to replace it. The UNIT (either digital or SI) of the SmartData
is presented below the icon. The size of the datum (Bytes,
Kilobytes, Megabytes, and so on), or the formula used to
calculate it is presented below the UNIT. The color of the
vertex represents its criticality, a range of color going from
blue (0) to red (1). Oriented edges connecting two vertices
are annotated with the parameters of the interest relationship
ψ. The first value of the tuple is the mode, followed by period,
expiry, and interval. For the sake of simplicity, ψ.interval
and ψ.mode are omitted when ψ.interval = [−∞,∞]and
ψ.mode =Single.
C. DIDACTIC EXAMPLE OF A SDG
Boolean
1B
S1
S3
T1
A3
A1
A2
ED-N=3, 0, 100ms
?
m/s2
4B
Boolean
1B
?
m3
4B
100ms, 50ms
50ms,
50ms, ms
50ms, 100ms
K
4B
S
K
4B
K
4B
50ms
2
50ms
Data Criticality
01
FIGURE 3. A depiction of a Data-Driven Design of a CPS control, including
Sensor, Transformers, and Actuator SmartData.
Figure 3 presents an example of the SDG representation. In
this scenario, three «Actuator» SmartData are defined: A1
is a periodic actuator with a period of 100ms (i.e., frequency
of 10Hz) and produces Volume information that will be used
for actuation (SI UNIT m3) represented in 4 Bytes; A2is a
periodic actuator with a period of 50ms (i.e., frequency of
20Hz) and produces a Boolean signal that will be used for ac-
tuation; and A3is an «Actuator» that produces a Boolean
that controls a switch. A3issues an event-driven Interest
to acceleration SmartData, represented by S3, with 100ms
expiry. Moreover, 3event occurrences have been estimated
for the length of this interest (i.e., A3.Dep ={(ψA3,S3=
(0,100,ED −N= 3)}). Both A1and A2are interested
VOLUME 11, 2023 13
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
in the «Transformer» SmartData T1, but with different
expiry and periods, A1.Dep ={ψA1,T1= (100,50)}, and
A2.Dep ={ψA2,T1= (50,50)}, which adapt its sampling
following Algorithm 1. T1is a machine learning algorithm
that builds upon S1and S2, two temperature sensors, T1
forwards the new constraints T1.ρ = 50ms;, with an expiry
of 100ms for S2and 50ms for S1, thus T1.Dep ={ψT1,S1=
(50,50), ψT1,S2= (50,100)}). Moreover, A3and S3are con-
sidered critical SmartData, while the S1,S2,T1,A1, and A2
are considered less critical, which would imply, for instance,
in the need for an isolated communication channel to avoid
collisions with the less critical data.
D. MAXIMUM DATA AGE
In the worst case, a new SmartData will be available every
Ωi.ρ. Thus, we can assume its maximum age is equal to
Ωi.ρ. Nevertheless, data consumption is linked to expiry. If
all interests issued by a SmartData have expiry smaller than
period (i.e., ∀j∈[0,|Π|]. ψΩi,Ωj.ε ∈Ψ. ψΩi,Ωj.ε < Ωi.ρ), in
fact, the maximum age of Ωiwill be given by the maximum
expiry value (i.e., a sample of Ωiwill be consumed at most
ψ.ε units of time after its production). For ED interests, since
ψ.ρ = 0, the maximum age is given solely by expiry. Finally,
for ED Urgent mode, where ψ.ρ =ψ.ε = 0, and the
maximum age is given by ψ.tf .
E. DERIVING NETWORK CAPACITY AND BANDWIDTH
WITH A SMARTDATA GRAPH
We can estimate a lower bound for bandwidth at design-
time by combining the resulting sampling rate (Ωi.ρ), the
bandwidth reservation Ωi.reserve (event-driven), and the data
size Ωi.type.UNIT .size, extracted from the resulting SDG
SDG generated using Algorithm 1. Thus, we can estimate the
bandwidth of a SmartData Ωias Ωi.bandwidth =Ωi.reserve +
Ωi.period *Ωi.type.UNIT .size1. Based on the attributes of the
«Secure» stereotype, Ωi.type.UNIT .size must be increased
to account for additional bytes related to security attached to
the communication payload (e.g., the size of signatures). We
can derive the total low-bound bandwidth for each communi-
cation medium by summing up the low-bound bandwidth of
SmartData sharing the same medium.
In our previous work, we have demonstrated that through
the SmartData principles and the definition of period and ex-
piry, we can obtain a design-time analysis of schedulability of
time-triggered wireless sensor networks in [36]. The method
is based on a list of interests issued to the network, a list of
hops to a sink to which information flows, a Margin of Safety
for communication (reserved bandwidth), and the Mac Rate
of the network.
In the context of the SDG, this schedulability analysis
only envisions communication defined over a given band-
width. The algorithm ignores the latency associated with data
1For bandwidth to be represented in bytes per second, Ωi.type.UNIT .size
is expected to be given in bytes and Ωi.ρ in seconds.
generation and dependency, and the respective adaptation to
account for such definitions will be addressed in future works.
To adapt the SDG defined in Algorithm 1, we must first
define the hops from source to destination for each interest
in the system. To comprise shared memory in the scheduling
analysis, we must define the number of hops as 1and the
Mac Rate as the period of SmartData sharing a communica-
tion medium. The algorithm proposed in [36] is based on a
route(source,destination)method that routes the path from
source to destination, according to the number of hops from
each node to the sink. In this case, this list shall be replaced
by a matrix representing the connection, in hops, between
all nodes. No connection is represented by ∞. Therefore,
a system designer can define the route(source,destination)
method as a static mapping or as the shortest path between
each node, for instance, using Dijkstra’s Shortest Path Algo-
rithm over the aforementioned matrix. Finally, according to
aMac Rate for communication, the interest definition based
on source,destination,period, and expiry, we would be able
to check for communication schedulability, and therefore,
system capacity.
VI. CASE STUDY: IMITATION LEARNING-BASED
AUTONOMOUS DRIVING AGENT IN CARLA
The use case example of choice is an Autonomous Driving
agent based on a condition imitation learning algorithm pro-
posed by Codevilla et al. [43]. The autonomous driving agent
is built over a benchmark that includes autonomous driving
simulations built using CARLA [44], an open-source urban
driving simulator implemented using Unreal Engine 4, which
provides professionally designed towns with buildings, veg-
etation, and traffic signs, as well as vehicular and pedestrian
traffic.
We chose these examples due to the complexity associated
with the design of an autonomous driving agent and how it
can be more easily designed with a data-driven approach. An
autonomous driving agent periodically evaluates the vehicle
condition and correlates it with its goal following a specific
model. In the scenario proposed by Codevilla et al. [43], the
autonomous driving agent expects images and other measure-
ments alongside navigational commands to produce the next
set of inputs to control the autonomous vehicle. The agent
is powered by an Imitation Learning algorithm built over the
main inputs a real human driver uses when driving a vehicle:
images from the current scenario, encompassing frontal and
sideways cameras, the current vehicle speed, and the path to
reach its destination.
The authors collected data from an expert driver who was
controlling a vehicle in the simulation environment. The
driver is presented with several simulation scenarios on a
first-person view of the environment (i.e., central camera)
at a resolution of 800 x 600 pixels. The driver follows the
requested path while avoiding collisions and respecting speed
limits but ignoring traffic lights and stop signs. The authors
periodically sampled the available sensors and the control
variables during the simulations. The authors also applied
14 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
data augmentation techniques to add noise to the training
dataset. Moreover, images are preprocessed after sampling to
remove unnecessary information, such as cropping sky and
resizing. The Imitation Learning model is trained to learn
based on the sensor readings and the respective actions taken
by the expert driver.
In this sense, the Imitation Learning inputs comprise the
preprocessed images, the normalized vehicle speed, the speed
limit, and a directional command. The speed limit is stored
information extracted from the path the vehicle must follow
in order for the vehicle to obey previously established speed
limitations. The direction command is a categorical feature
extracted from the vehicle’s path to reach its destination and
stored before the simulation starts. Their ML model com-
prises four commands: continue (follow the road), left (turn
left at the next intersection), straight (go straight at the next
intersection), and right (turn right at the next intersection).
The CARLA simulator also provides measurements like dis-
tance traveled, current position, and occurrence of infractions.
However, they are not used in this model.
The model outputs an action command that guides the
vehicle during the simulation at each step. In this benchmark,
the actuation must be performed with a frequency of 10Hz,
thus defining a period of 100ms. An action comprises 3
control data: steering wheel angle (in radian) and throttle and
brake intensity (in m/s2). On the other hand, the controller
expects five control data: the three outputted by the ML
model and two on/off switches. The switches represent the
hand brake and the reverse gear usage, which are always off
during the simulations. Lastly, the output action goes through
a correction step before being returned to the simulator. The
throttle intensity is adjusted based on the speed limit, and the
brake is adjusted by zeroing negligible values (<0.1).
A. DECOMPOSITION INTO SMARTDATA
First, decompose the system into functional blocks for the
autonomous driving-agent, including sensing, perception, de-
cision making, and control. The system is defined over 5
Control signals, namely throttle, steer, brake, hand brake, and
reverse gear. These high-level signals need to be updated ev-
ery 100ms according to the system requirements. Therefore,
the decision making layer that produces these values must
to follow this period. The inputs required by this driving
agent to make a decision includes the preprocessed images
and speed, from the perception layer, and speed limit and
direction command, coming from local storage (i.e., sensing
layer). Finally, the preprocessing on the images requires data
from sensing layer, namely speed, and the images from three
positions, front, left, and right.
1) Timing
The Autonomous Driving Agent must actuate at the fre-
quency of 10Hz. Thus, actuation in the system follows a
period of 100ms. The expiry for this model can be defined
as a dynamic requirement of the system, as in reality, the
temporal validity of a data sample is relative to the vehicle
speed [26]. The faster the vehicle is running, the faster the
data expires. In other words, at higher speeds, even though
following a period of 100ms, the temporal validity of the data
used for the decision is crucial to achieving correctness at the
decision-making, as using data that does not represent fidelity
to the current state of the vehicle can lead to a fatality in a
critical scenario. In this way, whenever an actuation is taken,
it is possible to estimate the vehicle speed for the next period
based on the current speed, the throttle, and brake data issued,
and use this information to update the expiry requirement
and thus, update the priority of data communication and
transformation.
Nevertheless, in extreme scenarios, an expiry requirement
shall never be smaller than the latency of one actuation round
(the estimated latency from the first sampling to the end of
the actuation). Moreover, even though expiry can be dynamic,
it is bounded by known values regarding the vehicle speed
bounds, 0m/s(stopped) up to the current value of the Speed
Limit of a road segment. With this information available at
design time, alongside requirements like security, the sys-
tem’s designers can better select the infrastructure neces-
sary to guarantee timeliness and robustness, like assigning
parallel communication channels, increasing bandwidth, or
improving the capacity of processing units, both in terms of
processing power and available memory.
2) Interest Modes
For the Driving agent case study all interests are given with
a mode Single. For the specific scenarios of interests issued
to the Remove Sky and Resize Image SmartData, where three
images are produced with a respective unique disambiguation
Id for their UNIT, three interests must be issued to each of the
disambiguation Id.
3) Data Criticality
The criticality attribute represents the importance of the data
and the impact of errors on data communication. For instance,
communication errors (e.g., bit inversions) have a far higher
impact on the actuation results like brake and throttle intensity
and steering wheel angle than if the same error occurred
while communicating one of the 1.84MB images. Hand Brake
and Reverse Gear data are not covered by the model under
investigation in this example and are set as a fixed value
(always off). Thus, no communication is required to produce
such data, leading to the lowest criticality in the system. In
this sense, the criticality of the data in such an initial state
of the design can yield a notion of robustness requirement
for specific communication paths, raising possible risks over
the final model that must be attended to when selecting the
communication technology and which data will share com-
munication channels.
4) Actuator Stereotype
The system is defined over 5 «Actuator» SmartData, each
of which must actuate at the frequency of 10Hz. Thus, actu-
ation in the system follows a period of 100ms. In this sense,
VOLUME 11, 2023 15
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
the actuators are the data that compose a control command as
depicted in Figure 4, namely, the Steering Wheel, Throttle
Intensity, Brake Intensity, Hand brake, and Reverse Gear.
Each «Actuator» SmartData is tagged with its respective
SI or DIGITAL UNIT, giving the data semantics over format
and size. Brake and Throttle Intensities are accelerations in
m/s2with a size of 4B. The steering wheel is represented as
an angle in radians with a size of 4B. Hand Brake and Reverse
Gear are DIGITAL data representing an on/off switch, with a
size of 1BBoolean. Hand Brake and Reverse Gear are static
data, always set to off. However, the simulator input expects
these values. Thus, bandwidth reservation is necessary for
these actuators. Each «Actuator» SmartData (except for
Reverse Gear and Hand Brake) is interested in the Imitation
Learning Model «Transformer» SmartData.
4B
m/s2
!
( )
Reverse Gear
R
Brake Intensity
m/s2
4B
Steering Wheel
rad
4B
Throttle
Hand Brake
Boolean
1B
Boolean
1B
FIGURE 4. A depiction of the «Actuator» SmartData composing the
Autonomous Driving Agent case study.
5) Transformer Stereotype
Remove Sky
Crop
RGB
395x600p
3*925.8KB
Resize Image
Resize
RGB
88x200p
3*68.75KB
Speed
Normalization
Immitation Learning
Model
action*
12B
m/s
4B
* action = [rad, m/s , m/s ]
22
FIGURE 5. A depiction of the «Transformer» SmartData composing the
Autonomous Driving Agent case study.
The Autonomous Driving Agent system is composed by 4
sensors depicted in Figure 5 and described below:
•The first «Transformer» SmartData to be defined
are those required directly by the Actuators SmartData,
in this scenario, the Imitation Learning model. The Im-
itation Learning Model is interested in the Direction
Command issued, the preprocessed images (Resize Im-
age), the Speed Normalization, and the Speed Limit
SmartData. The Imitation Learning model outputs the
commands for steering angle and brake and throttle
intensity. These data are represented using SI UNIT for
angle (radians) and acceleration (m/s2), respectively.
•Speed Normalization «Transformer» SmartData.
This Transformer provides speed information in the
range [0,1] based on the road/benchmark configuration.
It is represented as a velocity SI UNIT with 4B. This
SmartData is interested in the Speed «Sensor» Smart-
Data.
•Resize Image «Transformer» SmartData. This
Transformer provides 3 RGB images resulting from a
resizing of the 3 RGB images provided by the Remove
Sky «Transformer» SmartData. These images are
represented by a Digital UNIT representing an RGB
image with a maximum size of 68.75KB, each with
a unique disambiguation Id for this UNIT inside the
system.
•Remove Sky «Transformer» SmartData. This
Transformer provides 3 RGB images resulting from the
removal of the sky portion of the original images out-
putted by the Camera «Sensors» SmartData. These
images are represented by a Digital UNIT representing
an RGB image with a maximum size of 925.8KB, each
with a unique disambiguation Id for this UNIT inside the
system.
Timing constraints associated with the Transformer Smart-
Data are derived from the requirements of the SmartData
interested in this SmartData and will be discussed later in
Section V. The same will apply to the remaining SmartData in
this model since they are set to supply (directly or indirectly)
the same SmartData from the output layer.
6) Sensors Stereotype
The Autonomous Driving Agent system is composed by 4
sensors depicted in Figure 6 and described below:
•3 RGB Cameras «Sensor» SmartData. Each camera
outputs an RGB image represented by a Digital UNIT
representing an RGB image with a maximum size of
1.84MB. The cameras cover Frontal, Left, and Right
views from the vehicle. Each camera is defined with
a unique disambiguation Id for this UNIT inside the
system.
•Speed «Sensor» SmartData. This SmartData provides
a vehicle speed measure and is represented by the veloc-
ity SI UNIT (m/s).
7) Storage Stereotype
Direction Command and Speed Limit are «Storage»
SmartData, as previously described, extracted from the path
the vehicle is expected to follow. The two «Storage»
SmartData composing this case study are depicted in Figure
6. The Speed Limit is represented by a velocity SI UNIT
16 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
Frontal Camera
Left Camera
RGB
800x600p
1.84MB
Right Camera
Speed
m/s
4B
Direction Command
int
4B
Speed Limit
m/s2
4B
RGB
800x600p
1.84MB
RGB
800x600p
1.84MB
FIGURE 6. A depiction of the «Sensor» and «Storage» SmartData
composing the Autonomous Driving Agent case study.
(m/s) with 4B. The Imitation Learning Model uses the Speed
Limit to dictate the throttle output when the vehicle is already
at the maximum allowed speed. The Direction Command is
a categorical feature represented as Digital UNIT of a 4B
integer. The Imitation Learning model uses the Direction
command to dictate the direction the vehicle should take at
the next intersection to reach its destination.
8) Secure and Persistent Stereotypes
In this case study, all SmartData used by the Machine Learn-
ing process (Imitation Learning «Transformer» Smart-
Data), alongside with the «Actuator» SmartData, are
tagged with the «Persistent» stereotype (except for
Storage SmartData since they are already persistent by def-
inition). Moreover, since they interact with storage facilities,
«Secure» stereotype is also applied for these SmartData in
order to represent the entry-point to interactions with non-
isolated environments. These stereotypes are represented in
the SDG by the annotations with respective illustrations of
storage (hard-drive disk illustration) and security (lock il-
lustration). For the «Persistent» stereotype, a monitor-
ing rate of 10Hz is set, resulting in a monitoring period of
100ms, the same period of actuation. For the «Secure»
stereotype, an application-specific configuration named Sec1
is set. The persistency is modeled here as monitoring the
actuation signals and the respective inputs for offline analysis
(e.g., retraining, simulation validation, or auditing). The Sec1
configuration, for this example, is taken as the need for en-
cryption of data for communication and storage, considering
confidentiality and authenticity, which shall be handled by the
secure communication protocol (e.g., TSTP/FT-TSTP [45],
[46], including secure bootstrap procedures, key exchange,
authentication, and encryption), and the storage mechanism
of choice (e.g., cloud storage or internal logging in non-
volatile memory).
To build a dataset to train the Imitation Learning model,
the authors performed a round of simulations in which an
expert driver was in control of the vehicle, and the data
that will be used as input was collected alongside the expert
driver actuation regarding steering wheel angle and break and
throttle intensity. In this sense, the persistency modeled here
can also be applied in case the Imitation Learning model is
removed from the flow, and the driver commands are taken
instead, with the actuators taken as «Sensor» SmartData.
B. SMARTDATA GRAPH
Following the description presented in the previous sections,
we can see that complex cyber-physical systems are them-
selves driven by data, especially those using AI-based solu-
tions. The SDG that represents the system requirements at
design time is depicted in Figure 7.
In this figure, we have the Brake Intensity, Steering Wheel,
and Throttle Intensity «Actuator» SmartData issuing an
interest to their respective command data produced at the
Imitation Learning Model with ρ= 100ms and E=E1.
These three actuators are tagged for persistency, with ρ=
100ms, and security, with configuration 1 (Sec1), as pre-
viously described in Section VI-A, encryption of data for
communication and storage, considering confidentiality and
authenticity, which the secure communication protocol shall
handle. As discussed in Section VI-A, the Hand Brake and the
Reverse Gear «Actuator» SmartData are static and have
no interest in other SmartData.
The Imitation Learning Model «Transformer» Smart-
Data requires Speed Limit, Direction Command, Resize Im-
age, Speed Normalization SmartData. The interests issued
to Remove Sky and Resize Image have been represented
as a single arrow to avoid overloading the image. The size
of these SmartData is represented as 3∗imagesize once
we have three images being communicated. Moreover, the
Imitation Learning Model is also tagged with persistent and
security stereotypes in the same configuration as Brake In-
tensity, Steering Wheel, and Throttle Intensity «Actuator»
SmartData. Following Algorithm 1, the interests issued by the
Imitation Learning Model shall follow the greatest common
divisor for period and expiry from its Interested set (i.e.,
Brake Intensity, Steering Wheel, and Throttle Intensity), thus
yielding ρ= 100ms and E=E1.
The Speed Normalization «Transfor mer» SmartData
normalizes the Speed sensor. As previously described in Sec-
tion VI-A, this SmartData is also tagged with persistency and
security stereotypes in the same way as Brake Intensity, Steer-
ing Wheel, and Throttle Intensity «Actuator» SmartData.
Accordingly, the interest issued by the Speed Normalization
shall follow ρ= 100ms and E=E1.
The Resize Image «Transformer» SmartData requires
the Remove Sky SmartData. As previously described in Sec-
tion VI-A, this «Transformer» SmartData is also tagged
with persistency and security stereotypes in the same way
as Brake Intensity, Steering Wheel, and Throttle Intensity
«Actuator» SmartData. Accordingly, the interest issued
by Resize Image shall follow ρ= 100ms and E=E1.
The Remove Sky «Transformer» SmartData re-
quires Left Camera, Frontal Camera, Right Camera Smart-
Data. As previously described in Section VI-A, this
«Transformer» SmartData is also tagged with persis-
tency and security stereotypes in the same way as Brake In-
tensity, Steering Wheel, and Throttle Intensity «Actuator»
VOLUME 11, 2023 17
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
!
( )
Reverse Gear
R
Frontal Camera
Left Camera
RGB
800x600p
1.84MB
Right Camera
Remove Sky
Crop
RGB
395x600p
3*925.8KB
Speed
m/s
4B
Direction Command
int
4B
Resize Image
Resize
RGB
88x200p
3*68.75KB
Speed
Normalization
Immitation
Learning
Model
action*
12B
Speed Limit
m/s2
4B
100ms,E1
100ms,E1
100ms,E1
100ms,E1
100ms,E1
100ms,E1
100ms,E1
100ms,E1
100ms,E1
100ms,E1
100ms,E1
Brake Intensity
m/s2
4B
Steering Wheel
rad
4B
Throttle Intensity
* action = [rad, m/s2, m/s2]
Hand Brake
Boolean
1B
100ms,E1
RGB
800x600p
1.84MB
RGB
800x600p
1.84MB
m/s
4B
m/s2
4B
Boolean
1B
Data Criticality
01
100ms
Sec1
Sec1
Sec1
100ms
Sec1
100ms
Sec1
100ms
Sec1
100ms
Sec1
100ms
Sec1
FIGURE 7. A depiction of the SDG of the Imitation Learning-based Autonomous Driving Agent.
SmartData. Accordingly, the interest issued by Resize Image
shall follow ρ= 100ms and E=E1.
«Sensor» SmartData, namely Left Camera, Frontal
Camera, Right Camera, and Speed, have an empty Interest
set. They should follow their Interested set requirements for
data sampling, i.e., ρ= 100ms and E=E1. The same applies
to the Direction Command and Speed Limit «Storage»
SmartData. Moreover, these two «Storage» SmartData
are also tagged with the security stereotype to ensure their
authenticity.
C. EXPERIMENTAL RESULTS
The integration of SmartData with Simulators is proposed
in [47]2. The integration consists of externalizing all sen-
sors inside the simulator to the SmartData API through a
SmartData_Handler, which communicates the data to
the corresponding SmartData, acting as a Transducer.
ASmartData_Service holds the Sensor wrapper im-
plementation and forwards the data into a SmartData net-
work, a shared Message Bus that communicates all in-
terest requests and the respective responses whenever a
sensor is updated. Similarly, the SmartData_Handler
also connects Actuator SmartData back into the simu-
lation. Each SmartData_Service is implemented ac-
cording to the SDG presented in Figure 7. To connect
the CARLA simulation with the SmartData network, the
SmartData_Handler was implemented as client con-
nected through the simulator using the CARLA Client API.
CARLA Client API enables the simulation customization
during execution, for instance, adding new vehicles, adding
new sensors, and collecting information regarding all ob-
2The source code is available in https://gitlab.lisha.ufsc.br/iot/
smartdata-linux/- /tree/Transparent_Integration_of_AV- DAEM.
jects inside the simulation (see https://carla.readthedocs.io/
for more details on the CARLA Client API).
The same case study explored here was also explored in
[47]. The experimental results base on a CARLA server run-
ning the driving benchmark based on the CoRL2017 experi-
ment suit [44], including tasks with straight navigation, one
turn, and others. The SmartData_Handler starts the sim-
ulation and receives periodic interruptions from the simulator
to read the relevant data and return the control commands
for the AV. The communication inside the SmartData net-
work follows the Trustful Space-Time Protocol (TSTP) [45]
communication protocol. TSTP encompasses the notions of
timing, semantics, and security expressed by SmartData. In
this sense, TSTP promotes issuing Interest, Response, and
Advertise Messages, which covers the registration of Interest
relations online.
Scenarios
FIGURE 8. Comparison of end-to-end latency of CARLA Client and the
addition of SmartData communication layer [47].
The average end-to-end latency for the simulation is pre-
sented in Figure 8. The end-to-end latency of the base-
line CARLA client interacting with the simulation (index
0) presented average end-to-end latency of 18.839ms with
18 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
a standard deviation of 1.228ms. The SmartData implemen-
tation interacting with the simulator (index 1) presented
33.298ms average end-to-end delay, with a standard devia-
tion of 5.941ms. The increase in end-to-end latency reflects
the implications of a real-world application in contrast with
the regular CARLA client that runs all applications inside
a single program, ignoring the latency associated with data
communication between system components (e.g., sensors to
ECU, and ECU to actuators).
In terms of coding, a SmartData is defined according
to the parameters presented in Section IV. Algorithm 2
presents a code snippet defining a local «Sensor» Smart-
Data (one of the cameras from the example camera) and
a«Transformer» SmartData, the Imitation Learning
Model, that uses remote SmartData (will generate an Interest
message) to produce a new instance. A «Sensor» Smart-
Data specifies which transducer will be employed (line 2),
and the transducer implementation handles the interaction
with the physical world and defines the UNIT of the Smart-
Data. The SmartData base class (Responsive SmartData,
constructor presented in line 2) handles communication and
timing, while extracting location from the system parame-
ters, all transparent to the final user. The instantiation of the
SmartData object in the example specifies the disambiguation
identifier named device, the baseline expiry (for local usage),
if the SmartData is private or it will be advertised in the com-
munication bus, and the baseline period (for local usage) (line
3). A «Transformer» SmartData specifies which transfor-
mation will be employed (line 17), encapsulated in a Trans-
former, which will declare a interests (space-time region, de-
vice, and temporal constraints associated to it) to all data nec-
essary to compute the transformation. First, we define a proxy
through the definition of an Interested_SmartData type for
each input based on their UNIT (line 5 for the example of the
Resized_Images_Proxy). The constructor of the Transformer,
in this example the Imitation_Learning_Model_Transformer,
declares the interest to each input (lines 7 to 9), namely, the
resized images, direction command, and normalized speed,
all generated within a sphere with diameter 2 meters and
center at the center of the vehicle, from the current timestamp
until the end of the system execution, following a given
expiry and period with mode SINGLE and the respective
disambiguation device. The Transformer attaches itself to the
remote SmartData, so any updates to their value will notify
its update (Concurrent-Observer) (lines 10 to 12). Finally
it declares its outputs as private (lines 13 to 15). In this
example, the outputs will compose a Multi-Unit SmartData
that will be advertised at the declaration of the new instance
of the transformer (line 17 presents the declaration of the
respective Responsive_SmartData type and the new instance
is presented in line 18).
A«Storage» SmartData is similar to a «Sensor»
SmartData, where its transducer interacts with a data-base
or a local file. An actuator SmartData is implemented as a
Sensor SmartData to which its Transducer is defined as an
actuator instead of a sensor. Finally, adding «Secure» or
«Persistent» will lead to the aggregation of the respec-
tive operations at the update() method.
VII. DISCUSSIONS
A Data-Driven Design of a Critical System can be built di-
rectly from the system data requirements in early stages. With
the notion of how actuation is expected for this system (e.g., at
a given rate or through an event detection), we can comprise
the system complexity by modeling the necessary dataflows
to perform this actuation. By enriching the data properties
with semantics, timing, security, and criticality, we enable
a straightforward representation of the system’s complexity
that can provide notions regarding the lower bounds of the
system. For instance, in the SmartData Graph of the system,
the size of SmartData is derived from their UNIT. Building
the data relations from the actuators to the transformational
and sensors, we make the SDG to adapt its sampling based
on Interest requests. Thus, we can estimate the number of
messages SmartData will generate with the resulting s.period
obtained ∀s∈SDG.Vfrom the SDG returned from Algo-
rithm 1.
With the notion of error tolerance for specific data, we can
also identify the critical paths in the systems. Thus, designers
can choose a more reliable technology for communication
and processing in critical data flows while allocating re-
sources based on performance or energy consumption for low
criticality paths. For instance, image data requires a faster
communication speed due to its size but can also tolerate
higher noise in the data than control commands. Besides that,
with the notions of expiry, a new definition of priority for
communication and processing in a data-driven system is
made available, tailoring the system to handle strict timing
requirements to prioritize the communication and consump-
tion of data with shorter expiry. Moreover, tasks in between
data source and actuators, which ephemerally exist to perform
transformations, can derive their timing requirements and
priorities directly from the expiry [48].
Data has size. Physical resources have capacity estimates
to handle data. Networks have definitions of communica-
tion protocol (e.g., packet size, concurrency, duty-cycling).
Stream processing units have throughput estimates. Sensors
and actuators have response time estimates. Combining the
proposed design with the aforementioned estimates can more
promptly derive a system-wide latency analysis. For instance,
in previous work [36], we demonstrate that by specifying
the data communication through interests with periods and
expires, we can estimate at design-time the schedulability of
a time-triggered wireless sensor network while encompassing
a margin of safety notion as a resource reservation metric to
enable fault-tolerance.
From embedded systems to IoT and Cloud Platforms,
SmartData have been featured as a developing concept for
multiple applications in the past, even before its formal defi-
nition in 2018 [11]. In [49], [50], SmartData have been used
as the API for developing a SmartBuilding in the Federal
University of Santa Catarina. By using SmartData, the pro-
VOLUME 11, 2023 19
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
Algorithm 2 SmartData Examples in C++
// Responsive SmartData constructor
1: Responsive_SmartData(DeviceId dev,Mode mode =PRIVATE ,Microsecond period = 0)
...
// Specifying a Responsive SmartData type for the Camera using its Transducer via C++ template
2: using Camera =Responsive_SmartData <Camera_Transducer >;
// Declaration of a new SmartData Instance that advertises its datum in the network
3: Camera cam(17,expiry,ADVERTISED);
...
// Interested SmartData constructor
4: Interested_SmartData(const Region ®ion,Time expiry,Microsecond period = 0,Mode mode =SINGLE ,
Device_Id device = 0)
...
// Specifying a new Interested SmartData type using the Camera Transducer UNIT via C++ template
5: using Resized_Image_Proxy =Interested _SmartData <Resized _Image_Transformer :: UNIT >;
...
// Declaration of Interests in Imitation Learning Model Transformer Constructor
6: Imitation_Learning_Model_Tranformer(constDevice_Id&dev){
7: _ip =new Resized_Image_Proxy(Region(0,0,0,2,now(),INFINITE),EXPIRY ,PERIOD,SINGLE ,19);
8: _dir =new Direction_Proxy(Region(0,0,0,2,now(),INFINITE),EXPIRY ,PERIOD,SINGLE,1);
9: _sp =new Speed_Norm_Proxy(Region(0,0,0,2,now(),INFINITE),EXPIRY ,PERIOD,SINGLE,2);
// attach my inputs (Concurrent Observer) to trigger update on reception
10: _ip−>attach(this);
11: _dir−>attach(this);
12: _sp−>attach(this);
// Declaration of outputs that will be updated every update()
13: _ang =new Angle(3,EXPIRY ,PRIVATE);
14: _at =new Acceleration(4,EXPIRY ,PRIVATE);
15: _br =new Acceleration(5,EXPIRY ,PRIVATE);
16: }
...
// Specifying a Responsive SmartData type for the Imitation Learning Model using its Transformer via C++ template
17: using Imitation_Learning_Model =Responsive_SmartData <Imitation_Learning_Model_Tranformer >;
// Declaration of a new Imitation_Learning_Model SmartData Instance that advertises its datum in the network (Multi-Unit
SmartData)
18: Imitation_Learning_Model ilm(12,expiry,ADVERTISED);
posed approach was capable of intelligent energy manage-
ment in the building while providing data persistence through
monitoring integrated with LISHA’s IoT Platform (https://iot.
lisha.ufsc.br). Similarly, SmartData have also been applied
in the WSN context for hydrology monitoring, including
precipitation, water level, and water flow [51].
More recent developments include the Intelligent Acqui-
sition and Analysis System for ECUs (IASE) project [52], a
joint effort of LISHA and Renault do Brasil, which developed
a low-cost data acquisition system based on SmartData to
collect ECU data from vehicles and promote near real-time
data upload and storage in a cloud platform. Based on this
platform, we have investigated the definition of the safety
properties of SmartData time-series based on the monitor-
ing specification, without unveiling the internal relations of
data. Cloud-based [53] and embedded-based verification [54]
have been proposed. An integration of SmartData and Formal
Method for the online verification of safety properties based
on Safety Models has also been explored in [32], [55], fol-
lowing the design principles of expiry and period to promote
the definition of a global Safety Enforcement Unit.
20 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
VIII. CONCLUSION
The knowledge acquired during the last decades of develop-
ment culminated in SmartData as a tool for designing and im-
plementing data-driven systems. In this paper, we presented
a novel design approach for Data-Driven Critical Systems,
where, instead of relying on the definition of tasks, our design
focuses on the specification of the system over the data it
produces and handles. In this way, our design promotes a
solution for modeling the system over data dependencies and
their respective requirements, like timing, security, criticality,
and communication. Data is modeled broadly, from physical
data sources to data that drive actuators or must persist. Data
is associated with the concept of temporal validity, expressed
through an expiry. Expiry enables a measure to ensure validity
in the data processing, where only the most up-to-date data is
considered for decision-making.
In this paper, we focused on a case study of an autonomous
vehicle once its complexity summarizes the challenges of the
previous study cases, including security, location, persistency,
timing, and safety. At the same time, the abstraction of image
processing, path planning, and motion planning steps in the
autonomous vehicle approach described by Codevilla et al.
in [43] enabled a succinct example of the SDG proposed.
Moreover, in [47] we have also covered the modeling of
autonomous vehicles components considering a driving agent
not based on Imitation Learning, but on specific modules,
including Vision-based Perception, Dynamics sensor, Object
Recognition and Tracking, V2X communication, Path Plan-
ning, Motion Planning, and High-Level Control. The paper
consists of defining the UNITS and data dependency in order
to model components as SmartData services in a framework
for simulator integration with Hardware- and Software-in-
the-loop capabilities. Nevertheless, the complete modeling of
such components altogether in a SDG, evaluating its timing
properties, is still on the road-map for future works.
REFERENCES
[1] Junping Zhang, Fei-Yue Wang, Kunfeng Wang, Wei-Hua Lin, Xin Xu, and
Cheng Chen. Data-driven intelligent transportation systems: A survey.
IEEE Transactions on Intelligent Transportation Systems, 12(4):1624–
1639, 2011.
[2] Hao Ye, Le Liang, Geoffrey Ye Li, JoonBeom Kim, Lu Lu, and May Wu.
Machine learning for vehicular networks: Recent advances and application
examples. IEEE Vehicular Technology Magazine, 13(2):94–101, 2018.
[3] Yuchen Jiang, Shen Yin, and Okyay Kaynak. Data-driven monitoring and
safety control of industrial cyber-physical systems: Basics and beyond.
IEEE Access, 6:47374–47384, 2018.
[4] Daniel E. O’Leary. Artificial intelligence and big data. IEEE Intelligent
Systems, 28(2):96–99, 2013.
[5] Iqbal H. Sarker. Data science and analytics: An overview from data-
driven smart computing, decision-making and applications perspective. SN
Computer Science, 2(5):377, Jul 2021.
[6] Christopher J. Turner, John Oyekan, Lampros Stergioulas, and David
Griffin. Utilizing industry 4.0 on the construction site: Challenges and
opportunities. IEEE Transactions on Industrial Informatics, 17(2):746–
756, February 2021.
[7] Alessandro Biondi, Federico Nesti, Giorgiomaria Cicero, Daniel Casini,
and Giorgio Buttazzo. A safe, secure, and predictable software architecture
for deep learning in safety-critical systems. IEEE Embedded Systems
Letters, 12(3):78–82, September 2020.
[8] H. Kopetz and G. Bauer. The time-triggered architecture. Proceedings of
the IEEE, 91(1):112–126, January 2003.
[9] Mohammad Ashjaei, Lucia Lo Bello, Masoud Daneshtalab, Gaetano Patti,
Sergio Saponara, and Saad Mubeen. Time-sensitive networking in auto-
motive embedded systems: State of the art and research opportunities. In
Journal of Systems Architecture, volume 117, page 102137, 2021.
[10] Miguel Alcon, Hamid Tabani, Leonidas Kosmidis, Enrico Mezzetti, Jaume
Abella, and Francisco J. Cazorla. Timing of autonomous driving software:
Problem analysis and prospects for future solutions. In 2020 IEEE Real-
Time and Embedded Technology and Applications Symposium (RTAS),
pages 267–280, 2020.
[11] Antônio Augusto Fröhlich. SmartData: an IoT-ready API for sensor
networks. International Journal of Sensor Networks, 28(3):202, 2018.
[12] IEEE Instrumentation and Measurements Society. Ieee standard for a smart
transducer interface for sensors and actuators - common functions, com-
munication protocols, and transducer electronic data sheet (teds) formats.
IEEE Std 1451.0-2007, pages 1–335, 2007.
[13] M. K. Ludwich and A. A. Frohlich. Proper handling of interrupts in cyber-
physical systems. In 2015 International Symposium on Rapid System
Prototyping (RSP), pages 83–89, 2015.
[14] R. M. Scheffel and A. A. Fröhlich. Wsn data confidence attribution using
predictors. In 2018 Eighth Latin-American Symposium on Dependable
Computing (LADC), pages 145–154, 2018.
[15] D. Resner and A. A. Fröhlich. Tstp mac: A foundation for the trustful
space-time protocol. In 2016 IEEE Intl Conference on Computational
Science and Engineering (CSE) and IEEE Intl Conference on Embedded
and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed
Computing and Applications for Business Engineering (DCABES), pages
40–47, 2016.
[16] Mateus Lucena, Roberto Milton Scheffel, and Antonio Augusto Frohlich.
IoT gateway integrity checking protocol. In 2019 IX Brazilian Symposium
on Computing Systems Engineering (SBESC), pages 1–8. IEEE, November
2019.
[17] Kay Klobedanz, Christoph Kuznik, Andreas Thuy, and Wolfgang Müller.
Timing modeling and analysis for autosar-based software development -
a case study. 2010 Design, Automation & Test in Europe Conference &
Exhibition (DATE 2010), pages 642–645, 2010.
[18] J. Kim, I. Kang, Sungwon Kang, and A. Boudjadar. A process algebraic
approach to resource-parameterized timing analysis of automotive software
architectures. In IEEE Transactions on Industrial Informatics, volume 12,
pages 655–671, 2016.
[19] Héctor Posadas, Javier Merino, and Eugenio Villar. Data flow analysis
from uml/marte models based on binary traces. In 2020 XXXV Conference
on Design of Circuits and Integrated Systems (DCIS), pages 1–6, 2020.
[20] Matthias Becker, Dakshina Dasari, Saad Mubeen, Moris Behnam, and
Thomas Nolte. End-to-end timing analysis of cause-effect chains in
automotive embedded systems. Journal of Systems Architecture, 80:104–
113, 2017.
[21] Andrés Paz, Ghizlane El Boussaidi, and Mili Hafedh. checsdm: A method
for ensuring consistency in heterogeneous safety-critical system design.
In IEEE Transactions on Software Engineering, volume 47, pages 2713–
2739, December 2020.
[22] Bülent Sari and Hans-Christian Reuss. A model-driven approach for
the development of safety-critical functions using modified architecture
description language (adl). In 2016 International Conference on Electrical
Systems for Aircraft, Railway, Ship Propulsion and Road Vehicles Inter-
national Transportation Electrification Conference (ESARS-ITEC), pages
1–5, 2016.
[23] Lukas Krawczyk, Mahmoud Bazzal, Harald Mackamul, Raphael Weber,
and Carsten Wolff. Complex event models for automotive embedded
systems. In Journal of Systems Architecture, page 102343, 2021.
[24] S. Bateni and C. Liu. Predictable data-driven resource management: an
implementation using autoware on autonomous platforms. In 2019 IEEE
Real-Time Systems Symposium (RTSS), pages 339–352, 2019.
[25] Kai Liu, Victor C. S. Lee, Joseph K. Y. Ng, Sang H. Son, and Edwin H.-
M. Sha. Scheduling temporal data with dynamic snapshot consistency
requirement in vehicular cyber-physical systems. ACM Trans. Embed.
Comput. Syst., 13(5s), October 2014.
[26] G. R. Goud, N. Sharma, K. Ramamritham, and S. Malewar. Efficient real-
time support for automotive applications: A case study. In 12th IEEE
International Conference on Embedded and Real-Time Computing Systems
and Applications (RTCSA’06), pages 335–341, 2006.
[27] Grady Booch. Object oriented design with applications. Benjamin-
Cummings Publishing Co., Inc., USA, 1990.
VOLUME 11, 2023 21
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Conradi Hoffmann and Fröhlich: Data-Driven Design of Critical Systems
[28] Antônio Fröhlich and Medeiros Frohlich. Application-Oriented Operat-
ing Systems. GMD — Forschungszentrum Informationstechnik GmbH.,
Germany, 03 2002.
[29] Daniel Jiang and Luca Delgrossi. Ieee 802.11p: Towards an international
standard for wireless access in vehicular environments. In VTC Spring
2008 - IEEE Vehicular Technology Conference, pages 2036–2040, 2008.
[30] Cheolkyu Shin, Emad Farag, Hyunseok Ryu, Miao Zhou, and Younsun
Kim. Vehicle-to-everything (v2x) evolution from 4g to 5g in 3gpp:
Focusing on resource allocation aspects. IEEE Access, 11:18689–18703,
2023.
[31] Jiangtao Li, Zhaoheng Song, Zihou Zhang, Yufeng Li, and Chenhong Cao.
In-vehicle digital forensics for connected and automated vehicles with
public auditing. IEEE Internet of Things Journal, 11(4):6368–6383, 2024.
[32] José Luis Conradi Hoffmann and Antônio Augusto Fröhlich. Smartdata
safety: Online safety models for data-driven cyber-physical systems. In
IECON 2022 – 48th Annual Conference of the IEEE Industrial Electronics
Society, pages 1–6, 2022.
[33] L. P. Horstmann, J. L. C. Hoffmann, and A. A. Fröhlich. A framework
to design and implement real-time multicore schedulers using machine
learning. In 2019 24th IEEE International Conference on Emerging
Technologies and Factory Automation (ETFA), pages 251–258, 2019.
[34] Dea Angelia Kamil, Wahyono, Agus Harjoko, and Kang-Hyun Jo. Vehicle
speed estimation using consecutive frame approaches and deep image
homography for image rectification on monocular videos. IEEE Access,
12:181937–181952, 2024.
[35] Pui Yee Leong and Nur Syazreen Ahmad. Lidar-based obstacle avoid-
ance with autonomous vehicles: A comprehensive review. IEEE Access,
12:164248–164261, 2024.
[36] César Huegel Richa, Mateus M. de Lucena, Leonardo Passig Horstmann,
José Luis Conradi Hoffmann, and Antônio Augusto Fröhlich. Modeling
time requirements of CPS in wireless networks. Sensors, 20(7):1818,
March 2020.
[37] Jameela Al-Jaroodi, Nader Mohamed, Imad Jawhar, and Sanja Lazarova-
Molnar. Software engineering issues for cyber-physical systems. In 2016
IEEE International Conference on Smart Computing (SMARTCOMP),
pages 1–6, 2016.
[38] T. Hinrichs and B. Buth. Can ai-based components be part of dependable
systems? In 2020 IEEE Intelligent Vehicles Symposium (IV), pages 226–
231, 2020.
[39] Mariano De Paula and Gerardo G. Acosta. Trajectory tracking algo-
rithm for autonomous vehicles using adaptive reinforcement learning. In
OCEANS 2015 - MTS/IEEE Washington, pages 1–8. IEEE, October 2015.
[40] Weizi Li, David Wolinski, and Ming C. Lin. ADAPS: Autonomous driving
via principled simulations. In 2019 International Conference on Robotics
and Automation (ICRA), pages 7625–7631. IEEE, May 2019.
[41] Jose Luis Conradi Hoffmann and Antonio Augusto Frohlich. Online
machine learning for energy-aware multicore real-time embedded systems.
IEEE Transactions on Computers, pages 1–1, 2021.
[42] Leonardo Passig Horstmann, José Luís Conradi Hoffmann, and An-
tônio Augusto Fröhlich. Performance monitoring features in epos. In 2021
XI Brazilian Symposium on Computing Systems Engineering (SBESC),
pages 1–8, 2021.
[43] Felipe Codevilla, Matthias Müller, Antonio López, Vladlen Koltun, and
Alexey Dosovitskiy. End-to-end driving via conditional imitation learn-
ing. In 2018 IEEE International Conference on Robotics and Automation
(ICRA), pages 4693–4700, 2018.
[44] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and
Vladlen Koltun. CARLA: An open urban driving simulator. In Proceedings
of the 1st Annual Conference on Robot Learning, pages 1–16, 2017.
[45] Davi Resner and Antonio Augusto Frohlich. Design rationale of a cross-
layer, trustful space-time protocol for wireless sensor networks. In 2015
IEEE 20th Conference on Emerging Technologies & Factory Automation
(ETFA). IEEE, September 2015.
[46] Roberto Milton Scheffel and Antonio Augusto Frohlich. FT-TSTP: A
multi-gateway fully reactive geographical routing protocol to improve
WSN reliability. In 2018 IEEE International Conference on Advanced
Networks and Telecommunications Systems (ANTS), pages 1–8. IEEE,
December 2018.
[47] José Luis Conradi Hoffmann, Leonardo Passig Horstmann, and An-
tônio Augusto Fröhlich. Transparent integration of autonomous vehicles
simulation tools with a data-centric middleware. Design Automation for
Embedded Systems, January 2024.
[48] Antonio A. Frohlich and Davi Resner. DATA-CENTRIC CYBER-
PHYSICAL SYSTEMS DESIGN WITH SMARTDATA. In 2018 Winter
Simulation Conference (WSC), pages 1274–1285. IEEE, December 2018.
[49] Arliones Hoeller Jr. and Antônio Augusto Fröhlich. Smartbuildings as em-
bedded distributed systems. In 2014 Brazilian Symposium on Computing
Systems Engineering (SBESC), pages 1–2, 2014.
[50] Antônio Augusto Fröhlich, Eduardo Augusto Bezerra, and
Leonardo Kessler Slongo. Experimental analysis of solar energy
harvesting circuits efficiency for low power applications. Computers &
Electrical Engineering, 45:143–154, 2015.
[51] Simone Malutta, Giovani Gracioli, Jhonatan Cristian Pscheidt,
Tiago Guizoni Neto, Allan Thiesen, Cauê Val Arruda, Cesar Augusto
Pompêo, Antônio Augusto Fröhlich, and Nádia Bernardi Bonumá.
Monitoramento hidrológico da bacia hidrográfica no campus da ufsc em
joinville utilizando a plataforma eposmote iii. In Proceedings of the XXII
Simpósio Brasileiro de Recursos Hídricos, pages 1–8, 2017.
[52] João Paulo Bedretchuk, Sergio Arribas García, Thiago Nogiri Igarashi,
Rafael Canal, Anderson Wedderhoff Spengler, and Giovani Gracioli. Low-
cost data acquisition system for automotive electronic control units. Sen-
sors, 23(4):2319, Feb 2023.
[53] José Luis Conradi Hoffmann, Leonardo Passig Horstmann, and Anto-
nio Augusto Frohlich. Using formal methods for on-the-fly time series
verification. In Proceedings of the 12th Latin-American Symposium on
Dependable and Secure Computing, LADC ’23, page 21–29, New York,
NY, USA, 2023. Association for Computing Machinery.
[54] José Luis Conradi Hoffmann and Antônio Augusto Fröhlich. Data-centric
design for formal verification of vehicle monitoring. In 2023 XIII Brazilian
Symposium on Computing Systems Engineering (SBESC), pages 1–6, 2023.
[55] José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Matheus Wag-
ner, Felipe Vieira, Mateus Martínez de Lucena, and Antônio Augusto Fröh-
lich. Using formal methods to specify data-driven cyber-physical systems.
In 2022 IEEE 31st International Symposium on Industrial Electronics
(ISIE), pages 643–648, 2022.
JOSÉ L. CONRADI HOFFMANN received an
MSc degree in 2020, from the Federal Univer-
sity of Santa Catarina, Brazil, and is currently a
Ph.D. student at UFSC, where he is a member of
the Software/Hardware Integration Lab (LISHA)
since 2018. His research interests include Real-
Time embedded systems, multicore processors,
Formal Methods, Security, and Safety.
ANTÔNIO A. FRÖHLICH is a Full Professor
at the UFSC, where leads LISHA since 2001.
With a Ph.D. in Computer Engineering from TU-
Berlin, he has coordinated several R&D projects
on embedded systems. Significant contributions
from these projects materialized within the Brazil-
ian Digital Television System and IoT technol-
ogy for Energy Distribution, Smart Cities, and
Autonomous Systems. He is a senior member of
ACM, IEEE, and SBC.
22 VOLUME 11, 2023
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3548542
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/