ArticlePDF Available

Dynamic IoT Choreographies


Abstract and Figures

The Internet of Things is growing at a dramatic rate and extending into various application domains. We have designed, implemented, and evaluated a resilient and decentralized system to enable dynamic IoT choreographies. We applied it to maintaining the functionality of building automation systems so that new devices can appear and vanish on-the-fly.
Content may be subject to copyright.
FEATURE ARTICLE: IoT Communications
Dynamic IoT
Managing Discovery, Distribution, Failure and
The Internet of Things is growing at a dramatic rate
and extending into various application domains. We
have designed, implemented and evaluated a
resilient and decentralized system to enable dynamic
IoT choreographies. We applied it to maintaining the
functionality of building automation systems, so that
new devices can appear and vanish on-the-fly.
Pervasive connectivity in machine to machine communication as well as advancements in sensor
and actuator technology has given rise to the Internet of Things (IoT). For companies such as
Siemens, the IoT concept plays a tremendous role for their ongoing evolution towards full digi-
talization. Smarter cities, eHealth systems, or Industry 4.0-enabled manufacturing plants are
transforming into IoT environments. In this paper, we take an application from the building auto-
mation domain as an example to illustrate our solution. While building automation systems
(BAS) have certain specific constraints and characteristics, the solutions presented in this paper
are applicable to other IoT domains.
In today’s building infrastructures, we find heterogeneous devices such as lights, switches, win-
dow shutters, air conditioners, light sensors, or thermostats, which are being IoT-enabled. I.e.,
the communication with such components is based on Internet and Web technologies, such as
the HTTP or CoAP, REST interfaces and data are exchanged via JSON. In the future, such IoT
environments will become very dynamic. New devices will vanish and appear on-the-fly during
their lifetime or change their properties.
As an example, imagine a reconfigurable multi-purpose room that users can partition into multi-
ple rooms by movable walls. When users move walls in the room, the installed equipment should
automatically adapt to the new room configuration, which requires dynamic reconfiguration of
the building system.
The operation of such a dynamic room today requires the involvement of technicians at each re-
configuration stage, or provide a suboptimal user experience, with light switches not operating
Jan Seeger
Siemens AG / TU Munich
Rohit A. Deshmukh
TU Darmstadt
Vasil Sarafov
TU Munich
Arne Bröring
Siemens AG
the correct lights, or illuminating only a portion of the room. For reasons of brevity, we have
confined our example in this paper to a simple switch-light system which might appear in such a
dynamic meeting room, where a switch operates an indeterminate number of lights that may
change dynamically.
For these reasons, we have chosen to focus on the domain of building automation systems for
our application example. BAS have additional constraints that are not present in general IoT sys-
tems, but the solutions presented in this paper are applicable to other domains.
We approach these challenges by extending semantic application descriptions (called recipes)
with constraints to enable dynamic and automatic reconfiguration of applications. These recipes
are executed as a distributed and autonomous choreography that behaves according to constrains
configured during design time. To ensure reliability, we provide a novel mechanism based on
accrual failure detection that detects failures in the distributed system and recovers dynamically
using the defined constraint system. Finally, we evaluate our implementation of the approach
through an application demonstration. Further, we test the performance of the failure detection
mechanism and compare it to two related approaches. This work builds up on our previous
works on semantically-enabled IoT device composition [1], [2] and marks a further milestone on
our continued research path.
When new devices are added to an IoT environment such as a building automation system
(BAS), they need to be connected physically, and the software on the controller need to be repar-
ametrized and reconfigured. Today, this task requires substantial effort and profound expertise.
Therefore, it is not possible to enable dynamic behavior in such a traditional BAS.
Thuluva et al. [3] employ Semantic Web technologies to enable low-effort engineering of auto-
mation systems. However, in their Industry 4.0 approach, they do not focus on the runtime as-
pects and dynamic reconfiguration or failure detection yet. Using our integration of failure detec-
tion and rule-based interaction, the system can be dynamically reconfigured at runtime without
user interaction.
The “scope” concept described by Mottola et al. in [4] is similar to our concept of offering selec-
tion rules. However, the system does not support dynamic scopes, whereas our system supports
the modification of device descriptions on-the-fly during the operation of the system and thus the
dynamic reconfiguration of devices.
The DS2OS system by Pahl et al. [5] supports dynamic addition and removal of services through
the use of the blackboard pattern for communication, but no implicit mechanism is provided to
deal with newly appearing devices, as this must be implemented manually by the user. Our sys-
tem does not require special handling of dynamic behavior, only a specification of the require-
ments of the service.
Our system composes different services offered by web-enabled IoT offerings to form an appli-
cation. Web service composition can be classified into two types, service orchestration and ser-
vice choreography, based on the way the participant services interact (for more information, see
Sheng et al.[6]). Applications in the web-services composition field follow the orchestration ap-
proach, and do not cover reliability facets. Khan et al. [7] propose a reliable IoT infrastructure,
but focus solely on communication links. They do not examine the aggregation of nodes into
larger applications as provided by our recipe model.
A basis for reliability in IoT environments is the detection of failures of involved devices and
services. A high level approach for such failure detection for living spaces with emphasis on sys-
tem maintenance is presented in [8]. Kodeswaran et al. suggest detecting activities of daily living
in IoT home environments which are used to reason about the expected future degradation of the
connected devices. This approach is valid only for smart home applications and is not suitable
for explicit failure detection in general purpose IoT systems. The failure detector described in
this work is not restricted to any specific application areas.
In [9], Chetan et al. present the Gaia middleware for pervasive distributed systems which imple-
ments a fault-tolerant mechanism. However, the failure detection in this system is centralized,
which means that the controller is required to be available during the operation of the services. In
our recipe system, this is not the case, as nodes can detect the failure of other nodes without
communicating with the controller.
In [10], De Moraes Rosetto et al. present the design of an unreliable failure detector for ubiqui-
tous environments, which supports grouping and assigning of different impact factors of nodes.
A distributed fault detection algorithm in a sensor application based on trust management is pre-
sented by Guclu et al. in [11]. They propose to use a method that relies on evaluating statements
about trustworthiness of aggregated data from neighbors. This approach is valid only for homo-
geneous networks where data of the same kind is collected. Thus, it is not applicable in our rec-
ipe system, where we use a more general failure detector.
Today, IoT devices are typically composed by a central entity as a service orchestration. The
central entity, which providers typically host in the cloud, schedules and calls different device
and service functions. Such an orchestration entails a single point of failure, as well as latency,
network and privacy disadvantages. Our system instead forms the components into a service
choreography, where, during operation of the service, devices speak to each other directly, with-
out going through a third party. This takes advantage of the growing “smartness” of automation
devices and aims to partly mitigate the dependency on a single centralized orchestration point
during operation of the service. Only during the reconfiguration, the system still requires a cen-
tral controller. We describe a possible solution to this problem in the “Conclusions” section.
At the heart of our system, we use the recipe concept to represent the abstract structure of IoT
device interactions [1]. A recipe describes the dataflow between devices through ingredients and
interactions, as shown in Figure 1. Ingredients represent a class of devices or services through a
semantic category and a number of inputs and outputs that carry data type information. The cate-
gory describes the kind of thing that this ingredient represents. Using semantic concepts, the cat-
egory description can be made as fine- or coarse-grained as desired. The inputs and outputs de-
scribe the type of data that this ingredient requires for operation, and the results of its operation.
Interactions describe the data flow between offering inputs and outputs, and must take place be-
tween offering inputs and outputs with matching types. When an offering receives a set of com-
plete inputs, a computation or measurement is executed. The result of this computation is sent to
the outputs, and along the interaction edge to the next offering’s inputs.
This model is also suitable to express “part-of” relationships for devices consisting of several
ingredients. To model a lamp with color temperature and brightness inputs, the device would be
modeled as two separate ingredients that have a “device” non-functional property. In the recipe,
the author would then constrain the ingredients to be part of the same device via OSRs.
Figure 1: Instantiating a recipe of ingredients to running choreography of offerings.
Recall that a recipe only represents a template for a system. Ingredients represent placeholders
for concrete devices or services. To operate, a recipe needs to be instantiated, which means that
these placeholders need to be replaced with actual services or devices. We call these services or
devices offerings and the process of replacing ingredients with offerings instantiating a recipe.
Offerings are described similarly to ingredients (category, inputs and outputs), but correspond to
concrete device functionality. Additionally, offerings contain non-functional information about
their current state (such as location or administrative information), as opposed to functional in-
formation (inputs, outputs, category and implementation). Interactions describe the data flow
between offering inputs and outputs, and must take place between offering inputs and outputs
with matching types. When an offering receives a set of complete inputs, a computation or meas-
urement is executed. The result of this computation is sent to the outputs, and along the interac-
tion edge to the next offering’s inputs.
To extend the applicability of the recipe concept to other domains, we have introduced two new
concepts into the system, offering selection rules and recipe runtime configurations.
Offering selection rules (OSRs) describe additional requirements on an offering’s non-functional
properties that must be met in order for the offering to be considered for an ingredient’s replace-
ment [2]. All non-functional properties of offerings can be restricted, and restrictions can be
combined with Boolean operators. Also, cardinality restrictions allow limiting the number of of-
ferings that replace an ingredient between a lower and upper bound. Offering selection rules al-
low the expression of more complex system requirements than through semantic matching on
functional properties alone.
Recipe runtime configurations (RRCs) contain information about a specific instance of a recipe.
Recipes may be instantiated multiple times, possibly with different OSRs, and each instantiation
forms a new RRC. The process of selecting offerings that fulfill the requirements to be part of a
recipe is called offering discovery. In this process, the matching of category as well as input and
output types of an ingredient in the RRC is determined. This process strongly relies on the se-
mantic annotation of the descriptions of offerings and recipes. The recipe runtime configuration
allows the instantiation of a single recipe multiple times with varying OSRs. Currently, this is
done through repeated instantiation of predefined static templates, and RRCs replace this mecha-
nism with a more flexible approach.
Using these concepts, we have built a system that enables (1) the specification of recipes in a
graphical recipe editor, (2) browsing and viewing of recipes, (3) creating and modifying RRCs,
and (4) building executable choreographies from RRCs. The relevant components are: The con-
troller, which is hosts the graphical interface (described in more detail in Thuluva et al. [1]) and
enables the creation of choreographies from recipes, a semantic database for persistence and se-
mantic operations and finally the engine. These components and their interaction are shown in
Figure 2.
The engine is a piece of software hosted on smart devices that allows the running of choreogra-
phies. Each engine carries a description of the offering it provides (the so-called offering de-
scription) and provides an interface to realize the operation of choreographies. The engine uses
the concepts described by [12] to adapt heterogeneous devices to the standardized recipe inter-
Controller and engine work together to enable the running of distributed choreographies gener-
ated from RRCs. When an engine is added to the network or configuration of the engine
changes, it registers at the controller with its offering description. The controller then runs the
offering discovery process, finding all RRCs that the newly added engine should be part of.
From the recipe corresponding to the RRC, the controller derives all offerings that the engine
should communicate with. This communication information is encoded into a so-called interac-
tion descriptor (InDes) for each engine part of the RRC and distributed.
An interaction descriptor contains information about the components communication behavior
derived from the recipe. It contains information on which inputs should be mapped to which out-
puts, and information on the failure detector configuration to use for this communication link.
After receiving an interaction descriptor, the engine can fulfill service execution and failure de-
tection autonomously, without communication with the controller.
Using the mechanism described above, a choreography is created and run. The engine receives
input data (sent via HTTP PUT requests encoded as JSON data) and transfers that data to the im-
plementing service or device. This communication is shown by black arrows in Figure 2. The
service creates new output that the engine again encodes as JSON and sends to the REST end-
points described in the output section of its interaction descriptor. Once the choreography is cre-
ated, the controller is no longer required, and the functionality described by the recipe is pro-
vided by offerings without centralized coordination.
Figure 2: Failure detection in a distributed IoT environment. Red arrows indicate failure detection
communication, black lines indicate application communication.
An example of this approach for a light control recipe is shown in Figure 2: The recipe consists
of two switches (O1 and O3) controlling lights (O2 and O4). Two offerings are currently not part
of a recipe (O5, O6), and thus do not exchange recipe data. Each switch controls a single light.
The straight black arrows show recipe data flow, while the red arrows show monitoring links
created from this recipe data flow. It can be seen that all recipe data flows result in exactly one
monitoring link being created with additional backlinks to initial nodes of the recipe. Unused
offerings that are not part of a recipe are connected into a ring by monitoring links to maintain
readiness. When failure is detected by a node according to the recipe-specific failure detection
parameters, a notification is sent to the controller (represented by the red arrow to the controller)
and the controller reruns the offering selection mechanism to replace the failed offering.
As outlined above, choreographies haven clear advantages over centralized compositions. How-
ever, when minimizing the role of a centralized component, detection of component failures be-
comes more challenging. In the case of running choreographies for automation systems or the
IoT, a mechanism for failure detection is especially important, as failures have a more severe
impact and are more likely compared to traditional web services.
Hence, our aim is to ensure that the detector notices the failure of a choreography, and the con-
troller recovers the failed choreography, so that our system can provide services with minimal
We have designed a failure detection algorithm (FD) that takes these factors into account and
provides the engine with high-level information on the reliability of nodes it communicates with.
The algorithm is self-adaptive and flexible and thus can be applied not only in building automa-
tion systems, but also in other IoT applications. Additionally, the failure detector uses constant
memory over the full range of parameters, which is important for embedded and constrained de-
vices. The failure detector is based on the concept of accrual failure detection described by
Hayashibara et al. [13]. We have named our approach iota-FD (“Internet of Things Accrual Fail-
ure Detector”).
The failure detection algorithm is deployed on each node in a system. A client is a node that is
monitored by the algorithm. A server is a node that monitors clients. If needed, a node can be
both a client and a server.
An accrual function as defined in [13] is a suspicion function whose output value represents the
confidence of the failure detector that a monitored device has crashed. Values close to 0 imply
that the device is working correctly, while greater values indicate higher confidence that a moni-
tored device has crashed. In our system, we use a per-service suspicion threshold, above which
we consider a device crashed. When suspicion approaches this value, our system can take miti-
gating action such as migrating data or find a suitable replacement for a service.
We implement the accrual suspicion function using the Chebyshev one-sided inequality to easily
approximate the probability of the presence of a crashed device. Using a recursive computation
about the mean value and variance of inter-arrival heartbeat times, we can efficiently calculate
and update the suspicion level over time while using constant memory. This is an advantage to
implementations using the empirical distribution function, which requires storage of the com-
plete list of timestamps.
In addition, iota-FD provides measurements about the critical resources left in a monitored de-
vice (e.g. battery level). With the help of the collected measurements, the FD builds a Lagrange
polynomial, which estimates the depletion/augmentation of the resources. This information can
be used to indicate when devices need to be serviced.
Finally, iota-FD measures the quality of the end-to-end communication link between a server
and a client. This is achieved by estimating the current packet drop rate. Using a weighted expo-
nential moving average, the calculated drop rate changes over time and does not remain static,
i.e., the iota-FD is able to “forget” past disruptions in the network and “learn” new, once they
have occurred. This information can be used to differentiate between crashed devices and de-
vices whose heartbeats are lost because of bad network conditions.
Summarizing the above, the suspicion function s(t), the packet loss predictor p(t) and the re-
source predictor r(t) are shown in the following definition:
Definition: With the last n heartbeats received at times t0 < t1 < t2 < tn and the current
timestamp iota-FD computes:
       
, where X is the
time delay until a new heartbeat will be received in the future and μ and σ are respec-
tively the mean and variance of the distribution of heartbeat inter-arrival times.
is calculated from the current sum of the inter-arrival times of timestamps di-
vided by the number of received timestamps n, while is calculated from the cur-
rent sum of the squares of the inter-arrival times of all timestamps and using the
equality  
Both and are reset when  timestamps have been received. To reduce the im-
pact of this reset on the quality of the period estimation, old estimations are used until
a certain minimum number of timestamps  has been received.
Thus, the calculation of the suspicion function requires enough storage for two un-
signed integers large enough to not overflow within the chosen learning window, and
one integer to store the current number of timestamps. The memory required for fail-
ure detection is thus independent of the learning window, an important advantage over
competing implementations.
   
 , where α is a parameter that
determines the speed of learning (first term) and forgetting (second term) of future and
past bursts respectively.
 
 where    is received with every
heartbeat and expresses the level of resources currently left at a client device. No as-
sumptions are made about how ρ is calculated by the client.
Iota-FD provides a wide variety of parameters to configure failure detection on a per-node and
per-service basis: The learning window size used for the estimation of heartbeat timings of the
algorithm can be adjusted freely. The estimator for heartbeat timings is self-adapting, allowing
clients to adjust their heartbeat frequency dynamically based on external factors such as battery
charge without requiring explicit configuration. Alternatively, the heartbeat period can be ad-
justed based on the “importance” of the service in the recipe or other factors depending on the
recipe definition.
Additionally, the information computed through p(t) and r(t) allow the definition of fine-grained
policies for the application (such as “if suspicion is medium-high and no packet loss bursts were
registered, migrate service data in preparation for a device crash” or “if the resources will be be-
low a certain threshold in 2 hours, notify the administration to schedule a servicing routine”).
Service definitions can include such policies to define QoS requirements on the service, and con-
tinually evaluate the current quality of the service. We plan an implementation of this functional-
ity in the future.
To measure the behavior of our failure detector as compared to other state-of-the-art failure de-
tectors suitable for usage in distributed IoT systems, we have measured the behavior of our ap-
proach compared with two other failure detectors:
First, we compare iota-FD against the “Phi-Accrual” failure detector defined by Hayashibara et
al. [13], which uses a different implementation of the suspicion function compared to iota-FD. It
is used in the Apache Cassandra distributed database and Akka distributed programming frame-
work. The Phi-Accrual detector always assumes that the inter-arrival times of the heartbeats are
normally distributed. It calculates the above-mentioned probability using the same estimators but
applies them to the normal cumulative distribution function instead of the Chebyshev inequality
as iota-FD does.
Second, we evaluate the “Adaptive” failure detector proposed by Satzger in [14] that uses a
different definition of the suspicion function, where s(t) is the probability that a node has
crashed. The value of s(t) is computed using the empirical distribution function applied to a list
of stored heartbeat inter-arrival times.
Figure 3: Mistake rate vs. detection time (left), query accuracy probability vs. detection time (right)
As the related work focuses on pure accrual failure detection, we will not benchmark the re-
source and packet drop estimators but focus on the accrual functions. In this evaluation, we will
evaluate the tradeoffs between one performance (detection time) and two accuracy QoS metrics
(average mistake rate and query accuracy probability) defined by Chen et al. [15] for the three
failure detectors with comparable parameterizations. Detection time is the time a failure detector
needs to permanently start suspecting a crashed device, average mistake rate measures how often
a correct device is wrongly suspected to have crashed and query accuracy probability measures
the probability that when queried at a random time, a failure detector will answer correctly
whether a device is faulty or correct.
These parameters are the threshold U, the learning window size ωmax, the refresh duration ωmin
(only iota-FD), and the smoothness parameter α (only Adaptive).
The learning window size ωmax was fixed to 500 for all failure detectors as a configuration that is
suitable for the types of constrained devices that are targeted. The rest of the parameters were
varied to generate the QoS graphs in Figure 3.
The simulations were run as follows: Heartbeat times were generated using a normal distribution
with a mean of 1 and a variance of 9. This distribution stems from application and network level
variance as to the generation of heartbeats. It models a case where the majority of heartbeats are
received with delays between zero and five time units, most of which are located in the sur-
roundings of one time unit. Additionally, the burst packet loss model from ns-3
was used to
model packet loss in a wide area network. The failure detector algorithms were applied to the
resulting heartbeat times, and the values of the suspicion function were sampled. From this data,
mistake rate, detection time and query accuracy probability were computed for different thresh-
olds U. The values for corresponding U are joined to illustrate the tradeoffs between accuracy
and performance metrics.
As it is shown in Figure 3, the Adaptive failure detector is a special case: Its suspicion function
is a discrete function, and the learning buffer was not large enough to generate a smooth curve.
A small learning window of 500 heartbeats was chosen as IoT devices are constrained. This re-
sulted in a very low detection time of the Adaptive failure detector (lower than both other failure
detectors), but also a correspondingly high mistake rate can be observed. Additionally, it is not
possible to adjust the tradeoff between both parameters by varying U. On the other hand, both
the Phi and iota-FD approaches allow this tradeoff, with the iota-FD performing better in both
tradeoffs, which results in a lower detection time for a certain mistake rate or query accuracy
probability and vice versa.
In addition to the above evaluation of the failure detection mechanism, we have built a small ap-
plication demonstration using two “switch” offerings (Raspberry Pi single board computers with
attached pushbuttons) and two “light” offerings (one smart office light accessible via CoAP, and
one Philips Hue
light accessible via HTTP REST). We created a recipe that allows control of
offerings of the category “Light” located in “Room A” from all offerings with the category
“Switch” in “Room A”. We also added a “maximum cardinality of 1” constraint to the switch
ingredient. The two switch offerings were connected to the network, and one switch was selected
for control of the lights. To see the reliability functionality in action, we then removed power to
the switch currently controlling the light and observed failover to the second switch within 15
seconds. A recording of this application example’s operation accompanied with narration is
available at
This work proposes an architecture that allows dynamic reconfiguration of IoT choreographies
based on the iota-FD mechanism for failure detection. We applied our approach in a building
automation scenario and evaluated the iota-FD mechanism in comparison to two other
Described at
approaches. We found the recipe system represents a suitable programming approach for dy-
namic automation systems and the iota-FD is well suited for IoT use cases where small learning
buffers are assumed due to constrained devices. Iota-FD also provides a wide array of parame-
ters that can be adjusted on a per-application basis. Using recipes, dynamic choreographies can
be created that self-adapt to changing device states without user interaction.
Having implemented and evaluated our reliability approach for managing dynamic IoT choreog-
raphies, we are planning to extend both the evaluation and implementation. On the implementa-
tion level, we plan to take advantage of software defined networking (SDN) technology to be
able to replace offerings in a network-aware fashion. Additionally, we will evaluate the imple-
mentation of a distributed controller using replication and leader selection.
Beyond reliability, we will extend the capabilities of the offering description to not only describ-
ing choreographies of web services, but allowing computation offerings as well, with features
comparable to fog computing technology, i.e., being able to deploy computation offerings to a
local “cloud” and being able to seamlessly migrate such computations between nodes. Further,
an important aspect of future work will be the validation of a reconfigured choreography after a
recovered failure. Here, this challenge can be either tackled in advance of execution, by intro-
ducing further semantics to device descriptions, or at runtime, by performing consistency checks
after mediation.
[1] A. S. Thuluva, A. Bröring, G. P. Medagoda, H. Don, D. Anicic, and J. Seeger, “Recipes for IoT Ap-
plications,” in Proceedings of the Seventh International Conference on the Internet of Things, New
York, NY, USA, 2017, pp. 10:110:8.
[2] J. Seeger, R. A. Deshmukh, and A. Bröring, “Running Distributed and Dynamic IoT Choreographies,”
in 2018 IEEE Global Internet of Things Summit (GIoTS) Proceedings, Bilbao, Spain, 2018, vol. 2, pp.
[3] A. S. Thuluva, K. Dorofeev, M. Wenger, D. Anicic, and S. Rudolph, “Semantic-Based Approach for
Low-Effort Engineering of Automation Systems,” in On the Move to Meaningful Internet Systems.
OTM 2017 Conferences, 2017, pp. 497512.
[4] L. Mottola, A. Pathak, A. Bakshi, V. K. Prasanna, and G. P. Picco, “Enabling Scope-Based Interac-
tions in Sensor Network Macroprogramming,” in 2007 IEEE International Conference on Mobile
Adhoc and Sensor Systems, 2007, pp. 19.
[5] M.-O. Pahl, “Data-centric service-oriented management of things.,” in IFIP/IEEE International Sym-
posium on Integrated Network Management, IM 2015, Ottawa, ON, Canada, 11-15 May, 2015, 2015,
pp. 484490.
[6] Q. Z. Sheng, X. Qiao, A. V. Vasilakos, C. Szabo, S. Bourne, and X. Xu, “Web services composition:
A decade’s overview,” Inf. Sci., vol. 280, pp. 218238, Oct. 2014.
[7] W. Z. Khan, M. Y. Aalsalem, M. K. Khan, M. S. Hossain, and M. Atiquzzaman, “A reliable Internet
of Things based architecture for oil and gas industry,” in 2017 19th International Conference on Ad-
vanced Communication Technology (ICACT), 2017, pp. 705710.
[8] P. A. Kodeswaran, R. Kokku, S. Sen, and M. Srivatsa, “Idea: A System for Efficient Failure Manage-
ment in Smart IoT Environments,” in Proceedings of the 14th Annual International Conference on
Mobile Systems, Applications, and Services, New York, NY, USA, 2016, pp. 4356.
[9] S. Chetan, A. Ranganathan, and R. Campbell, “Towards fault tolerance pervasive computing,” IEEE
Technol. Soc. Mag., vol. 24, no. 1, pp. 3844, Spring 2005.
[10] A. G. De Moraes Rossetto et al., “A new unreliable failure detector for self-healing in ubiquitous envi-
ronments,” in Proceedings - International Conference on Advanced Information Networking and Ap-
plications, AINA, 2015.
[11] S. O. Guclu, T. Ozcelebi, and J. Lukkien, “Distributed Fault Detection in Smart Spaces Based on
Trust Management,” Procedia Comput. Sci., vol. 83, pp. 6673, Jan. 2016.
[12] A. Bröring et al., “Enabling IoT Ecosystems through Platform Interoperability,” IEEE Softw., vol. 34,
no. 1, pp. 5461, Jan. 2017.
[13] X. Défago, N. Hayashibara, R. Yared, and T. Katayama, “The Φ Accrual Failure Detector,” in Relia-
ble Distributed Systems, IEEE Symposium on(SRDS), 2004, pp. 6678.
[14] B. Satzger, A. Pietzowski, W. Trumler, and T. Ungerer, “A New Adaptive Accrual Failure Detector
for Dependable Distributed Systems,” in Proceedings of the 2007 ACM Symposium on Applied Com-
puting, New York, NY, USA, 2007, pp. 551555.
[15] W. Chen, S. Toueg, and M. K. Aguilera, “On the quality of service of failure detectors,” IEEE Trans.
Comput., vol. 51, no. 1, pp. 1332, Jan. 2002.
Jan Seeger is a PhD researcher at Siemens AG as well as at the Technical University of Munich.
He is active in the areas of Internet of Things and automation research, and how semantic tech-
nologies can improve the engineering of automation systems. He holds a M.Sc. in computer sci-
ence from the TU Munich. He can be reached at
Rohit A. Deshmukh holds a M.Sc. in Distributed Software Systems from the Technische Uni-
versität Darmstadt and has been with Siemens’ corporate research unit in Munich while contrib-
uting to this work. His research interests include distributed software systems, the Internet of
Things, peer-to-peer systems and the Semantic Web.
Vasil Sarafov is a M.Sc. student in computer science at TU Munich. His interests include dis-
tributed and embedded systems, algorithms and data structures. He can be reached at
Arne Bröring is a senior researcher at Siemens’ corporate research unit in Munich and the tech-
nical coordinator of the BIG IoT project. His research interests include the Internet of Things,
Sensor Web, and the Semantic Web. He received a Ph.D. in geoinformatics from the University
of Twente (NL).
... In recent years, service composition modeling under choreography paradigm has been addressed in several platforms such as CHOReOS [19,20] and its evolution into CHOReVOLUTION [21]; ActnConnect [22]; ChorSystem [23]; and research works such as [24][25][26][27][28]. Extensions integrated within choreography add functionality oriented to different scenarios [29][30][31][32]. ...
... The work presented in [31] uses a choreography model to associate consumers with different providers. The approach is focused on dynamic configuration. ...
Full-text available
This paper presents a solution to support service discovery for edge choreography based distributed embedded systems. The Internet of Things (IoT) edge architectural layer is composed of Raspberry Pi machines. Each machine hosts different services organized based on the choreography collaborative paradigm. The solution adds to the choreography middleware three messages passing models to be coherent and compatible with current IoT messaging protocols. It is aimed to support blind hot plugging of new machines and help with service load balance. The discovery mechanism is implemented as a broker service and supports regular expressions (Regex) in message scope to discern both publishing patterns offered by data providers and client services necessities. Results compare Control Process Unit (CPU) usage in a request–response and datacentric configuration and analyze both regex interpreter latency times compared with a traditional message structure as well as its impact on CPU and memory consumption.
... In this paper, we focus on (1) e ciently detecting failures of devices and so ware components using an accrual-based failure detection augmented with policies, and (2) automatically mitigating failures by nding an optimal allocation of application tasks, e.g., towards minimized energy consumption of the system. is work describes the latest ndings on our research agenda to enable distributed IoT choreographies. Our path began with the introduction of the "Recipe" concept for de ning IoT application templates [31], continued by our work on improving the runtime management of such recipes by handling them as service choreographies [26], and most recently de ned a mechanism for the dynamic and resilient management of IoT choreographies [27]. ...
... is work builds up on our previous works [26,27,31] that present an IoT composition as a "Recipe", i.e., separate from its implementation. A semi-automated service composition and instantiation tool assists the user in creating the composition. ...
Full-text available
In the industrial Internet of Things domain, applications are moving from the Cloud into the Edge, closer to the devices producing and consuming data. This means that applications move from the scalable and homogeneous Cloud environment into a potentially constrained heterogeneous Edge network. Making Edge applications reliable enough to fulfill Industry 4.0 use cases remains an open research challenge. Maintaining operation of an Edge system requires advanced management techniques to mitigate the failure of devices. This article tackles this challenge with a twofold approach: (1) a policy-enabled failure detector that enables adaptable failure detection and (2) an allocation component for the efficient selection of failure mitigation actions. The parameters and performance of the failure detection approach are evaluated, and the performance of an energy-efficient allocation technique is measured. Finally, a vision for a complete system and an example use case are presented.
... The approach presented by Seeger et al. in [27] propose to extend semantic application descriptions (called recipes) with constraints to enable dynamic and automatic reconfiguration of IoT applications. Using recipes, dynamic choreographies can be created that self-adapt to changing device states without human intervention. ...
Introduction The Internet of Things consists of devices and software interacting altogether in order to build powerful and added-value services. One of the main challenges in this context is to support end users with simple, user-friendly, and automated techniques to design such applications. IFTTT-style rules are a popular way to build IoT applications as it addresses this challenge. Problem statement Given the dynamicity of IoT applications, these techniques should also consider that these applications are in most cases not built once and for all. They can evolve over time and objects may be added or removed for several reasons (replacement, loss of connectivity, upgrade, failure, etc.). There is a need for techniques and tools supporting the reconfiguration of rule-based IoT applications to ensure certain correctness properties during this update tasks. Methodology In this paper, we propose new techniques for supporting the reconfiguration of running IoT applications, represented as a set of coordinated rules acting on devices. These techniques compare two versions of an application (before and after reconfiguration) to check if several functional and quantitative properties are satisfied. This information can be used by the user to decide whether the actual deployment of the new application should be triggered or not. Contributions and results The analysis techniques have been implemented using encodings into formal specification languages and verification is carried out using corresponding analysis frameworks. All these techniques for designing new applications, analyzing the aforementioned reconfiguration properties, and deploying the new applications have been integrated into the WebThings platform and applied on real-world examples for validation of the approach.
... The authors in [See+19] present an architecture to support automated dynamic reconfiguration of IoT applications in building automation systems. They model the application using a semantic description called Recipes. ...
The Internet of Things (IoT) applications are built by interconnecting everyday objects over a network. These objects or devices sense the environment around them, and their network capabilities allow them to communicate with other objects to perform utilitarian tasks. One of the popular ways to build IoT applications in the consumer domain is by combining different objects using Event-Condition-Action (ECA) rules. These rules are typically in the form of IF something-happens THEN do-something. The Web of Things (WoT) are a set of standards and principles that integrate architectural styles and capabilities of web to the IoT. Even though WoT architecture coupled with ECA rules simplifies the building of IoT applications to a large extent, there are still challenges in making end-users develop advanced applications in a simple yet correct fashion due to dynamic, reactive and heterogeneous nature of IoT systems.The broad objective of this work is to leverage formal methods to provide end-users of IoT applications certain level of guarantee at design time that the designed application will behave as intended upon deployment. In this context, we propose a formal development framework based on the WoT. The objects are described using a behavioural model derived from the Thing Description specification of WoT. Then, the applications are designed not only by specifying individual ECA rules, but also by composing these rules using a composition language. The language enables users to build more expressive automation scenarios. The description of the objects and their composition are encoded in a formal specification from which the complete behaviour of the application is identified. In order to guarantee correct design of the application, this work proposes a set of generic and application-specific properties that can be validated on the complete behaviour before deployment. Further, the deployed applications may be reconfigured during their application lifecycle. The work supports reconfiguration by specifying reconfiguration properties that allow one to qualitatively compare the behaviour of the new configuration with the original configuration. The implementation of all the proposals is achieved by extending Mozilla WebThings platform. A new set of user interfaces are built to support the composition of rules and reconfiguration. A model transformation component which transforms WoT models to formal models and an integration with formal verification toolbox are implemented to enable automation. Finally, a deployment engine is built by extending WebThings APIs. It directs the deployment of applications and reconfigurations respecting their composition semantics.
Full-text available
Today, new applications demand an internet of things (IoT) infrastructure with greater intelligence in our daily use devices. Among the salient features that characterize intelligent IoT systems are interoperability and dynamism. While service-oriented architectures (SOA) offer a well-developed and standardized architecture and protocols for interoperability, answering whether SOA offers enough dynamism to merge IoT with artificial intelligence (AI) is still in its beginnings. This paper proposes an SOA model, called SCM-IoT (service composition model for IoT), for incorporating AI into IoT systems, addressing their coordination by a mediator offering services for storage, production, discovery, and notification of relevant data for client applications. The model allows IoT systems to be incrementally developed from three perspectives: a conceptual model, platform-independent computational model, and platform-dependent computational model. Finally, as a case of study, a domotic IoT system application is developed in SCM-IoT to analyze the characteristics and benefits of the proposed approach.
The Internet of Things is a thriving paradigm that makes people’s lives easier. In the IoT, devices equipped with sensors and actuators communicate through standardized Internet protocols to reach common goals. In Smart Homes, for example, monitoring the current state of an environment, such as the room temperature, could lead to an automated triggering of actions, such as activating the heating system. Small IoT applications, e.g., in Smart Homes, are usually more easy to manage since they do not include a large amount of devices. However, in larger and more complex IoT environments, e.g., Smart Cities and Smart Factories, management and control become a tedious task, especially since IoT devices do not offer the robustness of traditional computer systems. In case of device failures, IoT applications could become unstable or even fail completely. To make matters even worse, faulty sensor measurements could lead to an undesired behavior of IoT applications, even though there are no obvious errors that are detectable by monitoring systems. Therefore, in this paper, we introduce a first approach that aims at improving IoT applications’ fault tolerance throughout their whole lifecycle by introducing feedback loops ranging from application modeling, to deployment and operation, until their retirement.
In Industry 4.0, the connected devices in production communicate with each other via standardized Internet protocols to achieve common goals. In this way, they enable the construction of complex, self-organizing applications, such as self-propelled transport vehicles in factory environments. Especially in large factories, newly emerging devices, as well as devices that are no longer available or fail, pose a great challenge. New devices have to be integrated into the application, while failing devices have to be handled. Dealing with this dynamic is a big issue, especially if this happens automatically. In this paper we present a life cycle method for device management in Industry 4.0 environments. This method allows for the integration of newly emerging devices into Smart Factory applications and also provides ways to deal with failing devices. We evaluate our approach by prototypically implementing our method and the corresponding architecture.
Conference Paper
Modularization is seen as one core building block for highly flexible production systems to ensure profitability in process as well as manufacturing industries in their increasing volatile markets. Production modules encapsulate local control al-gorithms and thus, form Industrial Cyber-Physical Systems (ICPS). They, as pre-automated modular units fulfill the charac-teristics of micro-service architectures. These architectures use or-chestration and choreographies as complementary association methods. In this contribution, the applicability and the advantages of service choreographies are analyzed from a more practical per-spective. It is shown that choreographies have their strengths in the decentral association of (sub-)elementary services to new and independent automation functions. Thus, the services of produc-tion modules can be combined more flexibly through decentralized control. Two simplified examples and a first experiment closes the practical insight to choreographies in industrial automation sys-tems.
Conference Paper
Full-text available
IoT systems are growing larger and larger and are becoming suitable for basic automation tasks. One of the features IoT automation systems can provide is dealing with a dynamic system -- Devices leaving and joining the system during operation. Additionally, IoT automation systems operate in a decentralized manner. Current commercial automation systems have difficulty providing these features. Integrating new devices into an automation system takes manual intervention. Additionally, automation systems also require central entities to orchestrate the operation of participants. With smarter sensors and actors, we can move control operations into software deployed on a decentralized network of devices, and provide support for dynamic systems. In this paper, we present a framework for automation systems that demonstrates these two properties (distributed and dynamic). We represent applications as semantically described data flows that are run decentrally on participating devices, and connected at runtime via rules. This allows integrating new devices into applications without manual interaction and removes central controllers from the equation. This approach provides similar features to current automation systems (central engineering, multiple instantiation of applications), but enables distributed and dynamic operation. We demonstrate satisfying performance of the system via a quantitative evaluation.
Full-text available
Lightweight Virtualization (LV) technologies have refashioned the world of software development by introducing flexibility and new ways of managing and distributing software. Edge computing complements today's powerful centralized data centers with a large number of distributed nodes that provide virtualization close to the data source and end users. This emerging paradigm offers ubiquitous processing capabilities on a wide range of heterogeneous hardware characterized by different processing power and energy availability. The scope of this article is to present an in-depth analysis on the requirements of edge computing in the perspective of three selected use cases particularly interesting for harnessing the power of the Internet of Things (IoT). We discuss and compare the applicability of two LV technologies, containers and unikernels, as platforms for enabling scalability, security and manageability required by such pervasive applications that soon may be part of our everyday life. To inspire further research, we further identify open problems and highlight future directions to serve as a road map for both industry and academia.
Conference Paper
Full-text available
Industry 4.0, also referred to as the fourth industrial revolution aims at mass customized production with low-cost and shorter production time. Automation Systems (ASs) used in the manufacturing processes should be flexible to meet the constantly changing needs of mass customized production. Low-effort engineering of an Automation System (AS) is an important requirement towards this goal. Secondly, transparency and interoperability of ASs across different domains open a new class of applications. In order to address these challenges we propose a low-effort approach to engineer, configure and re-engineer an AS by employing Web of Things and Semantic Web Technologies. The approach allows for creating semantic specification for a new functionality or an application. It automatically checks whether a target AS can run a new functionality. We developed an engineering tool with a graphical user interface for our approach that enables an engineer to easily interact with an AS when discovering its functionality, engineering, configuring and deploying new functionality on it.
Conference Paper
Full-text available
The Internet of Things (IoT) is on rise. More and more physical devices and their virtual shadows emerge and become accessible through IoT platforms. Marketplaces are being built to enable and monetize the access to IoT offerings, i.e., data and functions offered by platforms, things, and services. In order to maximize the usefulness of such IoT offerings we need mechanisms that allow their efficient and flexible composition. This paper describes a novel approach for such compositions. The approach is based on the notion of Recipes that define work-flows on how their ingredients, i.e., instances of IoT offerings, shall interact with each other. Furthermore the paper presents a novel user interface that enables users to create and instantiate recipes by selecting their ingredients. An example from the smart mobility domain guides through the paper, illustrates our approach, and demonstrates as a proof-of-concept.
Full-text available
Today, the Internet of Things (IoT) comprises vertically oriented platforms for things. Developers who want to use them need to negotiate access individually and adapt to the platform-specific API and information models. Having to perform these actions for each platform often outweighs the possible gains from adapting applications to multiple platforms. This fragmentation of the IoT and the missing interoperability result in high entry barriers for developers and prevent the emergence of broadly accepted IoT ecosystems. The BIG IoT (Bridging the Interoperability Gap of the IoT) project aims to ignite an IoT ecosystem as part of the European Platforms Initiative. As part of the project, researchers have devised an IoT ecosystem architecture. It employs five interoperability patterns that enable cross-platform interoperability and can help establish successful IoT ecosystems.
Conference Paper
Full-text available
Wireless sensor networks have been a driving force of the Industrial Internet of Things (IIoT) advancement in the process control and manufacturing industry. The emergence of IIoT opens great potential for the ubiquitous field device connectivity and manageability with an integrated and standardized architecture from low- level device operations to high-level data-centric application interactions. This technological development requires software definability in the key architectural elements of IIoT, including wireless field devices, IIoT gateways, network infrastructure, and IIoT sensor cloud services. In this paper, a novel software-defined IIoT (SD- IIoT) is proposed in order to solve essential challenges in a holistic IIoT system, such as reliability, security, timeliness scalability, and quality of service (QoS). A new IIoT system architecture is proposed based on the latest networking technologies such as WirelessHART, WebSocket, IETF constrained application protocol (CoAP) and software-defined networking (SDN). A new scheme based on CoAP and SDN is proposed to solve the QoS issues. Computer experiments in a case study are implemented to show the effectiveness of the proposed system architecture.
A new adaptive accrual failure detector for dependable distributed systems / W. Trumler ... . In: ACM Symposium on Applied Computing <22, 2007, Seoul> : Proceedings of the 2007 ACM Symposium on Applied Computing [Elektronische Ressource] : Seoul, Korea, March 11 - 15, 2007 / hosted by Seoul National University in Seoul ... - New York : ACM, 2007. - S. 551-555. - 1 CD-ROM.
Conference Paper
Anomaly detection systems deployed for monitoring in oil and gas industries are mostly WSN based systems or SCADA systems which all suffer from noteworthy limitations. WSN based systems are not homogenous or incompatible systems. They lack coordinated communication and transparency among regions and processes. On the other hand, SCADA systems are expensive, inflexible, not scalable, and provide data with long delay. In this paper, a novel IoT based architecture is proposed for Oil and gas industries to make data collection from connected objects as simple, secure, robust, reliable and quick. Moreover, it is suggested that how this architecture can be applied to any of the three categories of operations, upstream, midstream and downstream. This can be achieved by deploying a set of IoT based smart objects (devices) and cloud based technologies in order to reduce complex configurations and device programming. Our proposed IoT architecture supports the functional and business requirements of upstream, midstream and downstream oil and gas value chain of geologists, drilling contractors, operators, and other oil field services. Using our proposed IoT architecture, inefficiencies and problems can be picked and sorted out sooner ultimately saving time and money and increasing business productivity.
Conference Paper
IoT enabled smart environments are expected to proliferate significantly in the near future, particularly in the context of monitoring services for wellness living, patient healthcare and elderly care. Timely maintenance of failed sensors is of critical importance in such deployments to ensure minimal disruption to monitoring services. However, maintenance of large and geographically spread deployments can be a significant challenge. We present Idea that significantly increases the vtime-before-repair for a smart home deployment, thereby reducing the maintenance overhead. Specifically, our approach leverages the facts that (a) there is inherent sensor redundancy when combinations of sensors monitor activities of daily living (ADLs) in smart environments, and (b) the impact of each sensor failure depends on the activities being monitored and the functional redundancy afforded by rest of the heterogeneous sensors available for detecting the activities. Consequently, Idea identifies homes that need to be fixed based on expected degradation in ADL detection performance, and optimizes maintenance scheduling accordingly. We demonstrate that our approach leads to 3--40 times fewer maintenance personnel than a scheme in which failed sensors are fixed without considering their impact.