Content uploaded by Nicolás Cardozo
Author content
All content in this area was uploaded by Nicolás Cardozo on Jul 26, 2024
Content may be subject to copyright.
Learning Recovery Strategies for Dynamic
Self-healing in Reactive Systems
Mateo Sanabria,1Ivana Dusparic,2Nicolás Cardozo1
1Systems and Computing Engineering Department, Universidad de los Andes, Colombia
2School of Computer Science and Statistics, Trinity College Dublin, Ireland
1{m.sanabriaa,n.cardozo}@uniandes.edu.co, 2ivana.dusparic@tcd.ie
Abstract—Self-healing systems depend on following
a set of predefined instructions to recover from a known
failure state. Failure states are generally detected based
on domain specific specialized metrics. Failure fixes
are applied at predefined application hooks that are
not sufficiently expressive to manage different failure
types. Self-healing is usually applied in the context
of distributed systems, where the detection of failures
is constrained to communication problems, and resolu-
tion strategies often consist of replacing complete com-
ponents. However, current complex systems may reach
failure states at a fine granularity not anticipated by
developers (for example, value range changes for data
streaming in IoT systems), making them unsuitable
for existing self-healing techniques. To counter these
problems, in this paper we propose a new self-healing
framework that learns recovery strategies for healing
fine-grained system behavior at run time. Our proposal
targets complex reactive systems, defining monitors as
predicates specifying satisfiability conditions of system
properties. Such monitors are functionally expressive
and can be defined at run time to detect failure states
at any execution point. Once failure states are detected,
we use a Reinforcement Learning-based technique to
learn a recovery strategy based on users’ corrective
sequences. Finally, to execute the learned strategies,
we extract them as Context-oriented Programming vari-
ations that activate dynamically whenever the failure
state is detected, overwriting the base system behavior
with the recovery strategy for that state. We validate the
feasibility and effectiveness of our framework through
a prototypical reactive application for tracking mouse
movements, and the DeltaIoT exemplar for self-healing
systems. Our results demonstrate that with just the def-
inition of monitors, the system is effective in detecting
and recovering from failures between 55% −92% of
the cases in the first application, and at par with the
predefined strategies in the second application.
Index Terms—Self-healing systems, Context-oriented
Programming, Functional-reactive programming, RL
I. Introduction
Self-healing properties enable systems to au-
tonomously recover from failure states. To enable
self-healing in a software system there are four
challenges [22] to address: (1) build the means to
monitor the system state, (2) use fault analysis to de-
tect the root cause of faults based on the monitored
state, (3) make a decision about the detected fault (i.e.,
decide whether the fault is blocking), and (4) exe-
cute the recovery actions to take the system back
to a correct state. To execute the aforementioned
process, software systems must be equipped with
the tools and capabilities for each of the tasks. To
capture the system state and to detect faults therein,
self-healing systems specify hooks in the system
as control points to observe the system state. Such
January 24, 2024 DRAFT
arXiv:2401.12405v1 [cs.DC] 22 Jan 2024
hooks are also used to plug the corrective actions
(e.g., alternative execution paths, introductions of
new/fixed modules) [22]. Such characteristics of
self-healing systems are problematic, as to diagnose
and heal from failures, we require them to be antic-
ipated by developers at design time. This implies
that self-healing systems require a pre-defined set
of instructions to recover from prescribe failures
at defined points in the system’s execution. This
characteristic prescribes a low flexibility of self-
healing systems, offering a solution only for known
unknowns [33]. Unknown unknowns are not cov-
ered by self-healing, leading to system failures.
Moreover, it is a concern that the resolution of local
changes cannot assure global healing [9], hindering
the applicability of self-healing systems.
We note that self-healing systems have been ap-
plied mostly in distributed systems, taking advan-
tage of the characteristics of such systems with
defined metrics that point to possible failure states
during the execution. However, the use of healing
strategies in other systems is still lacking [19],
[28]. In line with this observation, we recognize
there is a wide variety of research and application
opportunities of self-healing systems. For example,
data streaming and big data [10], complex reconfig-
urable network environments [31], ensuring service
quality in web services [18], or avoid faults requir-
ing an operating system to restart [8].
This paper proposes an alternative to deal with
the challenges of defining and designing self-
healing systems. Our proposal breaks the assump-
tions previously made for monitoring, fault analy-
sis, and recovery action execution, about the pre-
definition of healing strategies (Section III). To do
this, we first introduce flexible dynamic monitors,
that can be defined and modified at run time.
Moreover, monitors express different conditions as
predicates, evaluating any system property, which
avoids fixed monitoring points. Second, upon de-
tecting failures, we use an algorithm based on
Reinforcement Learning (RL) options [20] to learn
the self-healing strategies based on corrective ac-
tions taken by a human actor, in order to restore
the system from the failing state. Once a resolution
strategy is learned, using Context-oriented Progra-
mming (COP) [25], the system dynamically adapts
the execution path whenever the state leading to
the error is detected again by the monitor. The
adapted behavior corresponds to the learned re-
covery strategy. Furthermore, our proposal seeks to
broaden the application of self-healing systems to
a new domain, reactive systems [32]. The objective
behind this, is to evaluate the feasibility to apply
the process of self-healing systems in a domain in
which systems are in continuous execution, and
in which there is not a pre-defined set of metrics
to assess the system performance across multiple
systems.
Our proposal is realized on top of three main con-
cepts to achieve run-time learning and definition of
recovery strategies (Section II): (1) reactive systems,
(2) RL options, and (3) COP.
To illustrate the idea of learned self-healing
strategies, consider the running example of a mouse
move tracking reactive application. In this applica-
tion we track the mouse position (x,y) as it moves
on a 100×100 GUI, displaying an ASCII value based
on the (x,y) position. In the example, we define
mouse prohibited areas in the GUI by tessellation
predicates. Whenever the mouse hovers over these
areas, it is considered as a fault state.
Given that we always want to display the ASCII
code of the mouse’s position, we want to leave the
January 24, 2024 DRAFT
fault state as soon as possible. Therefore, a recovery
strategy of this behavior is to move the mouse
away from the fault areas, to a correct execution
area. In the example, we assume that the the fault
areas are unknown beforehand, in consequence, it
is not possible to define a recovery action for each
possible position in the GUI. For each point in the
fault areas we must define mouse movements to
go to a safe area. Fig. 1 shows the setting of our
running example. In Fig. 1a we show two possible
situations in which the mouse is in a fault area,
Fig. 1b shows possible resolution strategies as paths
from the fault states to correct states. The better
resolution strategy depends on the specific appli-
cation, however, knowing this information before-
hand, may not always be possible. Our approach
lets us learn the best option at run time.
(a) Mouse on a fault
state.
(b) Possible recovery
strategy movements.
Fig. 1: Mouse position tracking running example
We validate the feasibility and effectiveness of
our approach by means of two case studies (Sec-
tion IV). First we use our running example taken
from the reactive systems literature. For this appli-
cation, we validate the feasibility of our solution
by introducing monitors to detect fault areas. The
healing strategy is to move the mouse out of the
fault area. The results of the evaluation show that,
using our framework, we are able to detect on
average over 70% of all failure states, effectively
generating and learning a healing strategy for each
of them. Our second case study reuses the DeltaIoT
exemplar [15] that dynamically adapts an Internet
of Things (IoT) network configuration to heal from
errors. We use this case study to validate the ef-
fectiveness of our proposal to find appropriate so-
lutions. We equip the servers in the exemplar with
reactive characteristics to define predicate monitors
to learn healing strategies. Our results demonstrate
that indeed, the learned healing strategies corre-
spond to the adaptations originally proposed in the
system for the pre-conceived situations. Moreover,
we are able to heal from situations previously
unknown, for which the original system cannot re-
cover, demonstrating the effectiveness of our work.
The main contributions of our work are:
•Introduction of flexible monitors to detect fail-
ures at any moment in time, and any point in
the execution, rather than at specified program
hooks.
•Learning of recovery strategies from failure
states, realized as modular context-dependent
adaptations, rather than as predefined behav-
ior.
•Application of self-healing techniques to the
domain of reactive systems.
II. Background
To learn self-healing sequences in reactive sys-
tems we follow the ideas of adaptation generation
from sequences of known behavior as achieved
by a technique called Auto-COP [3], [4]. In this
section, we provide the background necessary for
presenting our solution, specifically Reactive Sys-
tems and the two main concepts that Auto-COP
encompasses: COP and RL options.
January 24, 2024 DRAFT
A. Reactive Systems
Reactive systems are based on the prompt reac-
tion to internal (i.e., system) and external (i.e., in the
surrounding system environment) events. There are
two main approaches to address the complexity of
reactive applications: event-based languages, and
languages with direct representation of reactive val-
ues. Hereinafter we focus on the second approach.
Reactive values (or behaviors) [11] are introduced
in reactive programming languages to abstract val-
ues that vary continuously over time. Behaviors can
also be defined as dynamic dependent objects on
other time-varying entities. In this case, behaviors
update in response to updates from behaviors they
depend on. The added value of reactive progra-
mming is that now programmers can avoid explicit
value updates; instead, values are updated auto-
matically, as the language runtime takes care of the
complexity of updating values asynchronously.
REScala [27] is a reactive language built on top
of Scala, that provides a robust event system with
seamless integration support for reactive values,
promoting a mixture of Object-Oriented Progra-
mming (OOP) and Functional programming (FP).
REScala supports Signals to express functional de-
pendencies among values in a declarative way,
and events for continuous or discrete-time changing
values. A Signal represents the state in the ap-
plication, whereas events hold values that change
when fired, representing actions in the application.
REScala exposes a series of abstractions for signals
and events, uniformly applying over them.
To propagate changes through a reactive system,
REScala provides observers which attach a handler
function to the event. Every time an event fires,
its handler function is applied to the current event
value.
The following snippet shows an observer exam-
ple. Line 1 associates the signal counting with a
handler function. counting is an integer signal that
holds the monitor’s learning steps so far. Note that
the handler function (Lines 2-3) changes the value
of the learning flag variable when the learning steps
limit is reached.
1val learningStageObserver = counting observe
2(count => {if (count > learningStepsLimit)
3learningFlag = false})
Observers are the base for failure detection inside
monitors in our framework. As we explain in Sec-
tion III-A, the handle function corresponds to user-
defined predicates that specify failure states. When-
ever the predicate evaluates to true, the monitor
triggers the corresponding learned healing strategy.
B. Context-Oriented Programming
COP is a programming paradigm which enables
modeling of the variability required by complex
adaptive systems based on behavioral variations
that depend on the surrounding execution con-
text [14]. Behavioral variations are defined by layers
associated with partial method definitions. Layers
dynamically activate/deactivate depending on the
context currently executing, composing/withdraw-
ing the behavioral variations associated to them
with the base system behavior.
Complex real-world self-adaptive systems con-
sider several requirements. For instance, distribu-
tion leads to various contexts in different con-
current components. Correspondingly, if several
components create different contexts, they may
trigger behavioral changes in others. Thus, be-
havioral variation activation performed by asyn-
chronous communication is desirable. Considering
highly dynamic environments, performing behav-
January 24, 2024 DRAFT
ioral changes without inconsistent/erroneous be-
havior is also a primary concern.
ContextErlang [26], is a COP language based on
context-aware agents. Agents have behavioral units
that can be dynamically activated, i.e., variations.
ContextErlang uses message passing as a variation
activation mechanism. This mechanism leads to
the asynchronous activation required for real-world
applications.
ContextErlang defines a core calculus semantics
based on the actor concurrency model. The advan-
tage of the language semantics is that the language
capabilities can be easily replicated in other actor
languages, for example in Scala. ContextScala [26]
is a COP language implemented on top of the
Akka framework in Scala, based on ContextErlang.
In ContextScala variations are defined reusing the
main modularity abstraction of Scala: classes. As
an illustration, we show an example of variations
definition from our running example with two vari-
ations of the move behavior to manage the direction
of the movement, Up (Lines 1-3), and Down (Lines 4-
6).
1class Up extends Variation[Up] {
2def moved(): Unit = println("Movement Up")
3}
4class Down extends Variation[Down] {
5def moved(): Unit = println("Movement Up")
6}
In ContextScala variation activation is managed
by an instance of the ContextAgent (an Akaa actor),
which sends activation messages to the correspond-
ing actor. The ContextAgent manages context varia-
tions through the definition of the variation list,
given to the SetActiveVariations function as shown
in the following snippet (Line 1). Upon a method
call (the moved method in our example), the imple-
mentation corresponding to the activated variation
is executed. In our example, Line 2 executes the
definition of the Up variation (activated in Line
1), and Line 4 Executes the definition of the Down
variation (activated in Line 3).
1ContextAgent ! SetActiveVariations(List(Up()))
2ContextAgent !moved() //result: Movement Up
3ContextAgent ! SetActiveVariations(List(Down()))
4ContextAgent !moved() //result: Movement Down
The use of layers and behavioral adaptations
posits a modular way to define dynamic program
adaptations, based on the surrounding execution
context. The fine-grained partial behavior defini-
tions in COP enable us to adapt any system be-
havior at any moment in time, offering greater
flexibility to realize self-healing systems without
pre-define strategies.
C. Reinforcement Learning Options
RL [23] is a technique to learn optimal actions
for specific environmental conditions by trial-and-
error, based on interactions with the environment.
Q-learning is a common implementation of RL
agents that, for each time step, perceives the en-
vironment and maps it to a state sifrom its state
space S. It then selects an action aifrom its action
set Aand executes it. The agent receives a reward
rifrom the environment when it transitions to the
next state, based on which it updates the suitability
of taking action aiin state si. The agent’s goal is
to learn a policy (i.e., the most suitable action for
each state) to maximize the long-term cumulative
reward. The learning rate αdetermines the extent
new experiences overwrite previously learned ones,
and the discount factor γdetermines how much
the future rewards are discounted for agents to
prioritize immediate actions but still be able to plan
the best long term actions. At each timestep, Q-
value of an action aitaken in state siis updated
using the Q-learning equation bellow.
January 24, 2024 DRAFT
Q(st+1,at+1)←Q(st,at)+α[rt+1+γmax
aQ(st+1,a)−Q(st,at)]
Q-value
learning rate
reward discount factor
Maximum Q-Value in the next state
Early environment interactions are focused on
exploration, i.e., actions are picked randomly and
uniformly from the actions available in a given
state. At the same time, after an agent has had a
chance to learn the quality of actions, later inter-
actions focus on exploitation, i.e., mainly executing
those actions known to lead to the highest long-
term rewards.
Whenever sequences of actions are frequently
found together, RL options package such sequences
together as a means to optimize the agent execu-
tion by means of reducing the amount of decision
points, for each of the actions to the entry point of
the complete sequence [29] or behavior history [13].
Options are used in RL to speed up learning or
to minimize the periods of suboptimal performance
during exploratory interaction with the environ-
ment. We consider RL-based option techniques [30],
to be suitable for learning and integrating se-
quences of recovery actions. Packaging options as
COP adaptations can be used to autonomously
trigger sequences of recovery actions in self-healing
systems in response to context changes (i.e., failure
detection), akin to the way actions in RL are learned
and taken in response to observed environment
conditions.
III. Learning to Heal in Unannounced Situations
This section introduces our proposal for a
self-healing framework designed to: (1) detect
faults flexibly using declarative monitor definitions,
(2) learn and extract recovery strategies online from
user/learned behavior to avoid pre-defined behav-
ior, and (3) dynamically adapt behavior to heal
failures, according to the systems execution context.
In the following, we present the complete self-
healing process posit by our framework.
The objective of our framework is to recover from
unknown situations at run time. To achieve this, we
position monitors at the heart of the solution, as in
Fig. 2.
System
Method Variation
Monitor 0
Learning Model
Fault Detection Variation Manger
Monitor 1
Learning Model
Fault Detection Variation Manger
Fig. 2: Monitor Architecture
Monitors are used to handle fault detection.
When first encountered, a point of failure is used
to trigger learning recovery strategies process from
actions executed by external agents (e.g., users,
automated agents). Having multiple possible strate-
gies, the learning model the best suited corrective
behavior from all learned options. Once the learn-
ing stage finishes, the variation manager generates
the recovery strategy from action sequences, de-
fined as a variation to execute at run time. The
fault detection system, now triggers the appropriate
learned variations as faults are detected.
The main contribution of the framework is that,
opposed to existing self-healing systems, monitors
are not constrained to specific application points
(i.e., hooks) where a known healing strategy can be
applied. Rather, the declarative definition of moni-
tors enables us to evaluate different system proper-
ties at any point during the execution. Furthermore,
we are not restricted to applying pre-defined re-
covery strategies. Learning recovery strategies from
combinations of actions in the base system behavior
January 24, 2024 DRAFT
helps us recover from unannounced situations, a
property unique to our approach.
A. Fault Detection
Monitors are defined to observe the system exe-
cution. Monitors are instantiated as needed in the
application, given their declarative definition, spec-
ifying the variables or functions to observe, rather
than specifying fixed application points. Fig. 2
shows monitor’s internal structure and interaction
with the environment. Monitors are composed of:
the fault detection system, the learning model, and
the variation manager.
Monitor instances require the specification of a
set of relevant variables (i.e., observable variables),
and a predicate verifying such variables. Monitors
verify the system behavior based on REScala’s
observers; thus, the monitor verification is trig-
gered with every update to the variables under
observation. For our running example, we define a
monitor through the orFault predicate in Snippet 1,
evaluating the ASCII code of the mouse position
given by the x() signal.
Snippet 1: Monitor predicate declaration
1val orFaults = (x: Signal[Int]) => Signal[Boolean] {
2(40 < x() and x() < 50) or 200 < x()
3}
Moreover, monitors can refine the predicates dy-
namically, for example, if the acceptable conditions
of the behavior change at run time. This is achieved
by defining new predicates on top of the existing
ones to account for the new conditions. New predi-
cates can overwrite (as in Snippet 2) or complement
the behavior of existing predicates. In our example,
the primeFaults predicate is used to overwrite the
orFaults predicate.
Snippet 2: Primality check predicate declaration
1val primeFaults = (x: Signal[Int])=>Signal[Boolean]{
2isPrime(x())
3}
B. Recovery Strategy Generation
The second step in generating recovery strategies
involves learning corrective action sequences to
transition from a fault state to a valid state. Based
on RL, learning first undergoes a training phase, in
which the system executes (atomic) system actions
for a set number of steps, dependent on the state
and action space of the system. During each step,
whenever a monitor detects a fault state, it initiates
the learning process. The learning process requires
to explore actions, and based on an obtained re-
ward decide if the effectively correct the system be-
havior. corrective behavior. Once the the reward on
actions taken converges, we exploit these learned
actions as a recovery strategy for the detected fault
state. Following this, our framework is designed for
continuous learning, where new recovery actions
can be learned even after a recovery strategy has
been generated already.
In particular, we use a Q-learning agent to learn
atomic action sequences –that is, behavior (e.g.,
functions, methods) defined in the base system. The
monitors continuously capture the system state for
each event, evaluating their corresponding pred-
icates. Once failure states are detected, as events
not satisfying the monitor’s predicate, we use the
learning model to record the set of atomic actions
used to bring the system back to a valid state.
To drive the learning process, each atomic action
must be associated with a reward, to maximize the
attainable reward.
After the learning process concludes, the learning
model generates a map that encapsulates the fault
states and a list of the learned recovery actions,
January 24, 2024 DRAFT
containing those atomic actions with the highest
reward from those states, as illustrated in Snippet 3.
These highest reward atomic actions, depending on
the context, can be used to construct complex heal-
ing strategies by composing such atomic actions
with the highest rewards. Whenever the monitor
detects a fault of an already learned state in the
map, the system now will automatically execute the
healing strategy, until it reach a valid state.
Snippet 3: Learning atomic actions
1val DetectionObservable = observable observe {
2State => if (learningFlag) {
3step(State.oldPosition, State.newPosition)
4}else if (tableVariationCreationFlag) {
5agentData.Q.foreach {
6case (key, value) =>
7StateOfVariations.updateState(key,
updateCurrentVariationState(key, value, List
()))
8}
9tableVariationCreationFlag = false
10 }
11}
Note that the effectiveness of the generated heal-
ing strategies is contingent on both the expres-
siveness of the predicates and the exploration of
possible states during the learning stage. If the
monitor only learns from local states, there won’t be
any generated healing strategies for states not ex-
plored during learning. Moreover, if the predicate’s
expressiveness falls short in identifying desired
fault states, the learning stage may overlook them,
leading to incorrect reactions.
C. Healing From Error States
The final stage in our process consists on au-
tonomously applying the healing strategy at any
given application point of the system (definition or
execution) rather than relying on predefined actions
for particular program points.
Once a fault state is detected, and the association
between the state and its sequence of atomic actions
learned, the system should be capable to enact the
sequence at the fault state, rather than falling into
it. The dynamic composition and withdrawal of
behavior of COP is used for this purpose. That is,
whenever a fault state is reached and flagged by
the monitor’s predicate, the system should auto-
matically execute the learned sequence of atomic
actions to recover from said failure.
To do this, whenever a predicate is triggered, if
its state is defined in the reaction map, the monitor
interrupts system’s base execution and proceeds to
execute the sequence of recovery actions, as shown
in Snippet 4. Once the reaction is finished, the
expected behavior control execution is returned to
the element.
Snippet 4: Automated application of recovery
strategies
1def takeAction(ctx: Boolean, e: MouseMoved): Unit =
2if(ctx) {
3val agent = AgentLocation(e.point.x, e.point.y)
4val variationList: List[QMove] = StateOfVariations.
getState.getOrElse(agent, List())
5if (variationList.nonEmpty) {
6val castedVariationList = CastingDefinition.
QMoveListToDefinitionList(variationList)
7actor ! SetActiveVariations(castedVariationList)
8actor !moved(e.point.x, e.point.y)
9}
10}else
11currentObservable.mouse.mouseMovedE.fire(e.point)
Given that the recovery strategy corresponds to a
list of atomic actions, the execution of the strategy
consist of setting the list of variation to the variation
manager and then calling the system behavior,
which now corresponds to that of the first variation
in the list; as each variation finishes, it calls upon
the next variation, leading to the execution of all
the variations in the sequence. Finally, after the
execution of the healing strategy the system reverts
to the base system behavior.
In our running example, getting to state (17,28)
will trigger the monitor invalidating the orFaults
predicate in Snippet 1, as 10 <17 +28 <50. During
January 24, 2024 DRAFT
exploration, our framework learns to recover from
this state by generating sequences of atomic actions
(i.e., mouse movements).
IV. Validation
This section shows the feasibility and effective-
ness of our framework for the self-healing of reac-
tive programs, without having to specify program
hooks or strategies beforehand. The validation con-
sists of two different applications from different
domains and knowledge areas. First, we present
our running example of a prototypical application
for reactive systems, taken from the REScala ex-
emplars.1This application demonstrates the effec-
tiveness of our approach to heal systems in the
reactive systems application domain. Second, we
use the DeltaIoT self-healing exemplar to evaluate
the effectiveness of our approach in generating
appropriate healing strategies (i.e., in line with pre-
defined strategies) without previous specification,
to manage unforeseen situations.
A. ASCII Mapping Tessellation
We validate the feasibility of our framework us-
ing the mouse movement tracking application. In
particular, we demonstrate it is possible to: (1) de-
fine monitors for different system behavior/states,
(2) learn strategies that recover from failure states
autonomously.
1) Description: The mouse position is represented
as a signal that updates whenever the mouse moves
on a 100 ×100 GUI, as illustrated in Fig. 1. Using
the mouse position signal, we define two evalua-
tion scenarios according to different distributions
of faulty states.
1https://github.com/rescala-lang/REScala/tree/master/Code/Examples
Fault states are defined as tessellation patterns
based on the number given by the mouse’s po-
sition in the GUI. Tessellations are defined using
the predicates to detect failures. The predicates are
formulated based on REScala’s built-in functional-
ity and user behavior. Our scenarios are based on
two predicates, orFaults (Snippet 1), and primeFaults
(Snippet 2), using the logical or operator and a
predicate to determine primality, respectively. The
sum signal converts the (x,y) coordinates into an
integer that is input to the two predicates.
1val sum: default.Signal[Int] = Signal[Int] {
2(mouseX() + mouseY()) % 255
3}
The tessellations generated by the predicates are
respectively shown in Fig. 3, where the blue region
represent the failure GUI states. Mouse movements
trigger the signals. Failure detection is then a reac-
tion associated with the monitor. Monitors actively
observe mouse movements to detect faults (blue
regions) using the defined predicates (orFaults or
primeFaults in our case), then start corrective actions
to move the mouse to a white region. The tessella-
tions in the two scenarios generate different fault
distributions in the GUI states, as a case where
no predefined corrective action fits both scenarios.
In Fig. 3b, failure states are more uniformly dis-
tributed than in Fig. 3a where faults are concen-
trated in three regions.
As discussed in Section II-C, a predefined set of
possible actions must be established. In this context,
the actions correspond to mouse movements: Up,
Down,Left, and Right. These actions are the atomic
actions available to the system to correct the mouse
position between blue and white regions. Addition-
ally, we define a reward function that assigns a
January 24, 2024 DRAFT
value of 1 when the predicate is true (indicating
the mouse is in a valid position) and 0 when
false. Therefore, each mouse position (x,y) maps
to a collection of rewards, for each of the possible
actions in that position, the Q-value map.
The execution process of the system is as follows:
(1) the mouse moves through the GUI in a random
path ensuring that all positions have the same
probability of being visited. (2) During the initial
part of the process, actions moving from failure
states to valid states are recorded with a positive
reward. (3) Once learning concludes, the Q-value
map encapsulates optimal atomic actions for each
GUI state. (4) We systematically concatenate indi-
vidual optimal atomic actions in failure states, until
a valid state is reached, effectively generating a
healing strategy for the initial failure state. (5) Fi-
nally, whenever a failure state is detected again,
the learned corrective actions sequences are applied
autonomously, taking the mouse to a valid state,
and continue with its movement (step 1).
Note that loops may appear in the construction
of healing strategies. The process must account for
loops and transitive relations within the path, to im-
prove the process. state. This formalized approach
ensures a methodical exploration of potential action
sequences and their efficacy in guiding the system
towards valid states.
2) Experimental Design: We consider two tessella-
tions to evaluate the effective generation of strate-
gies in different scenarios. The scenarios correspond
to two predicates: (orPredicate,primePredicate), used
by the monitors. In both scenarios an external au-
tomatic agent moves the mouse uniformly through
the whole GUI for both experiments assuring all
states are visited, running for 100000 steps, a step
is a movement of the mouse. Each experiment runs
(a) Tessellation using the
orPredicate with three
specific fault regions
(b) Tessellation using
the primePredicate with
multiple fault regions
spread through the GUI
Fig. 3: Errors arise in the blue areas according to
the two predicates
for five iterations.
The first experiment counts the number of times
the monitor detects a failure, to compare the effec-
tiveness in detecting failures against the theoretical
value. Once learning finishes, the second experi-
ment takes place. The agent moves the mouse on
failure regions to test how many healing strategies
generate reactions leading to valid states.
3) Results: TABLE I shows the result of the ex-
periments where:
Faults detected shows the number of failure states
that the monitor detected through the learning
phase. That is, out of the 100000 executed steps
how many triggered the monitor’s predicate.
Fault proportion shows the proportion of detected
faults to the number of steps (100000).
Correct strategies shows the number of healing
strategies that lead to a correct state, once the
learning phase ends. To get this value, for every
fault state in the tessellation, we check whether or
not the associated healing strategy leads to correct
state.
Healing effectiveness shows the proportion of
January 24, 2024 DRAFT
correct strategies to the number of states to heal
(i.e., how well the system heals after the learning
phase). The number of states to heal each tes-
sellation is 1174 for primePredicate, and 2090 for
orPredicate.
The first two rows in TABLE I measure the
monitor failure detection. For the primePredicate tes-
sellation 13.3% of states are failures states, and for
the orPredicate 26.4% are failure states.
The last two rows in TABLE I measure monitor
failure and healing strategies effectiveness. For in-
stance, the mean value for correct strategies in the
tessellation of the primePredicate has a high preci-
sion, with a low standard deviation. This leads us
to conclude that the monitor is precise and accurate
when generating and executing healing strategies.
At first glance, the result for the orPredicate tessel-
lation seems contradictory. For example, iteration
3 shows a healing effectiveness of 23%, with a
mean value of 55.7% for the five iterations, and
a standard deviation of 16.98%. A possible reason
for these values come from the density of failure
states in each scenario. Fig. 3 shows the GUI cover-
ing with tessellations using a different distribution
of failure states. The orPredicate tessellation has
three specific regions containing all failure states, in
failure-dense regions learning corrective strategies
will take longer. In contrast, the primePredicate tes-
sellation has a better propagation of the reward due
to the many correct regions surrounding the failure
regions. Therefore, the amount of learning steps for
the systems should be defined by users, based on
the system complexity and interaction with failure
states observed by the monitors.
B. DeltaIoT Exemplar
The DeltaIoT exemplar is a multi-hop LORA
communication network composed of 25 IoT
motes deployed in different physical locations [15].
DeltaIoT is used as an exemplar to evaluate dif-
ferent self-adaptation strategies to manage motes’
trade-offbetween battery consumption and packet
delivery.2
1) Experimental Design: The DeltaIoT project3
provides a simulation environment of a LORA
network, with simulated behavior for the battery
consumption and package handling of the different
motes, as a subset of the real-world implemen-
tation. In the simulation, there are 96 execution
steps, in each step every (Mote,Link) combination
is traversed. During each step, the system uses
predefined adaptations to improve the PacketLoss
and EnergyConsumption metrics for the motes.
To apply our approach in the DeltaIoT exemplar,
we remove the prescribed adaptations and define
the execution steps of the simulation using reactive
signals. The main signal represents the execution of
the simulation, in which an event is a (Mote,Link)
combination. Building upon signals, we establish
the monitoring system. Monitors leverage the ac-
tivation strategy embedded in the simulation. In
particular, we create a predicate, analyzeLinkSettings
, measuring links’ state with respect to the two
metrics. When the predicate is satisfied, it trig-
gers the deployment of the variations defining the
learned healing strategies. The monitor, guided by
a learning model, efficiently manages healing adap-
tation strategies, ensuring effective surveillance of
2undisclosed repo url
3https://people.cs.kuleuven.be/~danny.weyns/software/
DeltaIoT
January 24, 2024 DRAFT
TABLE I: Experimental iterations results
orPredicate primePredicate
1234 5 12345
Faults detected 18122 18390 18127 18404 18048 20786 20652 20594 20712 20805
Faults proportion 18.1% 18.3% 18.1% 18.4% 18.0% 20.7% 20.6% 20.4% 20.7% 20.8%
Correct strategies 649 797 309 686 843 1937 1786 1677 1808 1842
Healing effectiveness 55% 67.8% 23.6% 58.4% 71.8% 92.6% 85.4% 80.2% 86.5% 88.1%
the simulation, as described in Section III. While
we reimplemented the use of monitors and simu-
lation events, our experiment mirrors the original
simulation. The Scala extension to the project is
modularly independent from the original DeltaIoT
implementation.
In the learning phase, every time a monitor’s
predicate is triggered, the user chooses the cor-
rective actions for the energy use or path in the
network configuration. We run DeltaIoT for 96
execution steps, as training of our RL model (the
original execution steps of the simulation). Once
variations are learned, we run the simulation for 54
additional steps, that do not depend on the specific
learning hooks. During these steps, the learning
model leads the healing strategies to execute in the
system from the learned behavior. In this exper-
iment, managing mote energy and packet loss at
par with the prescribed adaptations from DeltaIoT
is consider a success for our solution.
2) Results: The results show that the strategies
learned from the application usage, coincide with
the prescribed adaptations in the original DeltaIoT
simulation [15]. Figures 4 and 5 display the re-
sults for the packet loss and power consumption
adaptation strategies, respectively. The behavior of
our approach with respect to packet loss Fig. 4
maps to that of the original simulation, the mean
packet loss in our solution is slightly higher than
that in the original implementation. This might be
due to the higher rate of lost packets in the first
steps of the simulation, while the algorithm is still
learning. However, towards the last steps, once
learned variations are in place, we observe a far
lower rate of lost packets. The energy consump-
tion case, Fig. 5 presents a similar situation, with
a slight improvement seen using our approach.
The results allow us to conclude that learning the
adaptation strategies is as effective as using the
predefined adaptations in reaching the system’s
objective, with the added value of not having to
provide the pre-defined adaptations or the hooks to
monitor and apply them. Additionally, Fig. 6 shows
the life-long learning and continuous behavior of
our proposal. Here, the learned strategies continue
working without the need for any particular self-
healing strategy, and the system can continue a
learning more strategies.
TABLE II shows the behavior of both implemen-
tations from the perspective of detected faults and
the corrective actions applied. With these results
we can assure that the system is behaving as it
is supposed to, without the need for particular
predefined hooks in the application.
January 24, 2024 DRAFT
Fig. 4: Packet loss values after both simulation
execution
Fig. 5: Energy consumption values after both sim-
ulation execution
Fig. 6: Power consumption values for new imple-
mentation long run
TABLE II: Event detection DeltaIoT vs. our frame-
work
Proposed solution DeltaIoT
Total events 603 603
Correct strategies 411 420
V. Related Work
This section presents a comparison between re-
lated self-healing approaches, existing COP solu-
tions using RL, and our proposed approach.
A. Self Healing
Early work in this field focused on a self-
stabilizing system where the system reaches a le-
gitimate state in a finite number of steps regardless
of its initial state [9]. Following similar defini-
tions, several approaches have been proposed to
define self-healing systems. For example, a classifi-
cation by research area is presented by Psaier et al.
[19]. Here the authors identify various implemen-
January 24, 2024 DRAFT
tations of self-healing systems according to their
application domain: embedded systems, operating
systems, reflective middleware, and web services,
among others. Koopman [17] presents a set of
general concepts that all self-healing systems have
in common, regardless of their application domain.
In this sense, the system must have a fault model
with complete knowledge of the faults expected
to self-heal, having some information regarding
fault duration, fault manifestation, and fault source.
Besides, the system must be complex enough to
respond to the fault, meaning that the proper mech-
anism for fault detection, response, and recovery
must be considered. Furthermore, self-healing ap-
proaches must evaluate the system completeness. it
must account for how the system behaves over time
and handle the various changes in the system archi-
tecture that may occur at run time and architectural
completeness. Finally, Koopman [17] proposes that
the design context shapes in a particular way the
self-healing capabilities.
Concerning the architectural completeness, be-
fore any self-healing system can occur, a significant
infrastructure must be put in place to support fault
detection and repair [7]. Specifically, the system
must be built using a framework that provides: run-
time adaptation, a language to express the repair
plan, and an agent to execute the repair.
Usually, self-healing systems are tightly inte-
grated with the application. Granting the sys-
tems the ability to deal with failures at detection.
Nonetheless, externalization mechanisms are pro-
posed [12], where the self-healing system is un-
tangled from the base application using an archi-
tectural model approach to design the monitoring,
problem detection, and repair; i.e., the self-healing
system works on top of the base application un-
derstanding what the running system is doing in
high-level terms. Following this idea, Al-Zawi et
al. [34] proposes that these systems are suitable for
continuous learning. The system could learn from
its run-time behavior to improve its capabilities.
Specifically, the case study of this paper presents
a client-server application in which the self-healing
system is based on a PRNN (Pipelined Recurrent
Neural Network) where the parameters are the
server status and the bandwidth level. Another
example of learning capacity for self-healing sys-
tems [6] uses a Multivariate Diagram and a Naive
Bayes Classifier to determine severity levels and
infer possible consequences. In this approach, the
model continuously changes its parameters based
on the healing process. Reinforcement Learning RL
has also been used for this application. For instance,
Schneider et al. [28] shows self-healing systems’
capabilities in the area of LTE, implemented using
a RL scheme. Similarly, Razavi et al. [21] proposes a
self-optimization solution of coverage and capacity
in LTE networks using fuzzy RL while operat-
ing complete autonomous in a fully distributed
environment. Additional research efforts exist to
define self-healing systems over networks [1], [16],
[24], using similar techniques to the ones described
before.
Moreover Zhao et al. [36] propose a reinforcement
learning-based framework for the generation and
evolution of adaptation rules in software-intensive
systems. The framework involves an offline learn-
ing phase, similarly to our proposal, to generate
adaptation rules for various goal settings and an
online adaptation phase to use and dynamically
evolve these rules. It addresses the limitations of
traditional rule-based adaptation, such as guaran-
teeing optimal adaptation and supporting rule evo-
January 24, 2024 DRAFT
lution to cope with non-stationary environments
and changing user goals at runtime. Similar to our
approach of learning actions at run time to heal
the system through adaptations, Zhang et al. [35]
introduce self-learning adaptive systems (SLAS)
and underscore the significance of acquiring high-
performance adaptation policies for dynamic sce-
narios. SLAS proposes a new method called meta
reinforcement learning adaptive planning (MeRAP)
for the online adaptation of SLAS, which sepa-
rates concerns related to adaptation policy, mod-
els environment-system dynamics, and employs a
meta reinforcement learning algorithm for offline
training and online adaptation.
B. Combing COP and Learning Approaches
Our self-healing approach incorporates the Q-
learning reinforcement learning approach, enabling
the system to define healing decisions automati-
cally. Then, when the system detects a failure state,
this healing strategy is executed automatically us-
ing context-aware variations of the defined meth-
ods.
Similarly, Cardozo et al. [2] propose to have no
predefined adaptation using a proof-of-concept to
illustrate this idea while showing the challenges
of implementing such systems. For instance, the
systems should be capable of integrating new el-
ements, data sources, atomic actions, and goals at
run time. Building on that, Auto-COP [4] uses RL to
build action sequences based on previous instances
of the system execution.
As shown in Section II-B, COP is used to generate
dynamic behavioral responses to the system execu-
tion context; nevertheless, due to different simul-
taneous sensed situations, many adaptations could
be applicable, which leads to conflict in systems’
execution. An automated conflict resolution mech-
anism [5], where W-Learning (an RL algorithm) is
used to capture the relationships between simulta-
neously proposed adaptations over time, updating
their appropriateness as the system progresses.
Based on the revision of the current state of the
art, it is possible to conclude that:
•All self-healing systems must present the same
general properties.
•Learning exists to try to manage self-healing
systems, but there is no consensus, though RL
looks promising.
•The application domain seems to be restricted
for distributed systems.
We note that all current self-healing systems
must have previous knowledge of the possible fault
states in centralized systems –that is, only deal
with known unknowns. Finally, to the authors’
best knowledge, there is no work done on self-
healing systems on centralized systems or reactive
applications.
VI. Conclusion and Future Work
There are two principal concerns when design-
ing a self-healing system: detection and healing.
The main challenges behind these concerns are the
detection of fault states, the definition of healing
strategies, and the definition of hooks for adapta-
tion to introduce correct(ive) behavior. This paper
proposes a framework to deal with the complexity
of real word self-healing reactive applications. The
framework proposes monitors which are component
abstractions defined to deal whit the challenges
presented above. Monitors encapsulate the com-
plexity of fault detection and adaptation definition
and introduction in self-healing systems. Monitors
enable developers to customize detection strategies
January 24, 2024 DRAFT
and modularly observe the system at different gran-
ularity levels throughout is evolution. With this, the
design and implementation of self-healing systems
becomes more straightforward.
The two key features of the proposed framework
are: (1) A self-healing process driven by flexible
monitors defining fault detection at any granularity
level, and applicable at any program location or
moment in execution without predefined hooks.
(2) No need for predefined healing strategies. Our
framework is able to learn healing strategies from
the system’s execution, generating sequences of
actions that take the system from fault state to valid
ones.
The decrease in complexity and difficulty of de-
tecting and handling failures in massive reactive
systems, like streaming companies, makes this pro-
posal appealing since monitors could be placed
inside any existing services, automating the on-line
recovery from errors, while avoiding the process of
defining and managing complex healing strategies.
We demonstrate the feasibility of our approach
with a prototypical reactive application for tracking
mouse movements in which several signals are
generated using the mouse movement information.
We introduce monitors to dictate fault and valid
states of the system, automatically generating ac-
tions sequences to move from fault states to valid
ones. With this, we expand the applicability of self-
healing systems to the domain of reactive systems.
We further demonstrate the effectiveness of our
framework by applying it to the DeltaIoT self-
adaptation exemplar. In this application we demon-
strate how the system is able to generate healing
strategies that behave at par with those originally
prescribed in the system. Furthermore, our model
can continue learning to generate healing strategies
for situations not previously conceived.
As avenues for future work, we propose defin-
ing communication strategies between monitors
to enhance global healing strategies using local
strategies, all without the need for centralized or-
chestration. In alignment with this objective, we
suggest designing distributed reactive monitors as
a technique for generating comprehensive global
healing strategies.
Additionally the learning phase is a time and
resource intensive process. However, in this work,
our primary focus is not on addressing these re-
source concerns but rather on highlighting the
capability to generate effective healing strategies
without the need of predefined hooks. For future
work, we propose validating the efficiency of this
solution against predefined self-healing strategies,
taking into account the resource implications of the
training phase as a determining factor.
Acknowledgments
This work was sponsored, in part, by the Sci-
ence Foundation Ireland (SFI) under Grant No.
18/CRT/6223 (Centre for Research Training in Arti-
ficial Intelligence), and SFI Frontiers for the Future
project Clearway Grant No. 21/FFP-A/8957.
References
[1] T. Angskun, G. Fagg, G. Bosilca, J. Pješivac-
Grbovi´c, and J. Dongarra, Self-healing net-
work for scalable fault-tolerant runtime envi-
ronments, Future Generation Computer Sys-
tems, vol. 26, pp. 479–485, 2010.
[2] N. Cardozo and I. Dusparic, Generating soft-
ware adaptations using machine learning,
Workshop on Machine Learning for Progra-
mming Languages, 2018, pp. 1–2.
January 24, 2024 DRAFT
[3] N. Cardozo and I. Dusparic, Next genera-
tion context-oriented programming: Embrac-
ing dynamic generation of adaptations, Jour.
of Object Technology, vol. 21, pp. 1–6, 2022.
[4] N. Cardozo and I. Dusparic, Auto-COP:
Adaptation generation in context-oriented
programming using reinforcement learning
options, Information and Software Technol-
ogy, vol. 164, 2023.
[5] N. Cardozo, I. Dusparic, and J. H. Castro,
Peace corp: Learning to solve conflicts between
contexts, Proc. of the 9th Intl. Workshop on
Context-Oriented Programming, 2017, pp. 1–
6.
[6] Y. Dai, Y. Xiang, and G. Zhang, Self-healing
and hybrid diagnosis in cloud computing,
Cloud Computing, Springer Berlin Heidel-
berg, 2009, pp. 45–56, isbn: 978-3-642-10665-1.
[7] E. M. Dashofy, A. Van der Hoek, and
R. N. Taylor, Towards architecture-based self-
healing systems, Proc. of the first workshop
on Self-healing systems, 2002, pp. 21–26.
[8] F. M. David and R. H. Campbell, Building
a self-healing operating system, Third IEEE
Intl. Symp. on Dependable, Autonomic and
Secure Computing (DASC 2007), IEEE, 2007,
pp. 3–10.
[9] E. W. Dijkstra, Self-stabilizing systems in
spite of distributed control, Commun. ACM,
vol. 17, pp. 643–644, 1974.
[10] B. Dundar, M. Astekin, and M. S. Aktas, A
big data processing framework for self-healing
internet of things applications, Intl. Conf. on
Semantics, Knowledge and Grids, IEEE, 2016,
pp. 62–68.
[11] C. Elliott and P. Hudak, Functional reactive
animation, Intl. Conf. on Functional Progra-
mming, 1997.
[12] D. Garlan and B. Schmerl, Model-based adap-
tation for self-healing systems, Proc. of the
First Workshop on Self-Healing Systems, As-
sociation for Computing Machinery, 2002,
pp. 27–32, isbn: 1581136099.
[13] S. Girgin and F. Polat, Option discovery in re-
inforcement learning using frequent common
subsequences of actions, Intl. Conf. on Com-
putational Intelligence for Modelling, Control
and Automation and Intl. Conf. on Intelli-
gent Agents, Web Technologies and Internet
Commerce (CIMCA-IAWTIC’06), IEEE, 2005,
pp. 371–376.
[14] R. Hirschfeld, P. Costanza, and O. Nierstrasz,
Context-oriented programming, Jour. of Ob-
ject technology, vol. 7, pp. 125–151, 2008.
[15] M. U. Iftikhar, G. S. Ramachandran, P. Bol-
lansée, D. Weyns, and D. Hughes, Deltaiot:
A self-adaptive internet of things exemplar,
2017 IEEE/ACM 12th Intl. Symp. on Software
Engineering for Adaptive and Self-Managing
Systems (SEAMS), IEEE, 2017, pp. 76–82.
[16] R. Kawamura, K.-I. Sato, and I. Tokizawa,
Self-healing atm networks based on virtual
path concept, IEEE Jour. on Selected Areas in
Communications, vol. 12, pp. 120–127, 1994.
[17] P. Koopman, Elements of the self-healing sys-
tem problem space, 2003.
[18] H. Naccache and G. C. Gannod, A self-
healing framework for web services, IEEE Intl.
Conf. on Web Services (ICWS 2007), IEEE,
2007, pp. 398–345.
January 24, 2024 DRAFT
[19] H. Psaier and S. Dustdar, A survey on self-
healing systems: Approaches and systems,
Computing, vol. 91, pp. 43–73, 2010.
[20] J. Randlov, Learning macro-actions in rein-
forcement learning, Advances in Neural In-
formation Processing Systems, vol. 11, 1998.
[21] R. Razavi, S. Klein, and H. Claussen, Self-
optimization of capacity and coverage in lte
networks using a fuzzy reinforcement learn-
ing approach, IEEE Intl. Symp. on Personal,
Indoor and Mobile Radio Communications,
2010, pp. 1865–1870.
[22] G. D. Rodosek, K. Geihs, H. Schmeck, and B.
Stiller, Self-healing systems: Foundations and
challenges., Self-Healing and Self-Adaptive
Systems, 2009.
[23] S. R. S. and B. A. G., Reinforcement Learning:
An Introduction. Bradford Book. The MIT
Press, Cambridge, Massachusetts, 1998.
[24] A. Saeed, O. G. Aliu, and M. A. Imran, Con-
trolling self healing cellular networks using
fuzzy logic, IEEE Wireless Communications
and Networking Conf., 2012, pp. 3080–3084.
[25] G. Salvaneschi, C. Ghezzi, and M. Pradella,
Context-oriented programming: A software
engineering perspective, Jour. of Systems and
Software, vol. 85, pp. 1801–1817, 2012.
[26] G. Salvaneschi, C. Ghezzi, and M. Pradella,
Contexterlang: A language for distributed
context-aware self-adaptive applications, Sci-
ence of Computer Programming, vol. 102,
pp. 20–43, 2015.
[27] G. Salvaneschi, G. Hintz, and M. Mezini,
Rescala: Bridging between object-oriented and
functional style in reactive applications, Proc.
of the 13th Intl. Conf. on Modularity, 2014,
pp. 25–36.
[28] C. Schneider, A. Barker, and S. Dobson, A
survey of self-healing systems frameworks,
Software: Practice and Experience, vol. 45,
pp. 1375–1398, 2015.
[29] M. Stolle and D. Precup, Learning options
in reinforcement learning, Intl. Symp. on ab-
straction, reformulation, and approximation,
Springer, 2002, pp. 212–223.
[30] R. S. Sutton, D. Precup, and S. P. Singh, Intra-
option learning about temporally abstract ac-
tions., ICML, 1998, pp. 556–564.
[31] A. Trehan, Algorithms for self-healing net-
works, CoRR, vol. abs/1305.4675, 2013.
[32] Z. Wan and P. Hudak, Functional reactive
programming from first principles, Proc. of
the Conf. on Programming language design
and implementation, 2000, pp. 242–252.
[33] D. Weyns, Software engineering of self-
adaptive systems: An organised tour and
future challenges, Chapter in Handbook of
Software Engineering, p. 2, 2017.
[34] M. M. Al-Zawi, A. Hussain, D. Al-Jumeily,
and A. Taleb-Bendiab, Using adaptive neural
networks in self-healing systems, 2009 Second
Intl. Conf. on Developments in eSystems En-
gineering, 2009, pp. 227–232.
[35] M. Zhang, J. Li, H. Zhao, K. Tei, S. Honiden,
and Z. Jin, A meta reinforcement learning-
based approach for self-adaptive system, 2021
IEEE Intl. Conf. on Autonomic Computing
and Self-Organizing Systems (ACSOS), IEEE,
2021, pp. 1–10.
[36] T. Zhao, W. Zhang, H. Zhao, and Z. Jin, A
reinforcement learning-based framework for
the generation and evolution of adaptation
rules, 2017 IEEE Intl. Conf. on Autonomic
Computing (ICAC), IEEE, 2017, pp. 103–112.
January 24, 2024 DRAFT