Content uploaded by Arne Bröring
Author content
All content in this area was uploaded by Arne Bröring on Jan 19, 2023
Content may be subject to copyright.
Content uploaded by Arne Bröring
Author content
All content in this area was uploaded by Arne Bröring on Jan 19, 2023
Content may be subject to copyright.
Task Allocation in Industrial Edge Networks with Particle Swarm
Optimization and Deep Reinforcement Learning
Philippe Buschmann
philippe.buschmann@siemens.com
Siemens AG
Munich, Bavaria, Germany
Technical University of Munich
Garching, Bavaria, Germany
Mostafa H. M. Shorim
shorim@net.in.tum.de
Siemens AG
Munich, Bavaria, Germany
Technical University of Munich
Garching, Bavaria, Germany
Max Helm
helm@net.in.tum.de
Technical University of Munich
Garching, Bavaria, Germany
Arne Bröring
arne.broering@siemens.com
Siemens AG
Munich, Bavaria, Germany
Georg Carle
carle@net.in.tum.de
Technical University of Munich
Garching, Bavaria, Germany
ABSTRACT
To avoid the disadvantages of a cloud-centric infrastructure, next-
generation industrial scenarios focus on using distributed edge net-
works. Task allocation in distributed edge networks with regards
to minimizing the energy consumption is NP-hard and requires
considerable computational eort to obtain optimal results with
conventional algorithms like Integer Linear Programming (
ILP
).
We extend an existing ILP problem including an ILP heuristic for
multi-workow allocation and propose a Particle Swarm Optimiza-
tion (
PSO
) and a Deep Reinforcement Learning (
DRL
) algorithm.
PSO and DRL outperform the ILP heuristic with a median opti-
mality gap of 7
.
7 % and 35
.
9 % against 100
.
4 %. DRL has the lowest
upper bound for the optimality gap. It performs better than PSO
for problem sizes of more than 25 tasks and PSO fails to nd a
feasible solution for more than 60 tasks. The execution time of DRL
is signicantly faster with a maximum of 1 s in comparison to PSO
with a maximum of 361 s. In conclusion, our experiments indicate
that PSO is more suitable for smaller and DRL for larger sized task
allocation problems.
CCS CONCEPTS
•Theory of computation
→
Scheduling algorithms;•Com-
puting methodologies
→
Heuristic function construction; Model
verication and validation.
KEYWORDS
Edge Computing, Internet of Things (IoT), Task Allocation, Inte-
ger Linear Programming, Deep Reinforcement Learning, Particle
Swarm Optimization
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
IoT ’22, November 7–10, 2022, Delft, Netherlands
©2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 978-1-4503-9665-3/22/11. . . $15.00
https://doi.org/10.1145/3567445.3571114
ACM Reference Format:
Philippe Buschmann, Mostafa H. M. Shorim, Max Helm, Arne Bröring,
and Georg Carle. 2022. Task Allocation in Industrial Edge Networks with
Particle Swarm Optimization and Deep Reinforcement Learning. In Proceed-
ings of the 12th International Conference on the Internet of Things (IoT ’22),
November 7–10, 2022, Delft, Netherlands. ACM, New York, NY, USA, 9 pages.
https://doi.org/10.1145/3567445.3571114
1 INTRODUCTION
Today’s prevailing cloud-centric Internet of Things (
IoT
) model
has limitations [
10
], e.g., (i) unreliable cloud connectivity, (ii) lim-
ited bandwidth, and (iii) high round-trip times. Therefore, next-
generation
IoT
applications, such as autonomous guided vehicles,
AR/VR, or industrial robots/drones, require advanced
IoT
environ-
ments that comprise heterogeneous devices, which collaboratively
execute these IoT applications [15].
In this work, we focus on the industrial edge computing infras-
tructure as advanced
IoT
environment. Users of industrial edge
networks can scale resources horizontally on multiple local net-
work nodes. This allows moving workows and applications from
the cloud to local nodes which can reduce latency and can increase
throughput and reliability [
2
]. Further, this can enhance the privacy
and security of the workow [5].
One disadvantage of the industrial edge is the complexity of
placing workow on a distributed network. A workow can be
constrained by Quality of Service (
QoS
) objectives like latency or
energy-consumption and consists of one or multiple tasks that com-
municate with each other. Thus, a random placement of tasks in a
workow on network nodes might not satisfy the
QoS
constraints
of the workow. To optimize the
QoS
constraints while adhering
to the physical boundaries of the network, it is necessary to de-
sign specic algorithms for solving the task allocation problem.
This problem is known as NP-hard [
1
] and can be solved by either
optimal or heuristic approaches.
Optimal Approach To allocate tasks optimally on the nodes of
an edge network, we can use well-known optimization techniques
like
ILP
, and Optimization Modulo Theories (
OMT
). These meth-
ods achieve an optimal solution [
1
,
19
], but scale poorly in large
networks with many applications [
19
]. For example, the
ILP
model
in [
19
] can take up to a week of computational time to nd the
239
IoT ’22, November 7–10, 2022, Del, Netherlands Buschmann et al.
optimal allocation of 50 tasks on 20 nodes in a network with the
objective of minimizing the total energy-consumption [19].
Heuristic Approach Heuristics and meta-heuristics only ap-
proximate the optimal solution and usually scale better than optimal
approaches [
20
,
22
]. Some well-known heuristics are the Genetic
Algorithm (
GA
) and
PSO
[
20
,
22
]. Since these algorithms only ap-
proximate the optimal solution, they may not satisfy the
QoS
con-
straints perfectly or may perform worse than the optimal solution,
which is generally known as optimality gap [1].
In this work, we investigate both orchestration methods for opti-
mal and approximated workow allocation. We extend an existing
ILP
model to optimize the allocation of multiple workows with
regards to the capabilities of task and nodes and we implement
and evaluate a
PSO
and a
DRL
approach for the approximated al-
location of tasks. Since the energy consumption of networked IoT
devices and its environmental impact gains more awareness [
16
],
our objective is to minimize the energy consumption of the overall
network. In other words, we focus on the energy consumption of
all allocated and executed tasks on the devices and their energy
consumption for communication in between.
The main contributions are as follows:
•
We extend an
ILP
model (Section 4) which allocates tasks of
multiple workows optimally on a network with energy
consumption as a cost function. In contrast to previous
work [
19
], this eliminates the bias towards previously al-
located workows when placing one workow at a time.
Furthermore, we implement a constraint which limits the
allocation of tasks on nodes based on their capabilities.
•
We dene and implement a
PSO
approach for task alloca-
tion. Then, we evaluate the approach against the
ILP
. We
show that
PSO
outperforms the previously proposed heuris-
tic in [19] but with the trade-o of time consumption.
•
We implement, train, and evaluate a
DRL
model using Prox-
imal Policy Optimization (
PPO
). The trained model has a
lower optimality gap in comparison to
PSO
and the extended
version of the heuristic in [19].
In Section 2, we dene the task allocation problem in edge com-
puting networks and explain some optimization approaches in
detail. Next, we identify similarities and dierences in related work
in Section 3. In Section 4, we describe our approaches and tech-
niques in our methodology and evaluate them in Section 5. Last,
we conclude our work in Section 6.
2 BACKGROUND
First, we dene the task allocation problem used in this work. To
optimize this problem, we use Integer Linear Programming (
ILP
),
Particle Swarm Optimization (
PSO
) and Deep Reinforcement Learn-
ing (
DRL
). We introduce each method and algorithm and explain
the approaches to nd a solution for the optimization problem. We
start with a description of
ILP
problems and explain the algorithms
which allow to solve
ILP
problems. Then, we describe the
PSO
al-
gorithm. Last, we give an overview of
DRL
and Proximal Policy
Optimization (PPO).
2.1 Problem Denition
In this work, we focus on optimizing the allocation of multiple
workows on a network. Therefore, we dene all parts of this
optimization problem. We visualize the problem in Figure 1. In
the context of this work, a workow is a connected and directed
acyclic graph with nodes that we call tasks. A task can be e.g., an
application, a docker container or a micro-service. In our model,
tasks cannot be split across multiple nodes in a network, run in-
denitely and do not terminate. They require a static amount re-
sources like Central Processing Unit (
CPU
) cycles, Random Access
Memory (
RAM
), and storage, which is visualized as squares. Tasks
communicate with other tasks in the workow and transmit data
at a specic rate (transmission output) to other tasks. A task can
require a specic capability e.g., a graphical user interface or ac-
cess to sensors and actuators which further limits the allocation
possibilities in a network.
Another component in our model is the network which is a
bidirectional and connected graph. The network consists of one or
more nodes which are connected to other nodes. Each node provides
resources like
CPU
,
RAM
and storage which can be zero if no
resources are available anymore. In Figure 1, Node 4 accommodates
Task 3 and has no resources left for other tasks. Other nodes can
still provide all their resources. Each node can oer the capabilities
which can be used by tasks. These can be e.g., a temperature sensor,
a specic machine or API which is required by a task.
Our rst objective is to allocate tasks of multiple workows to
the nodes of the network. Therefore, tasks can only be placed on
nodes with enough resources to accommodate the task. One node
can accommodate multiple tasks if enough resources are available
on that node. Task can only be allocated and executed on nodes
that provide the required capabilities.
Our second objective is to allocate tasks on a network with
regards to a specic cost function. In general, this cost function
can be any
QoS
constraint. Like [
19
], we focus on minimizing the
energy consumption of the network. Thus, we dene the total
energy consumption as the sum of the utilization of nodes (if they
accommodate tasks) and the utilization of connections between the
nodes (if used to transmit data between tasks). To minimize the
energy consumption, we can place tasks on nodes that use their
resources more eciently and allocate tasks on nodes close to each
other. We show this possibility in Figure 1, where Task 4 can be
placed on either Node 2 or 3. In this case, Node 3 oers a lower
amount of energy consumption per resource.
As summary, we need to allocate tasks on a network of nodes so
that we use the least amount of energy in sum over all devices and
network connections while nding a feasible solution with regards
to resource consumption and capabilities.
2.2 Integer Linear Programming
In this work, we dene the problem of Section 2.1 as
ILP
problem.
In general, Linear Programming (
LP
), Integer Linear Programming
(
ILP
), and Mixed Integer Linear Programming (
MILP
) are methods
to formulate an optimization problem. If it is possible to formulate
the problem as an (mixed) (integer) linear program, we can apply
240
Task Allocation in Industrial Edge Networks with PSO and DRL IoT ’22, November 7–10, 2022, Del, Netherlands
Workflow 1
Task 0 Task 1
Task 2
Task 3
Task 4
...
Workflows
Network
Node 0
Resources
Energy
Node 2
Resources
Energy
Node 1
Resources
Energy
Node 3
Resources
Energy
Node 4
Resources
Energy
Node 5
Resources
Energy
Figure 1: Visualization of the task allocation problem
the respective algorithm for optimization.
max 𝑐𝑥
subject to 𝐴𝑥 ≤𝑏
𝑥≥0integral
(1)
Based on Conforti et al. [
3
], Equation 1 shows a pure integer
linear program with row vector
𝑐=(𝑐1, . .., 𝑐𝑛)
,
𝑚×𝑛
matrix
𝐴=(𝑎𝑖 𝑗 )
, column vector
𝑏=(𝑏1, . .., 𝑏𝑚)𝑇
and column vector
𝑥=(𝑥1, .. ., 𝑥𝑛)𝑇
where
𝑥
is integral if
𝑥∈Z𝑛
. The goal is to
optimize each element of 𝑥so that the result 𝑐𝑥 is maximized.
If
𝑥∈R
, the problem is called a linear program. If we use vector
𝑦
with
𝑦∈Z𝑛
and vector
𝑥
with
𝑥∈Z
as shown in Equation 2, the
problem contains a combined solution in
Z
and
R
and is therefore
called a mixed integer linear program. Solving a (mixed) (integer)
linear program returns the most optimal solution for the problem.
max 𝑐𝑥 +𝑑𝑦
subject to 𝐴𝑥 +𝐵𝑦 ≤𝑏
𝑥≥0integral
𝑦≥0
(2)
It is important to specify whether a problem can be formulated
as integer linear program or linear program because solving integer
linear programs is known as generally dicult in comparison to
linear programs [
3
]. If we remove the integer constraint and allow
𝑥∈R𝑛
, the problem is numerically easier to solve with
LP
[
3
]. This
is known as linear relaxation. However, even though linear relax-
ations facilitate solving (mixed) integer linear programs, they only
approximate the (integer-bound) solution because they may nd
solutions which are not feasible for the integer-bound problem [
3
].
The task allocation problem in this work is formulated as
ILP
problem. Therefore, we apply solvers that use the branch-and-
bound and/or the cutting plane algorithm. The branch-and-bound
algorithm solves linear programs based on the linear relaxation of a
(mixed) integer linear program. The bound part of the algorithm re-
moves infeasible solutions and solutions with a worse optimal value.
If the solution is feasible and improves the current best optimum,
the algorithm uses the branch part to explore linear programs with
dierent bounds. The cutting plane algorithm (iteratively) adds cuts
(e.g., upper bounds) to the solutions space to strengthen a linear
program. The strengthened linear program may result in an integer
solution. The cutting plane and branch-and-bound algorithm can be
Symbols Description
𝑣𝑡
𝑖Velocity of particle 𝑖at iteration 𝑡
𝑥𝑡
𝑖Position of particle 𝑖at iteration 𝑡
𝑤Decay of velocity
𝑐1,𝑐2Time varying coecient
𝑟1,𝑟2Random value ∈ [0,1]
𝑝𝑏𝑒𝑠𝑡𝑖Personal best solution for particle 𝑖
𝑔𝑏𝑒𝑠𝑡 Global best solution
Table 1: Description of symbols in the PSO equation.
combined which is known as branch-and-cut algorithm. We refer
to a more detailed explanation to Conforti et al. [3].
2.3 Particle Swarm Optimization
In contrast to nding the optimal solution with
ILP
s, we can ap-
proach the problem with the help of meta-heuristics like
PSO
. Even
though the solution may not be optimal, we can reduce the time and
energy consumption as shown in Section 5. The
PSO
algorithm orig-
inally developed by Kennedy et al. [
13
] in 1995 is a meta-heuristic
which searches a solution space for local and global optima. It is
population-based and therefore similar to the
GA
[
20
], Ant Colony
Optimization (
ACO
) algorithm [
7
] and Articial Bee Colony (
ABC
)
algorithm [
12
]. It is a meta-heuristic which does not guarantee that
it nds a feasible or optimal solution (see results in Section 5).
The
PSO
algorithm uses a number of particles to iteratively
search a solution space. Each particle
𝑖
has a position vector
𝑥𝑖
and
a velocity vector
𝑣𝑖
which are both updated for each iteration. At
the start, all particles are initialized at a random position
𝑥0
𝑖
and
with a random velocity
𝑣0
𝑖
. We show the calculation of the velocity
and the position in Equation 3 and 4 and describe the symbols in
Table 1.
𝑣𝑡+1
𝑖=𝑤∗𝑣𝑡
𝑖+𝑐1∗𝑟1∗ (𝑝𝑏𝑒𝑠𝑡𝑖−𝑥𝑡
𝑖) + 𝑐2∗𝑟2∗ (𝑔𝑏𝑒𝑠𝑡 −𝑥𝑡
𝑖)(3)
𝑥𝑡+1
𝑖=𝑥𝑡
𝑖+𝑣𝑡+1
𝑖(4)
We separate the velocity calculation in Equation 3 of particle
𝑖
at iteration step
𝑡+
1into three segments. The rst segment
𝑤∗𝑣𝑡
𝑖
introduces the velocity
𝑣𝑡
𝑖
of the previous iteration
𝑡
and multiplies
it with the decay coecient
𝑤
. Coecient
𝑤
limits the inuence
of the velocity of previous iterations. The second segment
𝑐1∗𝑟1∗
241
IoT ’22, November 7–10, 2022, Del, Netherlands Buschmann et al.
Agent Environment
Deep Neural Net work
(D NN)
Policy
Reward
Action
Selects Influences
Feeds Into Returns
Trains
Observation
Figure 2: Concept of DRL [14]
(𝑝𝑏𝑒𝑠𝑡𝑖−𝑥𝑡
𝑖)
introduces the direction towards the personal best
value of the particle
𝑝𝑏𝑒𝑠𝑡𝑖
multiplied with a random value
𝑟1
and a
time varying coecient
𝑐1
. The third segment
𝑐2∗𝑟2∗ (𝑔𝑏𝑒𝑠𝑡 −𝑥𝑡
𝑖)
introduces the direction towards the global best value of all particles
multiplied with a random value
𝑟2
and a time varying coecient
𝑐2
. The time varying coecients
𝑐1
and
𝑐2
inuence the focus on
the personal best values and on the global best values. We add the
velocity
𝑣𝑡+1
𝑖
to the previous position
𝑥𝑡
𝑖
to get the new position
𝑥𝑡+1
𝑖
. This position can then be used for the velocity in the next
iteration.
The personal and global best value is calculated by the tness
function
𝐹(𝑥)=𝑓(𝑥)
with the position vector
𝑥
as possible solution.
𝑓(𝑥)
is the objective function of the optimization problem which
returns the value for solution
𝑥
. In addition to the objective function,
we can introduce constraints to
𝐹(𝑥)
by using Deb’s approach [
6
]
which is the most common approach in the literature [
11
]. We
show this in Equation 5. We dene the constraints as inequality
constraints
𝑔𝑗(𝑥) ≥
0for
𝑗=
1
, . .., 𝑛
with
𝑛
being the maximum
number of all constraints. If the solution
𝑥
does not violate any
constraint (
𝑔𝑗(𝑥) ≥
0),
PSO
returns the objective function
𝑓(𝑥)
. If
𝑥
violates at least one constraint, we return the worst objective
value
𝑓worst
(e.g., the maximum energy consumed) plus the amount
of how much constraints where violated (𝑔𝑗(𝑥)).
𝐹(𝑥)=(𝑓(𝑥),if 𝑔𝑗(𝑥) ≥ 0,∀𝑗∈ {0, . .., 𝑛}
𝑓worst +Í𝑛
𝑗=1|𝑔𝑗(𝑥)|,otherwise (5)
2.4 Deep Reinforcement Learning
Another more recent approach to solving optimization problem
is
DRL
. We choose
DRL
because we can approach it similarly to
PSO
, we do not need labeled data as for supervised learning [
4
] and
the neural network does not have to nd a hidden structure as for
unsupervised learning [14].
DRL
can be split into agent, environment, action, observation
and reward [
14
]. Figure 2 shows the dependencies between these
entities. We initialize an environment with a state that can be
observed. This observation is used by the Deep Neural Network
(
DNN
) to return a probability distribution of actions. The policy
selects one action which is then applied to the environment. Then,
the agent can observe the environment again and use the policy to
select the next action. This is repeated if no termination condition
is met.
From the observation, we can calculate a reward value which
can be used to train the
DNN
. We can calculate a reward by using
e.g., the objective function of the
ILP
problem and/or the tness
function of the
PSO
algorithm. In general, a
DRL
agent should aim
to receive the highest reward for its action.
Selecting an action that yields the highest reward depends highly
on the policy [
14
]. We use the
PPO
in this work. PPO was proposed
by Schulman et al. [
18
] as a simpler alternative to Actor-Critic
Experience Replay (ACER) [
21
] with similar performance which
outperforms the synchronous Advantage Actor Critic (A2C) [17].
In this work, we use the implementation of the MaskablePPO,
which disallows the use of invalid actions [9].
3 RELATED WORK
Our work builds upon the previous work of Seeger et al. [
19
]. Seeger
et al. focuses on minimizing the energy consumption of a network
by optimally allocating the tasks of a workow on the edge. For
that, they extend the framework dened by Cardellini et al. [
1
] and
propose an ILP model for the optimal allocation and a second ILP
model as a heuristic which approximates the network energy con-
sumption. The heuristic reduces the complexity of the optimization
to a non-quadratic assignment problem but leads to worse results
in terms of network energy consumption. In our work, we extend
both models to allow the optimization of the allocation of multiple
workows simultaneously. This removes the bias towards already
allocated workows when allocating them one by one. In addition,
we add capabilities of nodes and tasks to avoid e.g., mapping a
sensor-reading task onto a non-sensor device.
In contrast to our ILP model, designed for an edge infrastructure,
Skarlat et al. [
20
] dene an ILP for a cloud and fog computing in-
frastructure and application deadlines. Their ILP objective function
reduces the cost of running services on the cloud by placing as
many services as possible on the fog network. In addition to the ILP
model, Skarlat et al. use and evaluate an implementation of the
GA
as a heuristic. They nd that the
GA
utilizes less fog resources than
the optimal solution, which leads to an increase in cost of about
2.5. This is similar to the ndings of You et al. [
22
] who compare
the
GA
, simulated annealing and
PSO
for task ooading in edge
computing networks. Their model focuses on the task execution
delay and computation performance and penalizes higher energy
consumption. They identify that in comparison to the other two
algorithms, PSO converges faster, nds a solution with a lower en-
ergy consumption and has a lower task execution delay especially
for a large number of nodes.
In addition to well-known heuristics, we also focus on machine
learning approaches like DRL. Gao et al. [
8
] propose a
DRL
approach
to ooad multiple workows on edge servers and user equipment.
They minimize the energy consumption and completion time of the
workows with a multi-agents deep deterministic policy gradient
algorithm. This algorithm yields the best values and terminates
as fastest in comparison to random ooading and DQN-based
ooading. Zheng et al. [
23
] implement a DQN-based task ooading
algorithm which focuses on minimizing the task failure rate by
balancing the ooading between multiple edge servers.
4 METHODOLOGY
In our methodology, we show how we use and implement the
methods described in Section 2. First, we explain the extended
ILP
problem. Then, we propose parameters for
PSO
and use parts of
242
Task Allocation in Industrial Edge Networks with PSO and DRL IoT ’22, November 7–10, 2022, Del, Netherlands
Figure 3: Biased / suboptimal workow allocation when allo-
cating one by one.
the
ILP
problem denition as tness function. Last, we describe our
DRL approach with PPO.
4.1 Integer Linear Programming
We base our
ILP
problem on the work of Seeger et al. [
19
] and extend
the
ILP
model to allow the allocation of multiple workows (set
of workows
𝑊
). By allocating workows one by one, we would
have a bias towards previously allocated workows or might not
nd an existing feasible solution in the worst case. Figure 3 shows
a simplied example of a worst-case. We start with three simple
workows with one task each in the upper rectangle and a network
of three nodes below. The required and available resources of the
tasks and nodes are shown as the number of squares. Tasks cannot
be split across multiple nodes. We assume that the node with four
available resources has the lowest energy consumption.
If we start allocating the task from left to right (one by one),
the algorithm allocates the task with three required resources to
the node with the lowest energy consumption (in our simplied
example the node with four available resources). If it does so, the al-
location of all tasks is not feasible anymore since the task with four
required resources can only be placed on the node with four avail-
able resources. However, this node would be blocked by the task
with three resources. If we solve the
ILP
problem for all workows
simultaneously, we avoid the scenario described in Figure 3.
The following equations describe our ILP model:
min 𝐸total (6)
subject to:
∀𝑤∈𝑊:∀𝑡1, 𝑡2∈𝑤:∀𝑛1,𝑛 2∈𝑁:𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛2] ≤ 𝑋[𝑡1, 𝑛1]
(7)
∀𝑤∈𝑊:∀𝑡1, 𝑡2∈𝑤:∀𝑛1,𝑛 2∈𝑁:𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛2] ≤ 𝑋[𝑡2, 𝑛2]
(8)
∀𝑤∈𝑊:∀𝑡1, 𝑡2∈𝑤:∀𝑛1,𝑛 2∈𝑁:𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛2]
≥𝑋[𝑡1, 𝑛1] + 𝑋[𝑡2, 𝑛 2] − 1(9)
∀𝑤∈𝑊:∀𝑡∈𝑤:
𝑛∈𝑁
𝑋[𝑡, 𝑛 ]=1(10)
∀𝑤∈𝑊:∀𝑡∈𝑤:∀𝑛∈𝑁:𝑋[𝑡, 𝑛 ] ≤ 𝐹[𝑡, 𝑛 ]
(11)
∀𝑛∈𝑁:
𝑤∈𝑊
𝑡∈𝑤
𝑋[𝑡, 𝑛 ] ∗ 𝑅RAM
𝑡≤𝑅RAM
𝑛(12)
∀𝑛∈𝑁:
𝑤∈𝑊
𝑡∈𝑤
𝑋[𝑡, 𝑛 ] ∗ 𝑅CPU
𝑡≤𝑅CPU
𝑛(13)
∀𝑛∈𝑁:
𝑤∈𝑊
𝑡∈𝑤
𝑋[𝑡, 𝑛 ] ∗ 𝑅Storage
𝑡≤𝑅Storage
𝑛(14)
𝑤∈𝑊
𝑡∈𝑤
𝑛∈𝑁
𝐶𝑛∗ (𝑆𝑡/𝑃𝑛) ∗ 𝑋[𝑡 , 𝑛] ≤ 𝐸device (15)
𝑤∈𝑊
𝑡1,𝑡2∈𝑤
𝑛1,𝑛2∈𝑁
𝑂𝑡1∗𝐷𝑛1,𝑛2∗𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛 2] ≤ 𝐸network
(16)
𝐸device +𝐸network ≤𝐸total (17)
Our extended
ILP
model starts with the minimization of the
total energy consumption as objective in Equation 6. We use
𝑋
and
𝑌
as binary decision variables as shown in Table 2 whether a
task is allocated on a node and whether two tasks on two nodes
communicate with each other. If task
𝑡1
is allocated to node
𝑛1
,
𝑋[𝑡1, 𝑛1]=
1otherwise 0. If both
𝑋[𝑡1, 𝑛1]
and
𝑋[𝑡2, 𝑛2]
are equal
to 1,
𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛 2]=
1
(≥
1
+
1
−
1
)
. Given only
𝑋[𝑡1, 𝑛1]=
1,
𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛 2]
could be zero or one depending on
𝑋[𝑡2, 𝑛2]
. There-
fore
𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛 2]
is always smaller or equal to both
𝑋
. We use
Equations 7 - 9 to dene
𝑌
. We use Equation 10 to allocate a task
𝑡
only once on the network. This can be changed if we need redun-
dancy of tasks as a QoS objective.
Equation 11 limits the placement of tasks to nodes which meet
their required capabilities. If node
𝑛
oers the capability that is
needed by task
𝑡
,
𝐹[𝑡, 𝑛 ]=
1or 0 otherwise. If
𝐹[𝑡, 𝑛 ]=
0, the task
cannot be allocated on node 𝑛and 𝑋[𝑡, 𝑛 ]=0.
The Equations 12 - 14 constrain the sum of the required resources
of all tasks on a node to at maximum the available resources of that
node. We dierentiate between CPU cycles, RAM, and storage as
resources for our edge infrastructure.
Equation 15 denes the energy consumption of the device de-
pending on the computation size of the task, the processing power
of the node and the energy consumption by CPU cycle. Equation 16
denes the network energy consumption depending on the task
transmission output between two tasks and the energy cost of the
path in between the nodes to which the tasks are allocated to. Equa-
tion 17 models the total energy consumption used as minimization
objective in this
ILP
. Table 2 explains the variables used in the
ILP
model in more detail.
Since our model is the extended version of the model of Seeger et
al. [
19
], we use their approach to create an
ILP
heuristic. The main
dierence in this work is the addition that allows us to allocate
multiple workows for the heuristic. Further, we added the resource
and capability constraint to the ILP heuristic.
To solve the
ILP
problem, we use the Python library
pulp
in
combination with the IBM CPLEX solver.
Next, we transfer the
ILP
model and the constraints to other
approaches.
4.2 Particle Swarm Optimization
As described in Section 2, we use Deb’s approach [
6
] to implement
the constraints of the
ILP
problem. As a result,
𝑓(𝑥)=𝐸device +
𝐸network
from Equation 15 and 17. For
𝑓worst
, we calculate the max-
imum total energy consumption for the nodes and the network as
shown in Equation 18-20. In Equation 19
𝑡𝑎𝑠𝑘𝑠
is sum of the number
of tasks over all workows. In Equation 20
𝑒𝑑𝑔𝑒𝑠
is the sum of all
243
IoT ’22, November 7–10, 2022, Del, Netherlands Buschmann et al.
Symbol Description
𝐸{device,network,total}Energy consumption of {all devices, all network links or both}
𝑊Set of one or more workows which consists of one or more tasks
𝑁Set of one or more nodes in a network
𝑋[𝑡, 𝑛 ]1 i task 𝑡is allocated on node 𝑛; 0 otherwise
𝑌[𝑡1, 𝑡2, 𝑛1, 𝑛 2]1 i communication of tasks 𝑡1,𝑡2is over the network link between nodes 𝑛1,𝑛2; 0 otherwise
𝐹[𝑡, 𝑛 ]1 i node 𝑛provides the required capability of task 𝑡; 0 otherwise
𝑅{RAM,CPU,Storage}
𝑡Required amount of resources of task 𝑡
𝑅{RAM,CPU,Storage}
𝑛Available amount of resources on node 𝑛
𝐶𝑛Energy consumption of node 𝑛per CPU cycle
𝑆𝑡Computation size of task 𝑡
𝑃𝑛Processing power of node 𝑛
𝑂𝑡Transmission rate / output of task 𝑡
𝐷𝑛1,𝑛2Energy consumption from node 𝑛1to node 𝑛2
Table 2: Denitions of the variables used for the ILP.
links between tasks over all workows.
𝑓max =𝐸max
device +𝐸max
network (18)
𝐸max
device =𝑡𝑎𝑠𝑘 𝑠 ∗max
∀𝑛𝐶𝑛∗ (max
∀𝑡𝑆𝑡/min
∀𝑛𝑃𝑛)(19)
𝐸max
network =𝑒𝑑𝑔𝑒𝑠 ∗max
∀𝑡𝑂𝑡∗max
∀𝑛1,𝑛2
𝐷𝑛1,𝑛2(20)
We use the same approach as described in Section 2 to calculate
the velocity and position of the particle. We start with 50 particles
and
𝑡𝑎𝑠𝑘𝑠
as the number of dimensions. We stop
PSO
when there
is no improvement with regards to a relative error of 0
.
000001 over
𝑡𝑎𝑠𝑘𝑠 ∗
100 of iterations. We iterate 20000 times to nd an optimal
solution. Our out-of-bounds strategy for particles is reective.
We use a version of the
PSO
algorithm which allows us to change
the decay of velocity
𝑤
and inuence on personal best values
𝑐1
and
global best values
𝑐2
. We start with the values of
𝑐1=
2
.
5,
𝑐2=
0
.
5
and
𝑤=
0
.
9. With each iteration, these values linearly converge
to
𝑐1=
0
.
5,
𝑐2=
2
.
5and
𝑤=
0
.
4which they reach in the end. As
a result, we focus on personal best values in the beginning with a
small inuence of global best values and focus more and more on
global best values with a small inuence of personal best values. In
addition, we use the decay of velocity as tool to slow the particle
down towards the end.
For the implementation, we use the Python library
pyswarm
and implement an extended version of the
GlobalBestPSO
class to
allow the linear change in 𝑤, 𝑐1, 𝑐 2.
4.3 Deep Reinforcement Learning
For our
DRL
implementation, we use the Python library
stable_baselines3
with a custom environment of the OpenAI
gym
library. We tested multiple models with dierent observation
spaces and present the best performing
DRL
model. Our custom
environment allocates tasks one by one. As a result, the action is the
number of the node on which the task should be allocated. We used
the same variables and calculations
𝑆𝑡
,
𝑃𝑛
,
𝐸device
and
𝐸network
as
dened in the
ILP
problem. The observation space is a dictionary
which consists of:
•
A list of the normalized (between zero and one) computation
size 𝑆𝑡of each task 𝑡.
•
A list of the processing power
𝑃𝑛
of each node
𝑛
. The value
is zero for invalid nodes.
•
A list of values that show the increase in network energy con-
sumption depending on the node on which the task is going
to be allocated. If the node is invalid because of constraints,
the value is two times the maximum network energy.
For the reward, we calculate the dierence in total energy con-
sumption
Δ𝐸𝑖
between the current allocation and previous allo-
cation and multiply it by -1 as shown in Equation 21 - 24. The
multiplication of the energy cost with -1 is necessary because the
DRL
agent maximizes the reward. Therefore, the agent now aims
to minimize the energy cost.
Δ𝐸0=0(21)
Δ𝐸𝑖=𝐸𝑖
total −𝐸𝑖−1
total (22)
𝐸𝑖
total =𝐸𝑖
device +𝐸𝑖
network (23)
𝑟𝑒𝑤𝑎𝑟𝑑 =−1∗Δ𝐸𝑖(24)
To avoid violating constraints, we use
MaskablePPO
in our im-
plementation to only allow numbers of valid network nodes as
actions.
5 EVALUATION
In this section, we show the results of the optimal and heuristic
approaches in comparison to each other. In addition, we compare
the run-time of each approach.
5.1 Experiment Setup
For our experiments, we use a machine running Debian
bullseye
,
an Intel(R) Xeon(R) Silver 4116
CPU
and 160 GB of
RAM
. We ex-
ecute all experiments on the same machine. To compare simple
task allocation problems with the optimal solution and the scal-
ability of the heuristic algorithms, we split our experiments into
two datasets. For the rst dataset, we use simplied versions of the
ILP, the ILP Heuristic, PSO and DRL where we omit the capability,
244
Task Allocation in Industrial Edge Networks with PSO and DRL IoT ’22, November 7–10, 2022, Del, Netherlands
1 2 3 4 5 6 7 8 9 10 111213 1415 161718 192021 2223 242526 2728 293031 3233 34 3536 373839 4041 424344 4546 47 4849 50
Example ID
0
10
20
30
40
Number
Tasks
Workflow Length
Workflow Width
Workflows
Figure 4: Metadata of each example for the rst dataset.
CPU and storage constraints. For the second dataset, we include
all constraints but omit the optimal results of the ILP due to the
computational complexity and time consumption.
We implemented a Python module which generates an example
with the parameters workow length, workow width and work-
ow count. An example consists of one or multiple workows and
a network. The network is scaled with the number of tasks to allow
a feasible solution. The parameters allow a pseudo-random work-
ow structure with some degrees of freedom e.g., in total number
of tasks. Each example is identied by a number. We increase the
length, width and workow count for each example as shown in
Figure 4 for both datasets.
We trained the
DRL
model with one million episodes for the
simplied
DRL
version (rst dataset) and with 80 million episodes
for the full version (second dataset).
We solve the
ILP
problem once for all examples of the rst dataset.
To calculate a mean value and a standard deviation, we execute the
ILP heuristic, PSO and DRL ten times on each example for both
datasets. We sort the results of the examples in the next sections
by the total number of tasks.
5.2 Comparison of Optimal and Heuristic
Algorithms
Figures 5a and 5b show the results of the rst dataset. The x-axis
shows the total number of tasks and the ID of the examples. The
y-axis shows the cost which is the total energy consumption of the
devices and the network. The ILP yields the optimal value. Since
the other approaches only approximate these optima, their results
are worse than or equal to the ILP results.
In our experiments, the ILP heuristic and DRL are deterministic.
Thus, we only include the standard deviation of
PSO
in the gures.
Figure 5 shows in the simplied version, overall,
PSO
and DRL
outperform the ILP heuristic. The best approach is
PSO
. However,
the PSO algorithm yields worse results for more than 25 tasks in
comparison to DRL. For a higher number of tasks, the best ap-
proximation is provided by the DRL. In addition,
PSO
is the only
(2, 1)
(3, 2)
(4, 3)
(4, 8)
(4, 36)
(5, 4)
(6, 5)
(6, 9)
(6, 15)
(6, 37)
(7, 6)
(8, 39)
(8, 38)
(8, 7)
(8, 10)
(8, 22)
(9, 16)
(10, 11)
(10, 29)
(12, 40)
(12, 12)
(12, 17)
(12, 23)
(12, 42)
Tasks, Example ID
0
5
10
15
20
25
Cost
Name
ILP Heuristic
PSO
DRL
ILP
(a) Number of tasks from 2 to 12.
(15, 18)
(15, 30)
(16, 14)
(16, 24)
(16, 41)
(16, 45)
(18, 19)
(18, 43)
(20, 25)
(20, 31)
(20, 48)
(21, 20)
(24, 46)
(24, 44)
(24, 21)
(24, 26)
(25, 32)
(28, 27)
(30, 33)
(30, 49)
(32, 28)
(32, 47)
(35, 34)
(40, 35)
(40, 50)
Tasks, Example ID
0
20
40
60
80
100
120
140
Cost
Name
ILP Heuristic
PSO
DRL
ILP
(b) Number of tasks from 15 to 40.
Figure 5: Energy consumption of ILP, ILP heuristic, PSO and
DRL (rst dataset).
algorithm which can terminate with an infeasible solution and does
so for example 34 with 35 tasks.
With increasing problem size, the
ILP
becomes infeasible. Due to
this, we were not able to include the optimal solution of example 27,
28, 33, 34, 35 and 47 despite running the solver for longer than two
weeks. We show the time usage over all examples in Figure 6a. The
maximum required time to solve an
ILP
problem is example 21 with
245
IoT ’22, November 7–10, 2022, Del, Netherlands Buschmann et al.
ILP Heuristic PSO DRL ILP
10 1
101
103
105
Time (s)
(a) Execution time in seconds.
ILP Heuristic PSO DRL
0
50
100
150
200
250
300
Optimality Gap (%)
(b) Optimality gap.
Figure 6: Execution time and optimality gap of algorithms
(rst dataset).
over thirteen days if we do not consider the examples terminated
after two weeks. In the worst-case scenario, PSO needs 15
.
9 s to nd
a solution while
DRL
and the
ILP
heuristic need 188
ms
and 174
ms
respectively. In general, the ILP heuristic has the lowest execution
time for the examples in the rst dataset.
We use the optimal value of the ILP to calculate the percent
error
𝛿=
100%
×
(𝑎𝑝𝑝𝑟 𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛 −𝑜𝑝𝑡 𝑖𝑚𝑎𝑙 )
𝑜𝑝𝑡 𝑖𝑚𝑎𝑙
. This is also known
as optimality gap. Figure 6b depicts the optimality gap of each
algorithm. Since PSO provides the best results for examples with
less than 25 tasks, the median is lower than the optimality gaps.
The ILP heuristic has a median optimality gap of 100
.
4 %, PSO has
a median optimality gap of 7
.
7 % and DRL has a median optimality
gap of 35
.
9 %. Furthermore, DRL yields the smallest upper bound
of 115
.
9 % (ILP heuristic: 288
.
7 %; PSO: 172
.
8 %), which indicates
that DRL may be more suitable for problems with a high number
of tasks.
5.3 Scalability Analysis
To check the performance of each algorithm for larger example sizes
and in their extended version, we compare them using the second
dataset. Since the dataset includes 120 examples and larger problem
sizes, we omit the extended version of the ILP. The overall results
are similar to the rst dataset. The extended PSO algorithm yields
the best results for less than 34 tasks. For over 34 tasks, DRL results
in the lowest energy consumption. Overall, the ILP heuristic yields
the highest energy consumption for all but two small examples with
4 and 5 tasks. The distribution of the execution time is comparable
20
20
20
22
24
24
25
28
30
30
32
34
36
40
40
42
48
51
55
56
66
70
78
88
104
Tasks
0
5
10
Failed Attempts
Name
Extended PSO
Figure 7: Failed attempts of PSO (second dataset).
with the rst dataset with a maximum execution time of 2
.
07 s for
the extended ILP Heuristic, 360
.
70 s for the extended PSO algorithm
and 0.99 s for the extended DRL model.
For examples with over 20 tasks, we notice an increase in failed
attempts for PSO as shown in Figure 7. A failed attempt is an
execution which terminates without nding a feasible solution.
We ran each example ten times. For over 60 tasks, PSO failed ten
attempts and was not able to nd any feasible solution.
6 CONCLUSION
In this work, we extend the existing
ILP
problem denition of Seeger
et al. [
19
] to minimize the energy consumption of the task allocation
problem with multiple workows. We propose a
PSO
and
DRL
approach which achieve median optimality gaps of 7
.
7 % and 35
.
9 %.
DRL achieves the lowest upper bound of the optimality gap with
115
.
9 % in comparison to 288
.
7 % for the ILP heuristic and 172
.
8 %
for PSO. Our results show that
PSO
sometimes fails to allocate tasks
past a size of 25-34 tasks. Furthermore, DRL generally scales better
for larger problem sizes. In addition, our results indicate that the
extended PSO algorithm is unreliable for the allocation of more
than 20 tasks and unusable for the allocation of more than 60 tasks.
Our measurements determine the extended DRL model as fastest
algorithm with an execution time of under 1 s in comparison to 2 s
for the extended ILP heuristic and 360 s for the extended PSO. As a
result, our results show that PSO is more suitable for smaller and
DRL for larger sized task allocation problems.
As future work, we plan to analyze the trade-o between speed
and optimality in ILP problems when adding a bias towards pre-
viously allocated tasks. In addition, we identify possible future
research in the direction of heterogeneous networks e.g., by intro-
ducing a cloud connection or the limitations and opportunities of
5G and TSN.
ACKNOWLEDGMENTS
This work has received funding from the European Union’s Horizon
2020 research and innovation programme under grant agreement
No. 957218 (Project IntellIoT).
REFERENCES
[1]
Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli.
2016. Optimal operator placement for distributed stream processing applications.
In Proceedings of the 10th ACM International Conference on Distributed and Event-
based Systems. ACM, Irvine California, 69–80. https://doi.org/10.1145/2933267.
2933312
[2]
Baotong Chen, Jiafu Wan, Antonio Celesti, Di Li, Haider Abbas, and Qin Zhang.
2018. Edge Computing in IoT-Based Manufacturing. IEEE Communications Mag-
azine 56, 9 (Sept. 2018), 103–109. https://doi.org/10.1109/MCOM.2018.1701231
Conference Name: IEEE Communications Magazine.
246
Task Allocation in Industrial Edge Networks with PSO and DRL IoT ’22, November 7–10, 2022, Del, Netherlands
[3]
Michele Conforti, Gérard Cornuéjols, and Giacomo Zambelli. 2014. Integer Pro-
gramming. Graduate Texts in Mathematics, Vol. 271. Springer International
Publishing, Cham. https://doi.org/10.1007/978-3- 319-11008-0
[4] Matthieu Cord and Sarah Jane Delany. 2008. Chapter 2 Supervised Learning.
[5]
Wenbin Dai, Hiroaki Nishi, Valeriy Vyatkin, Victor Huang, Yang Shi, and Xinping
Guan. 2019. Industrial Edge Computing: Enabling Embedded Intelligence. IEEE
Industrial Electronics Magazine 13, 4 (Dec. 2019), 48–56. https://doi.org/10.1109/
MIE.2019.2943283 Conference Name: IEEE Industrial Electronics Magazine.
[6]
Kalyanmoy Deb. 2000. An ecient constraint handling method for genetic
algorithms. Computer Methods in Applied Mechanics and Engineering 186, 2-4
(June 2000), 311–338. https://doi.org/10.1016/S0045-7825(99)00389- 8
[7]
Marco Dorigo, Mauro Birattari, and Thomas Stutzle. 2006. Ant colony opti-
mization. IEEE Computational Intelligence Magazine 1, 4 (Nov. 2006), 28–39.
https://doi.org/10.1109/MCI.2006.329691 Conference Name: IEEE Computational
Intelligence Magazine.
[8]
Yongqiang Gao and Yanping Wang. 2022. Multiple Workows Ooading Based
on Deep Reinforcement Learning in Mobile Edge Computing. In Algorithms and
Architectures for Parallel Processing (Lecture Notes in Computer Science), Yongxuan
Lai, Tian Wang, Min Jiang, Guangquan Xu, Wei Liang, and Aniello Castiglione
(Eds.). Springer International Publishing, Cham, 476–493. https://doi.org/10.
1007/978-3- 030-95384-3_30
[9]
Shengyi Huang and Santiago Ontañón. 2022. A Closer Look at Invalid Ac-
tion Masking in Policy Gradient Algorithms. The International FLAIRS Con-
ference Proceedings 35 (May 2022). https://doi.org/10.32473/airs.v35i.130584
arXiv:2006.14171 [cs, stat].
[10]
Mohammad Manzurul Islam, Sarwar Morshed, and Parijat Goswami. 2013. Cloud
Computing: A Survey on its limitations and Potential Solutions. International
Journal of Computer Science Issues 10 (July 2013), 159–163.
[11]
A. Rezaee Jordehi. 2015. A review on constraint handling strategies in particle
swarm optimisation. Neural Computing and Applications 26, 6 (Aug. 2015), 1265–
1275. https://doi.org/10.1007/s00521-014- 1808-5
[12]
Dervis Karaboga and Bahriye Akay. 2009. A comparative study of Articial
Bee Colony algorithm. Appl. Math. Comput. 214, 1 (Aug. 2009), 108–132. https:
//doi.org/10.1016/j.amc.2009.03.090
[13]
J. Kennedy and R. Eberhart. 1995. Particle swarm optimization. In Proceedings of
ICNN’95 - International Conference on Neural Networks, Vol. 4. 1942–1948 vol.4.
https://doi.org/10.1109/ICNN.1995.488968
[14]
Maxim Lapan. 2020. Deep Reinforcement Learning Hands-On - Second Edition (2nd
edition. ed.). Packt Publishing.
[15] Maren Lesche. 2022. Framework. https://intelliot.eu/framework
[16]
Chrysi K. Metallidou, Kostas E. Psannis, and Eugenia Alexandropoulou Egypti-
adou. 2020. Energy Eciency in Smart Buildings: IoT Approaches. IEEE Access 8
(2020), 63679–63699. https://doi.org/10.1109/ACCESS.2020.2984461 Conference
Name: IEEE Access.
[17]
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timo-
thy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asyn-
chronous Methods for Deep Reinforcement Learning. https://doi.org/10.48550/
arXiv.1602.01783 arXiv:1602.01783 [cs].
[18]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.
2017. Proximal Policy Optimization Algorithms. http://arxiv.org/abs/1707.06347
arXiv:1707.06347 [cs].
[19]
Jan Seeger, Arne Bröring, and Georg Carle. 2019. Optimally Self-Healing IoT
Choreographies. http://arxiv.org/abs/1907.04611 arXiv:1907.04611 [cs].
[20]
Olena Skarlat and Stefan Schulte. 2021. FogFrame: a framework for IoT application
execution in the fog. PeerJ Computer Science 7 (July 2021), e588. https://doi.org/
10.7717/peerj-cs.588
[21] Ziyu Wang, Victor Bapst, Nicolas Heess, Volodymyr Mnih, Remi Munos, Koray
Kavukcuoglu, and Nando de Freitas. 2017. Sample Ecient Actor-Critic with
Experience Replay. https://doi.org/10.48550/arXiv.1611.01224 arXiv:1611.01224
[cs].
[22]
Qian You and Bing Tang. 2021. Ecient task ooading using particle swarm op-
timization algorithm in edge computing for industrial internet of things. Journal
of Cloud Computing 10, 1 (July 2021), 41. https://doi.org/10.1186/s13677-021-
00256-4
[23]
Tao Zheng, Jian Wan, Jilin Zhang, and Congfeng Jiang. 2022. Deep Reinforcement
Learning-Based Workload Scheduling for Edge Computing. Journal of Cloud
Computing 11, 1 (Jan. 2022), 3. https://doi.org/10.1186/s13677-021- 00276-0
247