ArticlePDF Available

Deep Reinforcement Learning for Dynamic Task Scheduling in Edge-Cloud Environments

Authors:

Abstract and Figures

With The advent of the Internet of Things (IoT) and its use cases there is a necessity for improved latency which has led to edgecomputing technologies. IoT applications need a cloud environment and appropriate scheduling based on the underlying requirements of a given workload. Due to the mobility nature of IoT devices and resource constraints and resource heterogeneity, IoT application tasks need more efficient scheduling which is a challenging problem. The existing conventional and deep learning scheduling techniques have limitations such as lack of adaptability, issues with synchronous nature and inability to deal with temporal patterns in the workloads. To address these issues, we proposed a learning-based framework known as the Deep Reinforcement Learning Framework (DRLF). This is designed in such a way that it exploits Deep Reinforcement Learning (DRL) with underlying mechanisms and enhanced deep network architecture based on Recurrent Neural Network (RNN). We also proposed an algorithm named Reinforcement Learning Dynamic Scheduling (RLbDS) which exploits different hyperparameters and DRL-based decision-making for efficient scheduling. Real-time traces of edge-cloud infrastructure are used for empirical study. We implemented our framework by defining new classes for CloudSim and iFogSim simulation frameworks. Our empirical study has revealed that RLbDS out performs many existing scheduling methods.
Content may be subject to copyright.
Deep Reinforcement Learning for Dynamic Task
Scheduling in Edge-Cloud Environments
837
Original Scientic Paper
Abstract – With The advent of the Internet of Things (IoT) and its use cases there is a necessity for improved latency which has led
to edgecomputing technologies. IoT applications need a cloud environment and appropriate scheduling based on the underlying
requirements of a given workload. Due to the mobility nature of IoT devices and resource constraints and resource heterogeneity,
IoT application tasks need more ecient scheduling which is a challenging problem. The existing conventional and deep
learning scheduling techniques have limitations such as lack of adaptability, issues with synchronous nature and inability to deal
with temporal patterns in the workloads. To address these issues, we proposed a learning-based framework known as the Deep
Reinforcement Learning Framework (DRLF). This is designed in such a way that it exploits Deep Reinforcement Learning (DRL) with
underlying mechanisms and enhanced deep network architecture based on Recurrent Neural Network (RNN). We also proposed an
algorithm named Reinforcement Learning Dynamic Scheduling (RLbDS) which exploits dierent hyperparameters and DRL-based
decision-making for ecient scheduling. Real-time traces of edge-cloud infrastructure are used for empirical study. We implemented
our framework by dening new classes for CloudSim and iFogSim simulation frameworks. Our empirical study has revealed that
RLbDS out performs many existing scheduling methods.
Keywords: Task Scheduling, Edge-Cloud Environment, Recurrent Neural Network, Edge Computing, Cloud Computing,
Deep Reinforcement Learning
Volume 15, Number 10, 2024
D. Mamatha Rani*
TGSWRAFPDC(W), Bhongir
Department of Computer Science, Bhongir, Telangana- 508126, India
mamatha3004@gmail.com
Supreethi K P
Jawaharlal Nehru Technological University
Department of Computer Science and Engineering, Hyderabad, Telangana- 500085, India
supreethi.pujari@jntuh.ac.in
Bipin Bihari Jayasingh
CVR College of Engineering/IT Department
Hyderabad, Telangana- 501510, India
bipinbjayasingh@cvr.ac.in
Received: November 17, 2023; Received in revised form: June 12, 2024; Accepted: June 17, 2024
*Corresponding author
1. INTRODUCTION
Unprecedented growth of cloud-assisted use cases
has led to compelling Cloud Service Providers (CSPs) to
optimize resource usage in the presence of Service Level
Agreements (SLAs). Ubiquitous adoption of technologi-
cal innovation such as the Internet of Things (IoT ) has
led to the emergence of fog and edge computing phe-
nomena which leverage latency. In the presence of IoT
applications, the scheduling of tasks is challenging for
many reasons such as network hierarchy, heterogeneity
of resources, mobility of devices, resource-constrained
devices and stochastic behaviour of nodes [1]. Tradition-
al cloud scheduling algorithms are not sucient to har-
ness the power of the dynamic computing environment
made up of cloud, fog and edge resources. To overcome
this problem, dierent scheduling algorithms came into
existence. Reinforcement learning is one such technique
used with the machine learning approach [2]. Many
learning-based task scheduling approaches came into
existence. Their merits and demerits are summarized in
Table 1 and Table 2 in Section 2. The advantages of the
research in [1] include consideration of dynamic envi-
ronments and heterogeneous cores. However, it does
not consider adaptive QoS, edge cloud, decentralized
environment and presence of stochastic workloads.
838 International Journal of Electrical and Computer Engineering Systems
The work in [3] considered edge cloud and also het-
erogeneous cores for their task scheduling research.
However, it does not support adaptive QoS, dynamic
and decentralized environments, edge cloud and sto-
chastic workloads. The merits of [4] include the consid-
eration of dynamic environment, stochastic workloads
and heterogeneous cores. But it lacks adaptive QoS,
support for edge cloud and decentralized environ-
ments. The research in [5] and [6] has similar ndings.
Their method has provision for considering dynamic
environments, heterogeneous cores, adaptive QoS
and stochastic workloads. But is not designed for edge
cloud and decentralized environments. In [7], there
is consideration of dynamic environment, stochastic
workloads, adaptive QoS and heterogeneous cores but
does not support decentralized and edge-cloud envi-
ronments. The work in [8] supports dynamic environ-
ments and stochastic workloads. However, it has limita-
tions to deal with heterogeneous cores, adaptive QoS,
edge cloud and decentralized environments. There is a
similarity in the task scheduling methods proposed in
[9] and [10].
Their methods are dynamic supporting adaptive QoS
and stochastic workloads besides dealing with hetero-
geneous cores. However, they do not support decen-
tralized and edge cloud environments. The scheduling
research in [11] supports dynamic environments along
with stochastic workloads. They also deal with hetero-
geneous cores and adaptive QoS. However, the draw-
back is that those methods do not consider decentral-
ized and edge-cloud environments.
Concerning optimization parameters, Table 2 pro-
vides research gaps in existing solutions. Research in
[1] is based on a heuristics approach and considers en-
ergy and SLA violation parameters. Their research lacks
in the study of response time and cost of scheduling
which are crucial for task scheduling. The work in [3]
is also based on the heuristics method but considers
cost and energy parameters. It does not throw light on
response time and SLA violations. In [4], their method
is based on Gaussian process regression and considers
two parameters such as energy and SLAs. It has no sup-
port for optimization of cost and response time.
The task scheduling research in [5] and [6] is based
on the Deep Queue Learning Network (DQN) method
and supports cost and energy parameters for optimiza-
tion. However, they have no optimization of SLAs and
response time. In [7] Q-learning-based phenomenon is
used considering energy and cost dynamics for optimi-
zation. However, it lacks optimization of response time
and SLAs. Deep Neural Network (DNN) is the scheduling
method used in [8] and it has support for optimization
of cost and SLA parameters. It lacks support for energy
and response time optimizations. The work in [9] and
[10] is based on the Double DQN (DDQN) method and
it supports only energy parameters for optimization. It
lacks support for response time, cost and SLA optimiza-
tions. In [11] DRL method is used for task scheduling by
considering response time for optimization. However,
it does not support the optimization of SLAs, cost and
energy. From the literature, it is observed that there is a
need for a more comprehensive methodology in edge-
cloud environments for task scheduling. Our contribu-
tions to this paper are as follows.
1. We proposed a learning-based framework known
as the Deep Reinforcement Learning Framework
(DRLF). This is designed in such a way that it ex-
ploits Deep Reinforcement Learning (DRL) with un-
derlying mechanisms and enhanced deep network
architecture based on Recurrent Neural Network
(RNN).
2. We proposed an algorithm named Reinforcement
Learning Dynamic Scheduling (RLbDS) which ex-
ploits dierent hyperparameters and DRL-based
decision-making for ecient scheduling.
3. Our simulation study has revealed that the pro-
posed RLbDS outperforms many existing schedul-
ing methods.
The remainder of the paper is structured as follows.
Section 2 reviews prior works on existing task schedul-
ing methods for cloud and edge-cloud environments.
Section 3 presents details of the proposed system in-
cluding the system model, DRL mechanisms and the un-
derlying algorithm. Section 4 presents the results of the
empirical study while Section 5 concludes our work and
provides directions for the future scope of the research.
2. RELATED WORK
This section reviews prior works on existing task
scheduling methods for cloud and edge-cloud environ-
ments. VM plays a vital role in cloud infrastructure for
resource provisioning. Beloglazov and Bu proposed a
method for improving resource utilization in the cloud
through VM migration and consolidation. They found
that VM live migration has the potential to exploit idle
nodes in cloud data centres to optimize resource utili-
zation and reduce energy consumption. They consid-
ered the dynamic environment and presence of het-
erogeneous cores for their task scheduling study. Their
method is based on a heuristics approach. It considers
SLA negotiations and algorithms designed to support
optimizations such as energy eciency and SLAs. Their
algorithm monitors VMs and their resource usage. By
considering VM consolidation and VM live migration,
their method is aimed at reducing energy consump-
tion and adherence to SLAs. This method lacks adap-
tive QoS and support for dynamic workloads. Pham
and Huh [3] proposed a task scheduling method based
on a heuristics approach for such an environment.
It is designed to work for heterogeneous cores in fog-
cloud. They considered optimizations such as energy
eciency and cost reduction by scheduling tasks in an
edge-cloud environment. Their algorithm is based on
heuristics towards reducing cost and energy consump-
839
Volume 15, Number 10, 2024
tion. It is based on graph representation. Towards this,
their method exploits the task graph and processor
graph. Given the two graphs representing tasks and re-
sources, their method nds appropriate resource allo-
cation for given tasks. It has a provision for determining
task priority and then choosing the most suitable node
for the execution of the task.
Bui et al. [4] proposed an optimization framework for
the cloud with a predictive approach. They could pre-
dict the dynamics of resource utilization for schedul-
ing by employing a method named Gaussian process
regression. The prediction result helped them to mini-
mize the number of servers to be used to process the
requests leading to a reduction of energy usage. Their
method is, however, based on heuristics and is not suit-
able for dynamic workloads and edge-cloud environ-
ments. Cheng et al. [2] explored DRL based approach
towards task scheduling and resource provisioning
in the cloud. They further optimized the Q-learning
method to reduce the task rejection rate and improve
energy eciency. Huang et al. [5] and Mao et al. [6] fol-
lowed the DRL approach for improving task scheduling
performance in a cloud computing environment.
In [5] DRL based online ooading method is pro-
posed based on deep neural networks. It is a scalable
solution since it is a learning-based approach. In [6]
DeepRM is the framework proposed for task schedul-
ing considering ecient resource management. Both
methods are based on the DQN approach rather than
heuristics. Both methods considered optimization pa-
rameters such as energy and cost. In other words, they
are designed to reduce energy consumption and also
the cost incurred for task execution in cloud environ-
ments. They support stochastic workloads and adap-
tive QoS. However, they do not support edge-cloud
environments and do not optimize SLA and response
time parameters.
Basu et al. [7] focused on the problem of live migra-
tion of VMs based on the RL-based Q-learning process.
Their methodology improves live migration and heu-
ristics-based existing approaches. Towards this end,
their method exploits the Megh and RL-based model
to have continuous adaptation to the runtime situa-
tions towards leveraging energy eciency. Xu et al.
[8] dened a DNN approach named LASER to support
deadline-critical jobs with replication and speculative
execution. Their implementation of the framework is
designed for the Hadoop framework. Zhang et al. [9]
dened a DDQN method towards energy eciency in
edge computing. It is based on the Q-learning process
and also the dynamic voltage frequency scaling (DVFS)
method that has the potential to reduce energy usage.
As Q-learning is not able to recognize continuous sys-
tem states, they extended it to have double-deep Q-
learning. Table 1 shows provides a summary of ndings
among existing scheduling methods.
Table 1. Merits and demerits of existing scheduling methods compared with the proposed method
Reference Dynamic Stochastic Workload Decentralized Edge Cloud Adaptive QoS Heterogeneous
[1] Yes No No No No Yes
[3] No No No Yes No Yes
[4] Yes Ye s No No No Ye s
[5], [6] Yes Ye s No No Yes Yes
[7] Yes Ye s No No Yes Yes
[8] Yes Ye s No No No No
[9], [10] Yes Ye s No No Yes Yes
[11] Yes Ye s No No Yes Yes
[18] Yes No No No Yes Ye s
[19] Yes No No Yes Ye s No
[20] Yes No No No Yes Ye s
[21] Yes No No No Yes Ye s
[22] Yes No No No Yes Ye s
[23] Yes Ye s No No Yes Yes
[25] Yes No No No No No
[26] Yes No No No Yes Ye s
[27] Yes No Ye s No Yes Ye s
Proposed (RLbDS) Yes Yes Yes Ye s Yes Yes
Similar to the work of [2], Mao et al. [6] employed DDQN
for ecient resource management. This kind of work is
also found in Li et al. [10]. Both have employed the DRL
technique towards job scheduling over diversied re-
sources. However, these learning-based methods are not
able to withstand stochastic environments. Mao et al. [6]
and Rjoubet al. [11] investigated DRL based approach for
task scheduling in edge-cloud. However, they considered
only response time in their research. Its drawback is that
they could not exploit asynchronous methods for optimi-
zation of their methods towards robustness and adapt-
ability. There is a need to improve it by considering the dy-
namic optimization of parameters in the presence of sto-
chastic workloads. Skarlat et al. [12] explored IoT service
placement dynamics in fog computing resources while
Pham et al. [13] focused on cost and performance towards
proposing a novel method for task scheduling. Brogi and
Forti [14] investigated on deployment of QoS-aware IoT
840 International Journal of Electrical and Computer Engineering Systems
tasks in fog infrastructure. Task prioritization [15], DRL for
resource provisioning [4, 7], energy-ecient scheduling
using Q-learning [16] and DRL usage in 5G networks [17]
are other important contributions.
As presented in Table 1, we summarize our ndings
leading to important research gaps. The summary is
made in terms of dierent parameters such as dynamic
environment, presence of stochastic workload, decen-
tralized environment, usage of edge cloud, consider-
ation for adaptive QoS and presence of heterogeneous
cores for task scheduling. Table 1 also provides the pro-
posed method and its merits over existing methods.
Almutairi and Aldossary [18] proposed a novel meth-
od for IoT tasks to ooad in the edge-cloud ecosystem.
It is designed to serve latency-sensitive applications in
a better way. It has a fuzzy logic-based approach for
inferring knowledge towards decision-making in the
presence of resource utilization and dynamic resource
utilization. Ding et al. [19] considered an edge-cloud
environment to investigate stateful data stream appli-
cations. They proposed a method to judge state migra-
tion overhead and make partitioning decisions based
on the dynamically changing network bandwidth
availability. Murad et al.
[20] proposed an improved version of the min-min
task scheduling method to deal with scientic work-
ows in cloud computing. It could reduce the mini-
mum completion time besides optimizing resource
utilization. Bulej et al. [21] did their research on the
management of latency in the edge-cloud ecosystem
towards better performance in task scheduling in the
presence of dynamic workloads. It is designed to ex-
plore the upper bound of response time and optimize
the performance further. Almutairi and Aldossary [22]
proposed an edge-cloud system architecture to in-
vestigate modelling methodology on task ooading.
It has ooading latency models along with various
ooading schemes. Their simulations are made using
Edge CloudSim. They intend to improve it in future
with fuzzy logic.
Zhang and Shi [23] explored workow scheduling in
an edge-cloud environment. They analyzed dierent
possibilities in workow scheduling in such an eco-
system. They opined that workow applications need
novel approaches in the scheduling process. Zhao et al.
[24] focused on task scheduling along with security to
prevent intrusions in edge computing environments.
They considered low-rate intrusions and focused on
preventing them along with task scheduling. It is a Q-
learning-based approach designed to meet runtime
requirements based on the learning process. Zhang
et al. [25] proposed a time-sensitive algorithm that dy-
namically caters to the needs of deadline-aware tasks
in edge-cloud environments. It considers job size and
server capability in a given dynamic and hierarchical
scenario. It is a multi-objective task considering execu-
tion time, cost and reduction of SLAs. Lakhan et al. [26]
proposed a task scheduling approach for IoT tasks con-
sidering a hybrid mechanism consisting of task sched-
uling and task ooading. Singh and Bhushan [27] pro-
posed a method for task scheduling based on Cuckoo
Search Optimization (CSO). It has an integrated local
search strategy. From these recent works, it is found
that they targeted IoT kind of workows in edge-cloud
environments. There is Q-Learning used in one of the
papers. However, deep reinforcement learning is not
found in the latest works. Service placement in edge
resources using DRL [28], dynamic scheduling [29] and
task ooading [30] are other important contributions.
Table 2 provides a summary of ndings among exist-
ing scheduling methods in terms of optimization pa-
rameters. Magotra [41] focused on energy-ecient ap-
proaches in cloud infrastructures by developing adap-
tive solutions that could help the system towards prop-
er VM consolidation, leading to better performance.
Reference Method Optimization Parameters
SLA Violations Cost Response Time Energy
[1] Heuristics Yes No No Yes
[3] Heuristics No Ye s No Yes
[4] Gaussian Process Regression Yes No No Yes
[5], [6] DQN No Yes No Ye s
[7] Q Learning No Yes No Ye s
[8] DNN Yes Yes No No
[9], [10] DDQN No No No Yes
[11] DRL (REINFORCE) No No Yes No
[18] SJF No No Yes Yes
[19] Cloud Computing No No Yes No
[21] Cloud computing No Yes Ye s Yes
[23] CSA No Yes No No
[24] Cloud computing No Yes Ye s Yes
[25] Cloud computing No Yes No No
[27] CSP No Ye s Yes No
Table 2. Optimization parameters considered by existing scheduling methods
841
Volume 15, Number 10, 2024
As presented in Table 2, we summarized the existing
methods in terms of optimization parameters and the
approach considered in the task scheduling research.
The optimization parameters considered for the com-
parative study of existing methods are SLA violations,
cost, response time and energy.
Table 2 also provides the proposed method and its
merits over existing methods. Table 1 and Table 2 pro-
vide very useful insights reecting gaps in the research.
Our work in this paper is based on such research gaps as
those tables reveal the merits of the proposed system.
3. PROPOSED SYSTEM
We proposed a DRL-based framework for dynamic
task scheduling in an edge-cloud environment. This sec-
tion presents the framework and proposed algorithm
besides DRL mechanisms.
3.1. PROBLEM DEFINITION
Considering an edge-cloud environment, let H be a
collection of hosts denoted as {H1, H2, H3, …, Hn} where
n indicates a maximum number of hosts. A task T can be
assigned to host H. Scheduling is considered as the as-
signment of T to H. However, in terms of RL, the system
state is mapped to an action. Here action does mean al-
location of T to H. T may be an active task that could be
migrated to a new H or a newly arrived task. At the be-
ginning of an interval, denoted as SIi, The system state ini-
tially is denoted as statei which reects the hosts and their
parameters, tasks yet to be allocated in the prior interval,
denoted as (ai-1\ li) beside newly arrived tasks denoted as
ni. For each task, denoted as ai (=ai-1ni\ li), the scheduler
needs to take an action, denoted as Actioni, for the system
interval SIi in terms of either allocating it to a host or mi-
grating to a new host. A task is satisfying Let miai-1\ li is
considered a migratable task. A scheduler can be under-
stood as a model which reects a decision-making func-
tion StateiActioni. Here loss function associated with the
model for a given interval denoted as Lossi, is computed
based on task allocations. Therefore, the problem of real-
izing an optimal model is expressed in Eq. 1.
(1)
Dierent notations used in our work are presented
in Table 3.
3.2. OUR SYSTEM MODEL
We considered infrastructure or resources for sched-
uling in an edge-cloud environment. The resources are
heterogeneous. Edge resources are nearby while cloud
resources reside in a remote data centre. Therefore, each
host in the infrastructure is dierent in response time
and computational power. Edge resources are closer
and exhibit low response times but they do have limited
resources and computational power. Cloud resources
take more response time but they do have high compu-
tational power. Our system model is presented in Fig. 1.
The edge and cloud nodes are part of computing re-
sources. These resources are managed by the resource
management module. This module has several compo-
nents or sub-modules to deal with resource manage-
ment either directly or indirectly. The scheduler module
is responsible for either scheduling a task T to a host H or
migrating a task from one host to another host based on
runtime dynamics. The dynamic workload is generated
by IoT devices being used by dierent users. The work-
load contains several tasks with varied requirements.
Resource management module takes the workload and
follows DRL based (learning-based) approach in task al-
location or task migration. These decisions are based on
the ideal objective functions and the requirements asso-
ciated with tasks. The requirements may include dead-
line, bandwidth, RAM and CPU.
Fig. 1. Our system model
The workload is generated automatically to evaluate
the functionality of the proposed system. Our system
has a DRL model which inuences the scheduler mod-
ule in decision-making. There are multiple schedulers
to be used at runtime to serve dynamically generated
workloads. In the process, there is the distribution of
workload among hosts leading to faster convergence.
Each resource in edge-cloud accumulates local gradi-
ents associated with corresponding schedulers besides
synchronizing them to update models. The DRL module
follows asynchronous updates. The constraint satisfac-
tion module takes suggestions as input from DRL and
nds whether it is valid. Here valid does mean a task is in
migration or the host's capacity is optimally being used.
3.3. WORKLOAD GENERATION
We generate workload programmatically to evaluate
the proposed system. Since IoT devices and user's de-
mands are dynamic, there is a change in the bandwidth
and computational requirements of tasks. The whole
execution time in our system is divided into several
scheduling intervals. Each interval is assumed to have
the same duration. SIi denotes the ith scheduling inter-
val. This interval has a start time and end time denoted
as ti and ti+1 respectively. Each interval has active tasks
associated with it. They are the tasks being executed
and denoted as ai. The tasks that have been completed
at the beginning of the interval are denoted as li while
842
International Journal of Electrical and Computer Engineering Systems
newly arrived tasks that are dynamically generated by
the workload generator are denoted as ni.
3.4. OUR LEARNINGBASED APPROACH FOR
SCHEDULING
We proposed a framework known as the Deep Re-
inforcement Learning Framework (DRLF), as shown in
Fig. 2, which exploits a learning-based approach using
the DRL model for dynamic task scheduling in an edge-
cloud environment. The framework supports several
scheduling intervals. The framework has a workload
generator which generates tasks (ni) and gives them
to the scheduling and migration module. The tasks
given to the scheduler are in turn given to the resource
monitoring module which schedules new tasks and
migrates existing tasks if required to ensure optimal
resource utilization, load balancing and latency in task
completion. The scheduler activity changes the state of
the edge-cloud environment.
Fig. 2. Proposed Deep Reinforcement Learning
Framework (DRLF) for task scheduling in edge-cloud
environment
Every time Statei is updated by the resource monitor-
ing module it is given to the DRL model. The state infor-
mation consists of hosts' feature vectors, new tasks ni and
the rest of the tasks associated with the previous interval
and denoted by (ai-1\li). The resource monitoring module
also gives Lossi data to the DRL model. The DRL model
suggests an action, denoted as Actioni-1
PG, based on the
state information to the constraint satisfaction module
and updates parameters as expressed in Eq. 2. This mod-
ule then determines Penalityi to the DRL model.
(2)
This process continues iteratively. Once the con-
straint is satised, the constraint satisfaction module
gives the suggested action (Action)by the DRL module
to the resource management module. It then computes
Penalityi+1 about SIi+1 the next scheduling interval.
Table 3. Notations used in our work
Notation Description
aiIndicates a set of active tasks linked to SIi
HiIndicates ith host in a given set of hosts
liIndicates the initial set of tasks of SIi
miIndicates a decision for task migration
niIt indicates a task allocation decision
Actioni
PG Scheduling actions at the beginning of SIi
Lossi
PG Loss function at the beginning of SIi
SIiDenotes ith scheduling interval
Ti
SIt indicates ith in a given set of tasks
{T} Indicates the host to which task T has been assigned
AEC Average Energy Consumption
AMT Average Migration Time
ART Average Response Time
Hosts Indicates a collection of hosts in the edge-cloud
environment
NIndicates the maximum number of hosts
TDenotes a task to be executed
Based on the action received from the constraint sat-
isfaction module, the resource management module
either allocates a new task to a specic host or migrates
tasks, denoted as (ai-1\li), of the preceding interval. This
will result in an update from ai-1 to ai. Then the tasks
associated with ai are execute for SIi and the cycle con-
tinues for SIi+1.
3.5. DEEP LEARNING ARCHITECTURE
The DRL model is built based on an enhanced Re-
current Neural Network (RNN) architecture. It has the
functionality to achieve reinforcement learning. In the
process, it approximates Statei towards Actioni
PG which
is an action bestowed from the DRL model to the con-
straint satisfaction module for a given scheduling in-
terval. The enhanced RNN can ascertain temporal rela-
tionships between input space and output space. This
deep learning architecture is shown in Fig. 3. After each
interval, cumulative loss and policy are predicted by a
single network
The network has two fully connected layers, denot-
ed as fc1 and fc2, congured. These are followed by
three recurrent layers, denoted as r1, r2 and r3, with
skip connections. The given 2D input is attened and
sent to dense layers. The output of r3 is given to two
fully connected layers denoted as fc3 and fc4. The fc4
outputs a 2D vector of 100x100. It does mean that the
model can deal with 100 tasks allocated to 100 hosts
in cloud infrastructure. Eventually, a softmax function
is employed to the second dimension to have values
[0,1] and the resultant value in a row becomes 1. For in-
terpretation Ojk, denoting a probability map, indicates
that there is a probability of a task Tj
ai being assigned
to Hk. At the fc4, a cumulative loss function Lossi+1
PG is
computed. The layers in the network are made up of a
Gated Recurrent Unit that have the capacity to model
the temporal dimension of a given task and also the
843
Volume 15, Number 10, 2024
characteristics of the host comprising of bandwidth,
RAM and CPU. The Gated Recurrent Unit (GRU) layers
tend to have increased network parameters leading to
complexity. This problem is addressed by exploiting
skip connections towards gradient propagation faster.
Fig. 3. Architecture of an RNN variant used to
realize the DRL model
This model takes Statei as input which is represented
in the form of a 2D vector. This vector contains a con-
tinuous element FVi
Hosta, and another continuous ele-
ment FVi
ni and FVi
ai-1 \li has categorical host indices.
Therefore, pre-processing is required to transform host
indices into one hot vector with a maximum size of n.
Then there is a need for the concatenation of all feature
vectors. Afterwards, each element in the resultant vec-
tor is subjected to normalization based on a range of
values [0, 1]. Each element has a feature denoted as fe
while minfe and maxfe denote their minimum and maxi-
mum values respectively. These values are computed
relying on the dataset with the help of two heuristics
namely local regression and maximum migration time.
Afterwards, standardization is carried out feature-wise
using the expression in Eq. 3.
(3)
Once pre-processing of the given input is carried out,
it is fed to the network (Fig. 3) where it rst attens the
pre-processed input before sending it through dense
layers. The output of these layers is transformed into
Actioni
PG. We employed a backpropagation algorithm
to ascertain the biases and weights of the network. The
learning rate is kept adaptive from 10 to 2 and later
on, 1/10th value based on reward change associated
with the preceding 10 iterations is not greater than 0.1.
Automatic dierentiation is exploited to modify the
parameters of the network using Lossi
PG as a reward.
Gradients of local networks are accumulated across the
edge nodes periodically in an asynchronous fashion
towards the update of global network parameters. To-
wards this end, a gradient accumulation rule expressed
in Eq. 4 is followed.
(4)
Where local and global network parameters are
denoted as θ' and θ respectively, it has a log term to
indicate a change direction in parameters and the
(Lossi
PG+CLossi+1
pred) term denotes cumulative loss
predicted in a given episode that begins with State s.
Mean Square Error (MSE) is a gradient associated with
the cumulative loss predicted. Finally, there is the trans-
formation of output from Actioni
PG to Actioni by the
constraint satisfaction module and the same is given to
the resource management module.
3.6. Algorithm Design
We proposed an algorithm to realize the optimal
scheduling of given tasks in the edge-cloud ecosystem.
It is presented in Algorithm 1.
Algorithm: Reinforcement Learning based Dynamic
Scheduling (RLbDS)
Inputs:
Size of batch B
Maximum intervals for scheduling N
1. Begin
2. For each interval n in N
3. IF n%B==0 and n>1 Then
4. Compute loss function
5. Lossi
PG=Lossi+Penalityi
6. Use Lossi
PG in the network (Fig. 3) for
backpropagation
7. End If
8. StateiPreProcess(Statei)
9. Feed Statei to the network (Fig. 3)
10. pMapOutput of RL model (network as in Fig. 3)
11. (Action, Penalityi+1)ConSatMod(map)
12. Resource monitoring module takes action
13. DRL model takes Penalityi+1
14. ResourceMonitoring(Actioni) migrates active task
15. Execution of all tasks in interval n in edge-cloud
16. End For
17. End
Algorithm 1. Reinforcement Learning based
Dynamic Scheduling (RLbDS)
The algorithm takes the size of batch B and maxi-
mum intervals for scheduling N and performs optimal
scheduling of given tasks of every interval in edge-
cloud resources. The algorithm exploits the enhanced
RNN network (Fig. 3) to update the model from time to
time towards making DRL-based decisions for schedul-
ing. At each interval of scheduling, there is an iterative
process for taking care of pre-processing and feeding
the state to the DRL model. Based on the action sug-
gested by DRL, the constraint satisfaction module
species a penalty when there is an ideal scheduling
decision, that is notied to resource monitoring which
schedules new tasks and also performance migration
of active tasks based on the decisions rendered.
844
International Journal of Electrical and Computer Engineering Systems
(5)
Where the power function of host h is denoted by
ph(t) linked to time and its maximum possible power is
denoted as ph
max.
Average response time is another metric dened to
be used for interval SIi. ART for all tasks is normalized by
maximum response time. ART is computed as in Eq. 6.
(6)
The average migration time metric is dened for a
given SIi. It reects all tasks’ average migration time in
the interval normalized by maximum migration time.
AMT is computed as in Eq. 7.
(7)
Cost (C) is yet another metric dened for SIi. It indi-
cates the total incurred cost in the interval and is com-
puted as in Eq. 8.
(8)
Average SLA violation is another metric for SIi. It re-
ects SLA violation dynamics as expressed in Eq. 9.
(9)
To minimize the resultant value for all the aforemen-
tioned metrics, as used in [16] and [33], the Lossi metric
is dened as expressed in Eq. 10.
(10)
such that α, β, γ, δ, ≥ 0 α + β + γ + δ + = 1.
Dierent users can have varied QoS needs and hyper-
parameters (α, β, γ, δ, ) need to be set with dierent
values. As discussed in [33], [34] and [35] it is important
to optimize energy consumption in cloud infrastructure.
Therefore, it is essential to optimize loss. Even when
other metrics are compromised, it is possible to opti-
mize loss. In such a case, the loss can have α = 1 while
(11)
As specied in the works such as [37] and [38], the
penalty is to be included in neural network modes.
With the penalty, the model can update parameters
towards minimizing Lossi and ensure constrained sat-
isfaction. Therefore, for neural network loss function is
dened as in Eq. 11.
4. RESULTS AND DISCUSSION
This section presents our simulation environment,
the dataset used and the results of experiments.
4.1. SIMULATION SETUP
We built a simulation application using Java lan-
guage. The IDE used for development is the IntelliJ Idea
2022 version. CloudSim [39] and iFogSim [40] libraries
are used to have a simulation environment. Scheduling
intervals are considered equal to be compatible with
other existing works [4, 7, 41]. Cloudlets or tasks are
generated programmatically from the Bitbrain dataset
collected from [42].
The two simulation tools such as iFogSim and Cloud-
Sim are extended with required classes to facilitate the
usage of cost, response time and power parameters as-
sociated with edge nodes. New modules are created to
incorporate simulation of IoT devices with mobility with
delayed task execution, variations in bandwidth and
communication with deep learning model. Additional
classes are dened to have constraint satisfaction mod-
ules and also take care of input formats, output formats
and pre-processing. Based on the provision in CloudSim,
a loss function is implemented. The dataset collected
from [43] has traces of real workload run on Bitbrain
infrastructure. This dataset contains logs of workloads
of more than 1000 VMs associated with host machines.
The workload information contains time-stamp, RAM
usage, CPU usage, CPU cores requested, disk, network
and bandwidth details. This dataset is available at [44]
to reproduce our experiments. The dataset is divided
into 75% and 25% VM workloads for training and testing
respectively. Training deep learning model is done with
the former while the latter is used to test the network
and analyse results.
4.2. ANALYSIS OF RESULTS
We evaluated the performance of the proposed algo-
rithm named RLbDS by comparing it with state-of-the-art
methods such as Local Regression and Minimum Migra-
tion Time (LR-MMT) [41], Median Absolute Deviation and
Maximum Correlation Policy (MAD-MC) [41], DDQN [44]
and REINFORCE [9]. LL-MMT works for dynamic workloads
3.7. LOSS FUNCTION COMPUTATION
In the proposed learning model we want to opti-
mize, in each interval, with minimal Lossi. The model
is also designed to adapt to the state that dynamically
changes while mapping Statei to Actioni. Towards
this end, Lossi is a metric dened to update model
parameters. Besides dierent metrics that result in
normalized value 0 or 1 are dened. Average energy
consumption is a metric dened as the edge cloud re-
sources have dierent sources of energy as discussed
in [32]. The consumed energy by host h Hosts is mul-
tiplied by a factor αh [0, 1] that is associated edge-
cloud deployment strategy. The normalized AEC is
computed as in Eq. 5.
the other metrics can have 0. As discussed in [36] trac
management and healthcare monitoring are sensitive to
response time. In such cases, loss can have β = 1 while
other measures can have 0. In the same fashion, setting
hyper-parameters is application-specic.
845
Volume 15, Number 10, 2024
considering minimum migration time and local regres-
sion. It has heuristics to have task selection and overhead
detection. MAD-MC is also a dynamic scheduler which
is based on maximum correlation and median absolute
deviation heuristics. DDQN is a deep learning-based ap-
proach that exploits RL to schedule tasks. DRL method is
also based on RL which is based on policy gradient. The
results reveal the sensitivity dynamics hyperparameters,
such as (α, β, γ, δ, ε), of the proposed RLbDS about model
learning and its impact on dierent performance metrics.
Model training is given with 10 days of simulations while
testing is carried out with 1-day simulation time.
4.2.1. Impact of Hyperparameters on RLbDS
The performance of the proposed algorithm named
RLbDS is analysed with loss function associated with
many hyperparameters such as (α, β, γ, δ, ε). Experi-
ments are made with value 1 set to each of the hyper-
parameters. The rationale behind this is that when the
value is set to 1, it could provide optimal performance.
Table 4. Performance of RLbDS with dierent hyper parameters
Model Parameters Total Energy
(Watts)
Time
(milliseconds)
Fraction of SLA
Violations Total Cost (USD) Time (seconds) Number of
completed tasks
α=1 1.37 8.5 0.17 6305.5 4.45 815
β=1 1.43 8.18 0.17 6306.5 4.3 830
γ=1 1.51 8.8 0.148 6307.5 3.65 845
δ=1 1.38 8.78 0.178 6304.5 4.15 810
ε=1 1.44 8.22 0.134 6307.8 3.75 850
As presented in Table 5, the performance of RLbDS is provided in terms of the number of performance metrics.
Table 5. Performance of RLbDS compared against existing algorithms
Models Total Energy
(Watts) Time (milliseconds) Fraction of SLA
Violations
Total Cost (US
Dollar) Time (seconds) Number of
completed tasks
LR-MMT 0.959 8.58 0.06 6325 4.5 700
MAD-MC 0.95 8.4 0.13 6325 4.3 800
DDQN 0.85 8.8 0.07 6325 4 850
REINFORCE 0.82 8.35 0.06 6300 3.8 850
RLbDS 0.73 7.7 0.04 6000 3.3 1000
Loss function with dierent hyperparameters has its
inuence on the performance of the RLbDS algorithm
as presented in Fig. 4. The network learning process
diers with changes in hyperparameters. Energy con-
sumption diered when the loss function used dier-
ent hyperparameters. With α=1 RLbDS consumed 1.37
watts, with β=1 it needed 1.43 watts, with γ=1 the algo-
rithm consumed 1.51 watts, with δ=1 it required 1.38
watts and with ε=1 RLbDS consumed 1.44 watts. The
least energy is consumed when α=1 (all energy con-
sumption values are given in 1*108 format). The aver-
age response time of the algorithm RLbDS is inuenced
by each hyperparameter. With α=1 RLbDS required 8.5
milliseconds, with β=1 it needed 8.18 milliseconds,
with γ=1 the algorithm needed 8.8 milliseconds, with
δ=1 it required 8.78 milliseconds and with ε=1 RLbDS
required 8.22 milliseconds. The least response time is
recorded when β=1.
SLA violations are also studied with these hyperpa-
rameters. It is observed that they inuence a fraction of
SLA violations. With α=1 the fraction of SLA violations
caused by RLbDS is 0.17, with β=1 also it is 0.17, with
γ=1 the algorithm showing 0.148, with δ=1 it is 0.178,
and with ε=1 RLbDS caused by 0.134. The last fraction
of SLA violations is recorded when ε=1. The total cost is
also analysed in terms of USD (as per the pricing calcu-
lator of Microsoft Azure [45]).
It was observed earlier that hyperparameters have
an impact on energy consumption. Since energy con-
sumption attracts the cost of execution in the cloud,
obviously these parameters have an impact on the cost
incurred. With α=1 the total cost exhibited by RLbDS is
6305.5, with β=1 it is 6306.5, γ=1 the algorithm showed
6307.5, with δ=1 it is 6304.5, and with ε=1 RLbDS
caused 6307.8. The least cost is recorded when δ=1.
Average task completion time is also analysed with
dierent hyperparameters. With α=1 the average task
completion time exhibited by RLbDS is 4.45 seconds,
with β=1 it is 4.3, with γ=1 the algorithm showed 3.65,
with δ=1 it is 4.15, and with ε=1 RLbDS caused 3.75. The
least average task completion time is recorded when
γ=1 (all average task completion values are given in
1*106 format). The total number of tasks completed
with scheduling done by RLbDS is also inuenced by
hyperparameters. With α=1 the number of completed
tasks achieved by RLbDS is 815, β=1 it is 830, γ=1 the
algorithm showed 845, with δ=1 it is 810, and with ε=1
RLbDS showed 850 tasks to be completed. The least
number of completed tasks is recorded when δ=1.
Fig. 4. Performance dynamics of proposed RLbDS algorithm with dierent model parameters associated
with loss function
4.2.2. Performance Comparison
with State of the Art
Our algorithm RLbDS is compared against several
existing algorithms as presented in Fig. 5. Total en-
ergy consumption values are provided in 1*108 watts
format. LR-MMT algorithm consumed 0.959, MAD-MC
0.95, DDQN 0.85, REINFORCE 0.82 and the proposed
RLbDS consumed 0.73. The energy consumption of
RLbDS is found to be the least among the scheduling
algorithms. Average response time is another met-
ric used for comparison. LR-MMT algorithm exhibited
an average response time of 8.58 milliseconds, MAD-
MC 8.4, DDQN 8.8, REINFORCE 8.35 and the proposed
RLbDS required 7.7 milliseconds. The average response
time of RLbDS is found to be the least among the sched-
uling algorithms. SLA violations are another important
metric used for comparison. LR-MMT algorithm exhib-
ited a fraction of SLA violations as 0.06, MAD-MC 0.13,
DDQN 0.07, REINFORCE 0.06 and the proposed RLbDS
exhibited 0.04. The fraction of SLA violations of RLbDS
is found least among the scheduling algorithms.
Algorithm compared with the state-of-the-art
Total cost in terms of USD is another metric used
for comparison. This metric is inuenced by energy
consumption. LR-MMT algorithm needs 6325 USD,
MAD-MC 6325, DDQN 6325, REINFORCE 6300 and the
proposed RLbDS needed 6000 USD. The total cost of
RLbDS is found least among the scheduling algorithms.
Concerning average task completion time, the LR-MMT
algorithm needs 4.5 seconds, MAD-MC 4.3, DDQN 4,
REINFORCE 3.8 and the proposed RLbDS requires 3.3
seconds. The average task completion time of RLbDS
is found to be the least among the scheduling algo-
rithms (average task completion time is given in 1*106
seconds format). The number of completed tasks is an-
other observation made in our empirical study.
846
International Journal of Electrical and Computer Engineering Systems
Fig. 5. Performance of proposed RLbDS algorithm compared with the state of the art
LR-MMT completed 700 tasks, MAD-MC 800, DDQN
850, REINFORCE 850 and the proposed completed 1000
tasks. The average task completion time of RLbDS is
found to be the least among the scheduling algorithms.
4.2.3. Performance with Number of
Recurrent Layers
Considering optimal values for hyperparameters sched-
uling overhead and loss dynamics against the number of
recurrent layers are analysed. Overhead is computed as
the ratio between the total duration of execution and the
time taken for scheduling. Empirical study has revealed
that the number of recurrent layers in the proposed archi-
tecture (Fig. 3) inuences the loss and overhead.
As presented in Table 6, loss value and scheduling over-
head against several recurrent layers are observed. Loss
value and scheduling overhead are analysed against sev-
eral recurrent layers as presented in Fig. 6.
Table 6. Performance against the number of
recurrent layers
Number of
recurrent layers
Performance
Loss value Scheduling overhead (%)
0 3.69 0.009
1 3.4 0.010
2 2.9 0.010
3 2.6 0.010
4 2.5 0.019
5 2.4 0.029
847
Volume 15, Number 10, 2024
Fig. 6. Performance analysis with the number of
recurrent layers
Several layers inuence the loss value. Loss value de-
creases (performance increases) as the number of lay-
ers is increased. However, the scheduling overhead is
increased with the number of recurrent layers.
4.2.4. Scalability Analysis
The scalability of the proposed algorithm is anal-
ysed in terms of speedup and eciency. The analysis
is made against the number of hosts. As presented in
Table 7, the performance of the proposed algorithm in
terms of its scalability is provided.
Table 7. Scalability analysis
Number of
recurrent layers
Performance
Speed-up Eciency
1 1 1
5 5 0.8
10 9 0.785
15 13 0.775
20 17 0.765
25 19 0.725
30 21 0.7
35 23 0.650
40 25 0.630
45 26 0.570
50 27 0.525
Fig.7. Scalability analysis in terms of speedup and
eciency
There is a trade-o observed between scalability and
eciency as presented in Figure 7. When the number
of hosts is increased, there is a gradual decrease in ef-
ciency while there is a gradual increase in speedup.
From the experimental results, it is observed that the
proposed RLbDS is found to be dynamic and can adapt
to runtime situations as it is a learning-based approach.
Its asynchronous approach helps it in faster conver-
gence. In the presence of dynamic workloads and device
characteristics, RLbDS adapts to changes with ease.
5. CONCLUSION AND FUTURE WORK
We proposed a learning-based framework known as
the Deep Reinforcement Learning Framework (DRLF).
848
International Journal of Electrical and Computer Engineering Systems
This is designed in such a way that it exploits Deep
Reinforcement Learning (DRL) with underlying mecha-
nisms and enhanced deep network architecture based
on Recurrent Neural Network (RNN). We also proposed
an algorithm named Reinforcement Learning Dynamic
Scheduling (RLbDS) which exploits dierent hyperpa-
rameters and DRL-based decision-making for ecient
scheduling. Real-time traces of edge-cloud infrastructure
are used for empirical study. We implemented our frame-
work by dening new classes for CloudSim and iFogSim
simulation frameworks. We evaluated the performance of
the proposed algorithm named RLbDS by comparing it
with state-of-the-art methods such as LR-MMT, MAD-MC,
DDQN and REINFORCE. The results reveal the sensitivity
dynamics hyperparameters, such as (α, β, γ, δ, ε), of the
proposed RLbDS about model learning and its impact on
dierent performance metrics. Our empirical study has re-
vealed that RLbDS outperforms many existing scheduling
methods. In future, we intend to improve our framework
for container scheduling and load balancing.
6. REFERENCES
[1] A. Beloglazov, R. Buyya, “Optimal online determin-
istic algorithms and adaptive heuristics for energy
and performance ecient dynamic consolidation
of virtual machines in cloud data centres”, Concur-
rency and Computation: Practice and Experience,
Vol. 24, No. 13, 2012, pp. 1397–1420.
[2] M. Cheng, J. Li, S. Nazarian, “DRL-cloud: Deep rein-
forcement learning-based resource provisioning
and task scheduling for cloud service providers”,
Proceedings of the 23rd Asia and South Pacic De-
sign Automation Conference, Jeju, Korea, 22-25
January 2018, pp. 129-134.
[3] X.-Q. Pham, E.-N. Huh, “Towards task scheduling in
a cloud-fog computing system”, Proceedings of the
18th Asia-Pacic Network Operations and Manage-
ment Symposium, Kanazawa, Japan, 5-7 October
2016, pp. 1–4.
[4] D.-M. Bui, Y. Yoon, E.-N. Huh, S. Jun, S. Lee, "Energy
eciency for a cloud computing system based on
predictive optimization", Journal of Parallel and Dis-
tributed Computing, Vol. 102, 2017, pp. 103-114.
[5] L. Huang, S. Bi, Y. J. Zhang, “Deep reinforcement
learning for online computation ooading in wire-
less powered mobile-edge computing networks”,
IEEE Transactions on Mobile Computing, Vol. 19, No.
11, 2020, pp. 2581-2593.
[6] H. Mao, M. Alizadeh, I. Menache, S. Kandula, “Re-
source management with deep reinforcement
learning”, Proceedings of the 15th ACM Workshop
on Hot Topics in Networks, Atlanta, GA, USA, 9-10
November 2016, pp. 50-56.
[7] D. Basu, X. Wang, Y. Hong, H. Chen, S. Bressan, “Learn-
as-you-go with Megh: Ecient live migration of vir-
tual machines”, IEEE Transactions on Parallel and Dis-
tributed Systems, Vol. 30, No. 8, 2019, pp. 1786-1801.
[8] M. Xu, S. Alamro, T. Lan, S. Subramaniam, "Laser: A
deep learning approach for speculative execution
and replication of deadline-critical jobs in the cloud”,
Proceedings of the 26th International Conference on
Computer Communication and Networks, Vancou-
ver, BC, Canada, 31 July - 3 August 2017, pp. 1-8.
[9] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, S. U. Khan, P.
Li, “A double deep Q-learning model for energy-
ecient edge scheduling”, IEEE Transactions on Ser-
vices Computing, Vol. 12, No. 5, 2019, pp. 739-749.
[10] F. Li, B. Hu, “Deepjs: Job scheduling based on deep
reinforcement learning in the cloud data centre,
Proceedings of the 4th International Conference on
Big Data and Computing, Guangzhou, China, 10-12
May 2019, pp. 48-53.
[11] G. Rjoub, J. Bentahar, O. A. Wahab, A. S. Bataineh,
“Deep and reinforcement learning for automated
task scheduling in large-scale cloud computing sys-
tems”, Concurrency and Computation: Practice and
Experience, Vol. 33, No. 23, 2020, pp.1-14.
[12] O. Skarlat, M. Nardelli, S. Schulte, M. Borkowski, P.
Leitner, “Optimized IoT service placement in the
fog”, Service Oriented Computing and Applications,
Vol. 11, No. 4, 2017, pp. 427-443.
[13] X.-Q. Pham, N. D. Man, N. D. T. Tri, N. Q. Thai, E.-N.
Huh, “A cost-and performance-eective approach
for task scheduling based on collaboration between
cloud and fog computing”, International Journal of
Distributed Sensor Networks, Vol. 13, No. 11, 2017,
pp. 1-16.
[14] A. Brogi, S. Forti, “QoS-aware deployment of IoT ap-
plications through the fog”, IEEE Internet of Things
Journal, Vol. 4, No. 5, 2017, pp. 1185-1192.
[15] T. Choudhari, M. Moh, T.-S. Moh, “Prioritized task
scheduling in fog computing”, Proceedings of the
ACMSE Conference, New York, NY, USA, March 2018,
pp. 22:1-22:8.
[16] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, P. Li, “Energy-
ecient scheduling for real-time systems based on
deep learning model”, IEEE Transactions on Sustain-
able Computing, Vol. 4, No. 1, 2017, pp. 132-141.
[17] Z. Xiong, Y. Zhang, D. Niyato, R. Deng, P. Wang, L.-C.
Wang, "Deep reinforcement learning for mobile 5g
and beyond Fundamentals, applications, and chal-
lenges”, IEEE Vehicular Technology Magazine, Vol.
14, No. 2, 2019, pp. 44-52.
[18] J. Almutairi, M. Aldossary, “A novel approach for IoT
tasks ooading in edge-cloud environments. Jour-
nal of Cloud Computing”, Journal of Cloud Comput-
ing, Vol. 10, 2021, p. 28.
[19] S. Ding, L. Yang, J. Cao, W. Cai, M. Tan, Z. Wang, “Parti-
tioning Stateful Data Stream Applications in Dynamic
Edge Cloud Environments”, IEEE Transactions on Ser-
vices Computing, Vol. 15, No. 4, 2021, pp. 2368-2381.
[20] S. S. Murad, R. Badeel, N. S. A. Alsandi, Ra, “Opti-
mized Min-Min Task Scheduling Algorithm For Sci-
entic Workows In A Cloud Environment”, Journal
of Theoretical and Applied Information Technology,
Vol. 100, No. 2, 2022, pp. 480-506.
[21] L. Bulej et al. “Managing latency in edge cloud envi-
ronment”, Journal of Systems and Software, Vol. 172,
2021, pp. 1-15.
[22] J. Almutairi, M. Aldossary, “Investigating and Model-
ling of Task Ooading Latency in Edge-Cloud Envi-
ronment. Computers”, Materials & Continua, Vol. 68,
No. 3, 2021, pp. 1-18.
[23] R. Zhang, W. Shi, “Research on Workow Task Sched-
uling Strategy in Edge Computer Environment”, Jour-
nal of Physics: Conference Series, Vol. 1744, 2021, pp.
1-6.
[24] X. Zhao, G. Huang, L. Gao, M. Li, Q. Gao, “Low load
DIDS task scheduling based on Q-learning in an
edge computing environment”, Journal of Network
and Computer Applications, Vol. 188, 2021, pp. 1-12.
[25] Y. Zhang, B. Tang, J. Luo, J. Zhang, “Deadline-Aware
Dynamic Task Scheduling in Edge-Cloud Collabora-
tive Computing”, Electronics, Vol. 11, 2022, pp. 1-24.
[26] A. Lakhan et al. “Delay Optimal Schemes for Internet
of Things Applications in Heterogeneous Edge Cloud
Computing Networks”, Sensors, Vol. 22, pp. 1-30.
[27] M. Singh, S. Bhushan, “CS Optimized Task Sched-
uling for Cloud Data Management”, International
Journal of Engineering Trends and Technology, Vol.
70, No. 6, 2022, pp. 114-121.
849
Volume 15, Number 10, 2024
[28] Y. Hao, M. Chen, H. Gharavi, Y. Zhang, K. Hwang,
“Deep Reinforcement Learning for Edge Service
Placement in Softwarized Industrial Cyber-Physical
System”, IEEE Transactions on Industrial Informatics,
Vol. 17, No. 8, 2021, pp. 5552-5561.
[29] S. Tuli et al. “Dynamic Scheduling for Stochastic
Edge-Cloud Computing Environments using A3C
learning and Residual Recurrent Neural Networks”,
IEEE Transactions on Mobile Computing, Vol. 21, No.
3, 2022, pp. 1-15.
[30] Q. Zhang, L. Gui, S. Zhu, X. Lang, “Task Ooading
and Resource Scheduling in Hybrid Edge-Cloud
Networks”, IEEE Access, Vol. 9, 2021, pp. 940-954.
[31] L. Roselli, C. Mariotti, P. Mezzanotte, F. Alimenti, G.
Orecchini, M. Virili, N. Carvalho, "Review of the pres-
ent technologies concurrently contributing to the
implementation of the Internet of things (IoT) para-
digm: RFID, green electronics, WPT and energy har-
vesting”, Proceedings of the Topical Conference on
Wireless Sensors and Sensor Networks, San Diego,
CA, USA, 25-28 January 2015, pp. 1-3.
[32] S. Tuli, N. Basumatary, S. S. Gill, M. Kahani, R. C. Arya,
G. S. Wander, R. Buyya, “Healthfog: An ensemble
deep learning based smart healthcare system for
automatic diagnosis of heart diseases in integrated
IoT and fog computing environments”, Future Gen-
eration Computer Systems, Vol. 104, 2020, pp. 187-
200.
[33] S. Sarkar, S. Misra, “Theoretical modelling of fog
computing: a green computing paradigm to sup-
port IoT applications”, IET Networks, Vol. 5, No. 2,
2016, pp. 23-29.
[34] Z. Abbas, W. Yoon, “A survey on energy conserving
mechanisms for the Internet of things: Wireless net-
working aspects”, Sensors, Vol. 15, No. 10, 2015, pp.
24818-24847.
[35] P. Kamalinejad, C. Mahapatra, Z. Sheng, S. Mirabbasi,
V. C. Leung, Y. L. Guan, "Wireless energy harvesting
for the Internet of things”, IEEE Communications
Magazine, Vol. 53, No. 6, 2015, pp. 102-108.
[36] A. M. Rahmani, T. N. Gia, B. Negash, A. Anzanpour,
I. Azimi, M. Jiang, P. Liljeberg, “Exploiting smart e-
Health gateways at the edge of healthcare Internet-
of-Things: A fog computing approach”, Future Gen-
eration Computer Systems, Vol. 78, 2018, pp. 641-
658.
[37] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Con-
strained policy optimization”, Proceedings of the
34th International Conference on Machine Learning,
Sydney, NSW, Australia, 6-11 August 2017, pp. 22-31.
[38] R. Doshi, K.-W. Hung, L. Liang, K.-H. Chiu, “Deep
learning neural networks optimization using hard-
ware cost penalty”, Proceedings of the IEEE Interna-
tional Symposium on Circuits and Systems, Montre-
al, QC, Canada, 22-25 May 2016, pp. 1954-1957
[39] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De
Rose, R. Buyya, “Cloudsim: a toolkit for modelling
and simulation of cloud computing environments
and evaluation of resource provisioning algorithms”,
Software: Practice and Experience, Vol. 41, No. 1,
2011, pp. 23-50.
[40] H. Gupta, A. Vahid Dastjerdi, S. K. Ghosh, R. Buyya,
“ifogsim: A toolkit for modelling and simulation of
resource management techniques in the internet
of things, edge and fog computing environments”,
Software: Practice and Experience, Vol. 47, No. 9,
2017, pp. 1275-1296.
[41] Bhagyalakshmi Magotra. (2023). “Adaptive Com-
putational Solutions to Energy Eciency in Cloud
Computing Environment Using VM Consolidation”.
Archives of Computational Methods in Engineering.
(2022), pp.1790-1818
[42] S. Shen, V. van Beek, A. Iosup, “Statistical charac-
terization of business-critical workloads hosted in
cloud datacenters”, Proceedings of the 15th IEEE/
ACM International Symposium on Cluster, Cloud
and Grid Computing, Shenzhen, China, 4-7 May
2015, pp. 465-474.
[43] Bitbrain Dataset, http://gwa.ewi.tudelft.nl/datasets/
gwa-t-12-bitbrains (accessed: 2024)
[44] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, P. Li, "Energy e-
cient scheduling for real-time systems based on the
deep q-learning model”, IEEE Transactions on Sus-
tainable Computing, Vol. 4, No. 1, 2017, pp. 132-141.
[45] Microsoft Azure Pricing Calculator, https://azure.
microsoft.com/en-au/pricing/calculator/ (accessed:
2024)
850
International Journal of Electrical and Computer Engineering Systems
... Furthermore, the CPU and RAM overhead introduced by the DRL-MTS models is analyzed across different fleet sizes. Their task success rate is also compared with three state-of-the-art methods: MEC Host Selection Mechanism (MEC-HSM) [20], the Multi-Agent System for Industrial Task Optimization (MASITO) [21], and Deep Reinforcement Learning for Dynamic Task Scheduling in Edge-Cloud Environments (RLbDS) [22]. Results confirm that the proposed models achieve competitive or superior success rates with lower computational overhead and better scalability in dynamic multi-AGV scenarios. ...
... Moreover, to position DRL-MTS within the broader research landscape, Scenario 1.B extends the evaluation by benchmarking against state-of-the-art DRL-based scheduling approaches designed for dynamic and resource-constrained environments. The selected baselines include MASITO [21], a multi-agent DRL system for industrial task optimization, and RLbDS [22], a deep reinforcement learning framework for dynamic task scheduling in edge-cloud environments. This comparison analyzes success rates, scalability, training convergence, and computational overhead, ensuring that DRL-MTS provides competitive performance in real-world industrial task scheduling scenarios. ...
... The selected baselines were chosen based on their relevance to dynamic scheduling environments, alignment with comparable problem setups, and demonstrated robustness in resource-constrained conditions. As previously mentioned, the selected approaches include the Multi-Agent System for Industrial Task Optimization (MASITO) [21] and the Deep Reinforcement Learning for Dynamic Task Scheduling in Edge-Cloud Environments (RLbDS) [22]. Table 9 provides a performance comparison between this work and state-of-the-art studies 5 . ...
Article
Full-text available
In modern novel collaborative multi-Automated Guided Vehicle (AGV) systems, vehicles are responsible for executing both mission-critical process-related operations and purely computational tasks, such as collision avoidance. This work investigates the problem of joint inter-AGV task placement and intra-AGV computational resource allocation in MEC-enabled multi-AGV environments. To address this challenge, a two-step strategy is proposed to maximize the number of scheduled and completed tasks across multiple AGVs while ensuring fair and efficient resource use within each AGV. The problem of inter-AGV task placement is solved by dynamically applying a catalog of deep reinforcement learning (DRL) models for varying numbers of AGVs. Training time for these models is reduced threefold by using datasets from existing optimization solvers. Transfer learning further reduces training times by up to 51%. Second, a multiagent deep reinforcement learning (MADRL)-based collaborative protocol for dynamic intra-AGV resource allocation (MACP-DRA) is proposed, allowing AGVs to adjust computational resources dynamically. It incorporates a minimum guaranteed share strategy to ensure fair resource distribution while optimizing performance under dynamic workloads. Compared to existing MADRL approaches, MACP-DRA enhances conflict resolution efficiency while maintaining low computational cost. Evaluation results demonstrate that the proposed inter-AGV scheduling strategy approaches optimal performance while achieving a superior trade-off between decision time and task completion rates. Compared to a multi-agent DRL baseline, the proposed MACP-DRA models reduced resource conflicts by 54.9%, task processing delays by 35.7%, and resource underutilization by 9.93%, while maintaining minimal computational and energy consumption overhead.
... These systems analyze historical patterns and correlate them with various contextual factors to forecast resource requirements with remarkable precision. The resulting dynamic scaling capabilities extend well beyond traditional threshold-based approaches, creating truly responsive infrastructure that adjusts in real-time to changing conditions [2]. Machine progressively improving their performance over time [2]. ...
... The resulting dynamic scaling capabilities extend well beyond traditional threshold-based approaches, creating truly responsive infrastructure that adjusts in real-time to changing conditions [2]. Machine progressively improving their performance over time [2]. ...
Article
Full-text available
This article explores the transformative potential of generative AI technologies in optimizing hybrid cloud environments, presenting a multifaceted analysis of current capabilities and emerging trends. The article examines how artificial intelligence fundamentally reshapes resource management through predictive allocation mechanisms, pattern recognition, and real-time demand analysis, shifting hybrid cloud operations from reactive to proactive paradigms. It evaluates critical security considerations including multi-zone deployments, AI-driven threat analysis, unified identity management, and zero-trust architectures that maintain protection across heterogeneous infrastructures. Network optimization strategies are analyzed, featuring software-defined technologies, latency minimization techniques, end-to-end encryption methodologies, and proximity-based workload allocation approaches that enhance performance while maintaining security. Implementation insights from enterprise case studies demonstrate significant improvements in application performance, resource utilization, and cost efficiency, while highlighting organizational factors critical for successful adoption. The article concludes by examining future technological horizons including advanced generative AI capabilities, quantum computing integration, and autonomous optimization systems, providing a strategic roadmap for organizations navigating the evolving hybrid cloud landscape.
Article
Full-text available
In recent years, modern industry has been exploring the transition to cyber physical system (CPS)-based smart factories. As intelligent industrial detection and control technology grows in popularity, massive amounts of time-sensitive applications are generated. A cutting-edge computing paradigm called edge-cloud collaborative computing was developed to satisfy the need of time-sensitive tasks such as smart vehicles and automatic mechanical remote control, which require substantially low latency. In edge-cloud collaborative computing, it is extremely challenging to improve task scheduling while taking into account both the dynamic changes of user requirements and the limited available resources. The current task scheduling system applies a round-robin policy to cyclically select the next server from the list of available servers, but it may not choose the best-suited server for the task. To satisfy the real-time task flow of industrial production in terms of task scheduling based on deadline and time sensitivity, we propose a hierarchical architecture for edge-cloud collaborative environments in the Industrial Internet of Things (IoT) and then simplify and mathematically formulate the time consumption of edge-cloud collaborative computing to reduce latency. Based on the above hierarchical model, we present a dynamic time-sensitive scheduling algorithm (DSOTS). After the optimization of DSOTS, the dynamic time-sensitive scheduling algorithm with greedy strategy (TSGS) that ranks server capability and job size in a hybrid and hierarchical scenario is proposed. What cannot be ignored is that we propose to employ comprehensive execution capability (CEC) to measure the performance of a server for the first time and perform effective server load balancing while satisfying the user’s requirement for tasks. In this paper, we simulate an edge-cloud collaborative computing environment to evaluate the performance of our algorithm in terms of processing time, SLA violation rate, and cost by extending the CloudSimPlus toolkit, and the experimental results are very promising. Aiming to choose a more suitable server to handle dynamically incoming tasks, our algorithm decreases the average processing time and cost by 30% and 45%, respectively, as well as the average SLA violation by 25%, when compared to existing state-of-the-art solutions.
Article
Full-text available
Over the last decade, the usage of Internet of Things (IoT) enabled applications, such as healthcare, intelligent vehicles, and smart homes, has increased progressively. These IoT applications generate delayed- sensitive data and requires quick resources for execution. Recently, software-defined networks (SDN) offer an edge computing paradigm (e.g., fog computing) to run these applications with minimum end-to-end delays. Offloading and scheduling are promising schemes of edge computing to run delay-sensitive IoT applications while satisfying their requirements. However, in the dynamic environment, existing offloading and scheduling techniques are not ideal and decrease the performance of such applications. This article formulates joint and scheduling problems into combinatorial integer linear programming (CILP). We propose a joint task offloading and scheduling (JTOS) framework based on the problem. JTOS consists of task offloading, sequencing, scheduling, searching, and failure components. The study’s goal is to minimize the hybrid delay of all applications. The performance evaluation shows that JTOS outperforms all existing baseline methods in hybrid delay for all applications in the dynamic environment. The performance evaluation shows that JTOS reduces the processing delay by 39% and the communication delay by 35% for IoT applications compared to existing schemes.
Article
Full-text available
Computation-intensive mobile applications are explosively increasing and cause computation overload for smart mobile devices (SMDs). With the assistance of mobile edge computing and mobile cloud computing, SMDs can rent computation resources and offload the computation-intensive applications to edge clouds and remote clouds, which reduces the application completion delay and energy consumption of SMDs. In this paper, we consider the mobile applications with task call graphs and investigate the task offloading and resource scheduling problem in hybrid edge-cloud networks. Due to the interdependency of tasks, time-varying wireless channels, and stochastic available computation resources in the hybrid edge-cloud networks, it is challenging to make task offloading decisions and schedule computation frequencies to minimize the weighted sum of energy, time, and rent cost (ETRC). To address this issue, we propose two efficient algorithms under different conditions of system information. Specifically, with full system information, the task offloading and resource scheduling decisions are determined based on semidefinite relaxation and dual decomposition methods. With partial system information, we propose a deep reinforcement learning framework, where the future system information is inferred by long short-term memory networks. The discrete offloading decisions and continuous computation frequencies are learned by a modified deep deterministic policy gradient algorithm. Extensive simulations evaluate the convergence performance of ETRC with various system parameters. Simulation results also validate the superiority of the proposed task offloading and resource scheduling algorithms over baseline schemes.
Article
Full-text available
Recently, the number of Internet of Things (IoT) devices connected to the Internet has increased dramatically as well as the data produced by these devices. This would require offloading IoT tasks to release heavy computation and storage to the resource-rich nodes such as Edge Computing and Cloud Computing. However, different service architecture and offloading strategies have a different impact on the service time performance of IoT applications. Therefore, this paper presents an Edge-Cloud system architecture that supports scheduling offloading tasks of IoT applications in order to minimize the enormous amount of transmitting data in the network. Also, it introduces the offloading latency models to investigate the delay of different offloading scenarios/schemes and explores the effect of computational and communication demand on each one. A series of experiments conducted on an EdgeCloudSim show that different offloading decisions within the Edge-Cloud system can lead to various service times due to the computational resources and communications types. Finally, this paper presents a comprehensive review of the current state-of-the-art research on task offloading issues in the Edge-Cloud environment.
Article
Full-text available
Recently, the number of Internet of Things (IoT) devices connected to the Internet has increased dramatically as well as the data produced by these devices. This would require offloading IoT tasks to release heavy computation and storage to the resource-rich nodes such as Edge Computing and Cloud Computing. Although Edge Computing is a promising enabler for latency-sensitive related issues, its deployment produces new challenges. Besides, different service architectures and offloading strategies have a different impact on the service time performance of IoT applications. Therefore, this paper presents a novel approach for task offloading in an Edge-Cloud system in order to minimize the overall service time for latency-sensitive applications. This approach adopts fuzzy logic algorithms, considering application characteristics (e.g., CPU demand, network demand and delay sensitivity) as well as resource utilization and resource heterogeneity. A number of simulation experiments are conducted to evaluate the proposed approach with other related approaches, where it was found to improve the overall service time for latency-sensitive applications and utilize the edge-cloud resources effectively. Also, the results show that different offloading decisions within the Edge-Cloud system can lead to various service time due to the computational resources and communications types.
Article
Cloud Computing has emerged as a computing paradigm where services are provided through the internet in recent years. Offering on-demand services has transformed the IT companies' working environment, leading to a linearly increasing trend of its usage. The provisioning of the Computing infrastructure is achieved with the help of virtual machines. A great figure of physical devices is required to satisfy the users' resource requirements. To meet the requirements of the submitted workloads that are usually dynamic, the cloud data centers cause the over-provisioning of cloud resources. The result of this over-provisioning is the resource wastage with an increase in the levels of energy consumption, causing a raised operational cost. High CO2 emissions result from this huge energy consumption by data centers, posing a threat to environmental stability. The environmental concern demands for the controlled energy consumption, which can be attained by optimal usage of resources to achieve in the server load, by minimizing the number of active nodes, and by minimizing the frequency of switching between active and de-active server mode in the data center. Motivated by these actualities, we discuss numerous statistical, deterministic, probabilistic, machine learning and optimization based computational solutions for the cloud computing environment. A comparative analysis of the computational methods, on the basis of architecture, consolidation step involved, objectives achieved, simulators involved and resources utilized, has also been presented. A taxonomy for virtual machine (VM) consolidation has also been derived in this research article followed by emerging challenges and research gaps in the field of VM consolidation in cloud computing environment.
Article
Task scheduling is the most recent networking technology in cloud computing. Among various technologies, the concept of Virtualization, dynamic sharing, delivering quality service, and load balancing are some of the most attractive ones that require high attention. While scheduling tasks and sharing applications, the most important challenge is to minimize execution time while maintaining the quality of service in terms of Service Level Agreement (SLA) and energy consumption. In the present paper, the authors proposed a Cuckoo Search Optimization to improve the local search strategy and schedule tasks in the cloud computing environment. This iterative search mechanism integrated efficient task scheduling with the neural architecture to achieve secure scheduling. The simulation analysis performed up to 1000 tasks for 100 user requests in terms of SLA violation, and energy consumption demonstrated the effectiveness of the proposed CS optimized, secure scheduling.
Article
Resource allocation and cloudlets scheduling are fundamental problems in a cloud computing environment. The scheduled cloudlets must be executed efficiently by using the available resources to improve system performance. To achieve this, we propose a new noble mechanism called Optimized Min-Min (OMin-Min) algorithm, inspired by the Min-Min algorithm. The objectives of this work are: i) to provide a comprehensive review of the cloud and scheduling process; ii) to classify the scheduling strategies and scientific workflows; iii) to implement our proposed algorithm with various scheduling algorithms (i.e., Min-Min, Round-Robin, Max-Min, and Modified Max-Min) for performance comparison, within different cloudlet sizes (i.e., small, medium, large, and heavy) in three scientific workflows (i.e., Montage, Epigenomics, and SIPHT); and iv) to investigate the performance of the implemented algorithms by using CloudSim. The main goal of this study is to obtain optimum results that satisfy the minimum completion time and achieve better utilization of resources, which lead to increased throughput. The algorithms were implemented in a Java environment. Results were discussed and analyzed by using formulas and were compared in percentages. According to the simulation results, the proposed algorithm produces the best solution among all algorithms in the proposed cases.
Article
Edge computing, as a new computing model, is facing new challenges in network security while developing rapidly. Due to the limited performance of edge nodes, the distributed intrusion detection system (DIDS), which relies on high-performance devices in cloud computing, needs to be improved to low load to detect packets nearby the network edge. This paper proposes a low load DIDS task scheduling method based on Q-Learning algorithm in reinforcement learning, which can dynamically adjust scheduling strategies according to network changes in the edge computing environment to keep the overall load of DIDS at a low level, while maintaining a balance between the two contradictory indicators of low load and packet loss rate. Simulation experiments show that the proposed method has better low-load performance than other scheduling methods, and indicators such as malicious feature detection rate are not significantly reduced.
Article
Computation partitioning is an important technique to improve the application performance by selectively offloading some computations from the mobile devices to the nearby edge cloud. In a dynamic environment in which the network bandwidth to the edge cloud may change frequently, the partitioning of the computation needs to be updated accordingly. The frequent updating of partitioning leads to high state migration cost between the mobile side and edge cloud. However, existing works don’t take the state migration overhead into consideration. Consequently, the partitioning decisions may cause significant network congestion and increase overall completion time tremendously. In this article, with considering the state migration overhead, we propose a set of novel algorithms to update the partitioning based on the changing network bandwidth. To the best of our knowledge, this is the first work on computation partitioning for stateful data stream applications in dynamic environments. The algorithms aim to alleviate the network congestion and minimize the make-span through selectively migrating state in dynamic edge cloud environments. Extensive simulations show our solution not only could selectively migrate state but also outperforms other classical benchmark algorithms in terms of make-span. The proposed model and algorithms will enrich the scheduling theory for stateful tasks, which has not been explored before.