Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Deep Reinforcement Learning for Dynamic Task
Scheduling in Edge-Cloud Environments
837
Original Scientic Paper
Abstract – With The advent of the Internet of Things (IoT) and its use cases there is a necessity for improved latency which has led
to edgecomputing technologies. IoT applications need a cloud environment and appropriate scheduling based on the underlying
requirements of a given workload. Due to the mobility nature of IoT devices and resource constraints and resource heterogeneity,
IoT application tasks need more ecient scheduling which is a challenging problem. The existing conventional and deep
learning scheduling techniques have limitations such as lack of adaptability, issues with synchronous nature and inability to deal
with temporal patterns in the workloads. To address these issues, we proposed a learning-based framework known as the Deep
Reinforcement Learning Framework (DRLF). This is designed in such a way that it exploits Deep Reinforcement Learning (DRL) with
underlying mechanisms and enhanced deep network architecture based on Recurrent Neural Network (RNN). We also proposed an
algorithm named Reinforcement Learning Dynamic Scheduling (RLbDS) which exploits dierent hyperparameters and DRL-based
decision-making for ecient scheduling. Real-time traces of edge-cloud infrastructure are used for empirical study. We implemented
our framework by dening new classes for CloudSim and iFogSim simulation frameworks. Our empirical study has revealed that
RLbDS out performs many existing scheduling methods.
Keywords: Task Scheduling, Edge-Cloud Environment, Recurrent Neural Network, Edge Computing, Cloud Computing,
Deep Reinforcement Learning
Volume 15, Number 10, 2024
D. Mamatha Rani*
TGSWRAFPDC(W), Bhongir
Department of Computer Science, Bhongir, Telangana- 508126, India
mamatha3004@gmail.com
Supreethi K P
Jawaharlal Nehru Technological University
Department of Computer Science and Engineering, Hyderabad, Telangana- 500085, India
supreethi.pujari@jntuh.ac.in
Bipin Bihari Jayasingh
CVR College of Engineering/IT Department
Hyderabad, Telangana- 501510, India
bipinbjayasingh@cvr.ac.in
Received: November 17, 2023; Received in revised form: June 12, 2024; Accepted: June 17, 2024
*Corresponding author
1. INTRODUCTION
Unprecedented growth of cloud-assisted use cases
has led to compelling Cloud Service Providers (CSPs) to
optimize resource usage in the presence of Service Level
Agreements (SLAs). Ubiquitous adoption of technologi-
cal innovation such as the Internet of Things (IoT ) has
led to the emergence of fog and edge computing phe-
nomena which leverage latency. In the presence of IoT
applications, the scheduling of tasks is challenging for
many reasons such as network hierarchy, heterogeneity
of resources, mobility of devices, resource-constrained
devices and stochastic behaviour of nodes [1]. Tradition-
al cloud scheduling algorithms are not sucient to har-
ness the power of the dynamic computing environment
made up of cloud, fog and edge resources. To overcome
this problem, dierent scheduling algorithms came into
existence. Reinforcement learning is one such technique
used with the machine learning approach [2]. Many
learning-based task scheduling approaches came into
existence. Their merits and demerits are summarized in
Table 1 and Table 2 in Section 2. The advantages of the
research in [1] include consideration of dynamic envi-
ronments and heterogeneous cores. However, it does
not consider adaptive QoS, edge cloud, decentralized
environment and presence of stochastic workloads.
838 International Journal of Electrical and Computer Engineering Systems
The work in [3] considered edge cloud and also het-
erogeneous cores for their task scheduling research.
However, it does not support adaptive QoS, dynamic
and decentralized environments, edge cloud and sto-
chastic workloads. The merits of [4] include the consid-
eration of dynamic environment, stochastic workloads
and heterogeneous cores. But it lacks adaptive QoS,
support for edge cloud and decentralized environ-
ments. The research in [5] and [6] has similar ndings.
Their method has provision for considering dynamic
environments, heterogeneous cores, adaptive QoS
and stochastic workloads. But is not designed for edge
cloud and decentralized environments. In [7], there
is consideration of dynamic environment, stochastic
workloads, adaptive QoS and heterogeneous cores but
does not support decentralized and edge-cloud envi-
ronments. The work in [8] supports dynamic environ-
ments and stochastic workloads. However, it has limita-
tions to deal with heterogeneous cores, adaptive QoS,
edge cloud and decentralized environments. There is a
similarity in the task scheduling methods proposed in
[9] and [10].
Their methods are dynamic supporting adaptive QoS
and stochastic workloads besides dealing with hetero-
geneous cores. However, they do not support decen-
tralized and edge cloud environments. The scheduling
research in [11] supports dynamic environments along
with stochastic workloads. They also deal with hetero-
geneous cores and adaptive QoS. However, the draw-
back is that those methods do not consider decentral-
ized and edge-cloud environments.
Concerning optimization parameters, Table 2 pro-
vides research gaps in existing solutions. Research in
[1] is based on a heuristics approach and considers en-
ergy and SLA violation parameters. Their research lacks
in the study of response time and cost of scheduling
which are crucial for task scheduling. The work in [3]
is also based on the heuristics method but considers
cost and energy parameters. It does not throw light on
response time and SLA violations. In [4], their method
is based on Gaussian process regression and considers
two parameters such as energy and SLAs. It has no sup-
port for optimization of cost and response time.
The task scheduling research in [5] and [6] is based
on the Deep Queue Learning Network (DQN) method
and supports cost and energy parameters for optimiza-
tion. However, they have no optimization of SLAs and
response time. In [7] Q-learning-based phenomenon is
used considering energy and cost dynamics for optimi-
zation. However, it lacks optimization of response time
and SLAs. Deep Neural Network (DNN) is the scheduling
method used in [8] and it has support for optimization
of cost and SLA parameters. It lacks support for energy
and response time optimizations. The work in [9] and
[10] is based on the Double DQN (DDQN) method and
it supports only energy parameters for optimization. It
lacks support for response time, cost and SLA optimiza-
tions. In [11] DRL method is used for task scheduling by
considering response time for optimization. However,
it does not support the optimization of SLAs, cost and
energy. From the literature, it is observed that there is a
need for a more comprehensive methodology in edge-
cloud environments for task scheduling. Our contribu-
tions to this paper are as follows.
1. We proposed a learning-based framework known
as the Deep Reinforcement Learning Framework
(DRLF). This is designed in such a way that it ex-
ploits Deep Reinforcement Learning (DRL) with un-
derlying mechanisms and enhanced deep network
architecture based on Recurrent Neural Network
(RNN).
2. We proposed an algorithm named Reinforcement
Learning Dynamic Scheduling (RLbDS) which ex-
ploits dierent hyperparameters and DRL-based
decision-making for ecient scheduling.
3. Our simulation study has revealed that the pro-
posed RLbDS outperforms many existing schedul-
ing methods.
The remainder of the paper is structured as follows.
Section 2 reviews prior works on existing task schedul-
ing methods for cloud and edge-cloud environments.
Section 3 presents details of the proposed system in-
cluding the system model, DRL mechanisms and the un-
derlying algorithm. Section 4 presents the results of the
empirical study while Section 5 concludes our work and
provides directions for the future scope of the research.
2. RELATED WORK
This section reviews prior works on existing task
scheduling methods for cloud and edge-cloud environ-
ments. VM plays a vital role in cloud infrastructure for
resource provisioning. Beloglazov and Bu proposed a
method for improving resource utilization in the cloud
through VM migration and consolidation. They found
that VM live migration has the potential to exploit idle
nodes in cloud data centres to optimize resource utili-
zation and reduce energy consumption. They consid-
ered the dynamic environment and presence of het-
erogeneous cores for their task scheduling study. Their
method is based on a heuristics approach. It considers
SLA negotiations and algorithms designed to support
optimizations such as energy eciency and SLAs. Their
algorithm monitors VMs and their resource usage. By
considering VM consolidation and VM live migration,
their method is aimed at reducing energy consump-
tion and adherence to SLAs. This method lacks adap-
tive QoS and support for dynamic workloads. Pham
and Huh [3] proposed a task scheduling method based
on a heuristics approach for such an environment.
It is designed to work for heterogeneous cores in fog-
cloud. They considered optimizations such as energy
eciency and cost reduction by scheduling tasks in an
edge-cloud environment. Their algorithm is based on
heuristics towards reducing cost and energy consump-
839
Volume 15, Number 10, 2024
tion. It is based on graph representation. Towards this,
their method exploits the task graph and processor
graph. Given the two graphs representing tasks and re-
sources, their method nds appropriate resource allo-
cation for given tasks. It has a provision for determining
task priority and then choosing the most suitable node
for the execution of the task.
Bui et al. [4] proposed an optimization framework for
the cloud with a predictive approach. They could pre-
dict the dynamics of resource utilization for schedul-
ing by employing a method named Gaussian process
regression. The prediction result helped them to mini-
mize the number of servers to be used to process the
requests leading to a reduction of energy usage. Their
method is, however, based on heuristics and is not suit-
able for dynamic workloads and edge-cloud environ-
ments. Cheng et al. [2] explored DRL based approach
towards task scheduling and resource provisioning
in the cloud. They further optimized the Q-learning
method to reduce the task rejection rate and improve
energy eciency. Huang et al. [5] and Mao et al. [6] fol-
lowed the DRL approach for improving task scheduling
performance in a cloud computing environment.
In [5] DRL based online ooading method is pro-
posed based on deep neural networks. It is a scalable
solution since it is a learning-based approach. In [6]
DeepRM is the framework proposed for task schedul-
ing considering ecient resource management. Both
methods are based on the DQN approach rather than
heuristics. Both methods considered optimization pa-
rameters such as energy and cost. In other words, they
are designed to reduce energy consumption and also
the cost incurred for task execution in cloud environ-
ments. They support stochastic workloads and adap-
tive QoS. However, they do not support edge-cloud
environments and do not optimize SLA and response
time parameters.
Basu et al. [7] focused on the problem of live migra-
tion of VMs based on the RL-based Q-learning process.
Their methodology improves live migration and heu-
ristics-based existing approaches. Towards this end,
their method exploits the Megh and RL-based model
to have continuous adaptation to the runtime situa-
tions towards leveraging energy eciency. Xu et al.
[8] dened a DNN approach named LASER to support
deadline-critical jobs with replication and speculative
execution. Their implementation of the framework is
designed for the Hadoop framework. Zhang et al. [9]
dened a DDQN method towards energy eciency in
edge computing. It is based on the Q-learning process
and also the dynamic voltage frequency scaling (DVFS)
method that has the potential to reduce energy usage.
As Q-learning is not able to recognize continuous sys-
tem states, they extended it to have double-deep Q-
learning. Table 1 shows provides a summary of ndings
among existing scheduling methods.
Table 1. Merits and demerits of existing scheduling methods compared with the proposed method
Reference Dynamic Stochastic Workload Decentralized Edge Cloud Adaptive QoS Heterogeneous
[1] Yes No No No No Yes
[3] No No No Yes No Yes
[4] Yes Ye s No No No Ye s
[5], [6] Yes Ye s No No Yes Yes
[7] Yes Ye s No No Yes Yes
[8] Yes Ye s No No No No
[9], [10] Yes Ye s No No Yes Yes
[11] Yes Ye s No No Yes Yes
[18] Yes No No No Yes Ye s
[19] Yes No No Yes Ye s No
[20] Yes No No No Yes Ye s
[21] Yes No No No Yes Ye s
[22] Yes No No No Yes Ye s
[23] Yes Ye s No No Yes Yes
[25] Yes No No No No No
[26] Yes No No No Yes Ye s
[27] Yes No Ye s No Yes Ye s
Proposed (RLbDS) Yes Yes Yes Ye s Yes Yes
Similar to the work of [2], Mao et al. [6] employed DDQN
for ecient resource management. This kind of work is
also found in Li et al. [10]. Both have employed the DRL
technique towards job scheduling over diversied re-
sources. However, these learning-based methods are not
able to withstand stochastic environments. Mao et al. [6]
and Rjoubet al. [11] investigated DRL based approach for
task scheduling in edge-cloud. However, they considered
only response time in their research. Its drawback is that
they could not exploit asynchronous methods for optimi-
zation of their methods towards robustness and adapt-
ability. There is a need to improve it by considering the dy-
namic optimization of parameters in the presence of sto-
chastic workloads. Skarlat et al. [12] explored IoT service
placement dynamics in fog computing resources while
Pham et al. [13] focused on cost and performance towards
proposing a novel method for task scheduling. Brogi and
Forti [14] investigated on deployment of QoS-aware IoT
840 International Journal of Electrical and Computer Engineering Systems
tasks in fog infrastructure. Task prioritization [15], DRL for
resource provisioning [4, 7], energy-ecient scheduling
using Q-learning [16] and DRL usage in 5G networks [17]
are other important contributions.
As presented in Table 1, we summarize our ndings
leading to important research gaps. The summary is
made in terms of dierent parameters such as dynamic
environment, presence of stochastic workload, decen-
tralized environment, usage of edge cloud, consider-
ation for adaptive QoS and presence of heterogeneous
cores for task scheduling. Table 1 also provides the pro-
posed method and its merits over existing methods.
Almutairi and Aldossary [18] proposed a novel meth-
od for IoT tasks to ooad in the edge-cloud ecosystem.
It is designed to serve latency-sensitive applications in
a better way. It has a fuzzy logic-based approach for
inferring knowledge towards decision-making in the
presence of resource utilization and dynamic resource
utilization. Ding et al. [19] considered an edge-cloud
environment to investigate stateful data stream appli-
cations. They proposed a method to judge state migra-
tion overhead and make partitioning decisions based
on the dynamically changing network bandwidth
availability. Murad et al.
[20] proposed an improved version of the min-min
task scheduling method to deal with scientic work-
ows in cloud computing. It could reduce the mini-
mum completion time besides optimizing resource
utilization. Bulej et al. [21] did their research on the
management of latency in the edge-cloud ecosystem
towards better performance in task scheduling in the
presence of dynamic workloads. It is designed to ex-
plore the upper bound of response time and optimize
the performance further. Almutairi and Aldossary [22]
proposed an edge-cloud system architecture to in-
vestigate modelling methodology on task ooading.
It has ooading latency models along with various
ooading schemes. Their simulations are made using
Edge CloudSim. They intend to improve it in future
with fuzzy logic.
Zhang and Shi [23] explored workow scheduling in
an edge-cloud environment. They analyzed dierent
possibilities in workow scheduling in such an eco-
system. They opined that workow applications need
novel approaches in the scheduling process. Zhao et al.
[24] focused on task scheduling along with security to
prevent intrusions in edge computing environments.
They considered low-rate intrusions and focused on
preventing them along with task scheduling. It is a Q-
learning-based approach designed to meet runtime
requirements based on the learning process. Zhang
et al. [25] proposed a time-sensitive algorithm that dy-
namically caters to the needs of deadline-aware tasks
in edge-cloud environments. It considers job size and
server capability in a given dynamic and hierarchical
scenario. It is a multi-objective task considering execu-
tion time, cost and reduction of SLAs. Lakhan et al. [26]
proposed a task scheduling approach for IoT tasks con-
sidering a hybrid mechanism consisting of task sched-
uling and task ooading. Singh and Bhushan [27] pro-
posed a method for task scheduling based on Cuckoo
Search Optimization (CSO). It has an integrated local
search strategy. From these recent works, it is found
that they targeted IoT kind of workows in edge-cloud
environments. There is Q-Learning used in one of the
papers. However, deep reinforcement learning is not
found in the latest works. Service placement in edge
resources using DRL [28], dynamic scheduling [29] and
task ooading [30] are other important contributions.
Table 2 provides a summary of ndings among exist-
ing scheduling methods in terms of optimization pa-
rameters. Magotra [41] focused on energy-ecient ap-
proaches in cloud infrastructures by developing adap-
tive solutions that could help the system towards prop-
er VM consolidation, leading to better performance.
Reference Method Optimization Parameters
SLA Violations Cost Response Time Energy
[1] Heuristics Yes No No Yes
[3] Heuristics No Ye s No Yes
[4] Gaussian Process Regression Yes No No Yes
[5], [6] DQN No Yes No Ye s
[7] Q Learning No Yes No Ye s
[8] DNN Yes Yes No No
[9], [10] DDQN No No No Yes
[11] DRL (REINFORCE) No No Yes No
[18] SJF No No Yes Yes
[19] Cloud Computing No No Yes No
[21] Cloud computing No Yes Ye s Yes
[23] CSA No Yes No No
[24] Cloud computing No Yes Ye s Yes
[25] Cloud computing No Yes No No
[27] CSP No Ye s Yes No
Table 2. Optimization parameters considered by existing scheduling methods
841
Volume 15, Number 10, 2024
As presented in Table 2, we summarized the existing
methods in terms of optimization parameters and the
approach considered in the task scheduling research.
The optimization parameters considered for the com-
parative study of existing methods are SLA violations,
cost, response time and energy.
Table 2 also provides the proposed method and its
merits over existing methods. Table 1 and Table 2 pro-
vide very useful insights reecting gaps in the research.
Our work in this paper is based on such research gaps as
those tables reveal the merits of the proposed system.
3. PROPOSED SYSTEM
We proposed a DRL-based framework for dynamic
task scheduling in an edge-cloud environment. This sec-
tion presents the framework and proposed algorithm
besides DRL mechanisms.
3.1. PROBLEM DEFINITION
Considering an edge-cloud environment, let H be a
collection of hosts denoted as {H1, H2, H3, …, Hn} where
n indicates a maximum number of hosts. A task T can be
assigned to host H. Scheduling is considered as the as-
signment of T to H. However, in terms of RL, the system
state is mapped to an action. Here action does mean al-
location of T to H. T may be an active task that could be
migrated to a new H or a newly arrived task. At the be-
ginning of an interval, denoted as SIi, The system state ini-
tially is denoted as statei which reects the hosts and their
parameters, tasks yet to be allocated in the prior interval,
denoted as (ai-1\ li) beside newly arrived tasks denoted as
ni. For each task, denoted as ai (=ai-1∪ni\ li), the scheduler
needs to take an action, denoted as Actioni, for the system
interval SIi in terms of either allocating it to a host or mi-
grating to a new host. A task is satisfying Let mi⊆ai-1\ li is
considered a migratable task. A scheduler can be under-
stood as a model which reects a decision-making func-
tion Statei→Actioni. Here loss function associated with the
model for a given interval denoted as Lossi, is computed
based on task allocations. Therefore, the problem of real-
izing an optimal model is expressed in Eq. 1.
(1)
Dierent notations used in our work are presented
in Table 3.
3.2. OUR SYSTEM MODEL
We considered infrastructure or resources for sched-
uling in an edge-cloud environment. The resources are
heterogeneous. Edge resources are nearby while cloud
resources reside in a remote data centre. Therefore, each
host in the infrastructure is dierent in response time
and computational power. Edge resources are closer
and exhibit low response times but they do have limited
resources and computational power. Cloud resources
take more response time but they do have high compu-
tational power. Our system model is presented in Fig. 1.
The edge and cloud nodes are part of computing re-
sources. These resources are managed by the resource
management module. This module has several compo-
nents or sub-modules to deal with resource manage-
ment either directly or indirectly. The scheduler module
is responsible for either scheduling a task T to a host H or
migrating a task from one host to another host based on
runtime dynamics. The dynamic workload is generated
by IoT devices being used by dierent users. The work-
load contains several tasks with varied requirements.
Resource management module takes the workload and
follows DRL based (learning-based) approach in task al-
location or task migration. These decisions are based on
the ideal objective functions and the requirements asso-
ciated with tasks. The requirements may include dead-
line, bandwidth, RAM and CPU.
Fig. 1. Our system model
The workload is generated automatically to evaluate
the functionality of the proposed system. Our system
has a DRL model which inuences the scheduler mod-
ule in decision-making. There are multiple schedulers
to be used at runtime to serve dynamically generated
workloads. In the process, there is the distribution of
workload among hosts leading to faster convergence.
Each resource in edge-cloud accumulates local gradi-
ents associated with corresponding schedulers besides
synchronizing them to update models. The DRL module
follows asynchronous updates. The constraint satisfac-
tion module takes suggestions as input from DRL and
nds whether it is valid. Here valid does mean a task is in
migration or the host's capacity is optimally being used.
3.3. WORKLOAD GENERATION
We generate workload programmatically to evaluate
the proposed system. Since IoT devices and user's de-
mands are dynamic, there is a change in the bandwidth
and computational requirements of tasks. The whole
execution time in our system is divided into several
scheduling intervals. Each interval is assumed to have
the same duration. SIi denotes the ith scheduling inter-
val. This interval has a start time and end time denoted
as ti and ti+1 respectively. Each interval has active tasks
associated with it. They are the tasks being executed
and denoted as ai. The tasks that have been completed
at the beginning of the interval are denoted as li while
842
International Journal of Electrical and Computer Engineering Systems
newly arrived tasks that are dynamically generated by
the workload generator are denoted as ni.
3.4. OUR LEARNINGBASED APPROACH FOR
SCHEDULING
We proposed a framework known as the Deep Re-
inforcement Learning Framework (DRLF), as shown in
Fig. 2, which exploits a learning-based approach using
the DRL model for dynamic task scheduling in an edge-
cloud environment. The framework supports several
scheduling intervals. The framework has a workload
generator which generates tasks (ni) and gives them
to the scheduling and migration module. The tasks
given to the scheduler are in turn given to the resource
monitoring module which schedules new tasks and
migrates existing tasks if required to ensure optimal
resource utilization, load balancing and latency in task
completion. The scheduler activity changes the state of
the edge-cloud environment.
Fig. 2. Proposed Deep Reinforcement Learning
Framework (DRLF) for task scheduling in edge-cloud
environment
Every time Statei is updated by the resource monitor-
ing module it is given to the DRL model. The state infor-
mation consists of hosts' feature vectors, new tasks ni and
the rest of the tasks associated with the previous interval
and denoted by (ai-1\li). The resource monitoring module
also gives Lossi data to the DRL model. The DRL model
suggests an action, denoted as Actioni-1
PG, based on the
state information to the constraint satisfaction module
and updates parameters as expressed in Eq. 2. This mod-
ule then determines Penalityi to the DRL model.
(2)
This process continues iteratively. Once the con-
straint is satised, the constraint satisfaction module
gives the suggested action (Action)by the DRL module
to the resource management module. It then computes
Penalityi+1 about SIi+1 the next scheduling interval.
Table 3. Notations used in our work
Notation Description
aiIndicates a set of active tasks linked to SIi
HiIndicates ith host in a given set of hosts
liIndicates the initial set of tasks of SIi
miIndicates a decision for task migration
niIt indicates a task allocation decision
Actioni
PG Scheduling actions at the beginning of SIi
Lossi
PG Loss function at the beginning of SIi
SIiDenotes ith scheduling interval
Ti
SIt indicates ith in a given set of tasks
{T} Indicates the host to which task T has been assigned
AEC Average Energy Consumption
AMT Average Migration Time
ART Average Response Time
Hosts Indicates a collection of hosts in the edge-cloud
environment
NIndicates the maximum number of hosts
TDenotes a task to be executed
Based on the action received from the constraint sat-
isfaction module, the resource management module
either allocates a new task to a specic host or migrates
tasks, denoted as (ai-1\li), of the preceding interval. This
will result in an update from ai-1 to ai. Then the tasks
associated with ai are execute for SIi and the cycle con-
tinues for SIi+1.
3.5. DEEP LEARNING ARCHITECTURE
The DRL model is built based on an enhanced Re-
current Neural Network (RNN) architecture. It has the
functionality to achieve reinforcement learning. In the
process, it approximates Statei towards Actioni
PG which
is an action bestowed from the DRL model to the con-
straint satisfaction module for a given scheduling in-
terval. The enhanced RNN can ascertain temporal rela-
tionships between input space and output space. This
deep learning architecture is shown in Fig. 3. After each
interval, cumulative loss and policy are predicted by a
single network
The network has two fully connected layers, denot-
ed as fc1 and fc2, congured. These are followed by
three recurrent layers, denoted as r1, r2 and r3, with
skip connections. The given 2D input is attened and
sent to dense layers. The output of r3 is given to two
fully connected layers denoted as fc3 and fc4. The fc4
outputs a 2D vector of 100x100. It does mean that the
model can deal with 100 tasks allocated to 100 hosts
in cloud infrastructure. Eventually, a softmax function
is employed to the second dimension to have values
[0,1] and the resultant value in a row becomes 1. For in-
terpretation Ojk, denoting a probability map, indicates
that there is a probability of a task Tj
ai being assigned
to Hk. At the fc4, a cumulative loss function Lossi+1
PG is
computed. The layers in the network are made up of a
Gated Recurrent Unit that have the capacity to model
the temporal dimension of a given task and also the
843
Volume 15, Number 10, 2024
characteristics of the host comprising of bandwidth,
RAM and CPU. The Gated Recurrent Unit (GRU) layers
tend to have increased network parameters leading to
complexity. This problem is addressed by exploiting
skip connections towards gradient propagation faster.
Fig. 3. Architecture of an RNN variant used to
realize the DRL model
This model takes Statei as input which is represented
in the form of a 2D vector. This vector contains a con-
tinuous element FVi
Hosta, and another continuous ele-
ment FVi
ni and FVi
ai-1 \li has categorical host indices.
Therefore, pre-processing is required to transform host
indices into one hot vector with a maximum size of n.
Then there is a need for the concatenation of all feature
vectors. Afterwards, each element in the resultant vec-
tor is subjected to normalization based on a range of
values [0, 1]. Each element has a feature denoted as fe
while minfe and maxfe denote their minimum and maxi-
mum values respectively. These values are computed
relying on the dataset with the help of two heuristics
namely local regression and maximum migration time.
Afterwards, standardization is carried out feature-wise
using the expression in Eq. 3.
(3)
Once pre-processing of the given input is carried out,
it is fed to the network (Fig. 3) where it rst attens the
pre-processed input before sending it through dense
layers. The output of these layers is transformed into
Actioni
PG. We employed a backpropagation algorithm
to ascertain the biases and weights of the network. The
learning rate is kept adaptive from 10 to 2 and later
on, 1/10th value based on reward change associated
with the preceding 10 iterations is not greater than 0.1.
Automatic dierentiation is exploited to modify the
parameters of the network using Lossi
PG as a reward.
Gradients of local networks are accumulated across the
edge nodes periodically in an asynchronous fashion
towards the update of global network parameters. To-
wards this end, a gradient accumulation rule expressed
in Eq. 4 is followed.
(4)
Where local and global network parameters are
denoted as θ' and θ respectively, it has a log term to
indicate a change direction in parameters and the
(Lossi
PG+CLossi+1
pred) term denotes cumulative loss
predicted in a given episode that begins with State s.
Mean Square Error (MSE) is a gradient associated with
the cumulative loss predicted. Finally, there is the trans-
formation of output from Actioni
PG to Actioni by the
constraint satisfaction module and the same is given to
the resource management module.
3.6. Algorithm Design
We proposed an algorithm to realize the optimal
scheduling of given tasks in the edge-cloud ecosystem.
It is presented in Algorithm 1.
Algorithm: Reinforcement Learning based Dynamic
Scheduling (RLbDS)
Inputs:
Size of batch B
Maximum intervals for scheduling N
1. Begin
2. For each interval n in N
3. IF n%B==0 and n>1 Then
4. Compute loss function
5. Lossi
PG=Lossi+Penalityi
6. Use Lossi
PG in the network (Fig. 3) for
backpropagation
7. End If
8. Statei←PreProcess(Statei)
9. Feed Statei to the network (Fig. 3)
10. pMap←Output of RL model (network as in Fig. 3)
11. (Action, Penalityi+1)←ConSatMod(map)
12. Resource monitoring module takes action
13. DRL model takes Penalityi+1
14. ResourceMonitoring(Actioni) migrates active task
15. Execution of all tasks in interval n in edge-cloud
16. End For
17. End
Algorithm 1. Reinforcement Learning based
Dynamic Scheduling (RLbDS)
The algorithm takes the size of batch B and maxi-
mum intervals for scheduling N and performs optimal
scheduling of given tasks of every interval in edge-
cloud resources. The algorithm exploits the enhanced
RNN network (Fig. 3) to update the model from time to
time towards making DRL-based decisions for schedul-
ing. At each interval of scheduling, there is an iterative
process for taking care of pre-processing and feeding
the state to the DRL model. Based on the action sug-
gested by DRL, the constraint satisfaction module
species a penalty when there is an ideal scheduling
decision, that is notied to resource monitoring which
schedules new tasks and also performance migration
of active tasks based on the decisions rendered.
844
International Journal of Electrical and Computer Engineering Systems
(5)
Where the power function of host h is denoted by
ph(t) linked to time and its maximum possible power is
denoted as ph
max.
Average response time is another metric dened to
be used for interval SIi. ART for all tasks is normalized by
maximum response time. ART is computed as in Eq. 6.
(6)
The average migration time metric is dened for a
given SIi. It reects all tasks’ average migration time in
the interval normalized by maximum migration time.
AMT is computed as in Eq. 7.
(7)
Cost (C) is yet another metric dened for SIi. It indi-
cates the total incurred cost in the interval and is com-
puted as in Eq. 8.
(8)
Average SLA violation is another metric for SIi. It re-
ects SLA violation dynamics as expressed in Eq. 9.
(9)
To minimize the resultant value for all the aforemen-
tioned metrics, as used in [16] and [33], the Lossi metric
is dened as expressed in Eq. 10.
(10)
such that α, β, γ, δ, ∈ ≥ 0 ∧ α + β + γ + δ + ∈ = 1.
Dierent users can have varied QoS needs and hyper-
parameters (α, β, γ, δ, ∈) need to be set with dierent
values. As discussed in [33], [34] and [35] it is important
to optimize energy consumption in cloud infrastructure.
Therefore, it is essential to optimize loss. Even when
other metrics are compromised, it is possible to opti-
mize loss. In such a case, the loss can have α = 1 while
(11)
As specied in the works such as [37] and [38], the
penalty is to be included in neural network modes.
With the penalty, the model can update parameters
towards minimizing Lossi and ensure constrained sat-
isfaction. Therefore, for neural network loss function is
dened as in Eq. 11.
4. RESULTS AND DISCUSSION
This section presents our simulation environment,
the dataset used and the results of experiments.
4.1. SIMULATION SETUP
We built a simulation application using Java lan-
guage. The IDE used for development is the IntelliJ Idea
2022 version. CloudSim [39] and iFogSim [40] libraries
are used to have a simulation environment. Scheduling
intervals are considered equal to be compatible with
other existing works [4, 7, 41]. Cloudlets or tasks are
generated programmatically from the Bitbrain dataset
collected from [42].
The two simulation tools such as iFogSim and Cloud-
Sim are extended with required classes to facilitate the
usage of cost, response time and power parameters as-
sociated with edge nodes. New modules are created to
incorporate simulation of IoT devices with mobility with
delayed task execution, variations in bandwidth and
communication with deep learning model. Additional
classes are dened to have constraint satisfaction mod-
ules and also take care of input formats, output formats
and pre-processing. Based on the provision in CloudSim,
a loss function is implemented. The dataset collected
from [43] has traces of real workload run on Bitbrain
infrastructure. This dataset contains logs of workloads
of more than 1000 VMs associated with host machines.
The workload information contains time-stamp, RAM
usage, CPU usage, CPU cores requested, disk, network
and bandwidth details. This dataset is available at [44]
to reproduce our experiments. The dataset is divided
into 75% and 25% VM workloads for training and testing
respectively. Training deep learning model is done with
the former while the latter is used to test the network
and analyse results.
4.2. ANALYSIS OF RESULTS
We evaluated the performance of the proposed algo-
rithm named RLbDS by comparing it with state-of-the-art
methods such as Local Regression and Minimum Migra-
tion Time (LR-MMT) [41], Median Absolute Deviation and
Maximum Correlation Policy (MAD-MC) [41], DDQN [44]
and REINFORCE [9]. LL-MMT works for dynamic workloads
3.7. LOSS FUNCTION COMPUTATION
In the proposed learning model we want to opti-
mize, in each interval, with minimal Lossi. The model
is also designed to adapt to the state that dynamically
changes while mapping Statei to Actioni. Towards
this end, Lossi is a metric dened to update model
parameters. Besides dierent metrics that result in
normalized value 0 or 1 are dened. Average energy
consumption is a metric dened as the edge cloud re-
sources have dierent sources of energy as discussed
in [32]. The consumed energy by host h ∈ Hosts is mul-
tiplied by a factor αh∈ [0, 1] that is associated edge-
cloud deployment strategy. The normalized AEC is
computed as in Eq. 5.
the other metrics can have 0. As discussed in [36] trac
management and healthcare monitoring are sensitive to
response time. In such cases, loss can have β = 1 while
other measures can have 0. In the same fashion, setting
hyper-parameters is application-specic.
845
Volume 15, Number 10, 2024
considering minimum migration time and local regres-
sion. It has heuristics to have task selection and overhead
detection. MAD-MC is also a dynamic scheduler which
is based on maximum correlation and median absolute
deviation heuristics. DDQN is a deep learning-based ap-
proach that exploits RL to schedule tasks. DRL method is
also based on RL which is based on policy gradient. The
results reveal the sensitivity dynamics hyperparameters,
such as (α, β, γ, δ, ε), of the proposed RLbDS about model
learning and its impact on dierent performance metrics.
Model training is given with 10 days of simulations while
testing is carried out with 1-day simulation time.
4.2.1. Impact of Hyperparameters on RLbDS
The performance of the proposed algorithm named
RLbDS is analysed with loss function associated with
many hyperparameters such as (α, β, γ, δ, ε). Experi-
ments are made with value 1 set to each of the hyper-
parameters. The rationale behind this is that when the
value is set to 1, it could provide optimal performance.
Table 4. Performance of RLbDS with dierent hyper parameters
Model Parameters Total Energy
(Watts)
Time
(milliseconds)
Fraction of SLA
Violations Total Cost (USD) Time (seconds) Number of
completed tasks
α=1 1.37 8.5 0.17 6305.5 4.45 815
β=1 1.43 8.18 0.17 6306.5 4.3 830
γ=1 1.51 8.8 0.148 6307.5 3.65 845
δ=1 1.38 8.78 0.178 6304.5 4.15 810
ε=1 1.44 8.22 0.134 6307.8 3.75 850
As presented in Table 5, the performance of RLbDS is provided in terms of the number of performance metrics.
Table 5. Performance of RLbDS compared against existing algorithms
Models Total Energy
(Watts) Time (milliseconds) Fraction of SLA
Violations
Total Cost (US
Dollar) Time (seconds) Number of
completed tasks
LR-MMT 0.959 8.58 0.06 6325 4.5 700
MAD-MC 0.95 8.4 0.13 6325 4.3 800
DDQN 0.85 8.8 0.07 6325 4 850
REINFORCE 0.82 8.35 0.06 6300 3.8 850
RLbDS 0.73 7.7 0.04 6000 3.3 1000
Loss function with dierent hyperparameters has its
inuence on the performance of the RLbDS algorithm
as presented in Fig. 4. The network learning process
diers with changes in hyperparameters. Energy con-
sumption diered when the loss function used dier-
ent hyperparameters. With α=1 RLbDS consumed 1.37
watts, with β=1 it needed 1.43 watts, with γ=1 the algo-
rithm consumed 1.51 watts, with δ=1 it required 1.38
watts and with ε=1 RLbDS consumed 1.44 watts. The
least energy is consumed when α=1 (all energy con-
sumption values are given in 1*108 format). The aver-
age response time of the algorithm RLbDS is inuenced
by each hyperparameter. With α=1 RLbDS required 8.5
milliseconds, with β=1 it needed 8.18 milliseconds,
with γ=1 the algorithm needed 8.8 milliseconds, with
δ=1 it required 8.78 milliseconds and with ε=1 RLbDS
required 8.22 milliseconds. The least response time is
recorded when β=1.
SLA violations are also studied with these hyperpa-
rameters. It is observed that they inuence a fraction of
SLA violations. With α=1 the fraction of SLA violations
caused by RLbDS is 0.17, with β=1 also it is 0.17, with
γ=1 the algorithm showing 0.148, with δ=1 it is 0.178,
and with ε=1 RLbDS caused by 0.134. The last fraction
of SLA violations is recorded when ε=1. The total cost is
also analysed in terms of USD (as per the pricing calcu-
lator of Microsoft Azure [45]).
It was observed earlier that hyperparameters have
an impact on energy consumption. Since energy con-
sumption attracts the cost of execution in the cloud,
obviously these parameters have an impact on the cost
incurred. With α=1 the total cost exhibited by RLbDS is
6305.5, with β=1 it is 6306.5, γ=1 the algorithm showed
6307.5, with δ=1 it is 6304.5, and with ε=1 RLbDS
caused 6307.8. The least cost is recorded when δ=1.
Average task completion time is also analysed with
dierent hyperparameters. With α=1 the average task
completion time exhibited by RLbDS is 4.45 seconds,
with β=1 it is 4.3, with γ=1 the algorithm showed 3.65,
with δ=1 it is 4.15, and with ε=1 RLbDS caused 3.75. The
least average task completion time is recorded when
γ=1 (all average task completion values are given in
1*106 format). The total number of tasks completed
with scheduling done by RLbDS is also inuenced by
hyperparameters. With α=1 the number of completed
tasks achieved by RLbDS is 815, β=1 it is 830, γ=1 the
algorithm showed 845, with δ=1 it is 810, and with ε=1
RLbDS showed 850 tasks to be completed. The least
number of completed tasks is recorded when δ=1.
Fig. 4. Performance dynamics of proposed RLbDS algorithm with dierent model parameters associated
with loss function
4.2.2. Performance Comparison
with State of the Art
Our algorithm RLbDS is compared against several
existing algorithms as presented in Fig. 5. Total en-
ergy consumption values are provided in 1*108 watts
format. LR-MMT algorithm consumed 0.959, MAD-MC
0.95, DDQN 0.85, REINFORCE 0.82 and the proposed
RLbDS consumed 0.73. The energy consumption of
RLbDS is found to be the least among the scheduling
algorithms. Average response time is another met-
ric used for comparison. LR-MMT algorithm exhibited
an average response time of 8.58 milliseconds, MAD-
MC 8.4, DDQN 8.8, REINFORCE 8.35 and the proposed
RLbDS required 7.7 milliseconds. The average response
time of RLbDS is found to be the least among the sched-
uling algorithms. SLA violations are another important
metric used for comparison. LR-MMT algorithm exhib-
ited a fraction of SLA violations as 0.06, MAD-MC 0.13,
DDQN 0.07, REINFORCE 0.06 and the proposed RLbDS
exhibited 0.04. The fraction of SLA violations of RLbDS
is found least among the scheduling algorithms.
Algorithm compared with the state-of-the-art
Total cost in terms of USD is another metric used
for comparison. This metric is inuenced by energy
consumption. LR-MMT algorithm needs 6325 USD,
MAD-MC 6325, DDQN 6325, REINFORCE 6300 and the
proposed RLbDS needed 6000 USD. The total cost of
RLbDS is found least among the scheduling algorithms.
Concerning average task completion time, the LR-MMT
algorithm needs 4.5 seconds, MAD-MC 4.3, DDQN 4,
REINFORCE 3.8 and the proposed RLbDS requires 3.3
seconds. The average task completion time of RLbDS
is found to be the least among the scheduling algo-
rithms (average task completion time is given in 1*106
seconds format). The number of completed tasks is an-
other observation made in our empirical study.
846
International Journal of Electrical and Computer Engineering Systems
Fig. 5. Performance of proposed RLbDS algorithm compared with the state of the art
LR-MMT completed 700 tasks, MAD-MC 800, DDQN
850, REINFORCE 850 and the proposed completed 1000
tasks. The average task completion time of RLbDS is
found to be the least among the scheduling algorithms.
4.2.3. Performance with Number of
Recurrent Layers
Considering optimal values for hyperparameters sched-
uling overhead and loss dynamics against the number of
recurrent layers are analysed. Overhead is computed as
the ratio between the total duration of execution and the
time taken for scheduling. Empirical study has revealed
that the number of recurrent layers in the proposed archi-
tecture (Fig. 3) inuences the loss and overhead.
As presented in Table 6, loss value and scheduling over-
head against several recurrent layers are observed. Loss
value and scheduling overhead are analysed against sev-
eral recurrent layers as presented in Fig. 6.
Table 6. Performance against the number of
recurrent layers
Number of
recurrent layers
Performance
Loss value Scheduling overhead (%)
0 3.69 0.009
1 3.4 0.010
2 2.9 0.010
3 2.6 0.010
4 2.5 0.019
5 2.4 0.029
847
Volume 15, Number 10, 2024
Fig. 6. Performance analysis with the number of
recurrent layers
Several layers inuence the loss value. Loss value de-
creases (performance increases) as the number of lay-
ers is increased. However, the scheduling overhead is
increased with the number of recurrent layers.
4.2.4. Scalability Analysis
The scalability of the proposed algorithm is anal-
ysed in terms of speedup and eciency. The analysis
is made against the number of hosts. As presented in
Table 7, the performance of the proposed algorithm in
terms of its scalability is provided.
Table 7. Scalability analysis
Number of
recurrent layers
Performance
Speed-up Eciency
1 1 1
5 5 0.8
10 9 0.785
15 13 0.775
20 17 0.765
25 19 0.725
30 21 0.7
35 23 0.650
40 25 0.630
45 26 0.570
50 27 0.525
Fig.7. Scalability analysis in terms of speedup and
eciency
There is a trade-o observed between scalability and
eciency as presented in Figure 7. When the number
of hosts is increased, there is a gradual decrease in ef-
ciency while there is a gradual increase in speedup.
From the experimental results, it is observed that the
proposed RLbDS is found to be dynamic and can adapt
to runtime situations as it is a learning-based approach.
Its asynchronous approach helps it in faster conver-
gence. In the presence of dynamic workloads and device
characteristics, RLbDS adapts to changes with ease.
5. CONCLUSION AND FUTURE WORK
We proposed a learning-based framework known as
the Deep Reinforcement Learning Framework (DRLF).
848
International Journal of Electrical and Computer Engineering Systems
This is designed in such a way that it exploits Deep
Reinforcement Learning (DRL) with underlying mecha-
nisms and enhanced deep network architecture based
on Recurrent Neural Network (RNN). We also proposed
an algorithm named Reinforcement Learning Dynamic
Scheduling (RLbDS) which exploits dierent hyperpa-
rameters and DRL-based decision-making for ecient
scheduling. Real-time traces of edge-cloud infrastructure
are used for empirical study. We implemented our frame-
work by dening new classes for CloudSim and iFogSim
simulation frameworks. We evaluated the performance of
the proposed algorithm named RLbDS by comparing it
with state-of-the-art methods such as LR-MMT, MAD-MC,
DDQN and REINFORCE. The results reveal the sensitivity
dynamics hyperparameters, such as (α, β, γ, δ, ε), of the
proposed RLbDS about model learning and its impact on
dierent performance metrics. Our empirical study has re-
vealed that RLbDS outperforms many existing scheduling
methods. In future, we intend to improve our framework
for container scheduling and load balancing.
6. REFERENCES
[1] A. Beloglazov, R. Buyya, “Optimal online determin-
istic algorithms and adaptive heuristics for energy
and performance ecient dynamic consolidation
of virtual machines in cloud data centres”, Concur-
rency and Computation: Practice and Experience,
Vol. 24, No. 13, 2012, pp. 1397–1420.
[2] M. Cheng, J. Li, S. Nazarian, “DRL-cloud: Deep rein-
forcement learning-based resource provisioning
and task scheduling for cloud service providers”,
Proceedings of the 23rd Asia and South Pacic De-
sign Automation Conference, Jeju, Korea, 22-25
January 2018, pp. 129-134.
[3] X.-Q. Pham, E.-N. Huh, “Towards task scheduling in
a cloud-fog computing system”, Proceedings of the
18th Asia-Pacic Network Operations and Manage-
ment Symposium, Kanazawa, Japan, 5-7 October
2016, pp. 1–4.
[4] D.-M. Bui, Y. Yoon, E.-N. Huh, S. Jun, S. Lee, "Energy
eciency for a cloud computing system based on
predictive optimization", Journal of Parallel and Dis-
tributed Computing, Vol. 102, 2017, pp. 103-114.
[5] L. Huang, S. Bi, Y. J. Zhang, “Deep reinforcement
learning for online computation ooading in wire-
less powered mobile-edge computing networks”,
IEEE Transactions on Mobile Computing, Vol. 19, No.
11, 2020, pp. 2581-2593.
[6] H. Mao, M. Alizadeh, I. Menache, S. Kandula, “Re-
source management with deep reinforcement
learning”, Proceedings of the 15th ACM Workshop
on Hot Topics in Networks, Atlanta, GA, USA, 9-10
November 2016, pp. 50-56.
[7] D. Basu, X. Wang, Y. Hong, H. Chen, S. Bressan, “Learn-
as-you-go with Megh: Ecient live migration of vir-
tual machines”, IEEE Transactions on Parallel and Dis-
tributed Systems, Vol. 30, No. 8, 2019, pp. 1786-1801.
[8] M. Xu, S. Alamro, T. Lan, S. Subramaniam, "Laser: A
deep learning approach for speculative execution
and replication of deadline-critical jobs in the cloud”,
Proceedings of the 26th International Conference on
Computer Communication and Networks, Vancou-
ver, BC, Canada, 31 July - 3 August 2017, pp. 1-8.
[9] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, S. U. Khan, P.
Li, “A double deep Q-learning model for energy-
ecient edge scheduling”, IEEE Transactions on Ser-
vices Computing, Vol. 12, No. 5, 2019, pp. 739-749.
[10] F. Li, B. Hu, “Deepjs: Job scheduling based on deep
reinforcement learning in the cloud data centre”,
Proceedings of the 4th International Conference on
Big Data and Computing, Guangzhou, China, 10-12
May 2019, pp. 48-53.
[11] G. Rjoub, J. Bentahar, O. A. Wahab, A. S. Bataineh,
“Deep and reinforcement learning for automated
task scheduling in large-scale cloud computing sys-
tems”, Concurrency and Computation: Practice and
Experience, Vol. 33, No. 23, 2020, pp.1-14.
[12] O. Skarlat, M. Nardelli, S. Schulte, M. Borkowski, P.
Leitner, “Optimized IoT service placement in the
fog”, Service Oriented Computing and Applications,
Vol. 11, No. 4, 2017, pp. 427-443.
[13] X.-Q. Pham, N. D. Man, N. D. T. Tri, N. Q. Thai, E.-N.
Huh, “A cost-and performance-eective approach
for task scheduling based on collaboration between
cloud and fog computing”, International Journal of
Distributed Sensor Networks, Vol. 13, No. 11, 2017,
pp. 1-16.
[14] A. Brogi, S. Forti, “QoS-aware deployment of IoT ap-
plications through the fog”, IEEE Internet of Things
Journal, Vol. 4, No. 5, 2017, pp. 1185-1192.
[15] T. Choudhari, M. Moh, T.-S. Moh, “Prioritized task
scheduling in fog computing”, Proceedings of the
ACMSE Conference, New York, NY, USA, March 2018,
pp. 22:1-22:8.
[16] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, P. Li, “Energy-
ecient scheduling for real-time systems based on
deep learning model”, IEEE Transactions on Sustain-
able Computing, Vol. 4, No. 1, 2017, pp. 132-141.
[17] Z. Xiong, Y. Zhang, D. Niyato, R. Deng, P. Wang, L.-C.
Wang, "Deep reinforcement learning for mobile 5g
and beyond Fundamentals, applications, and chal-
lenges”, IEEE Vehicular Technology Magazine, Vol.
14, No. 2, 2019, pp. 44-52.
[18] J. Almutairi, M. Aldossary, “A novel approach for IoT
tasks ooading in edge-cloud environments. Jour-
nal of Cloud Computing”, Journal of Cloud Comput-
ing, Vol. 10, 2021, p. 28.
[19] S. Ding, L. Yang, J. Cao, W. Cai, M. Tan, Z. Wang, “Parti-
tioning Stateful Data Stream Applications in Dynamic
Edge Cloud Environments”, IEEE Transactions on Ser-
vices Computing, Vol. 15, No. 4, 2021, pp. 2368-2381.
[20] S. S. Murad, R. Badeel, N. S. A. Alsandi, Ra, “Opti-
mized Min-Min Task Scheduling Algorithm For Sci-
entic Workows In A Cloud Environment”, Journal
of Theoretical and Applied Information Technology,
Vol. 100, No. 2, 2022, pp. 480-506.
[21] L. Bulej et al. “Managing latency in edge cloud envi-
ronment”, Journal of Systems and Software, Vol. 172,
2021, pp. 1-15.
[22] J. Almutairi, M. Aldossary, “Investigating and Model-
ling of Task Ooading Latency in Edge-Cloud Envi-
ronment. Computers”, Materials & Continua, Vol. 68,
No. 3, 2021, pp. 1-18.
[23] R. Zhang, W. Shi, “Research on Workow Task Sched-
uling Strategy in Edge Computer Environment”, Jour-
nal of Physics: Conference Series, Vol. 1744, 2021, pp.
1-6.
[24] X. Zhao, G. Huang, L. Gao, M. Li, Q. Gao, “Low load
DIDS task scheduling based on Q-learning in an
edge computing environment”, Journal of Network
and Computer Applications, Vol. 188, 2021, pp. 1-12.
[25] Y. Zhang, B. Tang, J. Luo, J. Zhang, “Deadline-Aware
Dynamic Task Scheduling in Edge-Cloud Collabora-
tive Computing”, Electronics, Vol. 11, 2022, pp. 1-24.
[26] A. Lakhan et al. “Delay Optimal Schemes for Internet
of Things Applications in Heterogeneous Edge Cloud
Computing Networks”, Sensors, Vol. 22, pp. 1-30.
[27] M. Singh, S. Bhushan, “CS Optimized Task Sched-
uling for Cloud Data Management”, International
Journal of Engineering Trends and Technology, Vol.
70, No. 6, 2022, pp. 114-121.
849
Volume 15, Number 10, 2024
[28] Y. Hao, M. Chen, H. Gharavi, Y. Zhang, K. Hwang,
“Deep Reinforcement Learning for Edge Service
Placement in Softwarized Industrial Cyber-Physical
System”, IEEE Transactions on Industrial Informatics,
Vol. 17, No. 8, 2021, pp. 5552-5561.
[29] S. Tuli et al. “Dynamic Scheduling for Stochastic
Edge-Cloud Computing Environments using A3C
learning and Residual Recurrent Neural Networks”,
IEEE Transactions on Mobile Computing, Vol. 21, No.
3, 2022, pp. 1-15.
[30] Q. Zhang, L. Gui, S. Zhu, X. Lang, “Task Ooading
and Resource Scheduling in Hybrid Edge-Cloud
Networks”, IEEE Access, Vol. 9, 2021, pp. 940-954.
[31] L. Roselli, C. Mariotti, P. Mezzanotte, F. Alimenti, G.
Orecchini, M. Virili, N. Carvalho, "Review of the pres-
ent technologies concurrently contributing to the
implementation of the Internet of things (IoT) para-
digm: RFID, green electronics, WPT and energy har-
vesting”, Proceedings of the Topical Conference on
Wireless Sensors and Sensor Networks, San Diego,
CA, USA, 25-28 January 2015, pp. 1-3.
[32] S. Tuli, N. Basumatary, S. S. Gill, M. Kahani, R. C. Arya,
G. S. Wander, R. Buyya, “Healthfog: An ensemble
deep learning based smart healthcare system for
automatic diagnosis of heart diseases in integrated
IoT and fog computing environments”, Future Gen-
eration Computer Systems, Vol. 104, 2020, pp. 187-
200.
[33] S. Sarkar, S. Misra, “Theoretical modelling of fog
computing: a green computing paradigm to sup-
port IoT applications”, IET Networks, Vol. 5, No. 2,
2016, pp. 23-29.
[34] Z. Abbas, W. Yoon, “A survey on energy conserving
mechanisms for the Internet of things: Wireless net-
working aspects”, Sensors, Vol. 15, No. 10, 2015, pp.
24818-24847.
[35] P. Kamalinejad, C. Mahapatra, Z. Sheng, S. Mirabbasi,
V. C. Leung, Y. L. Guan, "Wireless energy harvesting
for the Internet of things”, IEEE Communications
Magazine, Vol. 53, No. 6, 2015, pp. 102-108.
[36] A. M. Rahmani, T. N. Gia, B. Negash, A. Anzanpour,
I. Azimi, M. Jiang, P. Liljeberg, “Exploiting smart e-
Health gateways at the edge of healthcare Internet-
of-Things: A fog computing approach”, Future Gen-
eration Computer Systems, Vol. 78, 2018, pp. 641-
658.
[37] J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Con-
strained policy optimization”, Proceedings of the
34th International Conference on Machine Learning,
Sydney, NSW, Australia, 6-11 August 2017, pp. 22-31.
[38] R. Doshi, K.-W. Hung, L. Liang, K.-H. Chiu, “Deep
learning neural networks optimization using hard-
ware cost penalty”, Proceedings of the IEEE Interna-
tional Symposium on Circuits and Systems, Montre-
al, QC, Canada, 22-25 May 2016, pp. 1954-1957
[39] R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. De
Rose, R. Buyya, “Cloudsim: a toolkit for modelling
and simulation of cloud computing environments
and evaluation of resource provisioning algorithms”,
Software: Practice and Experience, Vol. 41, No. 1,
2011, pp. 23-50.
[40] H. Gupta, A. Vahid Dastjerdi, S. K. Ghosh, R. Buyya,
“ifogsim: A toolkit for modelling and simulation of
resource management techniques in the internet
of things, edge and fog computing environments”,
Software: Practice and Experience, Vol. 47, No. 9,
2017, pp. 1275-1296.
[41] Bhagyalakshmi Magotra. (2023). “Adaptive Com-
putational Solutions to Energy Eciency in Cloud
Computing Environment Using VM Consolidation”.
Archives of Computational Methods in Engineering.
(2022), pp.1790-1818
[42] S. Shen, V. van Beek, A. Iosup, “Statistical charac-
terization of business-critical workloads hosted in
cloud datacenters”, Proceedings of the 15th IEEE/
ACM International Symposium on Cluster, Cloud
and Grid Computing, Shenzhen, China, 4-7 May
2015, pp. 465-474.
[43] Bitbrain Dataset, http://gwa.ewi.tudelft.nl/datasets/
gwa-t-12-bitbrains (accessed: 2024)
[44] Q. Zhang, M. Lin, L. T. Yang, Z. Chen, P. Li, "Energy e-
cient scheduling for real-time systems based on the
deep q-learning model”, IEEE Transactions on Sus-
tainable Computing, Vol. 4, No. 1, 2017, pp. 132-141.
[45] Microsoft Azure Pricing Calculator, https://azure.
microsoft.com/en-au/pricing/calculator/ (accessed:
2024)
850
International Journal of Electrical and Computer Engineering Systems