Content uploaded by Sina Ebrahimi

Author content

All content in this area was uploaded by Sina Ebrahimi on Jun 28, 2021

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

1

Energy-Efﬁcient Task Ofﬂoading Under E2E

Latency Constraints

Mohsen Tajallifar, Sina Ebrahimi, Mohammad Reza Javan, Nader Mokari,

and Luca Chiaraviglio,

Abstract

In this paper, we propose a novel resource management scheme that jointly allocates the transmit

power and computational resources in a centralized radio access network architecture. The network

comprises a set of computing nodes to which the requested tasks of different users are ofﬂoaded. The

optimization problem minimizes the energy consumption of task ofﬂoading while takes the end-to-end-

latency, i.e., the transmission, execution, and propagation latencies of each task, into account. We aim to

allocate the transmit power and computational resources such that the maximum acceptable latency of

each task is satisﬁed. Since the optimization problem is non-convex, we divide it into two sub-problems,

one for transmit power allocation and another for task placement and computational resource allocation.

Transmit power is allocated via the convex-concave procedure. In addition, a heuristic algorithm is

proposed to jointly manage computational resources and task placement. We also propose a feasibility

analysis that ﬁnds a feasible subset of tasks. Furthermore, a disjoint method that separately allocates

the transmit power and the computational resources is proposed as the baseline of comparison. A lower

bound on the optimal solution of the optimization problem is also derived based on exhaustive search

over task placement decisions and utilizing Karush–Kuhn–Tucker conditions. Simulation results show

that the joint method outperforms the disjoint method in terms of acceptance ratio. Simulations also

show that the optimality gap of the joint method is less than 5%.

Index Terms

Mobile edge computing, task ofﬂoading, resource allocation, end-to-end latency, task placement.

I. INTRODUCTION

A. Background

In order to fulﬁll the requirements of 5G mobile networks, key enabling technologies such

as network function virtualization (NFV) and multi-access/mobile edge computing (MEC) are

M. Tajallifar, S. Ebrahimi, and N. Mokari are with the Department of Electrical and Computer Engineering, Tarbiat Modares

University, Tehran, 14115-111 Iran e-mail: nader.mokari@modares.ac.ir. M. Javan is with Shahrood University of Technology.

L. Chiaraviglio is with University of Rome Tor Vergata.

2

Transport

Network

𝑇tx

𝑇prop

𝑇exe

UE Task Base Station Execution Server

Uplink

Downlink

(a) A typical task ofﬂoading example.

RRH 1 RRH 2 RRH U

𝑛

BBU Pool

UE 1 UE 2 UE K

𝑚

𝑚′

(𝑚,𝑚′)

NFV-enabled

node

𝑛

Fronthaul

(b) System model.

Fig. 1: A typical task ofﬂoading example and system model.

introduced. With NFV, the network functions (NFs) that traditionally used dedicated hardware

are implemented in applications running on top of commodity servers [1]. On the other hand,

MEC aims to support low-latency mobile services by bringing the remote servers closer to the

mobile users [2], [3]. Moreover, MEC enables the ofﬂoading of the computational burden of

users’ tasks to reduce the impact of the limited battery power of user equipment (UE). Note that

when executing servers are NFV-enabled, they are able to process various types of tasks. As a

result, there is no restriction on ofﬂoading a task to a predetermined server.

A typical task ofﬂoading example is shown in Fig. 1(a). In task ofﬂoading, the non-processed

data of a task is sent from UE to an executing server that ofﬂoads the computational burden of the

task execution on the executing server. As Fig. 1(a) shows, the user transmits the non-processed

data of the task over the wireless link to its serving base station, which results in transmit latency

Ttx. Then, the received data is transmitted to an executing server. Executing servers are placed

at the base station and distant nodes in the transport network. The data transmission through

the transport network adds the propagation latency Tprop to the ofﬂoading process. Finally, the

received data is processed at the executing server with execution latency Texe and then is sent

back to the user over the downlink. Therefore, the end-to-end (E2E) latency of task ofﬂoading

is equal to the summation of Ttx,Tprop, and Texe in both uplink and downlink.

B. Related Works

We classify the related works on task ofﬂoading into four categories and discuss their appli-

cability in practical scenarios.

3

1) Task ofﬂoading to multiple executing servers: In task ofﬂoading, a UE decides to whether

ofﬂoad a task to a single executing server or to select an executing server out of multiple servers.

Ofﬂoading a task to one server in a set of executing servers in a multi-tier heterogeneous network

is considered in [4], [5]. Moreover, the authors in [6]–[8] propose to ofﬂoad a user’s task to one of

executing servers at base stations in a multi-cell network. Note that in the aforementioned works,

the executing servers are located at the edge of the radio access network and the computational

resources in the non-radio part of the network are not considered. In contrast, it is possible to

ofﬂoad a task to any server in the network in [9], i.e., servers in radio access and non-radio parts of

the network. However, radio resources are not allocated in [9]. Note that, ignoring computational

resources in the non-radio part of the network or ignoring radio resource allocation results in an

inefﬁcient task ofﬂoading.

2) Task placement and computational resource allocation: Task ofﬂoading is comprised of two

steps: i) task placement to select an executing server, and ii) computational resource allocation

that allocates the resources of the executing server to each task. In this context, various works

only focus on task placement with given computational resources for each task [4]–[6], [9]–[12],

while others include resource allocation as well [7], [13]–[21]. Note that the servers in non-radio

part of the network are not involved in these works. As a result, computationally intensive tasks

with moderate sensitivity to latency may occupy the capacity of executing servers in radio part

of the network while high capacity servers in non-radio part of the network are underutilized.

3) Joint Radio and Computational Resource Allocation: Extensive research is made on joint

radio and computational resource allocation [7], [11], [13]–[27]. In these works, radio resources

including transmit power and/or bandwidth as well as computational resources are allocated to

each task. Energy-efﬁcient resource allocation is performed in [11], [13], [17], [19], [21], [24],

[26], [27], and a weighted combination of consumed energy and latency is optimized in [7],

[14]–[16], [20], [22], [25]. Moreover, the impact of radio link quality without radio resource

allocation is taken into account by [5], [6], [8], [10], [28], [29]. In these works, the latency

of data transmission over radio links is taken into account, which impacts the optimal task

placement. Note that although joint optimization of radio and computational resources increases

the degrees of freedom in task ofﬂoading, the available computational resources in the radio

access network are very limited, which limit the acceptance ratio of the network.

4) Feasibility Analysis: When task ofﬂoading is subjected to a maximum acceptable latency,

sufﬁcient resources are required in various parts of the network. In case of insufﬁcient resources,

4

a feasibility analysis is needed to determine a feasible subset of requested tasks. One approach

to face infeasibility is making some simplifying assumptions, e.g., assuming sufﬁcient available

resources for task ofﬂoading [9], [22] or ofﬂoading a task when it is beneﬁcial, i.e., when

ofﬂoading results in less energy consumption or latency [10]. In practice, however, the resources

are limited and tasks are subjected to execution deadlines. As a result, a feasibility analysis is

inevitable. The feasibility analysis is performed by introducing a binary optimization variable,

which is one when the task is accepted or zero when the task is rejected [4], [7], [12], [13], [15],

[19]–[21], [27]. Note that ﬁnding optimal binary variables results in combinatorial optimization

problems that are challenging and of high complexity.

C. Motivation

The performance of a task ofﬂoading method is mainly measured by its latency and energy

consumption. In practice, E2E latency comes from radio links, transport network links, and

execution at the servers; and the energy consumption is impacted by consumed transmit power

and computational resources.

Optimizing the performance of task ofﬂoading necessitates a joint optimization of all available

resources in the network. However, existing works optimize a subset of resources and focus only

on one part of the whole network. Moreover, the impact of E2E latency is not considered in the

literature. As a result, existing methods may not perform well in practice.

In this paper, we propose a task ofﬂoading method that optimizes the energy consumption in

terms of transmit power and computational resources under E2E latency constraints. Throughout

the paper, the task ofﬂoading is referred to the process of transmit power allocation over radio

links, task placement, i.e., selecting an executing server and its path, and computational resource

allocation. The proposed method jointly allocates required transmit power to tasks, places each

task in a proper NFV-enabled node, and allocates sufﬁcient computational resources to the tasks.

With this joint method, high latency of radio links caused by weak radio channels is compensated

by a proper task placement and computational resource allocation. In contrast, high execution

latency caused by limited computational resources is compensated by consuming more transmit

power in radio links. As a result, more tasks are served, compared to a disjoint method wherein

transmit power allocation is independent of task placement and computational resource allocation.

NFV enables a general-purpose server to execute various tasks without needing a specialized

server for each task. Therefore, various tasks are dynamically ofﬂoaded to general-purpose

5

executing servers in a network of NFV-enabled nodes instead of ofﬂoading each task to a

respective specialized server. As a result, a task placement method is needed to determine an

executing server and its route for each task. In spite of conventional routing methods that choose

a route to a predetermined server, our task placement method jointly determines an executing

server, the associated route to the executing server, and the required computational resources in

the executing server.

We assume a deadline for ofﬂoading each task, i.e., sending the task from UE to the executing

server and sending it back to UE performed under a maximum acceptable latency constraint. As

a result, the sum of latencies in radio link, transport network links, and execution at the executing

server is less than the maximum acceptable latency. The feasibility of this E2E ofﬂoading method

depends on the available resources and location of executing servers in the network. For example,

when the available transmit power is low, the radio link latency is large, which may violate E2E

latency. In contrast, when the available computational resources at the executing server are low,

the execution latency is large, which may also violate the E2E latency constraint. Therefore, our

task ofﬂoading method includes a feasibility analysis that ﬁnds a set of feasible tasks.

The infeasibility of task ofﬂoading depends on the value of maximum acceptable latencies,

i.e., lower values of maximum acceptable latencies result in a larger number of infeasible tasks

and higher values result in a smaller number of infeasible tasks. Inspired by this fact and in

contrast to the existing works, we add a non-negative variable to each maximum acceptable

latency. Non-negative variables are zero for feasible tasks and are positive for infeasible tasks.

Therefore, the set of feasible tasks is obtained by solving an optimization problem that minimizes

the sum of non-negative variables, i.e., maximizes the number of feasible tasks.

Joint task ofﬂoading results in a non-convex problem due to coupling optimization variables.

Moreover, the task placement is performed by obtaining binary variables, which makes the

optimization problem further complicated. To deal with the optimization problem, we decouple

transmit power allocation from task placement and computational resource allocation. Transmit

power allocation is performed via the well-known convex-concave procedure (CCP) and a

heuristic algorithm is proposed for task placement and computational resource allocation. CCP

and the heuristic algorithm are alternatively applied until convergence. Note that both CCP and

the heuristic algorithm preserve the monotonicity of convergence.

We also develop two baseline methods to evaluate the efﬁciency of our joint task ofﬂoading

method. The ﬁrst is a disjoint method in which transmit power allocation is performed inde-

6

pendent of task placement. In doing so, the maximum acceptable E2E latency of each task is

divided into a radio latency constraint and a non-radio latency constraint. We allocate transmit

power under the radio latency constraint. Then, the task placement and computational resource

allocation are performed under the non-radio latency constraint.

The second baseline method achieves a lower bound on the optimal solution of the joint task

ofﬂoading optimization problem. The lower bound is achieved by relaxing some constraints in

the optimization problem, which comes from leveraging practical assumptions such as orthog-

onality of wireless channels in large-scale antenna array systems. The optimal solution is then

found by an exhaustive search over all feasible task placement candidates, ﬁnding the optimal

computational resource allocation for each placement candidate, and choosing the placement

candidate that results in the lowest objective value.

D. Contributions

In this paper, we develop an energy-efﬁcient task ofﬂoading method that ofﬂoads the compu-

tational burden of a task from a UE to one of executing servers in a network of NFV-enabled

nodes. In doing so, a task is ofﬂoaded by sending non-processed data of the task from the UE

to a radio remote head (RRH) over a radio link, sending the data from the RRH toward the

executing server through a transport network, and sending the processed data back from the

executing server to UE. We assume that each task is ofﬂoaded under a respective deadline, i.e.,

the E2E latency of task ofﬂoading is less than the maximum acceptable latency of the task.

The main contributions and achievements of this paper are as follows:

•We develop a joint task ofﬂoading method in a practical scenario, i.e., the proposed method

allocates the transmit power, ﬁnds an executing server and the route to it, and allocates

the computational resources in an energy-efﬁcient manner. Moreover, the proposed method

takes the E2E latency of task ofﬂoading into account. By the proposed method, the impact

of weak radio links is compensated by placing the tasks in servers closer to UEs and

consuming more computational resources. In contrast, limited computational resources are

compensated by allocating more transmit power, resulting in an efﬁcient and adaptive task

ofﬂoading method.

•We propose a novel method for task placement and computational resource allocation.

While the conventional routing methods ﬁnd a route to a predetermined node, our proposed

7

method jointly ﬁnds the executing server, its associated route, and the required computational

resources in an energy-efﬁcient manner.

•We ﬁnd a lower bound on the objective function of the optimization problem in the feasibility

analysis, i.e., an upper bound on the acceptance ratio of the proposed method. The lower

bound is obtained by relaxing some of constraints in the optimization problem, performing

an exhaustive search over all feasible task placement candidates, and ﬁnding the optimal

computational resource allocation by utilizing Karush-Kuhn-Tucker conditions.

•Simulation results show that the proposed joint method outperforms its disjoint counterpart

in terms of acceptance ratio. Moreover, the lower bound on the optimal solution is almost

tight because the joint method attains the lower bound in practical scenarios. Speciﬁcally,

the optimality gap of the joint method is less than 5%.

E. Organization

The rest of the paper is organized as follows. Section II introduces the system model. Section

III describes the optimization problem formulation. In Section IV, we propose joint task ofﬂoad-

ing while disjoint task ofﬂoading and lower bound on optimal task ofﬂoading are proposed in

Sections V and VI, respectively. Simulation results are presented in Section VII and the paper

is concluded in Section VIII.

F. Notation

The notation used in this paper are given as follows. The vectors are denoted by bold lowercase

symbols. Operators k·kand | · | are vector norm and absolute value of a scalar, respectively.

(a)Tis transpose of aand [a]+= max(a, 0).A\{a}discards the element afrom the set A.

Finally, a∼ CN(0,Σ)is a complex Gaussian vector with zero mean and covariance matrix Σ.

II. SY ST EM MO DE L

The structure of the radio access network, channel model, and signaling scheme as well as

NFV-enabled network, computational resources, and capacity of network links are described in

this section.

A. Radio Access Network (RAN)

We consider a centralized RAN architecture with a baseband unit (BBU) pool, which serves

a set of URRHs, each equipped with Mantennas. The set of all users is denoted by K. Each

8

user is equipped with a single antenna and the total number of users is K=|K|. The considered

model is shown in Fig. 1(b). It is assumed that each RRH is connected to the BBU pool through

a fronthaul link.

We assume that each user requests a single task. Task kis represented by a triplet < Lk, Dk, Tk>,

where Lkis the load of task k(i.e., the required CPU cycles), Dkis the data size of task k(in

terms of bits), and Tkis the maximum acceptable latency of task k.

Each UE transmits the non-processed data of its task to its serving RRH through a wireless

link. We assume that each UE is served by a single RRH. The set of users served by RRH u

is Ku={k∈ K|Jk

u= 1}where Jk

uis an indicator which equals 1 if UE kis connected to

RRH u(0 otherwise). In this paper, we assume that the UE-RRH assignment is given and ﬁxed.

Focusing on the wireless link, we assume a narrow-band block fading channel model [21]. The

channel vector between UE kand RRH uis denoted by hu,k, where hu,k =pQu,k ˜

hu,k in which

Qu,k represents the path loss between RRH uand UE kand small-scale fading is modeled as

˜

hu,k ∼ CN(0,IM). Similar to [16], [17], we assume that the channel state information (CSI)

is constant over the ofﬂoading time. As we show through simulations, this assumption is non-

restrictive in practical scenarios in sub-6 GHz bands. UE ktransmits a symbol xk∼ CN(0,1)

with transmit power ρktoward its serving RRH. The transmit power of UE kis constrained to

a maximum value, i.e., ρk≤Pmax

k∀k. The received signal vector at RRH uis:

yu=X

k∈K

hu,k√ρkxk,∀u. (1)

We assume the maximum ratio combining (MRC) at RRHs because of its simplicity. Neverthe-

less, MRC is asymptotically optimal in massive MIMO systems [30]. Therefore, the combined

signal is:

zu=FH

uyu,∀u, (2)

where Fu= [fk],∀k∈ Kuand fk=hu,k

khu,kk. The estimated signal of UE kis:

zk=fH

khu,k√ρkxk+X

j∈K\{k}

fH

khu,j √ρkxk+fH

knu,∀k∈ Ku,

where nu∼ CN(0, σ2

nIM)is the received noise vector at RRH u. Thus, the signal to interference

plus noise ratio (SINR) of UE kis:

SINRk=khu,kk2ρk

Pj∈K\{k}

|hH

u,khu,j |2

khu,kk2ρj+σ2

n

,∀k∈ Ku.(3)

9

Hence, the achievable data rate by UE kis Rk=Wlog2(1 + SINRk)1bits per second (bps),

where Wis the radio access network bandwidth. The radio transmission latency of task kin the

uplink is Ttx

k=Dk

Rk

2. The sum of data rates of UEs served by RRH uis less than the capacity

of its fronthaul link, i.e., Pk∈KuRk≤Bf,u,∀u. In this paper, similar to [10], [29], and [11],

we assume that the processed data size of task kis small. Moreover, since the power budget of

RRHs is generally large, the radio transmission latency in the downlink is assumed negligible.

B. NFV-enabled Network

The NFV-enabled network includes a graph G= (N,E), where Nand Eare the set of nodes

and edges (or links), respectively. A typical node in Nis denoted by nwhile the BBU pool is

indicated by ¯n(which is also a node in N). The link between two nodes mand m0is denoted

by (m, m0). Each NFV-enabled node is comprised of an executing server and a routing device.

The processing capacity (i.e., the maximum CPU cycles per second that are carried out) of the

executing server in NFV-enabled node nis indicated by Υn. Moreover, the capacity of link

(m, m0)is indicated by B(m,m0)in terms of bps.

In this paper, we assume the full ofﬂoading scheme, i.e., the task of each user is completely

executed in an executing server in the NFV-enabled network. Therefore, there is a need for

placing each task to a proper executing server. A task placement decision consists of selecting

an NFV-enabled node nand its associated path from ¯nto n. We denote the bth path between

nodes ¯nand nas pb

nwhere b∈ Bn={1···Bn}and Bnis the total number of paths between

nodes ¯nand n. Note that a path between ¯nand nmay comprise some intermediate nodes,

which only forward the tasks’ data via their routing devices and do not deliver the data to their

executing servers. We deﬁne decision variable ξk

pb

n, which equals 1when task kis ofﬂoaded to

node nand sent over path pb

n(0otherwise). Each task is ofﬂoaded to one and only one node

and path when we have:

X

n∈N X

b∈Bn

ξk

pb

n= 1,∀k. (4)

Indicator I(m,m0)

pb

ndetermines whether a link contributes to a path. The indicator is equal to 1when

link (m, m0)contributes to path pb

n(0 otherwise). Moreover, the set of all links that contribute to

1For wide-band channel model, the data rate of UE kis the sum rate over all sub-carriers allocated to UE k.

2No buffering is assumed in the transport network routing. Therefore, transmission time of tasks’ data over the transport

network links is not taken into account.

10

path pb

nis Epb

n=n(m, m0)∈ E|I(m,m0)

pb

n= 1o. The amount of computational resources allocated

to task kis denoted by υk(in terms of CPU cycles per second). Note that the execution of each

tasks is performed at only one node. To ensure that the allocated computational resources do

not violate the processing capacity of that node, we should have:

X

k∈K X

b∈Bn

υkξk

pb

n≤Υn,∀n. (5)

Since the data of task kis sent over the network with rate Rk, the aggregated rates of all tasks

that pass a link should not exceed its capacity, which is guaranteed by the following constraint:

X

k∈K X

n∈N X

b∈Bn

I(m,m0)

pb

nξk

pb

nRk≤B(m,m0),∀(m, m0)∈ E.(6)

The execution latency of task kis Texe

k=Lk

υk. The processed data of task kis sent to-

ward the BBU pool (i.e., node ¯n). In this paper, we assume the path of uplink and down-

link are the same. Therefore, the overall propagation latency of task kover the path pb

nis

twice the propagation latency of path pb

n. Thus, the propagation latency of task kis Tprop

k=

2Pn∈N Pb∈BnP(m,m0)∈Epb

n

ξk

pb

nδ(m,m0), where δ(m,m0)is the propagation latency of link (m, m0).

Table I summarizes the notation used in the paper.

TABLE I: Main Notation.

Notation Deﬁnition Notation Deﬁnition

U, M, K Number of RRHs, antennas and users WRadio access network bandwidth

K,N,ESet of all users, nodes and links KuSet of users served by RRH u

Pmax

kPower budget of UE khu,k Channel vector between user kand RRH u

Lk, Dk, Tk

Load, data size and maximum

acceptable latency of task k

ξk

pb

n

Decision variable for assignment of node n

and its associated path pb

nto task k

ΥnProcessing capacity of node n Bf,u Capacity of fronthaul link for RRH u

B(m,m0),

δ(m,m0)

Capacity and propagation latency

of link (m, m0)Λn

Computational energy efﬁciency coefﬁcient

of the node n

pb

nbth path between nodes ¯nand n υkComputational resources allocated to task k

BnSet of paths between nodes ¯nand n ρkAllocated transmit power to UE k

Epb

nSet of all links that contribute in path pb

nαkNon-negative variable of task k

I(m,m0)

pb

n

Indicator determining whether link (m, m0)

contributes in path pb

n

RkData rate of task k

Jk

u

Indicator determining whether UE k

is assigned to RRH u

Texe

kExecution latency of task k

Ttx

kRadio transmission latency of task k T prop

kPropagation latency of task k

11

III. PROBLEM FORMULATION

In this section, we formulate the optimization problem of joint task ofﬂoading. Each task is of-

ﬂoaded under its E2E latency constraint and in an energy-efﬁcient manner. The objective function

is E(ξ,υ,ρ) = Pk∈K ρk+ηPn∈N Pk∈K Pb∈BnΛnξk

pb

nυk3, where ξ= [ξ1

p1

1,··· , ξK

pBN

N

]T,υ=

[υ1,··· , υK]T, and ρ= [ρ1,··· , ρK]Tare the vectors of all ξk

pb

n, υk,and ρk, respectively; Λn

denotes the computational energy efﬁciency coefﬁcient of node n[18], and ηis a weight. Note

that the ﬁrst term in Eis the transmit power consumption and the second term is the power

consumption of executing servers. Therefore, the joint task ofﬂoading optimization problem is:

min

ξ,υ,ρE(ξ,υ,ρ)

s.t. C1: Texe

k+Tprop

k+Ttx

k≤Tk,∀k,

C2: Pk∈K Pb∈Bnυkξk

pb

n≤Υn,∀n,

C3: Pk∈K Pn∈N Pb∈BnI(m,m0)

pb

nξk

pb

nRk≤B(m,m0),∀(m, m0)∈ E,

C4: Pk∈KuRk≤Bf,u,∀u,

C5: ρk≤Pmax

k,∀k,

C6: Pn∈N Pb∈Bnξk

pb

n= 1,∀k,

(7)

under variables: ξ∈ {0,1},υ≥0,ρ≥0. Constraint C1 guarantees that the maximum acceptable

latency of task ofﬂoading is respected. Constraints C2 and C3 make sure that all tasks are

ofﬂoaded without violation in processing capacity of nodes and capacity of links, respectively.

Constraint C4 ensures the capacity of fronthaul links. Constraint C5 guarantees the power budget

of UEs while constraint C6 makes sure that each task is ofﬂoaded to only one node and path.

IV. JOINT TASK OFFLOADING (JTO)

In this section, we solve optimization problem (7). This problem is non-convex due to integer

variable ξand coupling variables in C1-C4. Therefore, we solve (7) by decoupling transmit power

allocation from task placement and computational resource allocation. In doing so, transmit power

is allocated given task placement and allocated computational resources. Then, we perform task

placement and computational resource allocation having allocated transmit powers. The proposed

approach needs a feasible initialization. However, it is likely for constraint C1 to make (7)

infeasible. Thus, we need to propose a feasibility analysis to ﬁnd a feasible subset of tasks.

A. Feasibility Analysis

The feasible set of (7) is extended by adding a non-negative variable αkto the maximum ac-

ceptable latency of task k. Thus, the feasibility problem is constructed by replacing the objective

12

function of (7) with the sum of non-negative variables, i.e., PK

k=1 αk[31]. The constraints which

cause infeasibility are found by solving the feasibility problem and determining the constraints

with positive values of non-negative variables. The feasibility problem is:

min

ξ,υ,ρ,αPk∈K αk

s.t. C1-a: Texe

k+Tprop

k+Ttx

k≤Tk+αk,∀k∈ K

C2-C6,

(8)

under variables: ξ∈ {0,1},υ≥0,ρ≥0,α≥0. Note that non-negative variables are added

only to C1 because when C1 is eliminated, the optimization problem (7) is always feasible. Thus,

we seek for the tasks whose maximum acceptable latencies are violated and eliminate them one

by one until a subset of feasible tasks remains. The solution to (8) not only provides the infeasible

constraints but also determines the level of infeasibility, i.e., constraints with larger values of

non-negative variables need more resources to become feasible. Therefore, we ﬁrst eliminate the

tasks with larger values of non-negative variables.

Without loss of equivalence, we add the summation of inequalities in C1-a as a new constraint

C7. Therefore, optimization problem (8) is restated as:

min

ξ,υ,ρ,αPk∈K αk

s.t. C1-a: Texe

k+Tprop

k+Ttx

k≤Tk+αk,∀k

C2-C6,

C7: Pk∈K (Texe

k+Tprop

k+Ttx

k−Tk)≤Pk∈K αk.

(9)

This optimization problem is equivalent with:

min

ξ,υ,ρ,αPk∈K (Texe

k+Tprop

k+Ttx

k)

s.t. C1-a, and C2-C6,

(10)

in which the term Pk∈K Tkis removed from the objective because it is constant. We solve

(10) by decoupling transmit power allocation from task placement and coumputational resource

allocation. In other words, we solve (10) under variables υ,ξ,α, having ρﬁxed and vice versa.

To perform task placement and computational resource allocation, we need an initial ρ=ρ0

that satisﬁes C3 and C4, which are satisﬁed with a small value of Rk, i.e., small values of ρk.

Next, we solve the following optimization problem:

min

α,υ,ξPk∈K (Texe

k+Tprop

k)

s.t. C1-a, C2,C3, and C6

(11)

13

by a heuristic method. As in Algorithm 1, we ﬁnd variables ξand υthat minimize the objective

of (11). Then, we set the non-negative variables for a feasible C1. In doing so, for task k,

we calculate the amount of unused computational resources at all nodes, formally expressed as

˜

Υk

n= Υn−Pj∈K\{k}Pb∈Bnυjξj

pb

n. Morevoer, the available capacity of link (m, m0)is ˜

Bk

(m,m0)=

B(m,m0)−Pj∈K\{k}Pn∈N Pb∈BnI(m,m0)

pb

nξj

pb

nRj. A task is placed in node nonly when there is

a feasible path between ¯nand n, i.e., a path with sufﬁcient capacity in all of its links. The set

of all such nodes is Nk. For each n∈ Nk, we calculate Texe

k+Tprop

kwhen υk=˜

Υk

n. Next,

we ﬁnd the node and feasible path that give the smallest Texe

k+Tprop

k, denoted by n?and b?,

respectively. Note that from C1, the sufﬁcient computational resources allocated to task kis

υtemp =Lk

Tk−Ttx

k−Tprop

k

. When ˜

Υk

n?≥υtemp, C1 is satisﬁed by setting υk=υtemp and αk= 0.

Otherwise, we set υk=˜

Υk

n?and αk=Ttx

k+Texe

k+Tprop

k−Tk. Next, the available computational

resources of nodes and available capacity of links are updated and this process is repeated for

all of tasks. Note that we begin with tasks that require lower resources, i.e., tasks with lower

values of Tk.

Algorithm 1: Heuristic Algorithm for Solving (11).

Input: ρ

1sort α:T[1] ≤T[2] ≤ ·· ·T[|K|]

2for k= [1] : [|K|]do

% Find a feasible node according to capacity of paths terminated at that node

3Nk={n∈ N|∃b:Rk≤˜

Bk

(m,m0)∀(m, m0)∈ Epb

n}

4˜

Υk

n= Υn−Pj∈K\{k}Pb∈Bnυjξj

pb

n

,∀n

%Find the best node and its associated path

5(n?, b?) = arg min

n∈N k,b∈Bn

Texe(˜

Υk

n) + Tprop(pb

n)

6set ξk

pb?

n?= 1 and ξk

pb

n= 0,∀(n, b)6= (n?, b?)

% Update computational resource allocation and non-negative variables

7υtemp =Lk

Tk−Ttx

k−Tprop

k

8if ˜

Υk

n?≥υtemp then

9set υ?

k=υtemp and α?

k= 0

10 else

11 υ?

k=˜

Υk

n?and α?

k=Ttx

k+Texe

k+Tprop

k(pb?

n?)−Tk

Output: α?,ξ?,υ?

14

After solving (11), we allocate the transmit power by solving:

min

ρPK

k=1 Ttx

k

s.t. C1-a, and C3-C5.

(12)

Note that in the heuristic method, we have Ttx

k+Texe

k+Tprop

k=Tk+αk. As a result, any

feasible solution to (12) does not increase Ttx

kbecause (12) is infeasible for larger values of Ttx

k.

Hence, replacing (12) with its feasibility problem counterpart does not impact the decreasing

monotonicity of the objective function in (10). The feasibility problem of (12) is:

ﬁnd ρ

s.t. C1-a, and C3-C5.

(13)

In solving (13), we note that the constraints C1-a, C3 and C4 are non-convex. Therefore, we

need to ﬁnd a convexiﬁed version of (13).We use CCP [32] to convexify (13). In doing so, we

reformulate C1-a as:

Rk≥Dk

Tk+αk−Tprop,i

k−Texe,i

k

.(14)

where Texe,i

kand Tprop,i

kare the execution latency and propagation latency of task kobtained

from the heuristic method in ith iteration, respectively. In order to convexify (14), we need a

concave approximation of Rkwith respect to ρ. The rate Rkis:

Rk=Wlog2Pj∈K

|hH

u,khu,j |2

|hu,j |2ρj+σ2

n

Pj∈K\{k}

|hH

u,khu,j |2

|hu,j |2ρj+σ2

n, k ∈ Ku,(15)

which is equivalent to:

Rk=Wlog2U

X

u=1 X

j∈Ku

|hH

u,khu,j |2

|hu,j |2ρj+σ2

n

| {z }

hk(ρ)

−Wlog2U

X

u=1 X

j∈Ku\{k}

|hH

u,khu,j |2

|hu,j |2ρj+σ2

n

| {z }

gk(ρ)

.(16)

Both hk(ρ)and gk(ρ)are concave functions of ρ. Thus, we need to ﬁnd a linear approximation

of gk(ρ), which is ˆgk(ρ) = gk(ρ0) + ∇gk(ρ0)T(ρ−ρ0), where:

[∇gk(ρ)]i=

WPU

u=1 Ii

u

|hH

u,khu,i |2

|hu,i|2

ln(2) PU

u=1 Pj∈Ku\{k}

|hH

u,khu,j |2

|hu,j |2ρj+σ2

n!, i ∈ K\{k},

0, i =k.

(17)

15

Next, we focus on the convex approximation of C3 and C4. To this aim, we ﬁnd a convex

approximation of Rk, which is found by linear approximation of hk(ρ). Thus, we have ˆ

hk(ρ) =

hk(ρ0) + ∇hk(ρ0)T(ρ−ρ0), where:

[∇hk(ρ)]i=WPU

u=1 Ii

u

|hH

u,khu,i |2

|hu,i|2

ln(2) PU

u=1 Pj∈Ku

|hH

u,khu,j |2

|hu,j |2ρj+σ2

n, i ∈ K,(18)

Finally, the convexiﬁed version of (12) is:

ﬁnd ρ

s.t. C1-b hk(ρ)−ˆgk(ρ)≥Dk

Tk+αk−Tprop,i

k−Texe,i

k

,∀k∈ K

C3-a: Pk∈K Pn∈N Pb∈BnI(m,m0)

pb

nξk

pb

nˆ

hk(ρ)−gk(ρ)≤B(m,m0),∀(m, m0)∈ E

C4-a: Pk∈Kuˆ

hk(ρ)−gk(ρ)≤Bf,u,∀u

C5: ρk≤Pmax

k,∀k,

(19)

under variable: ρ≥0. Note that, based on CCP, any feasible solution of (19) is also feasible

in (13) [32]. The feasibility problem (8) is solved by alternatively solving (11) and (19). Then,

we reject the task that makes (7) infeasible. According to Algorithm 2, we ﬁnd the value of the

maximum non-negative variable. If the value is positive, its associated task is rejected, the set

of served tasks is updated, and (8) is solved for updated set of tasks. This procedure continues

until all non-negative variables are zero. The output of Algorithm 2 is feasible subset of tasks

K?as well as the solution of (8), i.e., the values of ξini,ρini,and υini, which are utilized as

initialization for solving (7).

B. Optimization

Given the feasible solution ξini,ρini,υini , and the set of accepted tasks K?, we seek for the

solution of (7). Similar to Algorithm 2, we decouple power allocation from task placement and

coumputational resource allocation. The optimization problemof task placement and coumputa-

tional resource allocation is:

min

υ,ξPn∈N Pk∈K Pb∈BnΛnξk

pb

nυk3

s.t. C1-C3, and C6,

(20)

which is non-convex. Note that the objective of (20) is an increasing function of υkand allocating

lower computational resources to task kdecreases the power consumption. But, allocating lower

computational resources increases execution latency and violates the E2E latency constraint. As

16

Algorithm 2: JTO Feasibility Analysis for Solving (8).

Initialize: K={1,··· , K},ξ=0,ρ0:very small

1repeat

2i= 0

3repeat

% Allocate transmit power, computational resources, and place the tasks

4Solve (11) via Algorithm 1 and return υi+1,ξi+1, and αi+1

5Solve (19) and return ρi+1

6i=i+ 1

7until Pk∈K αi

k−Pk∈K αi+1

k≤or i≥Imax

% Discard the infeasible task

8k?= arg max

k∈K

αk

9if αk?>0then

10 K=K\{k?}

11 until Pk∈K αk= 0

Output: ξini =ξi+1,ρini =ρi+1 ,υini =υi+1, and K?=K

a result, we need to ﬁnd nodes with smaller propagation latency to compensate for increased

execution latency. In doing so, we ﬁnd a subset of nodes with smaller propagation latency

than the current executing server and with sufﬁcient capacity of links terminating at that nodes.

This set of nodes is N0

k={n0∈ N|∃b0:Rk≤˜

Bk

(m,m0)∀(m, m0)∈ Epb0

n0and Tprop

k(pb0

n0)≤

Tprop

k(pb

n)}, where we assume task kis previously placed through path pb

n. For each node in

N0

k, we calculate the minimum computational resources that satisfy the E2E latency constraint,

i.e., υtemp =Lk

Tk−Ttx

k−Tprop

k(pb0

n0). When ˜

Υk

n0≥υtemp and Λn0υ3

temp ≤Λnυ3

k, we ensure that task

placement through pb0

n0and computational resource allocation υtemp are feasible and result in

lower power consumption. Therefore, we set υk=υtemp. Otherwise, we reinstate υkfor task k.

Algorithm 3 begins with the tasks with larger power consumption, i.e., Λnkυ3

k, where nkdenotes

the executing server of task k. This procedure is repeated for all accepted tasks.

The sub-problem of transmit power allocation, after convexiﬁcation, is:

min

ρPk∈K ρk

s.t. C1-c: hk(ρ)−ˆgk(ρ)≥Dk

Tk−Tprop,i

k−Texe,i

k

,∀k∈ K

C3-a, C4-a, and C5.

(21)

Based on CCP in Algorithm 4 and starting from ρ0=ρini, an iterative solution of (21)

provides a sub-optimal transmit power allocation. Finally, optimization problem (7) is solved

17

Algorithm 3: Heuristic Algorithm for Solving (20).

Input: ξini,ρini ,υini

1sort: Λ[1]υ3

[1] ≤Λ[2]υ3

[2] ≤ ·· ·Λ[K]υ3

[K]

2for k= [1] : [|K|]do

% Find a feasible node according to capacity of paths terminated at that node

3N0

k={n0∈ N|∃b0:Rk≤˜

Bk

(m,m0),∀(m, m0)∈ Epb0

n0and Tprop

k(pb0

n0)≤Tprop

k(pb

n)}

4for n0∈ N0

kdo

5υtemp =Lk

Tk−Ttx

k−Tprop

k(pb0

n0)

6˜

Υk

n0= Υn0−Pj∈K\{k}Pb∈Bn0υjξj

pb

n0

7if υtemp ≥˜

Υk

n0and Λn0υ3

temp ≤Λn?υ3

kthen

8set υ?

k=υtemp

9set ξk?

pb0

n0

= 1 and ξk ?

pb

n= 0,∀(n, b)6= (n0, b0)

10 break

Output: ξ?,υ?

Algorithm 4: Power Allocation in JTO.

Input: ρ0=ρini,i= 0,= 10−3,Iρ

max = 102

1repeat

% Allocate power to users

2Solve (21) and return ρi+1

3i=i+ 1

4until Pk∈K ρi

k−Pk∈K ρi+1

k≤or i≥Iρ

max

Output: ρ?=ρi+1

via Algorithm 5, which alternatively solves optimization problem (11) via Algorithm 3 and

optimization problem (21) via Algorithm 4.

From the implementation point of view, BBU is responsible for gathering the required in-

formation, performing resource allocation, and sending the decisions to the associated entities.

Speciﬁcally, in JTO, BBU needs to acquire CSI of UEs and the available computational resources

in the NFV-enabled nodes. CSI of each UE is estimated at its serving RRH and is forwarded

through fronthaul links with negligible latency. In addition, each NFV-enabled node sends the

available computational resources to the BBU through the transport network. After performing

JTO, BBU transmits the value of allocated powers to RRHs. Next, BBU forwards the received

data of tasks as well as the obtained computational resources to associated NFV-enabled nodes

based on task placement variables. In the downlink, the processed data of tasks are sent to BBU,

18

which in turn transmits UEs processed data to their serving RRH.

Algorithm 5: JTO Optimization Algorithm for Solving (7).

Input: ξ0=ξini,ρ0=ρini ,υ0=υini,K?,i= 0

1repeat

% Place the tasks and allocate the computational resources

2Solve (20) via Algorithm 3 and return υi+1 and ξi+1

% Allocate the transmit power

3Solve (21) via CCP in Algorithm 4 and return ρi+1

4i=i+ 1

5until E(ξi,υi,ρi)− E(ξi+1,υi+1,ρi+1 )≤or i≥Imax

Output: ξ?,ρ?,υ?

C. Convergence analysis

In this subsection, we prove the convergence of Algorithms 2 and 5.

Theorem 1. Algorithm 2 is convergent.

Proof. We show that the objective value of (8), i.e., Pk∈K αk, is non-increasing in each step of

Algorithm 2 and since the objective value is lower bounded by zero, Algorithm 2 is convergent.

In ith iteration of Algorithm 2, Algorithm 1 sets αi+1

keither equal to 0when E2E latency of task

kis guaranteed or equal to Ttx

k+Texe

k+Tprop

k−Tkwhen E2E latency is larger than its maximum

acceptable value. Therefore, we have αi+1

k= [Texe

k+Tprop

k+Ttx

k−Tk]+. Hence, we need to show

that Pk∈K(Ttx

k+Texe

k+Tprop

k)does not increase after ith iteration. Algorithm 1 afﬂoads task k

so that Texe

k+Tprop

kin the objective of (11) is minimized (line 5 in Algorithm 1). As a result,

Algorithm 1 does not increase the objective value of (11), i.e., Pk∈K(Tprop

k(ξi+1)+Texe

k(υi+1)) ≤

Pk∈K(Tprop

k(ξi)+Texe

k(υi)). Moreover, as discussed in subsection IV-A, Algorithm 1 makes C1-a

active, i.e., Ttx

k(ρi) = Tk+αi+1

k−Texe

k(υi+1)−Tprop

k(ξi+1), and therefore, any feasible solution to

(13) does not increase the objective vlaue of (12), i.e., Pk∈K Ttx

k(ρi+1)≤Pk∈K Ttx

k(ρi), which

gives Pk∈K(Texe

k(υi+1) + Tprop

k(ξi+1) + Ttx

k(ρi+1)) ≤Pk∈K (Texe

k(υi) + Tprop

k(ξi) + Ttx

k(ρi)).

As a result, we have Pk∈K αi+1

k≤Pk∈K αi

k, that is, Algorithm 2 is convergent.

Note that Algorithm 2 may eliminate the task with maximum non-negative variable. This

elimination is equivalent to removing the constraints of (8) associated with the eliminated task.

Note that, eliminating a task increases the available capacity of links in transport network and

available computational resources in NFV-enabled nodes. As a result, a search space of Algorithm

1 increases, which may result in lower propagation and execution latencies. Moreover, eliminating

19

a task extends the feasible set of (13). Therefore, data rate of users may increase, which in turn

may decrease Pk∈K Ttx

k. As a result, eliminating the task with maximum non-negative variable

does not increase the objective of (8).

Theorem 2. Algorithm 5 is convergent.

Proof. Algorithm 5 solves (7) by alternating minimization of (20) and (21). Therefore, we need

to show that Algorithm 3 (which solves (20)) and Algorithm 4 (which solves (21)) do not

increase the objective value of (7). According to line 7 of Algorithm 3, computational resource

allocation and task placement do not increase the objective value of (20). In addition, based on

[32], convergence of Algorithm 4 is guaranteed and CCP does not increase the objective of (21).

As a result, the objective value of (7) is non-increasing in each iteration, and since Ψ(ξ,υ,ρ)

is lower bounded by zero, Algorithm 5 is convergent.

D. Summary of JTO

Herein we summarize JTO. We obtain a set of feasible tasks by solving (8). In doing so,

we decouple the power allocation from task placement and computational resource allocation,

which are performed by solving (13) and Algorithm (1), respectively. Then, we solve (7) for

feasible tasks via Algorithm 5, which includes the alternating minimization of (20) and (21) via

Algorithm 3 and Algorithm 4, respectively. Computational complexity (CC) analysis of JTO is

provided in [33] (not included here due to space limitation). Our analysis indicates that JTO is

of polynomial complexity, the same complexity order of state-of-the-art task ofﬂoading schemes.

V. DI SJ OI NT TASK OFFLOADING (DTO)

In DTO, transmit power allocation is independent of task placement and computational re-

source allocation. The transmit power is allocated under a radio latency constraint, i.e., Ttx

k≤

TRAN

k. Then, the task placement and computational resource allocation are performed so that

Tprop

k+Texe

k≤Tk−TRAN

k. The convexiﬁed sub-problem of the transmit power allocation is:

min

ρPk∈K ρk

s.t. C1-d: hk(ρ)−ˆgk(ρ)≥Dk

TRAN

k

,∀k∈ K

C4-a, and C5.

(22)

20

According to discussion on discussion on (7), a feasibility analysis is needed for (22). Similar

to JTO, the feasibility problem of (22) is:

ﬁnd ρ

s.t. C1-e: hk(ρ)−ˆgk(ρ)≥Dk

TRAN

k+αk,∀k∈ K

C4-a, and C5,

(23)

which is solved via CVX. Next, the non-negative variables are updated as αk= [Ttx

k−TRAN

k]+

and the task with maximum non-negative variable is eliminated. This procedure is repeated until

a feasible subset of tasks for transmit power allocation is obtained. After this step, (22) is solved

with the feasible subset of tasks. The transmit power allocation phase of DTO is provided in

Algorithm 6.

Algorithm 6: DTO Transmit Power Allocation.

Input: K={1,··· , K},α0:very large, ρ0:very small, TRAN

k= (0, Tk)

1repeat

2i= 0

3repeat

% Allocate the transmit power to users

4Solve (23) via CVX and set ρi+1 =ρ?

% Update the non-negative variables

5αi+1

k= [Ttx

k−TRAN

k]+,∀k∈ K

6i=i+ 1

7until Pk∈K αi

k−Pk∈K αi+1

k≤or i≥Imax

8k?= arg maxk∈K αk

% Discard the infeasible task

9if αk?>0then

10 K=K\{k?}

11 until Pk∈K αk= 0

% Minimize the transmit power

12 Solve (22) via CCP in Algorithm 4 and return ρ?

Output: ρ?,KRAN =K

Having obtained transmit power ρ, task placement and computational resource allocation are

21

performed, whose associated sub-problem is:

min

ξ,υPn∈N Pk∈K Pb∈BnΛnξk

pb

nυk3

s.t. C1-f: Tprop

k+Texe

k≤Tk−TRAN

k,∀k∈ K,

C2, C3, and C6.

(24)

A feasibility analysis is also needed for solving (24). Similar to the transmit power allocation,

we introduce a set of non-negative variables. The resulting sub-problem is similar to (11) by

replacing C1-a with C1-f, which is solved by algorithm 1. After obtaining a set of feasible tasks,

(24) is solved via Algorithm 3. The feasibility analysis and optimization of DTO is provided in

Algorithm 7. CC of DTO is also analyzed in [33]. Our analysis shows that CC of DTO is less

than CC of JTO, however, both are in the same complexity order.

Algorithm 7: DTO Computational Resource Allocation and Task Placement.

Input: KRAN,ξ=0

1repeat

2i= 0

3repeat

% Allocate transmit power, computational resources, and place the tasks

4Solve (24) via Algorithm 1 given υi,ξi, and αiand return υi+1,ξi+1, and αi+1

5i=i+ 1

6until Pk∈K αi

k−Pk∈K αi+1

k≤or i≥Imax

% Find the task with maximum non-negative variable

7k?= arg maxk∈K αk

8if αk?>0then

% Discard the infeasible task

9K=K\{k?}

10 until Pk∈K αk= 0

11 i= 0

12 repeat

% Allocate computational resources and place the tasks

13 Given υiand ξi, solve (24) via Algorithm 3 and return υi+1 and ξi+1

14 until Ψ(ξi,υi,ρ?)−Ψ(ξi+1,υi+1 ,ρ?)≤or i≥Imax

Output: ξ?,υ?

VI. LOWER BOUND ON OPTIMAL SOLUTION (LTO)

Since the optimization problem (8) is non-convex, without loss of the optimality, we make

some assumptions to resolve the non-convexity of (8). First, we note that it is very likely for the

22

ﬁber-optic links to have sufﬁcient capacity for carrying the trafﬁc of UEs, which is the case for

frontahul links and any wired link in the transport network. As a result, we relax the constraints

C3 and C4 from (8). Note that the relaxation of C3 and C4 extends the feasible set of (8),

resulting in a lower bound on the optimal solution to (8). In addition, with a large number of

antenna elements at RRHs, the channel vectors between different RRHs and a speciﬁc user are

approximately orthogonal, i.e., |hH

u,khu,j | ≈ 0for all j6=k[30]. Therefore, the interference in

wireless channels is negligible and (15) becomes:

Rk=Wlog21 + |hu,k |2

σ2

n

ρk, k ∈ Ku.(25)

The elimination of the interference increases Rkwith the same amount of power allocated to

each UE, which again results in a lower bound on the optimal solution to (8). Based on the fact

that min

α,ξ,υ,ρPk∈K αk= min

α,ξ,υmin

ρPk∈K αk, the optimal power allocation is the solution to:

min

ρPk∈K αk

s.t. C1: Texe

k+Tprop

k+Ttx

k≤Tk+αk,∀k,

C5: ρk≤Pmax

k,∀k.

(26)

The data rate in (25) removes the cross-coupling impact of the allocated power to different users.

Hence, without loss of optimality, (26) is solved for each ρkindependently. The associated power

allocation problem is:

min

ρk

Ttx

k

s.t. C1: Texe

k+Tprop

k+Ttx

k≤Tk+αk,∀k,

C5: ρk≤Pmax

k,∀k,

(27)

in which αkin the objective is replaced with Ttx

k. Note that minimizing Ttx

kis equivalent to

maximizing Rk

Dk. Since Rkin (25) is increasing with ρk, the optimal solution of (27) is ρ?

k=Pmax

k.

Note that feasibility of C1 is ensured by optimizing other variables.

Next, we deal with the binary optimization variable ξ. We propose an exhaustive search over

all possible values of ξto avoid any performance loss due to non-convexity of (8), stemmed

from binary ξ. The number of all possible combinations of task placement decisions equals |B||K|

where |B| =Pn|Bn|. Thus, we solve (8) for αand υfor each task placement decision and

select the decision that results in lowest Pkαkas the optimal decision. Note that the exhaustive

search may impose an excessive computational complexity. However, LTO is developed as a

23

baseline for performance evaluation and it is not supposed to work in real-time.

The optimization problem for solving αand υis:

min

υ,αPk∈K αk

s.t. C1-a: Lk

υk≤˜

Tk+αk,∀k∈ K

C2: Pk∈Knυk≤Υn,∀n,

(28)

where ˜

Tk=Tk−Tprop

k−Ttx

kand Knis the set of tasks to be executed at executing server n.

Problem (28) is convex in both αand υ. As a result, the KKT conditions determine the optimal

solution. To derive the KKT conditions, we ﬁrst write the Lagrangian function as follows:

L=X

k∈K αk+γk(Lk

υk−˜

Tk−αk)−ηkαk−µkυk+X

n∈N

λn X

k∈Kn

υk−Υn!.(29)

By derivating the Lagrangian function with respect to αkand υkwe have:

∂L

∂αk

= 1 −γk−ηk= 0,∀k∈ K,(30)

and

∂L

∂υk

=−γk

Lk

υ2

k−µk+λn= 0,∀k∈ Kn.(31)

In addition, the complementary slackness conditions are:

γk(Lk

υk−˜

Tk−αk) = 0,∀k∈ K,(32)

λn X

k∈Kn

υk−Υn!= 0,∀n∈ N,(33)

ηkαk= 0,∀k∈ K,(34)

µkυk= 0,∀k∈ K.(35)

Constraint C1-a implies υk>0. Hence, from (35) we have µk= 0 and condition (31) results

in:

υk=rLk

λn

,∀k∈ Kn,(36)

which implies λn>0. Thus, (33) gives:

X

k∈Kn

υk= Υn,∀n∈ N.(37)

On the other hand, when (7) is infeasible, we get αk>0. Thus, (34) leads to ηk= 0 and

condition (30) results in γk= 1. As a result, from (32) we get:

αk=Lk

υk−˜

Tk,∀k∈ K.(38)

24

Having αk≥0and (36), the optimal non-negative variable is:

αk= [pLkλn−˜

Tk]+,∀k∈ Kn,(39)

wherein λnis found such that:

X

k∈Kn

Lk

˜

Tk+αk

= Υn,∀n∈ N.(40)

Then, the optimal values of αkand υkare found as in (39) and (36), respectively. Having the

optimal solution of (28) for all possible ξ, the optimal solution of (8) is the solution with lowest

objective of (28).

VII. SIMULATION RES ULTS

In this section, we evaluate the performance of JTO 3. The setup of the simulation is presented

in Table II. We assume that U= 4 RRHs are placed with inter-site distance of 100 m and all

users are served in an area of 100 m radius with a given user-RRH assignment. The nodes in

the transport network are divided into three tiers based on their distance from UEs: the local

tier, the regional tier, and the national tier. Although the number of serving nodes is very large,

there are some distant nodes in each tier that impose a large propagation latency. Hence, we only

incorporate the nodes with reasonable propagation latency in the transport network [7]. Network

graph Gconsists of N= 6 nodes: ¯nat the local tier with zero propagation latency, three nodes at

the regional tier with relatively low propagation latency, and two distant nodes at the national tier.

For simplicity of comparison, we assume that all nodes have the same computational capacity

and all tasks are of the same size, load, and maximum acceptable latency, i.e., Dk=D,Lk=L,

and Tk=T,∀k. Moreover, we assume equal propagation latency and capacity for the network

links. Note that the relatively low value of link capacity (0.4Gbps) is the amount of capacity

solely reserved for task ofﬂoading. Finally, the simulations are performed on a 3.30 GHz Core

i5 CPU and 16 GB RAM.

Fig. 2 (a) reports the performance of the feasibility analysis in JTO, showing the acceptance

ratio versus T. The acceptance ratio is deﬁned as the ratio of accepted services by the feasibility

analysis over the total number of the requested tasks. Note that the acceptance ratio increases

by increasing T. This is due to the fact that the tasks with higher Tneed less transmit power

and computational resources to be served. Moreover, for higher T, a larger number of nodes are

3The simulation ﬁles are available online at IEEE DataPort with DOI: 10.21227/w5tv-yz53.

25

TABLE II: Simulation Setup.

Parameter Value Parameter Value

Lk106CPU Cycles δ(m,m0)10 ms

M32 Antennas Λn10−28 [18]

Dk0.1Mbits Path Loss 128.1 + 37.6 log Q[21]

Υn109CPU Cycles per Second [6] U4

Pmax

k0.5Watt ISD 100 m

B(m,m0)0.4Gbps W20 MHz [21]

Bf,u 0.6Gbps Noise power −150 dBm/Hz [21]

available for task ofﬂoading. In addition, we solve (8) by the alternate search method (ASM),

in which (8) is alternatively solved for each variable. Note that the sub-problem of υis solved

by CVX and the sub-problem of ξis solved by MOSEK (details are not provided due to space

limitation). The effectiveness of JTO against ASM is also shown in Fig. 2 (a). Note that for

latencies smaller than 75 ms, JTO outperforms ASM. Moreover, the performance of both methods

is identical for low values of T. This is due to the fact that the set of accessible NFV-enabled

nodes for low values of Tis restricted to ¯nand therefore, JTO is not able to ofﬂoad the tasks

to more distant NFV-enabled nodes because their propagation latencies violate the E2E latency

constraints.

The acceptance ratio of JTO for different number of tasks is shown in Fig. 2 (b). Since the

amount of available resources is limited, the acceptance ratio is decreasing with the increase in

the total number of tasks. Again, the superiority of JTO over ASM is observed.

0 10 20 30 40 50 60 70

T (ms)

0

0.2

0.4

0.6

0.8

1

Acceptance Ratio

JTO (Alg. 2)

ASM

27%

24%

(a) Acceptance ratio vs. Tfor K= 30.

10 20 30 40 50 60 70 80 90 100

K

0

0.2

0.4

0.6

0.8

1

Acceptance Ratio

JTO (Alg. 2)

ASM

(b) Acceptance ratio vs. Kfor T= 40 ms.

Fig. 2: Acceptance ratio vs. Tand K

The convergence of Algorithm 2 is shown in Fig. 3 (a). As expected, the sum of non-negative

variables is decreasing in each iteration. Furthermore, Algorithm 2 converges faster than ASM,

which stems from higher acceptance ratio of JTO.

26

0 5 10 15 20 25 30

# Iterations

0

1

2

3

4

k

JTO (Alg. 2)

ASM

103

(a) Convergence of admission control algorithm for

T= 20 ms and K= 30.

5 10 15 20 25

TRAN (ms)

0

0.2

0.4

0.6

0.8

1

Acceptance Ratio

DTO (Alg. 6)

DTO (Alg. 7)

JTO (Alg. 2)

(b) Acceptance ratio of joint vs. disjoint methods in

terms of TRAN for T= 30 ms and K= 30 users.

Fig. 3: Convergence and acceptance ratio of the proposed methods.

The acceptance ratio of JTO is compared with DTO in Fig. 3 (b). The acceptance ratio of

JTO and DTO is depicted for T= 30 ms. For DTO, we obtain the acceptance ratio for different

values of TRAN ∈(0, T ). Moreover, the acceptance ratio of the feasibility analysis in the transmit

power allocation phase of DTO, i.e., Algorithm 6, is depicted. The acceptance ratio of DTO is

increasing for small values of TRAN, that is, the small values of TRAN impose high rates on

users, which is impossible due to either insufﬁcient bandwidth or limited fronthaul capacity.

On the other hand, for larger values of TRAN, the acceptance ratio of Algorithm 6 is 1 but the

task placement and computational resource allocation restricts the number of accepted tasks.

Furthermore, JTO outperforms DTO in different values of TRAN.

Fig. 4 (a) shows the average radio transmission latency, i.e., 1

KPk∈K Ttx

k, and the average

execution latency of tasks, i.e., 1

KPk∈K Texe

kfor different values of Dgiven T= 20 ms.

The average radio transmission latency increases by increasing Dand subsequently the average

execution latency is decreased to maintain the maximum acceptable latency. Therefore, it is

inferred that JTO efﬁciently manages the transmit power and the computational resources.

Similarly, according to Fig. 4 (b), the average execution latency increases by increasing Land

subsequently this increase is compensated with lower radio transmission latency.

In Fig. 5, we assume there are three classes of tasks (each including 10 tasks) with three

different maximum acceptable latencies, i.e., T(1) = 10 ms, T(2) =50 ms, and T(3) = 100 ms. The

classes (1),(2), and (3) are considered as the sets of tasks with low, medium, and high latency

requirements, respectively. Moreover, we assume there are three nodes (shown by rectangles):

a local node (i.e., ¯n) with zero propagation latency, a regional node with 20 ms propagation

latency, and a national node with 40 ms propagation latency. The propagation latencies are the

27

0.1 0.2 0.3 0.4

D (Mbits)

0

2

4

6

8

10

12

14

16

18

20

Average delay (ms)

Average Tx. delay Average Exe. delay

(a) Average radio transmission and execution latencies

vs. Dfor T= 20 ms and K= 30.

1 10 20 30 40 50

L ( 10 5 CPU cycles)

0

2

4

6

8

10

12

14

16

18

20

Average delay (ms)

Average Tx. delay Average Exe. delay

(b) Average radio transmission and execution latencies

vs. Lfor T= 20 ms and K= 30.

Fig. 4: Average radio transmission and execution latencies in JTO.

5

9

1

10

5

5

3

7

5

4

1

4

4

5

10

1

10 9

Low latency tasks

Medium latency tasks

High latency tasks

(a) C=1 (b) C=10 (c) C=20 (d) C=50

(𝑇(1) =10 ms)

(𝑇(2) =50 ms)

(𝑇(3) =100 ms)

* C is scaled by 109 Cycles per Second

Fig. 5: Placement of the different classes of tasks at three different tiers of nodes for K= 30.

summation of uplink and downlink propagation latencies. Fig. 5 shows the task placement for

different values of the processing capacity of nodes C= Υn,∀n. When C= 1, none of the

nodes is able to serve the tasks in class (1) due to their high resource utilization. However, the

tasks in class (2) are mainly served at the local node and class (3) tasks are placed at regional

and national nodes. When C= 10, some of the tasks in class (1) are placed at the local node.

Moreover, some tasks in class (2) and (3) are served at the local node as well. Furthermore, the

national node does not serve any task because JTO places the tasks at the nearest nodes in order

to reduce the transmit power. When C= 20, more tasks in class (1) are served at the local node

and the acceptance ratio reaches 1. Finally, when C= 50, almost all of the tasks are placed at

the local node to reduce the transmit power consumption. Table III shows the acceptance ratio

of each class for different values C. Note that the acceptance ratio of all classes is increased

by increasing C. Moreover, the acceptance ratio of class (1) is lower than that of classes (2)

and (3). The reason is twofold, one is due to high resource utilization by tasks of this class and

another is due to limited number of available nodes for tasks with low latency requirement (only

28

node ¯nin this example).

TABLE III: Acceptance ratio of JTO for different task classes vs. processing capacity of nodes.

Computational

capacity (109CPU cycles/sec)

Maximum acceptable latency (ms)

T(1) = 10 T(2) = 50 T(3) = 100

C= 1 0 0.5 0.9

C= 10 0.5 0.9 1

C= 20 1 1 1

Fig. 6 shows the acceptance ratio of LTO and JTO for different values of maximum acceptable

latency T. Due to the high computational complexity of exhaustive search in LTO, we consider

a simple network graph comprised of two nodes connected with a single link. Moreover, the

total number of tasks |K| is 20. The acceptance ratio of both JTO and LTO is lower for larger

computational loads. Meanwhile, the acceptance ratio of JTO is almost the same as LTO for

different values of Tand L.

0 10 20 30 40 50

T (ms)

0

0.2

0.4

0.6

0.8

1

Acceptance Ratio

LTO, L=10

JTO, L=10

LTO, L=20

JTO, L=20

Fig. 6: Acceptance ratio of LTO and JTO vs. maximum acceptable latency.

VIII. CONCLUSIONS AND FUTURE WORK

In this paper, we considered an energy-efﬁcient task ofﬂoading problem under E2E latency

constraints. We investigated the joint impact of radio transmission, propagation of tasks through

the transport network, and execution of tasks on the experienced latency of tasks. Due to the non-

convexity of the optimization problem, we decoupled the transmit power allocation from task

placement and computational resource allocation. The transmit power allocation was solved by

adopting CCP to convexify the sub-problem. The task placement and computational resource al-

location were solved via our proposed heuristic method, which minimizes the sum of propagation

and execution latencies. Furthermore, to ensure the feasibility of the optimization problem, we

proposed a feasibility analysis that eliminates the tasks causing infeasibility. Simulation results

29

showed the superiority of JTO over both DTO and ASM. The performance of DTO depended

on the part of latency required to be met in the radio access network, i.e., TRAN. However,

JTO showed higher acceptance ratios for different values of TRAN. As future work, we plan to

incorporate task scheduling into JTO. Moreover, the investigation of an innovative solution that

divides the required computational load of each task among several nodes will be an interesting

future research activity.

REFERENCES

[1] B. Yi, X. Wang, K. Li, S. K. Das, and M. Huang, “A comprehensive survey of network function virtualization,” Computer

Networks, vol. 133, pp. 212–262, 2018.

[2] P. Mach and Z. Becvar, “Mobile edge computing: A survey on architecture and computation ofﬂoading,” IEEE

Communications Surveys & Tutorials, vol. 19, no. 3, pp. 1628–1656, 2017.

[3] ETSI, “Mobile Edge Computing (MEC); Framework and reference architecture,” ETSI Group Speciﬁcation MEC 003,

2016.

[4] H. Guo, J. Liu, and J. Zhang, “Computation ofﬂoading for multi-access mobile edge computing in ultra-dense networks,”

IEEE Communications Magazine, vol. 56, no. 8, pp. 14–19, 2018.

[5] W. Almughalles, R. Chai, J. Lin, and A. Zubair, “Task execution latency minimization-based joint computation ofﬂoading

and cell selection for MEC-enabled HetNets,” in 2019 28th Wireless and Optical Communications Conference (WOCC),

pp. 1–5, IEEE, 2019.

[6] L. Yang, H. Zhang, M. Li, J. Guo, and H. Ji, “Mobile edge computing empowered energy efﬁcient task ofﬂoading in 5G,”

IEEE Transactions on Vehicular Technology, vol. 67, no. 7, pp. 6398–6409, 2018.

[7] T. X. Tran and D. Pompili, “Joint task ofﬂoading and resource allocation for multi-server mobile-edge computing networks,”

IEEE Transactions on Vehicular Technology, vol. 68, no. 1, pp. 856–868, 2019.

[8] T. Q. Dinh, J. Tang, Q. D. La, and T. Q. Quek, “Ofﬂoading in mobile edge computing: Task allocation and computational

frequency scaling,” IEEE Transactions on Communications, vol. 65, no. 8, pp. 3571–3584, 2017.

[9] B. Yang, W. K. Chai, Z. Xu, K. V. Katsaros, and G. Pavlou, “Cost-efﬁcient NFV-enabled mobile edge-cloud for low latency

mobile applications,” IEEE Transactions on Network and Service Management, vol. 15, no. 1, pp. 475–488, 2018.

[10] X. Chen, L. Jiao, W. Li, and X. Fu, “Efﬁcient multi-user computation ofﬂoading for mobile-edge cloud computing,”

IEEE/ACM Transactions on Networking, vol. 24, no. 5, pp. 2795–2808, 2015.

[11] K. Zhang, Y. Mao, S. Leng, Q. Zhao, L. Li, X. Peng, L. Pan, S. Maharjan, and Y. Zhang, “Energy-efﬁcient ofﬂoading for

mobile edge computing in 5G heterogeneous networks,” IEEE access, vol. 4, pp. 5896–5907, 2016.

[12] T. Li, C. S. Magurawalage, K. Wang, K. Xu, K. Yang, and H. Wang, “On efﬁcient ofﬂoading control in cloud radio access

network with mobile edge computing,” in 2017 IEEE 37th International Conference on Distributed Computing Systems

(ICDCS), pp. 2258–2263, IEEE, 2017.

[13] P. Zhao, H. Tian, C. Qin, and G. Nie, “Energy-saving ofﬂoading by jointly allocating radio and computational resources

for mobile edge computing,” IEEE Access, vol. 5, pp. 11255–11268, 2017.

[14] X. Zhang, Y. Mao, J. Zhang, and K. B. Letaief, “Multi-objective resource allocation for mobile edge computing systems,”

in 2017 IEEE 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC),

pp. 1–5, IEEE, 2017.

30

[15] J. Zhang, X. Hu, Z. Ning, E. C.-H. Ngai, L. Zhou, J. Wei, J. Cheng, and B. Hu, “Energy-latency tradeoff for energy-aware

ofﬂoading in mobile edge computing networks,” IEEE Internet of Things Journal, vol. 5, no. 4, pp. 2633–2645, 2017.

[16] C. Wang, F. R. Yu, C. Liang, Q. Chen, and L. Tang, “Joint computation ofﬂoading and interference management in wireless

cellular networks with mobile edge computing,” IEEE Transactions on Vehicular Technology, vol. 66, no. 8, pp. 7432–7445,

2017.

[17] C. You, K. Huang, H. Chae, and B.-H. Kim, “Energy-efﬁcient resource allocation for mobile-edge computation ofﬂoading,”

IEEE Transactions on Wireless Communications, vol. 16, no. 3, pp. 1397–1411, 2016.

[18] F. Zhou, Y. Wu, R. Q. Hu, and Y. Qian, “Computation rate maximization in UAV-enabled wireless-powered mobile-edge

computing systems,” IEEE Journal on Selected Areas in Communications, vol. 36, no. 9, pp. 1927–1941, 2018.

[19] A. Khalili, S. Zarandi, and M. Rasti, “Joint resource allocation and ofﬂoading decision in mobile edge computing,” IEEE

Communications Letters, vol. 23, no. 4, pp. 684–687, 2019.

[20] J. Zhang, W. Xia, F. Yan, and L. Shen, “Joint computation ofﬂoading and resource allocation optimization in heterogeneous

networks with mobile edge computing,” IEEE Access, vol. 6, pp. 19324–19337, 2018.

[21] W. Xia, J. Zhang, T. Q. Quek, S. Jin, and H. Zhu, “Power minimization-based joint task scheduling and resource allocation

in downlink C-RAN,” IEEE Transactions on Wireless Communications, vol. 17, no. 11, pp. 7268–7280, 2018.

[22] M.-H. Chen, M. Dong, and B. Liang, “Resource sharing of a computing access point for multi-user mobile cloud ofﬂoading

with delay constraints,” IEEE Transactions on Mobile Computing, vol. 17, no. 12, pp. 2868–2881, 2018.

[23] S. Li, N. Zhang, S. Lin, L. Kong, A. Katangur, M. K. Khan, M. Ni, and G. Zhu, “Joint admission control and resource

allocation in edge computing for internet of things,” IEEE Network, vol. 32, no. 1, pp. 72–79, 2018.

[24] J. Guo, Z. Song, Y. Cui, Z. Liu, and Y. Ji, “Energy-efﬁcient resource allocation for multi-user mobile edge computing,”

in GLOBECOM 2017-2017 IEEE Global Communications Conference, pp. 1–7, IEEE, 2017.

[25] M.-H. Chen, B. Liang, and M. Dong, “Joint ofﬂoading decision and resource allocation for multi-user multi-task mobile

cloud,” in 2016 IEEE International Conference on Communications (ICC), pp. 1–6, IEEE, 2016.

[26] A. Al-Shuwaili and O. Simeone, “Energy-efﬁcient resource allocation for mobile edge computing-based augmented reality

applications,” IEEE Wireless Communications Letters, vol. 6, no. 3, pp. 398–401, 2017.

[27] Y. Yu, J. Zhang, and K. B. Letaief, “Joint subcarrier and cpu time allocation for mobile edge computing,” in 2016 IEEE

Global Communications Conference (GLOBECOM), pp. 1–6, IEEE, 2016.

[28] J. Liu, Y. Mao, J. Zhang, and K. B. Letaief, “Delay-optimal computation task scheduling for mobile-edge computing

systems,” in 2016 IEEE International Symposium on Information Theory (ISIT), pp. 1451–1455, IEEE, 2016.

[29] J. Li, H. Gao, T. Lv, and Y. Lu, “Deep reinforcement learning based computation ofﬂoading and resource allocation for

MEC,” in 2018 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6, IEEE, 2018.

[30] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, “An overview of massive MIMO: Beneﬁts and

challenges,” IEEE journal of selected topics in signal processing, vol. 8, no. 5, pp. 742–758, 2014.

[31] J. W. Chinneck, Feasibility and Infeasibility in Optimization. Algorithms and Computational Methods, Springer, 2008.

[32] T. Lipp and S. Boyd, “Variations and extension of the convex–concave procedure,” Optimization and Engineering, vol. 17,

no. 4, pp. 263–287, 2016.

[33] M. Tajallifar, S. Ebrahimi, M. R. Javan, N. Mokari, and L. Chiaraviglio, “Energy-efﬁcient task ofﬂoading under E2E

latency constraints.” arXiv preprint arXiv:1912.00187, June 2021.