Article

Online Learning for Orchestration of Inference in Multi-User End-Edge-Cloud Networks

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Deep-learning-based intelligent services have become prevalent in cyber-physical applications including smart cities and health-care. Deploying deep-learning-based intelligence near the end-user enhances privacy protection, responsiveness, and reliability. Resource-constrained end-devices must be carefully managed in order to meet the latency and energy requirements of computationally-intensive deep learning services. Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency that can address application requirements through computation offloading. The decision to offload computation is a communication-computation co-optimization problem that varies with both system parameters (e.g., network condition) and workload characteristics (e.g., inputs). On the other hand, deep learning model optimization provides another source of tradeoff between latency and model accuracy. An end-to-end decision-making solution that considers such computation-communication problem is required to synergistically find the optimal offloading policy and model for deep learning services. To this end, we propose a reinforcement-learning-based computation offloading solution that learns optimal offloading policy considering deep learning model selection techniques to minimize response time while providing sufficient accuracy. We demonstrate the effectiveness of our solution for edge devices in an end-edge-cloud system and evaluate with a real-setup implementation using multiple AWS and ARM core configurations. Our solution provides 35% speedup in the average response time compared to the state-of-the-art with less than 0.9% accuracy reduction, demonstrating the promise of our online learning framework for orchestrating DL inference in end-edge-cloud systems.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Most current solutions are based on design time optimization, without considerations on varying system dynamics at run-time [5, 11-13, 34, 41, 45, 51, 64, 79, 80]. A complex system that runs a variety of applications in uncertain environmental conditions requires dynamic control to offer high-performance or low-power guarantees [32,58,60,61,63]. Considering the run-time variation of system dynamics, and making an optimal orchestration choice requires intelligent monitoring, analysis, and decision making. ...
... Orchestration Framework: Figure 9 shows an overview of generic intelligent orchestration framework for multi-layered end-edge-cloud architectures [63]. This framework uses system-wide information for intelligent orchestration through virtual system layers that include application, platform, network and hardware layers. ...
... The x-axis represents the number of active users. Each bar represents a different orchestration decision made by using the corresponding orchestration strategy [61,63]. In RL-optimized approach the average response time remains constant while the number of users is less than three. ...
Full-text available
Preprint
Smart eHealth applications deliver personalized and preventive digital healthcare services to clients through remote sensing, continuous monitoring, and data analytics. Smart eHealth applications sense input data from multiple modalities, transmit the data to edge and/or cloud nodes, and process the data with compute intensive machine learning (ML) algorithms. Run-time variations with continuous stream of noisy input data, unreliable network connection, computational requirements of ML algorithms, and choice of compute placement among sensor-edge-cloud layers affect the efficiency of ML-driven eHealth applications. In this chapter, we present edge-centric techniques for optimized compute placement, exploration of accuracy-performance trade-offs, and cross-layered sense-compute co-optimization for ML-driven eHealth applications. We demonstrate the practical use cases of smart eHealth applications in everyday settings, through a sensor-edge-cloud framework for an objective pain assessment case study.
... Having multiple and/or heterogeneous processing systems, which is typical for deep learning, adds significant complexity to the computing solutions [8]. Recently, using online learning for IIoT systems has gained attention [10]. ...
... As complete analysis is often unfeasible, methods based on instrumentation and learning offer a potential approach to implement self-stabilizing middleware. The research in [10] demonstrates how online learning can adopt a holistically optimal execution strategy for DL tasks. ...
Full-text available
Preprint
We study machine learning systems for real-time industrial quality control. In many factory systems, production processes must be continuously controlled in order to maintain product quality. Especially challenging are the systems that must balance in real-time between stringent resource consumption constraints and the risk of defective end-product. There is a need for automated quality control systems as human control is tedious and error-prone. We see machine learning as a viable choice for developing automated quality control systems, but integrating such system with existing factory automation remains a challenge. In this paper we propose introducing a new fog computing layer to the standard hierarchy of automation control to meet the needs of machine learning driven quality control.
... • In many practical applications, complete non-causal knowledge or even statistical knowledge of the system dynamics might not be available [11,12]. This is especially the case with energy harvesting processes when they are non-stationary or from sources with unknown distributions. ...
Full-text available
Article
We investigate an energy-harvesting IoT device transmitting (delay/jitter)-sensitive data over a wireless fading channel. The sensory module on the device injects captured event packets into its transmission buffer and relies on the random supply of the energy harvested from the environment to transmit them. Given the limited harvested energy, our goal is to compute optimal transmission control policies that decide on how many packets of data should be transmitted from the buffer’s head-of-line at each discrete timeslot such that a long-run criterion involving the average delay/jitter is either minimized or never exceeds a pre-specified threshold. We realistically assume that no advance knowledge is available regarding the random processes underlying the variations in the channel, captured events, or harvested energy dynamics. Instead, we utilize a suite of Q-learning-based techniques (from the reinforcement learning theory) to optimize the transmission policy in a model-free fashion. In particular, we come up with three Q-learning algorithms: a constrained Markov decision process (CMDP)-based algorithm for optimizing energy consumption under a delay constraint, an MDP-based algorithm for minimizing the average delay under the limitations imposed by the energy harvesting process, and finally, a variance-penalized MDP-based algorithm to minimize a linearly combined cost function consisting of both delay and delay variation. Extensive numerical results are presented for performance evaluation.
Full-text available
Article
With the rapid development of Internet-of-Things (IoT) devices and mobile communication technologies, Multi-access Edge Computing (MEC) has emerged as a promising paradigm to extend cloud computing and storage capabilities to the edge of cellular networks, near to IoT devices. MEC enables IoT devices with limited battery capacity and computation/storage capabilities to execute their computation-intensive and latency-sensitive applications at the edge of the networks. However, to efficiently execute these applications in MEC systems, each task must be properly offloaded and scheduled onto the MEC servers. Additionally, the MEC servers may intelligently balance and share their computing resources to satisfy the application QoS and QoE. Therefore, effective resource allocation (RA) mechanisms in MEC are vital for ensuring its foreseen advantages. Recently, Machine Learning (ML) and Deep Learning (DL) have emerged as key methods for many challenging aspects of MEC. Particularly, ML and DL play a crucial role in addressing the challenges of RA in MEC. This paper presents a comprehensive survey of ML/DL-based RA mechanisms in MEC. We first present tutorials that demonstrate the advantages of applying ML and DL in MEC. Then, we present enabling technologies for quickly running ML/DL training and inference in MEC. Afterward, we provide an in-depth survey of recent works that used ML/DL methods for RA in MEC from three aspects: (1) ML/DL-based methods for task offloading; (2) ML/DL-based methods for task scheduling; and (3) ML/DL-based methods for joint resource allocation. Finally, we discuss key challenges and future research directions of applying ML/DL for resource allocation in MEC networks.
Full-text available
Article
In edge computing, edge devices can offload their overloaded computing tasks to an edge server. This can give full play to an edge server’s advantages in computing and storage, and efficiently execute computing tasks. However, if they together offload all the overloaded computing tasks to an edge server, it can be overloaded, thereby resulting in the high processing delay of many computing tasks and unexpectedly high energy consumption. On the other hand, the resources in idle edge devices may be wasted and resource-rich cloud centers may be underutilized. Therefore, it is essential to explore a computing task collaborative scheduling mechanism with an edge server, a cloud center and edge devices according to task characteristics, optimization objectives and system status. It can help one realize efficient collaborative scheduling and precise execution of all computing tasks. This work analyzes and summarizes the edge computing scenarios in an edge computing paradigm. It then classifies the computing tasks in edge computing scenarios. Next, it formulates the optimization problem of computation offloading for an edge computing system. According to the problem formulation, the collaborative scheduling methods of computing tasks are then reviewed. Finally, future research issues for advanced collaborative scheduling in the context of edge computing are indicated.
Full-text available
Article
Mobile edge computing (MEC) emerges recently as a promising solution to relieve resource-limited mobile devices from computation-intensive tasks, which enables devices to offload workloads to nearby MEC servers and improve the quality of computation experience. In this paper, an MEC enabled multi-user multi-input multi-output (MIMO) system with stochastic wireless channels and task arrivals is considered. In order to minimize long-term average computation cost in terms of power consumption and buffering delay at each user, a deep reinforcement learning (DRL)-based dynamic computation offloading strategy is investigated to build a scalable system with limited feedback. Specifically, a continuous action space-based DRL approach named deep deterministic policy gradient (DDPG) is adopted to learn decentralized computation offloading policies at all users respectively, where local execution and task offloading powers will be adaptively allocated according to each user’s local observation. Numerical results demonstrate that the proposed DDPG-based strategy can help each user learn an efficient dynamic offloading policy and also verify the superiority of its continuous power allocation capability to policies learned by conventional discrete action space-based reinforcement learning approaches like deep Q-network (DQN) as well as some other greedy strategies with reduced computation cost. Besides, power-delay tradeoff for computation offloading is also analyzed for both the DDPG-based and DQN-based strategies.
Full-text available
Article
A large number of connected sensors and devices in Internet of Things (IoT) can generate large amounts of computing data and increase massive energy consumption. Real-time states monitoring and data processing of IoT nodes are of great significance, but the processing power of IoT devices is limited. Using the emerging mobile edge computing (MEC), IoT devices can offload computing tasks to an MEC server associated with small or macro base stations. Moreover, the use of renewable energy harvesting capabilities in base stations or IoT nodes may reduce energy consumption. As wireless channel conditions vary with time and the arrival rates of renewable energy, computing tasks are stochastic, and data offloading and renewable energy aware for IoT devices under a dynamic and unknown environment are major challenges. In this work, we design a data offloading and renewable energy aware model considering an MEC server performing multiple stochastic computing tasks and involving time-varied wireless channels. To optimize data transmission delay, energy consumption, and bandwidth allocation jointly, and to avoid the curse of dimensionality caused by the complexity of the action space, we propose a joint optimization method for data offloading, renewable energy aware, and bandwidth allocation for IoT devices based on deep reinforcement learning (JODRBRL), which can handle the continuous action space. JODRBRL can minimize the total system cost(including data buffer delay cost, energy consumption cost, and bandwidth cost) and obtain an efficient solution by adaptively learning from the dynamic IoT environment. The numerical results demonstrate that JODRBRL can effectively learn the optimal policy, which outperforms Dueling DQN, Double DQN (DDQN), and greedy policy in stochastic environments.
Full-text available
Article
Multi-access Edge Computing (MEC) has emerged as a flexible and cost-effective paradigm, enabling resource constrained mobile devices to offload, either partially or completely, computationally intensive tasks to a set of servers at the edge of the network. Given that the shared nature of the servers' resources introduces high computation and communication uncertainty, in this paper we consider users' risk-seeking or loss-aversion behavior in their final decisions regarding the portion of their computing tasks to be offloaded at each server in a multi-MEC server environment, while executing the rest locally. This is achieved by capitalizing on the power and principles of Prospect Theory and Tragedy of the Commons, treating each MEC server as a Common Pool of Resources available to all the users, while being rivarlous and subtractable, thus may potentially fail if over-exploited by the users. The goal of each user becomes to maximize its perceived satisfaction, as expressed through a properly formulated prospect-theoretic utility function, by offloading portion of its computing tasks to the different MEC servers. To address this problem and conclude to the optimal allocation strategy, a non-cooperative game among the users is formulated and the corresponding Pure Nash Equilibrium (PNE), i.e., optimal data offloading, is determined, while a distributed low-complexity algorithm that converges to the PNE is introduced. The performance and key principles of the proposed framework are demonstrated through modeling and simulation, while useful insights about the users' data offloading decisions under realistic conditions and behaviors are presented.
Full-text available
Article
Fueled by the availability of more data and computing power, recent breakthroughs in cloud-based machine learning (ML) have transformed every aspect of our lives from face recognition and medical diagnosis to natural language processing. However, classical ML exerts severe demands in terms of energy, memory and computing resources, limiting their adoption for resource constrained edge devices. The new breed of intelligent devices and high-stake applications (drones, augmented/virtual reality, autonomous systems, etc.), requires a novel paradigm change calling for distributed, low-latency and reliable ML at the wireless network edge (referred to as edge ML). In edge ML, training data is unevenly distributed over a large number of edge nodes, which have access to a tiny fraction of the data. Moreover training and inference are carried out collectively over wireless links, where edge devices communicate and exchange their learned models (not their private data). In a first of its kind, this article explores key building blocks of edge ML, different neural network architectural splits and their inherent tradeoffs, as well as theoretical and technical enablers stemming from a wide range of mathematical disciplines. Finally, several case studies pertaining to various high-stake applications are presented demonstrating the effectiveness of edge ML in unlocking the full potential of 5G and beyond.
Full-text available
Conference Paper
In the era of Fog computing where one can decide to compute certain time-critical tasks at the edge of the network, designers often encounter a question whether the sensor layer provides the optimal response time for a service, or the Fog layer, or their combination. In this context, minimizing the total response time using computation migration is a communication-computation co-optimization problem as the response time does not depend only on the computational capacity of each side. In this paper, we aim at investigating this question and addressing it in certain situations. We formulate this question as a static or dynamic computation migration problem depending on whether certain communication and computation characteristics of the underlying system is known at design-time or not. We first propose a static approach to find the optimal computation migration strategy using models known at design-time. We then make a more realistic assumption that several sources of variation can affect the system's response latency (e.g., the change in computation time, bandwidth, transmission channel reliability, etc.), and propose a dynamic computation migration approach which can adaptively identify the latency optimal computation layer at runtime. We evaluate our solution using a case-study of artificial neural network based arrhythmia classification using a simulation environment as well as a real test-bed.
Full-text available
Article
With the Internet of Things (IoT) becoming part of our daily life and our environment, we expect rapid growth in the number of connected devices. IoT is expected to connect billions of devices and humans to bring promising advantages for us. With this growth, fog computing, along with its related edge computing paradigms, such as multi-access edge computing (MEC) and cloudlet, are seen as promising solutions for handling the large volume of security-critical and time-sensitive data that is being produced by the IoT. In this paper, we first provide a tutorial on fog computing and its related computing paradigms, including their similarities and differences. Next, we provide a taxonomy of research topics in fog computing, and through a comprehensive survey, we summarize and categorize the efforts on fog computing and its related computing paradigms. Finally, we provide challenges and future directions for research in fog computing.
Full-text available
Article
This paper introduces a framework of device-to-device edge computing and networks (D2D-ECN), a new paradigm for computation offloading and data processing with a group of resource-rich devices towards collaborative optimization between communication and computation. However, the computation process of task intensive applications would be interrupted when capacity-limited battery energy run out. In order to tackle this issue, the D2D-ECN with energy harvesting technology is applied to provide a green computation network and guarantee service continuity. Specifically, we design a reinforcement learning framework in a point-to-point offloading system to overcome challenges of the dynamic nature and uncertainty of renewable energy, channel state and task generation rates. Furthermore, to cope with high-dimensionality and continuous-valued action of the offloading system with multiple cooperating devices, we propose an online approach based on Lyapunov optimization for computation offloading and resource management without priori energy and network information. Numerical results demonstrate that our proposed scheme can reduce system operation cost with low task execution time in D2D-ECN.
Full-text available
Article
The recent ground-breaking advances in deep learning networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-limited embedded devices. Offloading the computation into the cloud is often infeasible due to privacy concerns, high latency, or the lack of connectivity. As such, there is a critical need to find a way to effectively execute the DNN models locally on the devices. This paper presents an adaptive scheme to determine which DNN model to use for a given input, by considering the desired accuracy and inference time. Our approach employs machine learning to develop a predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this by first training off-line a predictive model, and then use the learnt model to select a DNN model to use for new, unseen inputs. We apply our approach to the image classification task and evaluate it on a Jetson TX2 embedded deep learning platform using the ImageNet ILSVRC 2012 validation dataset. We consider a range of influential DNN models. Experimental results show that our approach achieves a 7.52% improvement in inference accuracy, and a 1.8x reduction in inference time over the most-capable single DNN model.
Full-text available
Conference Paper
Abstract—Deep learning shows great promise in providing more intelligence to augmented reality (AR) devices, but few AR apps use deep learning due to lack of infrastructure support. Deep learning algorithms are computationally intensive, and front-end devices cannot deliver sufficient compute power for real-time processing. In this work, we design a framework that ties together front-end devices with more powerful backend “helpers” (e.g., home servers) to allow deep learning to be executed locally or remotely in the cloud/edge. We consider the complex interaction between model accuracy, video quality, battery constraints, network data usage, and network conditions to determine an optimal offloading strategy. Our contributions are: (1) extensive measurements to understand the tradeoffs between video quality, network conditions, battery consumption, processing delay, and model accuracy; (2) a measurement-driven mathematical framework that efficiently solves the resulting combinatorial optimization problem; (3) an Android application that performs real-time object detection for AR applications, with experimental results that demonstrate the superiority of our approach.
Full-text available
Article
To improve the quality of computation experience for mobile devices, mobile-edge computing (MEC) is a promising paradigm by providing computing capabilities in close proximity within a sliced radio access network (RAN), which supports both traditional communication and MEC services. Nevertheless, the design of computation offloading policies for a virtual MEC system remains challenging. Specifically, whether to execute a computation task at the mobile device or to offload it for MEC server execution should adapt to the time-varying network dynamics. This paper considers MEC for a representative mobile user in an ultra-dense sliced RAN, where multiple base stations (BSs) are available to be selected for computation offloading. The problem of solving an optimal computation offloading policy is modelled as a Markov decision process, where our objective is to maximize the long-term utility performance whereby an offloading decision is made based on the task queue state, the energy queue state as well as the channel qualities between MU and BSs. To break the curse of high dimensionality in state space, we first propose a double deep Q-network (DQN) based strategic computation offloading algorithm to learn the optimal policy without knowing a priori knowledge of network dynamics. Then motivated by the additive structure of the utility function, a Q-function decomposition technique is combined with the double DQN, which leads to a novel learning algorithm for the solving of stochastic computation offloading. Numerical experiments show that our proposed learning algorithms achieve a significant improvement in computation offloading performance compared with the baseline policies.
Full-text available
Article
Deep neural networks are among the most influential architectures of deep learning algorithms, being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants (IPAs), autonomous cars, and smart home services often employ either simple local models or complex remote models on the cloud. Mobile-only and cloud-only computations are currently the status quo approaches. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side, but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward and backward propagation in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18X and 32X reductions on the latency and mobile energy consumption of querying DNNs, respectively.
Full-text available
Conference Paper
In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This article reviews the recent advances in deep reinforcement learning with focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.
Full-text available
Article
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of deep neural network to improve energy-efficiency and throughput without sacrificing performance accuracy or increasing hardware cost are critical to enabling the wide deployment of DNNs in AI systems. This article aims to provide a comprehensive tutorial and survey about the recent advances towards the goal of enabling efficient processing of DNNs. Specifically, it will provide an overview of DNNs, discuss various platforms and architectures that support DNNs, and highlight key trends in recent efficient processing techniques that reduce the computation cost of DNNs either solely via hardware design changes or via joint hardware design and network algorithm changes. It will also summarize various development resources that can enable researchers and practitioners to quickly get started on DNN design, and highlight important benchmarking metrics and design considerations that should be used for evaluating the rapidly growing number of DNN hardware designs, optionally including algorithmic co-design, being proposed in academia and industry. The reader will take away the following concepts from this article: understand the key design considerations for DNNs; be able to evaluate different DNN hardware implementations with benchmarks and comparison metrics; understand trade-offs between various architectures and platforms; be able to evaluate the utility of various DNN design techniques for efficient processing; and understand of recent implementation trends and opportunities.
Article
Internet of Things (IoT) paradigm raises challenges for devising efficient strategies that offload applications to the fog or the cloud layer while ensuring the optimal response time for a service. Traditional computation offloading policies assume the response time is only dominated by the execution time. However, the response time is a function of many factors including contextual parameters and application characteristics that can change over time. For the computation offloading problem, the majority of existing literature presents efficient solutions considering a limited number of parameters (e.g., computation capacity and network bandwidth) neglecting the effect of the application characteristics and dataflow configuration. In this paper, we explore the impact of the computation offloading on total application response time in three-layer IoT systems considering more realistic parameters, e.g., application characteristics, system complexity, communication cost, and dataflow configuration. This paper also highlights the impact of a new application characteristic parameter defined as Output-Input Data Generation (OIDG) ratio and dataflow configuration on the system behavior. In addition, we present a proof-of-concept end-to-end dynamic computation offloading technique, implemented in a real hardware setup, that observes the aforementioned parameters to perform real-time decision-making.
Article
Smart mobile devices (SMDs) can meet users’ high expectations by executing computational intensive applications but they only have limited resources including CPU, memory, battery power and wireless medium. To tackle this limitation, partial computation offloading can be used as a promising method to schedule some tasks of applications from resource-limited SMDs to high-performance edge servers. However, it brings communication overhead issues caused by limited bandwidth, and inevitably increases latency of tasks offloaded to edge servers. Therefore, it is highly challenging to achieve the balance between high-resource consumption in SMDs and high communication cost for providing energy-efficient and latency-low services to users. This work proposes a partial computation offloading method to minimize the total energy consumed by SMDs and edge servers by jointly optimizing offloading ratio of tasks, CPU speeds of SMDs, allocated bandwidth of available channels and transmission power of each SMD in each time slot. It jointly considers execution time of tasks performed in SMDs and edge servers, and transmission time of data. It also jointly considers latency limits, CPU speeds, transmission power limits, available energy of SMDs, and maximum number of CPU cycles and memories in edge servers. Considering these factors, a nonlinear constrained optimization problem is formulated and solved by a novel hybrid meta-heuristic algorithm named Genetic Simulated-annealing-based Particle swarm optimization (GSP) to produce a close-to-optimal solution. GSP achieves joint optimization of computation offloading between a cloud data center and the edge, and resource allocation in the data center. Real-life data-based experimental results prove that it achieves lower energy consumption in less convergence time than its three typical peers.
Article
Edge computing is a new architecture to provide computing, storage, and networking resources for achieving the Internet of Things. It brings computation to the network edge in close proximity to users. However, nodes in the edge have limited energy and resources. Completely running tasks in the edge may cause poor performance. Cloud data centers (CDCs) have rich resources for executing tasks, but they are located in places far away from users. CDCs lead to long transmission delays and large financial costs for utilizing resources. Therefore, it is essential to smartly offload users’ tasks between a CDC layer and an edge computing layer. This work proposes a cloud and edge computing system, which has a terminal layer, edge computing layer, and CDC layer. Based on it, this work designs a profit-maximized collaborative computation offloading and resource allocation algorithm to maximize the profit of systems and guarantee that response time limits of tasks are strictly met. In each time slot, this work jointly considers CPU, memory, and bandwidth resources, load balance of all heterogeneous nodes in the edge layer, maximum amount of energy, maximum number of servers, and task queue stability in the CDC layer. Considering the abovementioned factors, a single-objective constrained optimization problem is formulated and solved by a proposed simulated-annealing-based migrating birds optimization procedure to obtain a close-to-optimal solution. The proposed method achieves joint optimization of computation offloading between CDC and edge, and resource allocation in CDC. Realistic data-based simulation results demonstrate that it realizes higher profit than its peers. Note to Practitioners —This work considers the joint optimization of computation offloading between Cloud data center (CDC) and edge computing layers, and resource allocation in CDC. It is important to maximize the profit of distributed cloud and edge computing systems by optimally scheduling all tasks between them given user-specific response time limits of tasks. It is challenging to execute them in nodes in the edge computing layer because their computation resources and battery capacities are often constrained and heterogeneous. Current offloading methods fail to jointly optimize computation offloading and resource allocation for nodes in the edge and servers in CDC. They are insufficient and coarse-grained to schedule arriving tasks. In this work, a novel algorithm is proposed to maximize the profit of distributed cloud and edge computing systems while meeting response time limits of tasks. It explicitly specifies the task service rate and the selected node for each task in each time slot by considering resource limits, load balance requirement, and processing capacities of nodes in the edge, and server and energy constraints in CDC. Real-life data-driven simulations show that the proposed method realizes a larger profit than several typical offloading strategies. It can be readily implemented and incorporated into large-scale industrial computing systems.
Article
With the maturity of 5G technology and the popularity of intelligent terminal devices, the traditional cloud computing service model cannot deal with the explosive growth of business data quickly. Therefore, the purpose of mobile edge computing (MEC) is to effectively solve problems such as latency and network load. In this paper, deep reinforcement learning (DRL) is first proposed to solve the offloading problem of multiple service nodes for the cluster and multiple dependencies for mobile tasks in large-scale heterogeneous MEC. Then the paper uses the LSTM network layer and the candidate network set to improve the DQN algorithm in combination with the actual environment of the MEC. Finally, the task offloading problem is simulated by using iFogSim and Google Cluster Trace. The simulation results show that the offloading strategy based on the improved IDRQN algorithm has better performance in energy consumption, load balancing, latency and average execution time than other algorithms.
Conference Paper
Smartphones and wearable devices, such as smart watches, can act as mobile gateways and sensor nodes in IoT applications, respectively. In conventional IoT systems, wearable devices gather and transmit data to mobile gateways where most of computations are performed. However, the improvement of wearable devices, in recent years, has decreased the gap in terms of computation capability with mobile gateways. For this reason, some recent works present offloading schemes to utilize wearable devices and hence reducing the burden of mobile gateways for specific applications. However, a comprehensive study of offloading methods on wearable devices has not been conducted. In this paper, nine applications from the LOCUS's benchmark have been utilized and tested on different boards having hardware specification close to wearable devices and mobile gateways. The execution time and energy consumption results of running the benchmark on the boards are measured. The results are then used for providing insights for system designers when designing and choosing a suitable computation method for IoT systems to achieve a high quality of service (QoS). The results show that depending on the application, offloading methods can be used for achieving certain improvements in energy efficiency. In addition, the paper compares energy consumption of a mobile gateway when running the applications in both serial and multi-threading fashions.
Article
Due to their on-body and ubiquitous nature, wearables can generate a wide range of unique sensor data creating countless opportunities for deep learning tasks. We propose DeepWear, a deep learning (DL) framework for wearable devices to improve the performance and reduce the energy footprint. DeepWear strategically offloads DL tasks from a wearable device to its paired handheld device through local network connectivity such as Bluetooth. Compared to the remote-cloud-based offloading, DeepWear requires no Internet connectivity, consumes less energy, and is robust to privacy breach. DeepWear provides various novel techniques such as context-aware offloading, strategic model partition, and pipelining support to efficiently utilize the processing capacity from nearby paired handhelds. Deployed as a user-space library, DeepWear offers developer-friendly APIs that are as simple as those in traditional DL libraries such as TensorFlow. We have implemented DeepWear on the Android OS and evaluated it on COTS smartphones and smartwatches with real DL models. DeepWear brings up to 5.08X and 23.0X execution speedup, as well as 53.5 and 85.5 percent energy saving compared to wearable-only and handheld-only strategies, respectively.
Article
Internet of Things (IoT) devices can apply mobile edge computing (MEC) and energy harvesting (EH) to provide high level experiences for computational intensive applications and concurrently to prolong the lifetime of the battery. In this paper, we propose a reinforcement learning (RL) based offloading scheme for an IoT device with EH to select the edge device and the offloading rate according to the current battery level, the previous radio transmission rate to each edge device and the predicted amount of the harvested energy. This scheme enables the IoT device to optimize the offloading policy without knowledge of the MEC model, the energy consumption model and the computation latency model. Further, we present a deep RL based offloading scheme to further accelerate the learning speed. Their performance bounds in terms of the energy consumption, computation delay and utility are provided and verified via simulations for an IoT device that uses wireless power transfer for energy harvesting. Simulation results show that the proposed RL based offloading scheme reduces the energy consumption, computation delay and task drop rate and thus increases the utility of the IoT device in the dynamic MEC in comparison with the benchmark offloading schemes.
Article
Mobile Edge Computing (MEC) has recently emerged as a promising paradigm to meet the increasing computation demands in Internet of Things (IoT). However, due to the limited computation capacity of the MEC server, an efficient computation offloading scheme, which means the IoT device decides whether to offload the generated data to the MEC server, is needed. Considering the limited battery capacity of IoT devices, energy harvesting (EH) is introduced to enhance the lifetime of the IoT systems. However, due to the unpredictability nature of the generated data and the harvested energy, it is a challenging problem when designing an effective computation offloading scheme for the EH MEC system. To cope with this problem, we model the computation offloading process as a Markov Decision Process (MDP) so that no prior statistic information is needed. Then, reinforcement learning algorithms can be adopted to derive the optimal offloading policy. To address the large time complexity challenge of learning algorithms, we first introduce an after-state for each state-action pair so that the number of states in the formulated MDP is largely decreased. Then, to deal with the continuous state space challenge, a polynomial value function approximation method is introduced to accelerate the learning process. Thus, an after-state reinforcement learning algorithm for the formulated MDP is proposed to obtain the optimal offloading policy. To provide efficient instructions for real MEC systems, several analytical properties of the offloading policy are also presented. Our simulation results validate the great performance of our proposed algorithm, which significantly improves the achieved system reward under a reasonable complexity.
Conference Paper
Current wisdom to run computation-intensive deep neural network (DNN) on resource-constrained mobile devices is allowing the mobile clients to make DNN queries to central cloud servers, where the corresponding DNN models are pre-installed. Unfortunately, this centralized, cloud-based DNN offloading is not appropriate for emerging decentralized cloud infrastructures (e.g., cloudlet, edge/fog servers), where the client may send computation requests to any nearby server located at the edge of the network. To use such a generic edge server for DNN execution, the client should first upload its DNN model to the server, yet it can seriously delay query processing due to long uploading time. This paper proposes IONN (Incremental Offloading of Neural Network), a partitioning-based DNN offloading technique for edge computing. IONN divides a client's DNN model into a few partitions and uploads them to the edge server one by one. The server incrementally builds the DNN model as each DNN partition arrives, allowing the client to start offloading partial DNN execution even before the entire DNN model is uploaded. To decide the best DNN partitions and the uploading order, IONN uses a novel graph-based algorithm. Our experiments show that IONN significantly improves query performance in realistic hardware configurations and network conditions.
Article
Computation offloading is a protuberant elucidation for the resource-constrained mobile devices to accomplish the process demands high computation capability. The mobile cloud is the well-known existing offloading platform, which usually far-end network solution, to leverage computation of the resource-constrained mobile devices. Because of the far-end network solution, the user devices experience higher latency or network delay, which negatively affects the real-time mobile Internet of things (IoT) applications. Therefore, this paper proposed near-end network solution of computation offloading in mobile edge/fog. The mobility, heterogeneity and geographical distribution mobile devices through several challenges in computation offloading in mobile edge/fog. However, for handling the computation resource demand from the massive mobile devices, a deep Q-learning based autonomic management framework is proposed. The distributed edge/fog network controller (FNC) scavenging the available edge/fog resources i.e. processing, memory, network to enable edge/fog computation service. The randomness in the availability of resources and numerous options for allocating those resources for offloading computation fits the problem appropriate for modeling through Markov decision process (MDP) and solution through reinforcement learning. The proposed model is simulated through MATLAB considering oscillated resource demands and mobility of end user devices. The proposed autonomic deep Q-learning based method significantly improves the performance of the computation offloading through minimizing the latency of service computing. The total power consumption due to different offloading decisions is also studied for comparative study purpose which shows the proposed approach as energy efficient with respect to the state-of-the-art computation offloading solutions.
Conference Paper
Limited processing power and memory prevent realization of state of the art algorithms on the edge level. Offloading computations to the cloud comes with tradeoffs as compression techniques employed to conserve transmission bandwidth and energy adversely impact accuracy of the algorithm. In this paper, we propose collaborative processing to actively guide the output of the sensor to improve performance on the end application. We apply this methodology to smart surveillance specifically the task of object detection from video. Perceptual quality and object detection performance is characterized and improved under a variety of channel conditions.
Article
Deep convolutional neural networks (CNNs) have recently achieved dramatic accuracy improvements in many visual recognition tasks. However, existing deep convolutional neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep CNNs without significantly decreasing the classification accuracy. During the past few years, tremendous progress has been made in this area. In this paper, we survey the recent advanced techniques for compacting and accelerating CNNs model developed. These techniques are roughly categorized into four schemes: parameter pruning and sharing, low-rank factorization, transfered/compact convolutional filters and knowledge distillation. Methods of parameter pruning and sharing will be described in detail at the beginning, and all the others will introduced. For methods of each scheme, we provide insightful analysis regarding the performance, related applications, advantages and drawbacks etc. Then we will go through a few very recent additional successful methods, for example, dynamic networks and stochastic depths networks. After that, we survey the evaluation matrix, main datasets used for the evaluating the model performance and recent benchmarking efforts. Finally we conclude this paper, discuss remaining challenges and possible directions in this topic.
Article
We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed computing hierarchy, a DDNN can scale up in neural network size and scale out in geographical span. Due to its distributed nature, DDNNs enhance sensor fusion, system fault tolerance and data privacy for DNN applications. In implementing a DDNN, we map sections of a DNN onto a distributed computing hierarchy. By jointly training these sections, we minimize communication and resource usage for devices and maximize usefulness of extracted features which are utilized in the cloud. The resulting system has built-in support for automatic sensor fusion and fault tolerance. As a proof of concept, we show a DDNN can exploit geographical diversity of sensors to improve object recognition accuracy and reduce communication cost. In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
Article
The computation for today's intelligent personal assistants such as Apple Siri, Google Now, and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires significant amounts of data to be sent to the cloud over the wireless network and puts significant computational pressure on the datacenter. However, as the computational resources in mobile devices become more powerful and energy efficient, questions arise as to whether this cloud-only processing is desirable moving forward, and what are the implications of pushing some or all of this compute to the mobile devices on the edge. In this paper, we examine the status quo approach of cloud-only processing and investigate computation partitioning strategies that effectively leverage both the cycles in the cloud and on the mobile device to achieve low latency, low energy consumption, and high datacenter throughput for this class of intelligent applications. Our study uses 8 intelligent applications spanning computer vision, speech, and natural language domains, all employing state-of-the-art Deep Neural Networks (DNNs) as the core machine learning technique. We find that given the characteristics of DNN algorithms, a fine-grained, layer-level computation partitioning strategy based on the data and computation variations of each layer within a DNN has significant latency and energy advantages over the status quo approach. Using this insight, we design Neurosurgeon, a lightweight scheduler to automatically partition DNN computation between mobile devices and datacenters at the granularity of neural network layers. Neurosurgeon does not require per-application profiling. It adapts to various DNN architectures, hardware platforms, wireless networks, and server load levels, intelligently partitioning computation for best latency or best mobile energy. We evaluate Neurosurgeon on a state-of-the-art mobile development platform and show that it improves end-to-end latency by 3.1X on average and up to 40.7X, reduces mobile energy consumption by 59.5% on average and up to 94.7%, and improves datacenter throughput by 1.5X on average and up to 6.7X.
Article
Technological evolution of mobile user equipments (UEs), such as smartphones or laptops, goes hand-in-hand with evolution of new mobile applications. However, running computationally demanding applications at the UEs is constrained by limited battery capacity and energy consumption of the UEs. Suitable solution extending the battery life-time of the UEs is to offload the applications demanding huge processing to a conventional centralized cloud (CC). Nevertheless, this option introduces significant execution delay consisting in delivery of the offloaded applications to the cloud and back plus time of the computation at the cloud. Such delay is inconvenient and make the offloading unsuitable for real-time applications. To cope with the delay problem, a new emerging concept, known as mobile edge computing (MEC), has been introduced. The MEC brings computation and storage resources to the edge of mobile network enabling to run the highly demanding applications at the UE while meeting strict delay requirements. The MEC computing resources can be exploited also by operators and third parties for specific purposes. In this paper, we first describe major use cases and reference scenarios where the MEC is applicable. After that we survey existing concepts integrating MEC functionalities to the mobile networks and discuss current advancement in standardization of the MEC. The core of this survey is, then, focused on user-oriented use case in the MEC, i.e., computation offloading. In this regard, we divide the research on computation offloading to three key areas: i) decision on computation offloading, ii) allocation of computing resource within the MEC, and iii) mobility management. Finally, we highlight lessons learned in area of the MEC and we discuss open research challenges yet to be addressed in order to fully enjoy potentials offered by the MEC.
Article
Mobile edge computing (a.k.a. fog computing) has recently emerged to enable \emph{in-situ} processing of delay-sensitive applications at the edge of mobile networks. Providing grid power supply in support of mobile edge computing, however, is costly and even infeasible (in certain rugged or under-developed areas), thus mandating on-site renewable energy as a major or even sole power supply in increasingly many scenarios. Nonetheless, the high intermittency and unpredictability of renewable energy make it very challenging to deliver a high quality of service to users in renewable-powered mobile edge computing systems. In this paper, we address the challenge of incorporating renewables into mobile edge computing and propose an efficient reinforcement learning-based resource management algorithm, which learns on-the-fly the optimal policy of dynamic workload offloading (to centralized cloud) and edge server provisioning to minimize the long-term system cost (including both service delay and operational cost). Our online learning algorithm uses a decomposition of the (offline) value iteration and (online) reinforcement learning, thus achieving a significant improvement of learning rate and run-time performance when compared to standard reinforcement learning algorithms such as Q-learning.
Article
Deep Neural Networks (DNN) have achieved state-of-the-art results in a wide range of tasks, with the best results obtained with large training sets and large models. In the past, GPUs enabled these breakthroughs because of their greater computational speed. In the future, faster computation at both training and test time is likely to be crucial for further progress and for consumer applications on low-power devices. As a result, there is much interest in research and development of dedicated hardware for Deep Learning (DL). Binary weights, i.e., weights which are constrained to only two possible values (e.g. -1 or 1), would bring great benefits to specialized DL hardware by replacing many multiply-accumulate operations by simple accumulations, as multipliers are the most space and power-hungry components of the digital implementation of neural networks. We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated. Like other dropout schemes, we show that BinaryConnect acts as regularizer and we obtain near state-of-the-art results with BinaryConnect on the permutation-invariant MNIST, CIFAR-10 and SVHN.
Article
Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy, by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG16 found that the network as a whole can be reduced 6.8x just by pruning the fully-connected layers, again with no loss of accuracy.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Article
In recent years, deep neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
Article
To date, reinforcement learning has mostly been studied solving simple learning tasks. Reinforcement learning methods that have been studied so far typically converge slowly. The purpose of this work is thus two-fold: 1) to investigate the utility of reinforcement learning in solving much more complicated learning tasks than previously studied, and 2) to investigate methods that will speed up reinforcement learning. This paper compares eight reinforcement learning frameworks: adaptive heuristic critic (AHC) learning due to Sutton, Q-learning due to Watkins, and three extensions to both basic methods for speeding up learning. The three extensions are experience replay, learning action models for planning, and teaching. The frameworks were investigated using connectionism as an approach to generalization. To evaluate the performance of different frameworks, a dynamic environment was used as a testbed. The environment is moderately complex and nondeterministic. This paper describes these frameworks and algorithms in detail and presents empirical evaluation of the frameworks.
To offload or not to offload? the bandwidth and energy costs of mobile cloud computing
  • V Marco
  • Sokol Barbera
  • Alessandro Kosta
  • Julinda Mei
  • Stefa
Marco V Barbera, Sokol Kosta, Alessandro Mei, and Julinda Stefa. 2013. To offload or not to offload? the bandwidth and energy costs of mobile cloud computing. In 2013 Proceedings Ieee Infocom. IEEE, 1285-1293.
Dynamic Computation Offloading Based on Deep Reinforcement Learning
  • Baichuan Cheng
  • Zhilong Zhang
  • Danpu Liu
  • Cheng BaiChuan
BaiChuan Cheng, ZhiLong Zhang, and DanPu Liu. 2019. Dynamic Computation Offloading Based on Deep Reinforcement Learning. In 12th EAI International Conference on Mobile Multimedia Communications, Mobimedia 2019. European Alliance for Innovation (EAI).
  • G Andrew
  • Menglong Howard
  • Bo Zhu
  • Dmitry Chen
  • Weijun Kalenichenko
  • Tobias Wang
  • Marco Weyand
  • Hartwig Andreetto
  • Adam
Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
  • Sicong Liu
  • Junzhao Du
  • Kaiming Nan
  • Atlas Wang
  • Yingyan Lin
Sicong Liu, Junzhao Du, Kaiming Nan, Atlas Wang, Yingyan Lin, et al. 2020. AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for Enabling Ubiquitous Intelligent Mobiles. arXiv preprint arXiv:2006.04432 (2020).
  • Bradley Mcdanel
  • Surat Teerapittayanon
  • H T Kung
Bradley McDanel, Surat Teerapittayanon, and HT Kung. 2017. Embedded binarized neural networks. arXiv preprint arXiv:1709.02260 (2017).
Reinforcement Learning. O’Reilly Media. Phil Winder. 2020. Reinforcement Learning
  • Phil Winder
Phil Winder. 2020. Reinforcement Learning. O'Reilly Media.
AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for Enabling Ubiquitous Intelligent Mobiles
  • Sicong Liu
  • Junzhao Du
  • Kaiming Nan
  • Atlas Wang
  • Yingyan Lin
  • Liu Sicong