Fig 2 - uploaded by Laurent Prosperi
Content may be subject to copyright.
A hybrid infrastructure where Planner is used to deploy parts of the computation on edge devices. The connectors are small pieces of software used to plug Planner with other cloudbased analytics systems (e.g., Apache Flink).

A hybrid infrastructure where Planner is used to deploy parts of the computation on edge devices. The connectors are small pieces of software used to plug Planner with other cloudbased analytics systems (e.g., Apache Flink).

Source publication
Article
Full-text available
Stream processing applications handle unbounded and continuous flows of data items which are generated from multiple geographically distributed sources. Two approaches are commonly used for processing: Cloud-based analytics and Edge analytics. The first one routes the whole data set to the Cloud, incurring significant costs and late results from th...

Context in source publication

Context 1
... automatically and transparently delegates a light part of the computation to Edge devices (e.g., running embedded edge processing engines) in order to minimize the network cost and the end-to-end processing time of the Cloud based stream processing. It does so as a thin extension of a traditional cloud-based SPE (e.g., Apache Flink in our case) to support hybrid deployments, as shown in Figure 2. ...

Similar publications

Preprint
Full-text available
In this paper, we consider the IoT data discovery data objects to specific nodes in the network. They are very problem in very large and growing scale networks. Specifically, we investigate in depth the routing table summarization techniques to support effective and space-efficient IoT data discovery routing. Novel summarization algorithms, includi...

Citations

... So partitioning of computation between edge devices and the cloud is also a requirement. Planner [23] is a middleware for cost-effective, transparent and unified stream analytics on edge and cloud, which can effectively and automatically partition computation between edge and cloud, this merger is called hybrid stream processing, By adding flexibility to improve system performance, while data is filtered and collected locally, it is very easy to process live data using a hybrid approach. Planner follows two models: a resource model and a network cost model, the resource model is used for stream processing, and the network cost model is used for communication and flow between edge and cloud resources. ...
Article
Edge technology aims to bring cloud resources (specifically, the computation, storage, and network) to the closed proximity of the edge devices, i.e., smart devices where the data are produced and consumed. Embedding computing and application in edge devices lead to emerging of two new concepts in edge technology: edge computing and edge analytics. Edge analytics uses some techniques or algorithms to analyse the data generated by the edge devices. With the emerging of edge analytics, the edge devices have become a complete set. Currently, edge analytics is unable to provide full support to the analytic techniques. The edge devices cannot execute advanced and sophisticated analytic algorithms following various constraints such as limited power supply, small memory size, limited resources, etc. This article aims to provide a detailed discussion on edge analytics. The key contributions of the paper are as follows-a clear explanation to distinguish between the three concepts of edge technology: edge devices, edge computing, and edge analytics, along with their issues. In addition, the article discusses the implementation of edge analytics to solve many problems and applications in various areas such as retail, agriculture, industry, and healthcare. Moreover, the research papers of the state-of-the-art edge analytics are rigorously reviewed in this article to explore the existing issues, emerging challenges, research opportunities and their directions, and applications.
... Scheduling operators has largely been studied in the literature for DSPA applications deployed in the Cloud or distributed on peer nodes of a data center, where the computational resources are abundant [19,20]. On the other hand, related work on processing data streams at the IoT network edge mostly focuses on reducing the required network bandwidth and resulting delays to reach the Cloud by exploiting to the maximum the available computational resources at the Edge/Fog layers [21][22][23][24]. Given that the Edge/Fog nodes come with heterogeneous and limited computational resources, using to the maximum their available computational resources may in turn impair on the DSPA application performance i.e. violate time-constraint [14]. ...
... operator placement In order to deploy a physical DSPA DAG on our computing infrastructure, we need to select the nodes on which operators will be executed to respect the QoS objective of the DSPA application and eventually an overall optimal resource usage. Placement decisions are usually made once at deployment time [22][23][24]. Some placement algorithms [8,46,47] continue to be active also at run-time, in order to response to changes in the DSPA application workload or changes in the availability of the allocated resources. ...
... Similarly, works like [23,[110][111][112][113] consider a hierarchical resource network where IoT devices are placed at the bottom (Edge) and the Cloud is placed at the top of the hierarchy. A hybrid network architecture is proposed in [114,114,115] including a P2P local area network (LAN) ...
Thesis
Data stream processing and analytics (DSPA) applications are widely used to process the ever increasing amounts of data streams produced by highly geographically distributed data sources, such as fixed and mobile IoT devices, in order to extract valuable information in a timely manner for actuation. DSPA applications are typically deployed in the Cloud to benefit from practically unlimited computational resources on demand.However, such centralized and distant computing solutions may suffer from limited network bandwidth and high network delay.Additionally, data propagation to the Cloud may compromise the privacy of sensitive data.To effectively handle this volume of data streams, the emerging Edge/Fog computing paradigm is used as the middle-tier between the Cloud and the IoT devices to process data streams closer to their sources and to reduce the network resource usage and network delay to reach the Cloud. However, Edge/Fog computing comes with limited computational resource capacities and requires deciding which part of the DSPA application should be performed in the Edge/Fog layers while satisfying the application response time constraint for timely actuation. Furthermore, the computational and network resources across the Edge-Fog-Cloud architecture can be shareable among multiple DSPA (and other) applications, which calls for efficient resource usage. In this PhD research, we propose a new model for assessing the usage cost of resources across the Edge-Fog-Cloud architecture.Our model addresses both computational and network resources and enables dealing with the trade-offs that are inherent to their joint usage.It precisely characterizes the usage cost of resources by distinguishing between abundant and constrained resources as well as by considering their dynamic availability, hence covering both resources dedicated to a single DSPA application and shareable resources.We complement our system modeling with a response time model for DSPA applications that takes into account their windowing characteristics.Leveraging these models, we formulate the problem of scheduling streaming operators over a hierarchical Edge-Fog-Cloud resource architecture.Our target problem presents two distinctive features. First, it aims at jointly optimizing the resource usage cost for computational and network resources, while few existing approaches have taken computational resources into account in their optimization goals.More precisely, our aim is to schedule a DSPA application in a way that it uses available resources in the most efficient manner. This enables saving valuable resources for other DSPA (and non DSPA) applications that share the same resource architecture. Second, it is subject to a response time constraint, while few works have dealt with such a constraint; most approaches for scheduling time-critical (DSPA) applications include the response time in their optimization goals.To solve our formulated problem, we introduce several heuristic algorithms that deal with different versions of the problem: static resource-aware scheduling that each time calculates a new system deployment from the outset, time-aware and resource-aware scheduling, dynamic scheduling that takes into account the current deployment.Finally, we extensively and comparatively evaluate our algorithms with realistic simulations against several baselines that either we introduce or that originate / are inspired from the existing literature. Our results demonstrate that our solutions advance the current state of the art in scheduling DSPA applications.
... Similarly to our approach, Planner [8] introduces a heuristic for deploying DSPA applications between the Cloud and the Edge/Fog in order to minimize the network resource usage for reaching the Cloud. Planner relies on a minimum edge-cut algorithm to split separately each data stream processing path. ...
... Similarly, the edge-cloud-based query processing is also playing a vital role in the effort of reducing the query execution latency [1,3,[15][16][17][18][19][20]. Edge and fog computing have emerged as promising paradigms for meeting stringent processing demands of latency-sensitive applications [20][21][22][23][24][25]. ...
Article
Full-text available
IoT (Internet of Things) streaming data has increased dramatically over the recent years and continues to grow rapidly due to the exponential growth of connected IoT devices. For many IoT applications, fast stream query processing is crucial for correct operations. To achieve better query performance and quality, researchers and practitioners have developed various types of query execution models—purely cloud-based, geo-distributed, edge-based, and edge-cloud-based models. Each execution model presents unique challenges and limitations of query processing optimizations. In this work, we provide a comprehensive review and analysis of query execution models within the context of the query execution latency optimization. We also present a detailed overview of various query execution styles regarding different query execution models and highlight their contributions. Finally, the paper concludes by proposing promising future directions towards advancing the query executions in the edge and cloud environment.
... Thus, partitioning of computation between Edge devices and Cloud is also a requirement. Planner [22] is a middleware for cost-efficient, transparent, and uniform stream analytics on Edge and Cloud. It effectively and automatically partitions the computations between Edge and Cloud. ...
Preprint
Full-text available
Edge technology aims to bring Cloud resources (specifically, the compute, storage, and network) to the closed proximity of the Edge devices, i.e., smart devices where the data are produced and consumed. Embedding computing and application in Edge devices lead to emerging of two new concepts in Edge technology, namely, Edge computing and Edge analytics. Edge analytics uses some techniques or algorithms to analyze the data generated by the Edge devices. With the emerging of Edge analytics, the Edge devices have become a complete set. Currently, Edge analytics is unable to provide full support for the execution of the analytic techniques. The Edge devices cannot execute advanced and sophisticated analytic algorithms following various constraints such as limited power supply, small memory size, limited resources, etc. This article aims to provide a detailed discussion on Edge analytics. A clear explanation to distinguish between the three concepts of Edge technology, namely, Edge devices, Edge computing, and Edge analytics, along with their issues. Furthermore, the article discusses the implementation of Edge analytics to solve many problems in various areas such as retail, agriculture, industry, and healthcare. In addition, the research papers of the state-of-the-art edge analytics are rigorously reviewed in this article to explore the existing issues, emerging challenges, research opportunities and their directions, and applications.
... In the contrary in our work we aim at minimizing both the Fog computational and Fog to Cloud network resource usage. On the other hand, related work on operator placement and scheduling mostly focus on optimizing network usage and ensuring system availability without optimizing computational resource usage [3,9,[13][14][15]17]. A centralized framework to place DSPA operators over a set of distributed nodes is proposed in [3]. ...
... Unfortunately, SpanEdge does not optimize the computational resource usage. On the other hand, Plannar [14] proposes a uniform approach for deploying a DSPA application between the Cloud and the Edge. It relies on a minimum edge-cut algorithm to split separately each data stream processing path: the first part of the cut is deployed at the Edge by considering constraints involving data locality and computational resource usage, while the other is deployed on the Cloud. ...
... There is, however, a lack of consensus regarding the definition of edge computing. Existing work considers edge computing as numerous IoT devices collecting data at the edges of the network [21,31,32], other works consider MD, routers etc. [14,33,34], and part of the existing work, like in this thesis, consider edge computing as a two-layered infrastructure with one layer containing IoT devices, and another layer with MD, routers and gateways [35,36,37]. ...
... Second, it selects resources that host sources or sinks (lines [13][14][15][16][17]. Third, CPU and memory requirements from the operators that are neither sources or sinks are summed to ReqCP U and ReqM em, respectively (line 20). When the evaluated layer is IoT, the set of resources is sorted by CPU and memory capacity in descending order and ReqCP U and ReqM em are used to select a subset of computing resources whose combined capacity meets the requirements (lines [29][30][31][32][33][34]. For the other two layers, the function iterates through the list of operators selecting a worst-fit resource that supports the operator's requirements. ...
Thesis
Technology has evolved to a point where applications and devicesare highly connected and produce ever-increasing amounts of dataused by organizations and individuals to make daily decisions. Forthe collected data to become information that can be used indecision making, it requires processing. The speed at whichinformation is extracted from data generated by a monitored systemTechnology has evolved to a point where applications and devicesare highly connected and produce ever-increasing amounts of dataused by organizations and individuals to make daily decisions. Forthe collected data to become information that can be used indecision making, it requires processing. The speed at whichinformation is extracted from data generated by a monitored systemor environment affects how fast organizations and individuals canreact to changes. One way to process the data under short delays isthrough Data Stream Processing (DSP) applications. DSPapplications can be structured as directed graphs, where the vertexesare data sources, operators, and data sinks, and the edges arestreams of data that flow throughout the graph. A data source is anapplication component responsible for data ingestion. Operatorsreceive a data stream, apply some transformation or user-definedfunction over the data stream and produce a new output stream,until the latter reaches a data sink, where the data is stored,visualized or provided to another application.
Conference Paper
In Internet of Things (IoT) scenarios, huge number of sensors and mobile devices generate large volume of data with high velocity which needs real-time processing (monitoring or decision support and analytics) reflecting the data when it is still fresh, meaningful, and most valuable. There are multiple data stream processing engines for distributed stream processing in cloud-based, fog-based, and cloud-fog hybrid environments. One of the important challenges in distributed data stream processing is to determine task allocation on the available resources. This paper reviews the researches of data stream processing task placement in hybrid cloud and fog environment and compare their results. Finally, the challenges raised in this context are discussed.
Chapter
Stream Processing (SP), i.e., the processing of data in motion, as soon as it becomes available, is a hot topic in cloud computing. Various SP stacks exist today, with applications ranging from IoT analytics to processing of payment transactions. The backbone of said stacks are Stream Processing Engines (SPEs), software packages offering a high-level programming model and scalable execution of data stream processing pipelines. SPEs have been traditionally developed to work inside a single datacenter, and optimised for speed. With the advent of Fog computing, however, the processing of data streams needs to be carried out over multiple geographically distributed computing sites: Data gets typically pre-processed close to where they are generated, then aggregated at intermediate nodes, and finally globally and persistently stored in the Cloud. SPEs were not designed to address these new scenarios. In this paper, we argue that large scale Fog-based stream processing should rely on the coordinated composition of geographically dispersed SPE instances. We propose an architecture based on the composition of multiple SPE instances and their communication via distributed message brokers. We introduce SpecK, a tool to automate the deployment and adaptation of pipelines over a Fog computing platform. Given a description of the pipeline, SpecK covers all the operations needed to deploy a stream processing computation over the different SPE instances targeted, using their own APIs and establishing the required communication channels to forward data among them. A prototypical implementation of SpecK is presented, and its performance is evaluated over Grid’5000, a large-scale, distributed experimental facility.