ArticlePDF Available

Abstract and Figures

Due to the pervasive diffusion of personal mobile and IoT devices, many “smart environments” (e.g., smart cities and smart factories) will be, among others, generators of huge amounts of data. To provide value-add services in these environments, data will have to be analysed to extract knowledge. Currently, this is typically achieved through centralised cloud-based data analytics services. However, according to many studies, this approach may present significant issues from the standpoint of data ownership, and even wireless network capacity. One possibility to cope with these shortcomings is to move data analytics closer to where data is generated. In this paper we tackle this issue by proposing and analysing a distributed learning framework, whereby data analytics are performed at the edge of the network, i.e., on locations very close to where data is generated. Specifically, in our framework, partial data analytics are performed directly on the nodes that generate the data, or on nodes close by (e.g., some of the data generators can take this role on behalf of subsets of other nodes nearby). Then, nodes exchange partial models and refine them accordingly. Our framework is general enough to host different analytics services. In the specific case analysed in the paper we focus on a learning task, considering two distributed learning algorithms. Using an activity recognition and a pattern recognition task, both on reference datasets, we compare the two learning algorithms between each other and with a central cloud solution (i.e., one that has access to the complete datasets). Our results show that using distributed machine learning techniques, it is possible to drastically reduce the network overhead, while obtaining performance comparable to the cloud solution in terms of learning accuracy. The analysis also shows when each distributed learning approach is preferable, based on the specific distribution of the data on the nodes.
Content may be subject to copyright.
A preview of the PDF is not available
... In general, with HTL, instead of training a model on the whole training set in a centralised way, multiple parallel models are trained on disjoint subsets, and then the partial models are combined to obtain a single final model. We already have successfully applied HTL to the case of distributed learning in IoT environments in [9] and [10], where we have presented an activity recognition solution, where we have shown that such solution is able to drastically cut the network traffic required to perform the learning task, with an affordable reduction of accuracy with respect to a conventional solution where data is transferred on a global cloud platform. ...
... Contrarily to [9,10], in this paper we consider that data, generated by IoT devices, need to be moved either to an edge gateway (and possibly to the cloud) or to a number of mobile nodes passing by the IoT devices. Both the gateway and the mobile nodes take the role of Data Collectors, and HTL is used to compute and exchange partial models between them. ...
... Precisely in that we find the optimal operating point of data aggregation on the nodes that guarantee a certain target model accuracy. Differently, in this paper, we consider an approach based on the Hypothesis Transfer results in [10,9,11], we can expect qualitatively similar results when the same analysis is performed on other datasets. ...
Preprint
Full-text available
Due to the pervasive diffusion of personal mobile and IoT devices, many "smart environments" (e.g., smart cities and smart factories) will be, generators of huge amounts of data. Currently, analysis of this data is typically achieved through centralised cloud-based services. However, according to many studies, this approach may present significant issues from the standpoint of data ownership, as well as wireless network capacity. In this paper, we exploit the fog computing paradigm to move computation close to where data is produced. We exploit a well-known distributed machine learning framework (Hypothesis Transfer Learning), and perform data analytics on mobile nodes passing by IoT devices, in addition to fog gateways at the edge of the network infrastructure. We analyse the performance of different configurations of the distributed learning framework, in terms of (i) accuracy obtained in the learning task and (ii) energy spent to send data between the involved nodes. Specifically, we consider reference wireless technologies for communication between the different types of nodes we consider, e.g. LTE, Nb-IoT, 802.15.4, 802.11, etc. Our results show that collecting data through the mobile nodes and executing the distributed analytics using short-range communication technologies, such as 802.15.4 and 802.11, allows to strongly reduce the energy consumption of the system up to $94\%$ with a loss in accuracy w.r.t. a centralised cloud solution up to $2\%$.
... Phenomena as digital age, Industry 4.0 (Lin et al., 2018;Pilloni, 2018;Vianna, Graeml, & Peinado, 2020), and smart environments (Valerio, Passarella, & Conti, 2017;Ullo & Sinha, 2020) turned crowdsourcing platforms into essential tools for organizations, since they are the means for companies to reach data and improve problem solving, product development, and process innovations (Hartmann & Henkel, 2020;Vianna et al., 2020). Thus, trying to better understand the characteristics of these platforms and their functioning, while seeking to consolidate the elements that comprise them, represents a relevant research effort. ...
... With the advent of the digital age and phenomena such as Industry 4.0 (Lin et al., 2018;Pilloni, 2018;Vianna et al., 2020) and smart environments (Ullo & Sinha, 2020;Valerio et al., 2017), there was an increase in the use of data as an input for organizational processes (Hartmann & Henkel, 2020;Vianna et al., 2020). This data is processed through ubiquitous, distributed, and innovation-enabling digital systems that provide services and products (Reuver et al., 2018;Hein et al., 2020;Rolland, Mathiassen, & Rai, 2018) to their users, many times referred to as digital platforms. ...
Article
Full-text available
This article intends to categorize different classifications used in the literature to distinguish among crowdsourcing platform types, based on their characteristics and intents. This is performed by means of a systematic literature review. The search for texts that combined the terms 'crowdsourcing' and 'taxonomy,' on the Google Scholar platform, resulted in 61 potential articles to be included in the corpus of the research, which were reduced to 13, after applying additional filtering. The study shows that taxonomies and classifications of platforms differ from author to author, each one of them adopting his/her own criteria and terminology. The 65 different crowdsourcing classifications that were found in the reviewed studies were reorganized in 16 groups, based on their characteristics. We believe that the current work contributes to the standardization of terminology and categorizations adopted in the literature and, therefore, to a better understanding of the crowdsourcing phenomenon.
... Peer-to-peer is another collaborative training architecture, in which participants are equal. Valerio et al. [144] adopt such training architecture for data analysis. Specifically, participants first perform partial analytic tasks separately with their own data. ...
... Federated learning is a kind of distributed learning [144], [484], [485], which allows training sets and models to be located in different, noncentralized positions, and learning can occur independently of time and places. This learning paradigm is first proposed by Google, which allows smartphones to collaboratively learn a shared model with their local training data, instead of uploading all data to a central cloud server [39]. ...
Article
Edge intelligence refers to a set of connected systems and devices for data collection, caching, processing, and analysis proximity to where data are captured based on artificial intelligence. Edge intelligence aims at enhancing data processing and protects the privacy and security of the data and users. Although recently emerged, spanning the period from 2011 to now, this field of research has shown explosive growth over the past five years. In this article, we present a thorough and comprehensive survey of the literature surrounding edge intelligence. We first identify four fundamental components of edge intelligence, i.e., edge caching, edge training, edge inference, and edge offloading based on theoretical and practical results pertaining to proposed and deployed systems. We then aim for a systematic classification of the state of the solutions by examining research results and observations for each of the four components and present a taxonomy that includes practical problems, adopted techniques, and application goals. For each category, we elaborate, compare, and analyze the literature from the perspectives of adopted techniques, objectives, performance, advantages and drawbacks, and so on. This article provides a comprehensive survey of edge intelligence and its application areas. In addition, we summarize the development of the emerging research fields and the current state of the art and discuss the important open issues and possible theoretical and technical directions.
... Users' personal and IoT devices already outnumber core devices, and this trend is not going to stop anytime soon ( [N16][C17]). A complementary trend coupled with the expansion of the Internet at the edge is the migration of network and computing functionalities towards the edge ( [SPC09], [BMZ12], [FLR2013], [KOM14], [HAHZ15], [LMED15], [YLL15], [BBCM16], [CZ16], [MC2016], [AD17], [MMG17], [RMMS17], [VPC17]). ...
Preprint
The cyber-physical convergence, the fast expansion of the Internet at its edge, and tighter interactions between human users and their personal mobile devices push towards a data-centric Internet where the human user becomes more central than ever. We argue that this will profoundly impact primarily on the way data should be handled in the Next Generation Internet. It will require a radical change of the Internet data-management paradigm, from the current platform-centric to a human-centric model. In this paper we present a new paradigm for Internet data management that we name Internet of People (IoP) because it embeds human behavior models in its algorithms. To this end, IoP algorithms exploit quantitative models of the humans' individual and social behavior, from sociology, anthropology, psychology, economics, physics. IoP is not a replacement of the current Internet networking infrastructure, but it exploits legacy Internet services as (reliable) primitives to achieve end-to-end connectivity on a global-scale. In this opinion paper, we first discuss the key features of the IoP paradigm along with the underlying research issues and challenges. Then, we present emerging data-management paradigms that are anticipating IoP.
... More sophisticated agents can also be implemented into this framework, and notably by adapting existing works of the literature that describe solutions for distributing learning algorithm in a network of machines [40]. While activity recognition itself is out of the scope of this paper, we refer the reader to [30] for a comprehensive and recent tour of the subject. ...
Article
Full-text available
In this paper, a new architecture is proposed for continuously generating, propagating, and delivering information by using event-based communication between independent agents. The resulting system can both handle heterogeneous smart environments and compute information in multiple places. With a communication method working as an abstraction layer, the proposed solution enables the use of multiple technologies at once. Additionally, different options for delivering the resulting data to client applications are explored. The implementation of this design as a platform written in Java with the Spring Framework is also presented, along with its handling of ten housing facilities equipped with various sensors (electromagnetic contacts, smart plugs, motion detectors, humidity, temperature, and light sensors). This paper is then concluded by an analysis of the platform workloads incurred by the tracking of a set of low-level activities. Finally, the code is distributed online for the benefit of the community.
Chapter
The pervasiveness of mobile devices is a common phenomenon nowadays, and with the emergence of the Internet of Things (IoT), an increasing number of connected devices are being deployed. In Smart Cities, data collection, processing, and distribution play critical roles in everyday quality of life and city planning and development. The use of Cloud computing to support massive amounts of data generated and consumed in Smart Cities has some limitations, such as increased latency and substantial network traffic, hampering support for a variety of applications that need low response times. In this chapter, we introduce and discuss aspects of distributed multi-tiered Mobile Edge Computing (MEC) architectures, which offer data storage and processing capabilities closer to data sources and data consumers, taking into account how mobility impacts the management of such infrastructure. The main goal is to address topics on how such infrastructure can be used to support content distribution from and to mobile users, how to optimize the resource allocation in such infrastructure, as well as how an intelligent layer can be added to the MEC/Fog infrastructure. Furthermore, a multifaceted literature review is given, as well as the open issues and challenging aspects of resource and application management will also be discussed in this chapter.
Article
It is a theoretical innovation of teaching concept that the distance English teaching integrates with intelligent learning based on the perspective of big data. The acquisition of distance English teaching data requires the realization of automatic collection, forming the collection of data systematization, standardization, sustainability and real-time. The design mode and storage mode of data resources in the early stage have strict requirements. The analysis of distance English teaching data is to provide guarantee for future intelligent learning, because of the complexity of teaching data, the continuous and changing dynamic data, difficulties in mining the law of change and large volatility affected by individuals. To explore the inherent law and value of distance English teaching data will effectively promote the improvement of learners’ learning efficiency, the improvement of the overall level of distance education, the further expansion of educational opportunities and the enhancement of employability, and provide reference value for other related fields. Therefore, it is very important and urgent to study scientific methods which are suitable for distance English intelligent learning from the perspective of big data.
Article
Quick Service Restaurant industry is a massive sector which has a huge and ever increasing share in the global food market. Efficient management of resources is crucial to provide service optimization and to avoid massive amount of wastes in such a huge domain. This requires fully automated intelligence by using the power of the Internet of Things (IoT), instead of human based methods that are inefficient and prone to errors. In an IoT platform, edge computing is a vital technology to provide low latency, less redundancy, resource utilization, extra security and real time decisions. In this paper, an edge oriented IoT architecture for Quick Service Restaurants is proposed. In the proposed architecture, data is collected from a variety of wireless sensor nodes and data sources and processed at the edge to make predictions, to create timely and meaningful alerts and to make some intelligent decisions with an aim of waste management and reduction. For this purpose, it is mainly focused on anomaly/outlier detection and production service level estimation by incorporating lightweight clustering and classification techniques. Several experiments are performed in a real restaurant environment and the results show that the IoT based automation system is capable of correctly deciding on (in advance) production level, as well as triggering alerts in case of any kind of anomalous waste conditions.
Chapter
Present cloud training (or cloud–edge training) is facing challenges in AI services requiring continuous learning and data privacy. Naturally, the edge architecture, which consists of a large number of edge nodes with modest computing resources, can cater for alleviating the pressure of networks and protecting data privacy by processing the data or training at themselves. Training at the edge or potentially among “end–edge–cloud,” treating the edge as the core architecture of training, is called “AI Training at Edge.” Such kind of training may require significant resources to digest distributed data and exchange updates in the hierarchical structure. Especially, FL is an emerging distributed learning setting and is promising to address these issues. For devices with diverse capabilities and limited network conditions in edge computing, FL can protect privacy in the time of handling non-IID training data, and has promising scalability in terms of efficient communication, resource optimization and security. As the principal content of this chapter, some selected works on FL are listed in the first table in this chapter.
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Article
As On-line Social Networks are nowadays largely used by mobile users and their posts potentially reveal - either explicitly or implicitly - much sensitive information about users, privacy control becomes a fundamental issue in such Mobile Social Networks (MSNs). In this paper we advocate that situational computing is the key ingredient for the development of effective mechanisms for privacy control in MSNs. We first describe an on-line survey carried out in order to understand the user’s requirements regarding privacy when using MSNs. The results suggest that users have dynamic and context-dependent privacy requirements and also pinpoints which types of context data are more relevant for the decision about the user’s willingness to share MSN content. Based on these findings, we propose SelPri, a solution developed as a proof of concept in form of an Android mobile social application that is integrated with Facebook. SelPri employs Fuzzy Logic to autonomously and dynamically adapt privacy settings of posts in MSNs according to the user’s current situation, freeing the user from the hassle of the manual configuration of the privacy settings whenever his/her situation changes. We also describe conducted evaluations of the user experience in using SelPri to assess its accuracy to identify user situations, and its usability and effectiveness in meeting the user’s dynamic and contextual privacy requirements.
Article
Mobile capped plans are being increasingly adopted by mobile operators due to an exponential data traffic growth. Users then often suffer high data consumption costs as well as poor quality of experience. In this paper, we introduce a novel content access scheme, Crowd-Cache, which enables mobile networking in proximity by exploiting the transient co-location of devices, the epidemic nature of content popularity, and the capabilities of smart mobile devices. Crowd-Cache provides mobile users access to popular content cheaply with low latency while improving the overall quality of experience. We model the Crowd-Cache system in a probabilistic framework using a real-life dataset of video content access. The simulation results show that, in a public transportation scenario, more than 80% of the passengers can save at least 40% on their cellular data usage during a typical average city bus commute of 10 minutes. Finally, we show the practical viability of the system by implementing and evaluating the system on Android devices.
Conference Paper
Many of today’s popular online social networks are disconnected from their users’ immediate social and physical con- text, which makes them poorly suited for supporting transient, on-purpose social communities of co-located users. We introduce the idea of a local dataspace that can mediate social interactions via freely user modifiable shared content. We demonstrate this concept via an opportunistic experience sharing application Here&Now.
Conference Paper
It is commonly assumed that in a smart city there will be thousands of mostly mobile/wireless smart devices (e.g. sensors, smart-phones, etc.) that will continuously generate big amounts of data. Data will have to be collected and processed in order to extract knowledge out of it, to feed users' and smart city applications. A typical approach to process such big amounts of data is to i) gather all the collected data on the cloud through wireless pervasive networks, and ii) perform data analysis operations exploiting machine learning techniques. However, according to many studies, this centralised cloud-based approach may not be sustainable from a networking point of view. The joint effect of data-intensive users' multimedia applications and smart cities monitoring and control applications may result in severe network congestions making applications hardly usable. To cope with this problem, in this paper we propose a distributed machine learning approach that does not require to move data in a centralised cloud platform, but processes it directly where it is collected. Specifically, we exploit Hypothesis Transfer Learning (HTL) to build a distributed machine learning framework. In our framework we train a series of partial models, each ``residing'' in a location where a subset of the dataset is generated. We then refine the partial models by exchanging them between locations, thus obtaining a unique complete model. Using an activity classification task on a reference dataset as a concrete example, we show that the classification accuracy of the HTL model is comparable with that of a model built out of the complete dataset, but the cost in term of network overhead is dramatically reduced. We then perform a sensitiveness analysis to characterise how the overhead depends on key parameters. It is also worth noticing that the HTL approach is suitable for applications dealing with privacy sensitive data, as data can stay where they are generated, and do not need to be transferred to third parties, i.e., to a cloud provider, to extract knowledge out of it.
Article
The Internet of Things (IoT) is a novel paradigm relying on the interaction of smart objects (things) among each other and with physical and/or virtual resources through the Internet. Despite the recent advances that have made IoT a reality, there are several challenges to be addressed towards exploiting its full potential and promoting tangible benefits to society, environment, economy, and individual citizens. Recently, Cloud Computing has been advocated as a promising approach to tackle some of the existing challenges in IoT while leveraging its adoption and bringing new opportunities. With the combination of IoT and Cloud Computing, the cloud becomes an intermediate layer between smart objects and applications that make use of data and resources provided by these objects. On the one hand, IoT can benefit from the almost unlimited resources of Cloud Computing to implement management and composition of services related to smart objects and their provided data. On the other hand, the cloud can benefit from IoT by broadening its operation scope to deal with real-world objects. In spite of this synergy, the literature still lacks of a broad, comprehensive overview on what has been investigated on the integration of IoT and Cloud Computing and what are the open issues to be addressed in future research and development. The goal of this work is to fill this gap by systematically collecting and analyzing studies available in the literature aiming to: (i) obtain a comprehensive understanding on the integration of IoT and Cloud Computing paradigms; (ii) provide an overview of the current state of research on this topic; and (iii) identify important gaps in the existing approaches as well as promising research directions. To achieve this goal, a systematic mapping study was performed covering papers recently published in journals, conferences, and workshops, available at five relevant electronic databases. As a result, 35 studies were selected presenting strategies and solutions on how to integrate IoT and Cloud Computing as well scenarios, research challenges, and opportunities in this context. Besides confirming the increasing interest on the integration of IoT and Cloud Computing, this paper reports the main outcomes of the performed systematic mapping by both presenting an overview of the state of the art on the investigated topic and shedding light on important challenges and potential directions to future research.