ArticlePublisher preview available
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

Distributed systems have been an active field of research for over 60 years, and has played a crucial role in computer science, enabling the invention of the Internet that underpins all facets of modern life. Through technological advancements and their changing role in society, distributed systems have undergone a perpetual evolution, with each change resulting in the formation of a new paradigm. Each new distributed system paradigm—of which modern prominence include cloud computing, Fog computing, and the Internet of Things (IoT)—allows for new forms of commercial and artistic value, yet also ushers in new research challenges that must be addressed in order to realize and enhance their operation. However, it is necessary to precisely identify what factors drive the formation and growth of a paradigm, and how unique are the research challenges within modern distributed systems in comparison to prior generations of systems. The objective of this work is to study and evaluate the key factors that have influenced and driven the evolution of distributed system paradigms, from early mainframes, inception of the global inter-network, and to present contemporary systems such as edge computing, Fog computing and IoT. Our analysis highlights assumptions that have driven distributed systems appear to be changing, including (1) an accelerated fragmentation of paradigms driven by commercial interests and physical limitations imposed by the end of Moore’s law, (2) a transition away from generalized architectures and frameworks towards increasing specialization, and (3) each paradigm architecture results in some form of pivoting between centralization and decentralization coordination. Finally, we discuss present day and future challenges of distributed research pertaining to studying complex phenomena at scale and the role of distributed systems research in the context of climate change.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Computing (2021) 103:1859–1878
https://doi.org/10.1007/s00607-020-00900-y
1 3
REGULAR PAPER
The evolution ofdistributed computing systems:
fromfundamental tonew frontiers
DominicLindsay1 · SukhpalSinghGill2 · DariaSmirnova1·
PeterGarraghan1
Received: 6 February 2020 / Accepted: 28 December 2020 / Published online: 30 January 2021
© The Author(s), under exclusive licence to Springer-Verlag GmbH, AT part of Springer Nature 2021
Abstract
Distributed systems have been an active field of research for over 60 years, and has
played a crucial role in computer science, enabling the invention of the Internet that
underpins all facets of modern life. Through technological advancements and their
changing role in society, distributed systems have undergone a perpetual evolution,
with each change resulting in the formation of a new paradigm. Each new distrib-
uted system paradigm—of which modern prominence include cloud computing, Fog
computing, and the Internet of Things (IoT)—allows for new forms of commercial
and artistic value, yet also ushers in new research challenges that must be addressed
in order to realize and enhance their operation. However, it is necessary to precisely
identify what factors drive the formation and growth of a paradigm, and how unique
are the research challenges within modern distributed systems in comparison to
prior generations of systems. The objective of this work is to study and evaluate the
key factors that have influenced and driven the evolution of distributed system para-
digms, from early mainframes, inception of the global inter-network, and to present
contemporary systems such as edge computing, Fog computing and IoT. Our analy-
sis highlights assumptions that have driven distributed systems appear to be chang-
ing, including (1) an accelerated fragmentation of paradigms driven by commercial
interests and physical limitations imposed by the end of Moore’s law, (2) a transition
away from generalized architectures and frameworks towards increasing specializa-
tion, and (3) each paradigm architecture results in some form of pivoting between
centralization and decentralization coordination. Finally, we discuss present day and
future challenges of distributed research pertaining to studying complex phenomena
at scale and the role of distributed systems research in the context of climate change.
Keywords Distributed computing· Computing systems· Evolution· Green
computing
* Sukhpal Singh Gill
s.s.gill@qmul.ac.uk
Extended author information available on the last page of the article
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
... Due to its revolutionary impact on the industry, distributed computing has continued to be one of the most significant developments in the computing world this century. The application of multiuser, multiprogramming, and multi-agent systems has expanded (Dominic et al., 2021). A distributed computing approach shares the parts of a software system among multiple computers, or nodes. ...
Article
The Cyberspace has witnessed a surge in different form of attacks with the user now at the losing end. The need to improve the performance of the crypto system has necessitated the urge to develop systems that are almost impenetrable to attackers and improve the level of confidentiality of data transmissions over the network especially Distributed Systems. A lot of research has recently been conducted on several neural net-based encryption techniques using single-layer or multilayer perceptron models.Hybrid combination of conventional crypto system and Neural Crypto system has been discovered to be effective in curbing the incessant network data breaches. In this paper, a framework for the development of a hybrid data encryption system is presented. Some of the known techniques was analysed and the various steps for the development of the hybrid algorithm presented. Finally, this work developed an algorithm for an enhanced crypto system with the combination of IDEA (International Data Encryption Algorithm and Artificial Feed Forward Neural Network (AFFNN) Algorithms.
... represents not merely a technical evolution but a fundamental reimagining of computational models that will define digital infrastructure for decades to come" [1]. As Bakshi and colleagues note in their comprehensive historical analysis, this evolution can be traced through distinct architectural epochs, each representing a 3-5× increase in system complexity and a corresponding expansion of capabilities [3]. ...
Article
Full-text available
Distributed systems form the backbone of our modern digital infrastructure, powering everything from global financial transactions to streaming services and real-time communications. As these systems evolve from simple client-server architectures to complex interconnected components spanning multiple geographic regions, they face transformative trends and significant challenges. Edge computing, serverless architectures, AI-driven operations, data mesh principles, chaos engineering, and sustainability initiatives are reshaping how distributed systems are designed and operated. Meanwhile, organizations must navigate expanding security concerns, complex regulatory requirements, growing operational complexity, and interoperability challenges. This article explores the future landscape of large-scale distributed systems, offering insights into how forward-thinking organizations can leverage emerging patterns and technologies to deliver more responsive, reliable, and efficient services while maintaining security and environmental responsibility in an increasingly connected world.
... The establishment of such systems was made possible by several crucial technological advances, such as advanced networking, specialized hardware, and development of the abovementioned supercomputers. Fast data transfer speeds between nodes were made available by low-latency networking technologies, while specialized hardware such as Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs) significantly enhanced the efficiency and performance of the computations [26][27][28][29][30][31][32]. ...
Preprint
Full-text available
High-Performance Computing (HPC) revolutionized the field of computational science by enabling it to process vast amounts of data and execute complex simulations at remarkable speed and efficiency. This paper describes various paradigms of computing, namely, client-server computing, distributed computing, cloud computing, and edge computing, and elaborates on their technology drivers. Particular focus is given to HPC, its architecture, programming models, performance metrics, scalability, and applications. The article highlights the significant impact of HPC to science in fields of medicine, biophysics, business, and engineering by facilitating paradigm-altering scientific discoveries, economic forecasting, and simulations of complex engineering. In addition, the article mentions challenges to deploying HPC, such as scalability, handling resources, and integration of multi-core processors. Through comparative analysis and benchmarking techniques, the research points to the necessity of continuous hardware and software innovations to make HPC systems efficient and sustainable. The study brings out the revolutionary potential of HPC in modern computing and its central role in solving the most complex computational problems in the modern age.
... Instead of concentrating data processing and resource management in clusters, this approach spreads the tasks across a broader set of fog nodes. Each fog node within the network can independently process data, execute applications, and manage resources (Lindsay et al., 2021;Dastjerdi et al., 2016). The core operational methodologies underpinning distributed architecture in fog computing encompass the following (Jin et al., 2018;Naha et al., 2018): (i) Decentralized Processing: Fog nodes are equipped to independently handle data processing and application execution, reducing the need for centralized cluster heads. ...
Article
Full-text available
Undoubtedly, energy efficiency forms a fundamental pillar of the fog computing model. Processing data at the network's edge leads to a substantial reduction in energy consumption when contrasted with the alternative of transmitting all data to remote data centers, typically associated with cloud computing. This energy-saving approach not only promotes a more environmentally friendly footprint but also serves to prolong the operational life of battery-powered IoT devices, a particularly critical aspect, especially in remote or challenging-to-access environments. Thus, fog computing plays a crucial role in the operation of massive energy-saving IoT or green IoT networks. This study offers a comprehensive survey of recent research endeavors focused on achieving energy-efficient fog computing and eco-friendly fog computing solutions for IoT networks. The article initiates with an introductory overview of fog computing and subsequently delves into an in-depth exploration of various energy-conservation techniques tailored for fog computing environments. These techniques encompass energy-conscious architectural designs, data aggregation and compression strategies, low-power hardware implementations, energy-aware scheduling methods, task offloading mechanisms, resource utilization optimization , virtualization techniques, and energy harvesting approaches. In addition, this investigation introduces novel methodologies and outlines prospective research pathways to bolster the energy efficiency of fog computing. Moreover, practical applications are presented to highlight the potential advantages and obstacles associated with deploying energy-conscious strategies, providing insights into their effectiveness and practical implications in real-world scenarios. Essentially, this article can be considered a roadmap towards the realization of a sustainable fog computing ecosystem for extensive IoT networks. In addition, opens the door for interested researchers to follow and continue the vision of energy-efficient computing.
... In the last decade, two factors have altered the way software is architected: containerization and service-orientated architecture [1,21,22,23]. Building software to run in containers is not a new concept. Containerization was created and has been around for decades to keep the kernel safe by isolating the running software kernel to take all the host's resources and compromise its stability. ...
Preprint
Full-text available
The modern datacenter's computing capabilities have far outstripped the applications running within and have become a hidden cost of doing business due to how software is architected and deployed. Resources are over-allocated to monolithic applications that sit idle for large parts of the day. If applications were architected and deployed differently, shared services could be used for multiple applications as needed. When combined with powerful orchestration software, containerized microservices can both deploy and dynamically scale applications from very small to very large within moments scaling the application not only across a single datacenter but across all datacenters where the application(s) are deployed. In this paper, we analyze data from an application(s) deployed both as a single monolithic codebase and as a containerized application using microservice-based architecture to calculate the performance and computing resource waste are both architected and deployed. A modern approach is offered as a solution as a path from how to go from a monolithic codebase to a more efficient, reliable, scalable, and less costly deployment model.
Article
The paper introduces TinyThunder, an asynchronous Byzantine fault tolerance (BFT) protocol designed to minimize communication overhead during inter-node message transmission. Regardless of the original transaction sizes, TinyThunder optimizes the acknowledgment overhead for a transaction to a constant size (e.g. 8 bytes). This optimization is based on a key observation in BFT systems: each transaction is redundantly stored by at least one honest node. Instead of transmitting the original transaction, TinyThunder only needs to send a specific feature value to confirm a transaction, leading to the development of our new compact reliable broadcast protocol. Additionally, we introduce a novel block compensation protocol that ensures the consistency of recovering these feature values and enables TinyThunder to achieve the desirable property of strong validity. The implementation and evaluation of TinyThunder in large-scale wide-area network environments demonstrate its superiority over the well-known HoneyBadgerBFT, with higher throughput (increased by 122%) and lower latency (reduced by 54%). Notably, TinyThunder also exhibits significant bandwidth savings for larger individual transaction sizes. For transactions of 250B in size, TinyThunder reduces bandwidth consumption by 56% compared to HoneyBadgerBFT.
Chapter
Cloud computing has been a driving force for many technological innovations and transformations in various domains and industries. It offers scalable and cost-effective storage and processing capabilities suitable for handling large volumes of structured and unstructured data. Could computing has enabled organizations to leverage data-driven insights and decision making, as well as to create new products and services based on data analysis.
Article
Full-text available
Cloud computing has emerged as a dominant platform for computing for the foreseeable future. A key factor in the adoption of this technology is its security and reliability. Here, this article addresses a key challenge which is the secure allocation of resources. The authors propose a security-based resource allocation model for execution of cloud workloads called STARK. The solution is designed to ensure security against probing, User to Root (U2R), Remote to Local (R2L) and Denial of Service (DoS) attacks whilst the execution of heterogeneous cloud workloads. Further, this paper highlights the promising directions for future research.
Article
Full-text available
Current cloud computing frameworks host millions of physical servers that utilize cloud computing resources in the form of different virtual machines. Cloud Data Center (CDC) infrastructures require significant amounts of energy to deliver large scale computational services. Moreover, computing nodes generate large volumes of heat, requiring cooling units in turn to eliminate the effect of this heat. Thus, overall energy consumption of the CDC increases tremendously for servers as well as for cooling units. However, current workload allocation policies do not take into account effect on temperature and it is challenging to simulate the thermal behaviour of CDCs. There is a need for a thermal-aware framework to simulate and model the behaviour of nodes and measure the important performance parameters which can be affected by its temperature. In this paper, we propose a lightweight framework, ThermoSim, for modelling and simulation of thermal-aware resource management for cloud computing environments. This work presents a Recurrent Neural Network based deep learning temperature predictor for CDCs which is utilized by ThermoSim for lightweight resource management in constrained cloud environments. ThermoSim extends the CloudSim toolkit helping to analyse the performance of various key parameters such as energy consumption, service level agreement violation rate, number of virtual machine migrations and temperature during the management of cloud resources for execution of workloads. Further, different energy-aware and thermal-aware resource management techniques are tested using the proposed ThermoSim framework in order to validate it against the existing framework (Thas). The experimental results demonstrate the proposed framework is capable of modelling and simulating the thermal behaviour of a CDC and ThermoSim framework is better than Thas in terms of energy consumption, cost, time, memory usage and prediction accuracy.
Article
Full-text available
Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.
Chapter
Full-text available
The growing number of IoT devices brings challenges to the existing centralised computing system. The existing security protocols are unable to protect the security and privacy of the user data. The current IoT system rely on centralised model. The decentralised IoT system would not only reduce the infrastructure cost but provide standardised peer-to-peer communication model for the massive transactions. However, peer-to-peer communication model has a big challenge of security. The blockchain technology ensures transparent interactions between different parties in a more secure and trusted way using distributed ledger and proof-of-work (POW) consensus algorithm. Blockchain enables trustless, peer-to-peer communication and has already proven its worth in the world of financial services. The idea of blockchain can be implanted to IoT system to deal with the issue of scale, trustworthy and decentralisation, thereby allowing billions of devices to share the same network without the need for additional resources. However, the limited processing power, storage size and energy consumption of IoT device is a major point of concern for blockchain cryptographic functions. Moreover, efficiency, reliability, interoperability among blockchain still need to be addressed. This chapter presents basic concepts of blockchain and investigation about the feasibility of the blockchain in Internet of Things settings.
Conference Paper
Full-text available
Containerized clusters of machines at scale that provision Cloud services are encountering substantive difficulties with stragglers -- whereby a small subset of task execution negatively degrades system performance. Stragglers are an unsolved challenge due to a wide variety of root-causes and stochastic behavior. While there have been efforts to mitigate their effects, few works have attempted to empirically ascertain how system operational scenarios precisely influence straggler occurrence and severity. This challenge is further compounded with the difficulties of conducting experiments within real-world containerized clusters. System maintenance and experiment design are often error-prone and time-consuming processes, and a large portion of tools created for workload submission and straggler injection are bespoke to specific clusters, limiting experiment reproducibility. In this paper we propose PRISM, a framework that automates containerized cluster setup, experiment design, and experiment execution. Our framework is capable of deployment, configuration, execution, performance trace transformation and aggregation of containerized application frameworks, enabling scripted execution of diverse workloads and cluster configurations. The framework reduces time required for cluster setup and experiment execution from hours to minutes. We use PRISM to conduct automated experimentation of system operational conditions and identify straggler manifestation is affected by resource contention, input data size and scheduler architecture limitations.
Article
Full-text available
Cloud computing plays a critical role in modern society and enables a range of applications from infrastructure to social media. Such system must cope with varying load and evolving usage reflecting societies’ interaction and dependency on automated computing systems whilst satisfying Quality of Service (QoS) guarantees. Enabling these systems are a cohort of conceptual technologies, synthesized to meet demand of evolving computing applications. In order to understand current and future challenges of such system, there is a need to identify key technologies enabling future applications. In this study, we aim to explore how three emerging paradigms (Blockchain, IoT and Artificial Intelligence) will influence future cloud computing systems. Further, we identify several technologies driving these paradigms and invite international experts to discuss the current status and future directions of cloud computing. Finally, we proposed a conceptual model for cloud futurology to explore the influence of emerging paradigms and technologies on evolution of cloud computing.
Article
It is a long-standing challenge to achieve a high degree of resource utilization in cluster scheduling. Resource oversubscription has become a common practice in improving resource utilization and cost reduction. However, current centralized approaches to oversubscription suffer from the issue with resource mismatch and fail to take into account other performance requirements, e.g., tail latency. In this paper we present ROSE, a new resource management platform capable of conducting performance-aware resource oversubscription. ROSE allows latency-sensitive long-running applications (LRAs) to co-exist with computation-intensive batch jobs. Instead of waiting for resource allocation to be confirmed by the centralized scheduler, job managers in ROSE can independently request to launch speculative tasks within specific machines according to their suitability for oversubscription. Node agents of those machines can however avoid any excessive resource oversubscription by means of a mechanism for admission control using multi-resource threshold control and performance-aware resource throttle. Experiments show that in case of mixed co-location of batch jobs and latency-sensitive LRAs, the CPU utilization and the disk utilization can reach 56.34% and 43.49%, respectively, but the 95th percentile of read latency in YCSB workloads only increases by 5.4% against the case of executing the LRAs alone.
Article
Fog computing aims at extending the cloud towards the Internet of things so to achieve improved quality of service and to empower latency‐sensitive and bandwidth‐hungry applications. The fog calls for novel models and algorithms to distribute multiservice applications in such a way that data processing occurs wherever it is best placed, based on both functional and nonfunctional requirements. This survey reviews the existing methodologies to solve the application placement problem in the fog, while pursuing three main objectives. First, it offers a comprehensive overview on the currently employed algorithms, on the availability of open‐source prototypes and on the size of test use cases. Second, it classifies the literature based on the application and fog infrastructure characteristics that are captured by available models, with a focus on the considered constraints and the optimized metrics. Finally, it identifies some open challenges in application placement in the fog.
Article
With an ever‐increasing variety and complexity of Internet of Things (IoT) applications delivered by increasing numbers of service providers, there is a growing demand for an automated mechanism that can monitor and regulate the interaction between the parties involved in IoT service provision and delivery. This mechanism needs to take the form of a contract, which, in this context, is referred to as a service level agreement (SLA). As a first step toward SLA monitoring and management, an SLA specification is essential. We believe that current SLA specification formats are unable to accommodate the unique characteristics of the IoT domain, such as its multilayered nature. Therefore, we propose a grammar for a syntactical structure of an SLA specification for IoT. The grammar is built based on a proposed conceptual model that considers the main concepts that can be used to express the requirements for hardware and software components of an IoT application on an end‐to‐end basis. We followed the goal question metric approach to evaluate the generality and expressiveness of the proposed grammar by reviewing its concepts and their predefined lists of vocabularies against two use cases with a considerable number of participants whose research interests are mainly related to IoT. The results of the analysis show that the proposed grammar achieved 91.70% of its generality goal and 93.43% of its expressiveness goal.
Article
Minimizing the energy consumption of servers within cloud computing systems is of upmost importance to cloud providers towards reducing operational costs and enhancing service sustainability by consolidating services onto fewer active servers. Moreover, providers must also provision high levels of availability and reliability, hence cloud services are frequently replicated across servers that subsequently increases server energy consumption and resource overhead. These two objectives can present a potential conflict within cloud resource management decision making that must balance between service consolidation and replication to minimize energy consumption whilst maximizing server availability and reliability, respectively. In this paper, we propose a cuckoo optimization-based energy-reliability aware resource scheduling technique (CRUZE) for holistic management of cloud computing resources including servers, networks, storage, and cooling systems. CRUZE clusters and executes heterogeneous workloads on provisioned cloud resources and enhances the energy-efficiency and reduces the carbon footprint in datacenters without adversely affecting cloud service reliability. We evaluate the effectiveness of CRUZE against existing state-of-the-art solutions using the CloudSim toolkit. Results indicate that our proposed technique is capable of reducing energy consumption by 20.1% whilst improving reliability and CPU utilization by 17.1% and 15.7% respectively without affecting other Quality of Service parameters.