To read the full-text of this research, you can request a copy directly from the authors.
... Additionally, Zhang et al. (2022) [45] suggested a specific strategy for storing data copies in hyperconverged data centers based on the perspective that conventional data storage algorithms do not directly adapt to the inherent characteristics of hyperconverged infrastructures. The proposed method involves the use of multiple neural networks to analyze node reliability based on historical data, thereby facilitating the selection of storage locations. ...
... Additionally, Zhang et al. (2022) [45] suggested a specific strategy for storing data copies in hyperconverged data centers based on the perspective that conventional data storage algorithms do not directly adapt to the inherent characteristics of hyperconverged infrastructures. The proposed method involves the use of multiple neural networks to analyze node reliability based on historical data, thereby facilitating the selection of storage locations. ...
... Although storage has been the main driver of hyperconvergence, a hyperconverged architecture is typically characterized by the integration of computing, storage, and networking into a single unified system [45]. From a hardware perspective, this implies the combination of computing (CPU and memory) and storage (hard drives and SSDs) into a single server. ...
Virtual Trusted Platform Modules (vTPMs) are widely adopted in commercial cloud platforms such as VMware Cloud, Google Cloud, Microsoft Azure, and Amazon AWS. However, as software-based components, vTPMs do not provide the same security guarantees as hardware TPMs. The existing solutions attempt to mitigate this limitation by anchoring vTPMs to physical TPMs, but such approaches often face challenges in heterogeneous environments and in failure recovery or migration scenarios. Meanwhile, the evolution of data center architectures toward hyperconverged infrastructures introduces new opportunities for security mechanisms by integrating compute, storage, and networking into a single solution. This work proposes a novel mechanism to securely anchor vTPMs in hyperconverged environments. The proposed approach introduces a unified software layer capable of aggregating and managing the physical TPMs available in the data center, establishing a root of trust for vTPM anchoring. It supports scenarios where hardware TPMs are not uniformly available and enables anchoring replication for critical systems. The solution was implemented and evaluated in terms of its performance impact. The results show low computational overhead, albeit with an increase in anchoring time due to the remote anchoring process.
Large-scale data stores are an increasingly important component of cloud datacenter services. However, cloud storage system usually experiences data loss, hindering data durability. Three-way random replication is commonly used to lead better data durability in cloud storage systems. However, three-way random replication cannot effectively handle correlated machine failures to prevent data loss. Although Copyset Replication and Tiered Replication can reduce data loss in correlated and independent failures, and enhance data durability, they fail to leverage different data popularities to substantially reduce the storage cost and bandwidth cost caused by replication. To address these issues, we present a popularity-aware multi-failure resilient and cost-effective replication (PMCR) scheme for high data durability in cloud storage. PMCR splits the cloud storage system into primary tier and backup tier, and classifies data into hot data, warm data and cold data based on data popularities. To handle both correlated and independent failures, PMCR stores the three replicas of the same data into one Copyset formed by two servers in the primary tier and one server in the backup tier. For the third replicas of warm data and cold data in the backup tier, PMCR uses the compression methods to reduce storage cost and bandwidth cost. Extensive numerical results based on trace parameters and experimental results from real-world Amazon S3 show that PMCR achieves high data durability, low probability of data loss, and low storage cost and bandwidth cost compared to previous replication schemes.
As the devices that make up the Internet become more powerful, algorithms that orchestrate cloud systems are on the verge of putting more responsibility for computation and storage on these devices. In our current age of Big Data, dissemination and storage of data across end cloud devices is becoming a prominent problem subject to this expansion. In this paper, we propose a distributed data dissemination approach that relies on dynamic creation/replacement/removal of replicas guided by continuous monitoring of data requests coming from edge nodes of the underlying network. Our algorithm exploits geographical locality of data during the dissemination process due to the plenitude of common data requests that stem from the clients within a close proximity. Our results using both real-world and synthetic data demonstrate that a decentralized replica placement approach provides significant cost benefits compared to client side caching that is widely used in traditional distributed systems. IEEE
Data management is the core module of cloud storage system. Constructed network topology of data center, the data management model of Recursion-based N-regular Polygon Topology (RNPT) was established referring to file systems as Google File System (GFS) and Hadoop Distributed File System (HDFS). The model uses central server mode as well as RNPT network structure to ensure system scalability. In addition to achieve high replica availability and reliability, the replica is reasonably allocated to reduce user access time, decrease communication delay as well as effectively cooperative load balancing strategy. The system resource is adequately used in the model to improve cloud storage performance and service quality. Comparison to HDFS by simulation experiments using CloudSim shows that the RNPT-based data management model can improve data access performance and reasonably utilize network bandwidth so that load balancing is achieved.
Storage allocation, meaning the way that a chunk of data is stored over a set of storage nodes, affects different performance measures of a distributed storage system (DSS). In this work, we study the storage allocation problem for a DSS where nodes have different storage capacities. To this end, we first introduce the notion of k-guaranteed allocations referring to allocations where the data can be recovered by accessing any arbitrary set of k storage nodes. We then find the necessary conditions for an allocation to be k-guaranteed considering the limit on the individual node capacities and the overall storage budget. Using these conditions, an iterative algorithm is developed to find k-guaranteed allocations, if feasible.
Caching and replication of popular data objects contribute significantly to the reduction of the network bandwidth usage and the overall access time to data. Our focus is to improve the efficiency of object replication within a given distributed replication group. Such a group consists of servers that dedicate certain amount of memory for replicating objects requested by their clients. The content replication problem we are solving is defined as follows: Given the request rates for the objects and the server capacities, find the replica allocation that minimizes the access time over all servers and objects. We design a distributed approximation algorithm that solves this problem and prove that it provides a 2-approximation solution. We also show that the communication and computational complexity of the algorithm is polynomial with respect to the number of servers, the number of objects, and the sum of the capacities of all servers. Finally, we perform simulation experiments to investigate the performance of our algorithm. The experiments show that our algorithm outperforms the best existing distributed algorithm that solves the replica placement problem.
In this paper, we discuss and compare several policies to place replicas in tree networks, subject to server capacity and Quality of Service (QoS) constraints. The client requests are known beforehand, while the number and location of the servers are to be determined. The standard approach in the literature is to enforce that all requests of a client be served by the closest server in the tree. We introduce and study two new policies. In the first policy, all requests from a given client are still processed by the same server, but this server can be located anywhere in the path from the client to the root. In the second policy, the requests of a given client can be processed by multiple servers. One major contribution of this paper is to assess the impact of these new policies on the total replication cost. Another important goal is to assess the impact of server heterogeneity. In this paper, we establish several new complexity results, and provide several efficient polynomial heuristics for NP-complete instances of the problem. The absolute performance of these heuristics is assessed by comparison with the optimal solution provided by the formulation of the problem in terms of the solution of an integer linear program.
The maximum possible throughput (or the rate of job completion) of a multi-server system is typically the sum of the service rates of individual servers. Recent work shows that launching multiple replicas of a job and canceling them as soon as one copy finishes can boost the throughput, especially when the service time distribution has high variability. This means that redundancy can, in fact, create synergy among servers such that their overall throughput is greater than the sum of individual servers. This work seeks to find the fundamental limit of the throughput boost achieved by job replication and the optimal replication policy to achieve it. While most previous works consider upfront replication policies, we expand the set of possible policies to delayed launch of replicas. The search for the optimal adaptive replication policy can be formulated as a Markov Decision Process, using which we propose two myopic replication policies, MaxRate and AdaRep, to adaptively replicate jobs. In order to quantify the optimality gap of these and other policies, we derive upper bounds on the service capacity, which provide fundamental limits on the throughput of queueing systems with redundancy.
In recent years, fog computing has emerged as a new paradigm for the future Internet-of-Things (IoT) applications, but at the same time, ensuing new challenges. The geographically vast-distributed architecture in fog computing renders us almost infinite choices in terms of service orchestration. How to properly arrange the service replicas (or service instances) among the nodes remains a critical problem. To be specific, in this article, we investigate a generalized service replicas placement problem that has the potential to be applied to various industrial scenarios. We formulate the problem into a multiobjective model with two scheduling objectives, involving deployment cost and service latency. For problem solving, we propose an ant colony optimization-based solution, called multireplicas Pareto ant colony optimization (MRPACO). We have conducted extensive experiments on MRPACO. The experimental results show that the solutions obtained by our strategy are qualified in terms of both diversity and accuracy, which are the main evaluation metrics of a multiobjective algorithm.
Many NoSQL databases support quorum-based protocols, which require a subset of replicas (called a quorum) respond to each write/read operation. These systems configure the quorum size to tune the operation latency and adopt multiple consistency levels. Some recent works illustrate that using probability models to quantify the chance of reading the last update is important because it could avoid returning stale values under eventual consistency. There are two challenging issues: (1) from inconsistent replicas, how to determine the minimum quorum size (i.e., the lowest access latency) to read the newest data at a specified probability; (2) node failure frequently happens in large-scale systems, how to guarantee the probability-based consistent reads. This paper presents Probabilistic Consistency Guarantee (PCG), which is the first dynamic quorum decision and failure-aware quantification model. PCG model respectively quantifies the server-side consistency after the latest write, which reflects the object's time-varying update progress, and the possibility of reading this update when responding to the end-users. Our theoretical analysis derives several formulas to determine the quorum size of a read quorum and the consensus result selected from this quorum is the data updated by the last write at the user-specified probability. When some replicas are unavailable, our model knows how to rescale the quorum and read values from surviving replicas could reduce the stale reads caused by node failures. The experimental results in Cassandra demonstrate that the PCG model can achieve up to 77.7% more accurate predictions and reduce up to 48.9% read latency than those of the previous model.
In geo-distributed cloud storage systems, data replication has been widely used to serve the ever more users around the world for high data reliability and availability. How to optimize the data replica placement has become one of the fundamental problems to reduce the inter-node traffic and the system overhead of accessing associated data items. In the big data era, traditional solutions may face the challenges of long running time and large overheads to handle the increasing scale of data items with time-varying user requests. Therefore, novel offline community discovery and online community adjustment schemes are proposed to solve the replica placement problem in a scalable and adaptive way. The offline scheme can find a replica placement solution based on the average read/write rates for a certain period. The scalability can be achieved as 1) the computation complexity is linear to the amount of data items and 2) the data-node communities can evolve in parallel for a distributed replica placement. Furthermore, the online scheme is adaptive to handle the bursty data requests, without the need to completely override the existing replica placement. Driven by real-world data traces, extensive performance evaluations demonstrate the effectiveness of our design to handle large-scale datasets.
Edge caching has attracted great attention recently due to its potential for reducing service delays. One of the key performance metrics in caching is storage efficiency. To achieve high storage efficiency, we present an edge caching strategy with time-domain buffer sharing in this paper. More particularly, our scheme can determine not only which content items deserve pushing by the core network, but also how long the content items deserve caching in the buffer of the base station. To this end, we formulate a queueing model, in which the storage cost and the maximum caching time are bridged via Little’s Law. Based on this model, we present a probabilistic edge caching strategy with random maximum caching time to strike the optimal tradeoff between the storage cost and the overall hit ratio of content items. For different content items having different users’ demand preferences, we further formulate a nonconvex optimization problem to jointly allocate the transmission and storage resources. An efficient two-layer searching algorithm is presented to achieve an optimal solution. Moreover, we also present the analytical solution to the joint transmission and storage allocation problem in the special scenario where all content items have been cached in the core network.
Many massive data processing applications nowadays often need long, continuous, and uninterrupted data accesses. Distributed file systems are used as the back-end storage to provide global namespace management and reliability guarantee. Due to increasing hardware failures and software issues with the growing system scale, metadata service reliability has become a critical issue as it affects file system operations. Existing metadata management mechanisms can provide fault tolerance capability to some level but are inadequate. This paper introduces a novel highly reliable metadata service to address these issues. Different from traditional strategies, the reliable metadata service adopts a new active-standby architecture for fault tolerance and uses a holistic approach to improve file system availability. A new shared storage pool (SSP) is designed for transparent metadata synchronization and replication between active and standby servers. Based on the SSP, a new policy called multiple actives multiple standbys (MAMS) is presented to perform metadata service recovery in case of failures. A new global state recovery strategy and smart client fault tolerance mechanism are achieved to keep the continuity of metadata service. Experimental results confirm that it can significantly improve file system reliability with fast failover in different failure scenarios while having negligible influence on performance. Compared with typical reliability designs in Hadoop Avatar, Hadoop HA, and Boom-FS file systems, the mean-time-to-recovery (MTTR) with the highly reliable metadata service was reduced by 80.23%, 65.46% and 28.13%, respectively.
Erasure codes, in the recent past, have emerged as an alternative to data replication-based systems for storing big data. Efficient choice of code and data nodes from the numerous available storage nodes is the key to the performance of any storage system. This paper presents the Storage Node Allocation Problem for selecting the suitable set of nodes for holding data and code blocks by representing the storage systems as a complete bipartite graph. Additionally, the paper formally proves that the problem is NP-hard and proposes approximate solutions using greedy, ant colony optimization and clustering-based methods. The solutions accomplish efficient choice of storage nodes by utilizing parameters like bandwidth availability, distance between the nodes, computational load and disk space availability.
In the power industry, processing business big data from geographically distributed locations, such as online line-loss analysis, has emerged as an important application. How to achieve highly efficient big data storage to meet the requirements of low latency processing applications is quite challenging. In this paper, we propose a novel adaptive power storage replica management system, named PARMS, based on stochastic configuration networks (SCNs), in which the network traffic and the data center (DC) geo-distribution are taken into consideration to improve data real-time processing. First, as a fast learning model with less computation burden and sound prediction performance, the SCN model is employed to estimate the traffic state of power data networks. Then, a series of data replica management algorithms is proposed to lower the effects of limited bandwidths and a fixed underlying infrastructure. Last, the proposed PARMS is implemented using data-parallel computing frameworks (DCFs) for the power industry. Experiments are carried out in an electric power corporation of 230 million users, CSG, and the results show that our proposed solution can deal with power big data storage efficiently and the job completion times across geo-distributed DCs are reduced by 12.19% on average.
Resource management is a key factor in the performance and efficient utilization of cloud systems, and many research works have proposed efficient policies to optimize such systems. However, these policies have traditionally managed the resources individually, neglecting the complexity of cloud systems and the interrelation between their elements. To illustrate this situation, we present an approach focused on virtualized Hadoop for a simultaneous and coordinated management of virtual machines and file replicas. Specifically, we propose determining the virtual machine allocation, virtual machine template selection, and file replica placement with the objective of minimizing the power consumption, physical resource waste, and file unavailability. We implemented our solution using the non-dominated sorting genetic algorithm-II, which is a multi-objective optimization algorithm. Our approach obtained important benefits in terms of file unavailability and resource waste, with overall improvements of approximately 400% and 170% compared to three other optimization strategies. The benefits for the power consumption were smaller, with an improvement of approximately 1.9%.
Data-intensive applications need to address the problem of properly placing the set of data items in geo-distributed storage nodes. Traditional techniques use the hashing method to achieve the load balance among nodes such as those used in Hadoop and Cassandra, but are not efficient for the requests reading multiple data items in one transaction, especially when the source locations of requests are also distributed. Some recent papers proposed the managed data placement schemes for online social networks, but have a limited scope of applications due to their focuses. We propose a general hypergraph-based data placement framework, which considers both the performance metrics related to the co-location of associated data and those related to the exact location of fulfilling each requested data item. In the framework, we present the methods to convert the optimization objectives into hypergraph models and employ a hypergraph partitioning to efficiently partition the set of data items and place them in distributed nodes. Further, we extend the scheme into replica placement where we need to find multiple locations to place the replicas of the same data item. Through extensive experiments based on trace-based datasets, we evaluate the performance of the proposed framework and demonstrate its effectiveness.
The current storage system is facing the bottleneck of performance due to the gap between fast CPU computing speed and the slow response time of hard disk. Recently a multitier hybrid storage system (MTHS) which uses fast flash devices like a solid-state drive (SSD) as the one of the high performance storage tiers has been proposed to boost the storage system performance. In order to maintain the overall performance of the MTHS, optimal disk storage assignment has to be designed so that the data migrated to the high performance tier like SSD is the optimal set of data. In this paper we proposed a optimal data allocation algorithm for disk storage in MTHS. The data allocation problem (DAP) is to find the optimal lists of data files for each storage tier in the MTHS to achieve maximal benefit values without exceeding the available size of each tier. We formulate the DAP as a special multiple choice knapsack problem (MCKP) and propose the multiple-stage dynamic programming (MDP) to find the optimal solutions. The results show that the MDP can achieve improvements up to 6 times compared with the existing greedy algorithms.
High performance, highly reliable, and energy-efficient storage systems are essential for mobile data-intensive applications such as remote surgery and mobile data center. Compared with conventional stationary storage systems, mobile disk-array-based storage systems are more prone to disk failures due to their severe application environments. Further, they have very limited power supply. Therefore, data reconstruction algorithms, which are executed in the presence of disk failure, for mobile storage systems must be performance-driven, reliability-aware, and energy-efficient. Unfortunately, existing reconstruction schemes cannot fulfill the three goals simultaneously because they largely overlooked the fact that mobile disks have much higher failure rates than stationary disks. Besides, they normally ignore energy-saving. In this paper we develop a novel reconstruction strategy, called multi-level caching-based reconstruction optimization (MICRO), which can be applied to RAID-structured mobile storage systems to noticeably shorten reconstruction times and user response times while saving energy. MICRO collaboratively utilizes storage cache and disk array controller cache to diminish the number of physical disk accesses caused by reconstruction. Experimental results demonstrate that compared with two representative algorithms DOR and PRO, MICRO reduces reconstruction times on average 20.22% and 9.34%, while saving energy no less than 30.4% and 13%, respectively.