Fig 1 - uploaded by Zhen Huang
Content may be subject to copyright.

# An example for storage allocation problem.

Source publication
Article
Full-text available
Cloud storage system provides reliable service to users by widely deploying redundancy schemes in its system – which brings high reliability to the data storage, but inversely introduces significant overhead to the system, consisting of storage cost and energy consumption. The core behind this issue is how to leverage the relationship between data...

## Contexts in source publication

Context 1
... where i 2 f1; . . . ; jSjg -note that r i means that with this probability all blocks that N i hosts are usable for data recovery, and vice versa. Then we have n ¼ P jSj i¼1 l i . And the storage allocation scheme is mapped into a set of par- titions SA ¼ fl 1 ; . . . ; l jSj g. Now we illustrate the problem by a simple example. As depicted in Fig. 1(a), the original file with k original blocks are encoded into five redundant blocks in total (in this case, n ¼ 5), where the original file can be reconstructed using any subset of k redundant blocks. How can we allocate these five redundant blocks to a given set of three storage nodes whose reliabilities are r 1 ; r 2 and r 3 ...
Context 2
... redundant blocks in total (in this case, n ¼ 5), where the original file can be reconstructed using any subset of k redundant blocks. How can we allocate these five redundant blocks to a given set of three storage nodes whose reliabilities are r 1 ; r 2 and r 3 respectively? Two possible storage alloca- tion schemes for this example are given in Fig. 1(b). Then the question is: which one is better and ...
Context 3
... deriving the model based on generating function, we first take the scheme SA 1 in Fig. 1(b) as an example to illustrate: there are three nodes N 1 ; N 2 and N 3 , with relia- bility probabilities of r 1 ; r 2 and r 3 respectively. For nodes N 1 and N 2 , each hosts two redundant blocks whereas N 3 hosts only one. We define drðpÞ as the probability that p blocks are reliable and the other 5 À p blocks are unreli- able, where p ...
Context 4
... SA 1 of Fig. 1(b) as an example, we can obtain GFDRðSA; bÞ as follows: Note that the expanded form of GFDRðSA; bÞ is a polyno- mial, where in each term, the exponent (of the power) denotes the number of reliable blocks (the same meaning as p as above) and the coefficient is the sum of the prob- abilities -for the cases that p blocks are reliable and ...
Context 5
... Lemma 1 and Eq. (3), we know that DRðSA; kÞ À DRðSA 0 ; kÞ 6 0. So the allocation SA 0 is better than SA. This leads to a contradiction with the assumption. Thus Theorem 1 holds. h For example, considering the scheme SA 2 given in Fig. 1(b), and a new scheme SA 3 in which node N 1 ; N 2 and N 3 stores respectively one, three and one block, we have DRðSA 2 Þ ¼ 0:9 > DRðSA 3 Þ ¼ 0:85. Theorem 1 suggests that, the larger reliability a node possesses, the more data blocks should be assigned to and stored in it. Based on this logic, is it feasible that we give all the data ...
Context 6
... use the example in Fig. 1(b) for demonstration. By enumeration method, it is easy to know that the allocation scheme SA 1 is optimal, and DRðSA 1 Þ ¼ 0:941. Suppose that k ¼ 3, then the maximum data reliability is 0.9 (< DRðSA 1 Þ) if we assign more than 3 blocks to any one node. Therefore, Theorem 2 manifests that each node can host at most k redundant data ...
Context 7
... in S n N m hold less than k blocks in total. So the data reliability completely depends on the reliability of N m . Then we have to satisfy the case (ii) by assigning k blocks to N m and zero blocks to other nodes, which allows us to use a minimum redundancy with a maximum data reliability r m . So Theorem 3 holds. h For the example given in Fig. 1(b), we know that SA 1 is optimal and DRðSA 1 Þ ¼ 0:941. Then we observe that 8l j 2 SA 1 ; l j 6 n À k ¼ 2. By Theorem 3, if the maximum data reliability r m among all nodes (for distributed storage) is smaller than the given (global) data reliability ADR, then a node is assigned at most ðn À kÞ redundant data ...
Context 8
... From Theorems 4 and 5, we know that one only needs to keep the terms containing powers whose exponent is smaller than k for each multiplication. And finally, we can obtain the data reliability by subtracting the coefficient of the result of the multiplication from 1. We then use the example given before, i.e. the allocation scheme SA 1 in Fig. 1 to illustrate the ...
Context 9
... then have DRðSA; k 1 Þ 6 DRðSA; k 2 Þ. Hence, Theorem 6 holds. For the scheme SA 1 given in Fig. 1(b [7]. h Proposition 1. Given the total number of redundant blocks n and a data reliability ADR to achieve, if 8k ...

## Similar publications

Preprint
Full-text available
The surge in demand for cost-effective, durable long-term archival media, coupled with density limitations of contemporary magnetic media, has resulted in synthetic DNA emerging as a promising new alternative. Today, the limiting factor for DNA-based data archival is the cost of writing (synthesis) and reading (sequencing) DNA. Newer techniques tha...

## Citations

... Indeed, the Grid application to the municipality scale can help understand power balances among municipalities, thus providing an in-depth knowledge of its dynamics. On the other hand, it is oriented to reduce the Grid's redundancy, intended as the presence of more than one indicator providing the same piece of information (Huang et al. 2015). More in detail, this task mainly addresses temporal redundancy and, thus, rejects indicators occurring twice with different time horizons when their simultaneous presence doesn't pitch in understanding the ongoing territorial dynamics (Table 2). ...
Article
Full-text available
The ongoing forced reflection on the leading urbanization models' crisis has led to greater attention to marginal areas. In Italy, the scientific and media debate has focused on inner areas that, since 2014, have represented the target of an innovative national cohesion policy aimed at tackling their shrinking dynamics: the National Strategy for Inner Areas (SNAI). Indeed, Italian inner areas are endowed with extraordinary natural capital and settlement models far from urban density. Thus, they seem to respond perfectly to the new raised living needs. However, leaving aside the optimistic rhetoric, strong political and administrative choices are necessary to trigger a `return process' based on this broader attention toward inner areas, thus countering humankind's natural tendency to concentrate on urban realities. In this light, the paper proposes a tool to support SNAI in designing and implementing heritage-based local development strategies to address inner areas' real needs. After a critical reading of the new challenges for planning posed by the pandemic and SNAI's role within them, the contribution moves to frame the THEMA (Tool for Heritage-based Enhancement of Marginal Areas) tool, focusing on specificities of the inner areas as cultural heritage. Finally, the tool's application to a case study, an inner area in Campania Region, allows to outline and discuss its possible benefits for SNAI implementation and its limits.
... On the other hand, to reduce the redundancy of the Grid, intended as the presence of the same piece of information in two different indicators [27,28]. In more detail, this is mainly conducted by addressing temporal redundancy and, thus, rejecting the indicators occurring twice with a different time horizon, when their combined use does not pitch in the interpretation of the phenomena under investigation. ...
Article
Full-text available
The National Strategy for Inner Areas (SNAI) is a public policy designed to tackle depopulation in inner areas, defined according to the distance from centers offering essential services. Such a policy’s success is crucial to address the new challenges for planning brought to light by the COVID-19 pandemic. In this sense, there is a need to adequately support its implementation by providing handy decision support tools, understanding the power balances among municipalities, and defining proper interventions. The Indicator Grid, already used by the SNAI for project areas selection, can answer this need. However, the Grid’s application to support public policy at the municipality level requires reviewing some of its features, such as the indicators’ large number and the impossibility of defining some of them at the municipal scale. Based on these premises, this paper aims at supporting inner areas policies by carrying out a critical analysis of the current SNAI Grid, aimed at improving its effectiveness. It relies on a hybrid methodology that merges qualitative data interpretations and statistical analyses. Thanks to this method, defining a parsimonious Grid by leaving its complexity and information level untouched is possible. The so-defined set of indicators can represent a valuable reference tool in pinpointing priorities for actions or selecting further territorial scopes from the SNAI perspective, even if it still brings some criticalities to be faced.
... The researchers working in the domain of IDS can use this as a guideline by selecting the dataset carefully. The storage allocation system on cloud has been presented by Huang et al. [11] to reduce the data redundancy problem. The redundant storage of data on cloud may lead to the problems like system overhead, overall increase in cost and energy consumption at the cost of increasing reliability. ...
Article
Database is used for storing the data in an easy and efficient format. In recent days large size of data has been generated through number of applications and same has been stored in the database. Considering the importance of data in every sector of digitized world, it is foremost important to secure the data. Hence, database security has been given a prime importance in every organization. Redundant data entries may stop the functioning of the database. Redundant data entries may be inserted in the database because of the absence of primary key or due to incorrect spelled data. This article addresses the solution for database security by protecting the database from redundant data entries based on the concept of Bloom filter. This database security has been obtained by correcting the incorrect spelled data from query values with the help of edit distance algorithm followed by the data redundancy check. This article also presents the performance comparison between proposed technique and MongoDB database for document search functionality.
... IIoT, a cutting-edge industry of huge commercial value, is widely used in design, production, management, and service [16]. IIoT realizes flexible allocation of raw materials, execution of manufacturing process on demand, reasonable optimization of production process through network interconnection and rapid adaptation to the manufacturing environment, and data exchange and system interoperability of industrial resources to achieve efficient utilization of resources, in order to build a new service-driven industrial ecosystem [18,19]. ...
... (3) if e of the "test.txt" (4) break (5) end if (6) for i � 1 to N (7) read the data from "data.txt" to T[i + 1] (8) aver � sum(T)/i + 1; (9) end (10) if (aver < 0) (11) for i � 1 to k (12) aver < aver + T[i] (13) aver < aver/k (14) end (15) else (16) for i � 2 to n (17) temp < aver (18) for j � 0 to k − 1 (19) if i + j ≥� n (20) temp < −1 (21) aver < aver + T[i + j] (22) end if (23) end (24) end if (25) end (26) end if (27) aver < aver/k (28) if |aver-temp| >� e (29) put i + j −1 to "out1.txt" (30) put T[i + j −1] to "out2.txt" ...
Article
Full-text available
Aiming at solving network delay caused by large chunks of data in industrial Internet of Things, a data compression algorithm based on edge computing is creatively put forward in this paper. The data collected by sensors need to be handled in advance and are then processed by different single packet quantity K and error threshold e for multiple groups of comparative experiments, which greatly reduces the amount of data transmission under the premise of ensuring the instantaneity and effectiveness of data. On the basis of compression processing, an outlier detection algorithm based on isolated forest is proposed, which can accurately identify the anomaly caused by gradual change and sudden change and control and adjust the action of equipment, in order to meet the control requirement. As is shown by experimental simulation, the isolated forest algorithm based on partition outperforms box graph and K-means clustering algorithm based on distance in anomaly detection, which verifies the feasibility and advantages of the former in data compression and detection accuracy.
... A lot of works have been done to study the optimal configuration of a distributed data storage system considering different factors. In [9], the minimal number of homogeneous hardware redundancies (storage devices) achieving a pre-determined system reliability was studied. Since the homogeneous redundancy scheme may increase the cost of data copies, some research works were devoted to analyzing the optimal non-homogeneous redundancy scheme in order to minimize the cost [10][11]. ...
... and 11 / jj t rx a  . (9) and are the costs per hacker's and defender's effort unit in data destruction/protection contest. The probability that computer is not intruded by the hacker is, ...
Article
Distributed data storage systems are widely used to store data with the development of big data and cloud computing. Due to the complicated network environment, the data may risk being destroyed or stolen by an intentional hacker. In order to mitigate the risk of data destruction and data theft, this paper studies the joint optimization of data parts allocation and computers protections to defense a distributed data storage system in order to maximize the system reliability. The whole data is divided into multiple data parts, where copies of each part can be made and allocated onto different computers. Two different cases are considered, where all data parts on a computer will be destroyed in the first case and all data parts on a computer will not only be destroyed but also stolen in the second case, if the computer is intruded by the hacker. The system is reliable if the defender still has all data parts and the hacker does not obtain all data parts. For both cases, the system reliability is evaluated by extending the universal generating function technique. Numerical examples are carried out to illustrate the applications.
... For example, Gonalves et al. [8] investigate the workload of cloud storage services using Dropbox workloads. Huang et al. [9] study a model to minimize the data redundancy in cloud storage system which reduces the amount of data stored. Boullery et al. [10] optimize the utilization of network bandwidth, and Xia et al. [11] use Markov decision process to decide the storage schedule. ...
... We will now transform Eqs. (9) and (10) to generating functions. Define therefore ...
... That is, 9 ...
Article
Cloud infrastructures are becoming a common platform for storage and workload operations for industries. With increasing rate of data generation, the cloud storage industry has already grown into a multi-billion dollar industry. This industry offers services with very strict service level agreements (SLAs) to insure a high Quality of Service (QoS) for its clients. A breach of these SLAs results in a heavy economic loss for the service provider. We study a queueing model of data backup systems with a focus on the age of data. The age of data is roughly defined as the time for which data has not been backed up and is therefore a measure of uncertainty for the user. We precisely define the performance measure and compute the generating function of its distribution. It is critical to ensure that the tail probabilities are small so that the system stays within SLAs with a high probability. Therefore, we also analyze the tail distribution of the age of data by performing dominant singularity analysis of its generating function. Our formulas can help the service providers to set the system parameters adequately.
... In year 2015, Zhen Huang et al. [7] have developed a storage allocation scheme that not only achieves good data reliability but also minimizes the data redundancy as well. They come with a practical and efficient storage allocation scheme for effective storage of health records based on generating function, which is able to minimize the data redundancy. ...
... Zhen Huang et al. 2015 Developed a storage allocation scheme that provides data reliability and minimizes the data redundancy too. ...
Article
Full-text available
Nowadays as the growth in the use of recent information and communication technologies increased, there is persistently growing healthcare data over the Internet. There are several challenges in web based systems like scalability, availability, etc. There are several categories of services offered on demand over the web. Cloud computing is internet-based computing paradigm, where shared servers can provides storage, computing power, development stages and software to computers and other devices if required. Cloud healthcare, interestingly, tries to change the healthcare delivery model: from doctor-centric to patient-centric, from acute reactive to nonstop preventive, and from inspecting to monitoring.
... Therefore, there is a trade-off between the cost of storage and safety of data. Thus, the optimal cost of storage is the minimu m acceptable occupied capacity that will ensure the safety of data [2] [26] [27]. ...
Conference Paper
Full-text available
Cloud database services are used to reduce the cost of storage in information technolog y fiel ds and provi de other benefits such as data accessibility through internet. The single cloud is defined as a group of servers whether one or multi ple data centres offered by a single provi der. However, movi ng from single cloud to multi-clouds is reasonable and important for many reasons. The services of single clouds are still subject to outage which affects the availability of the database. In the case of disaster event, the single cloud is subject to data l ost parti ally or fully. The single cloud is predicted to become less popul ar wi th customers due to risks of database service availability failure and the possibility of malicious insiders in the single cloud. With Disaster Recovery (DR) in cloud, resources of multi ple cloud service provi ders can be utilized cooperati vely by the DR service provi der. Therefore, there is a necessity to devel op a practical multi-cloud based DR framework with the ai m of mini mizing backup cost wi th res pect to Recovery Ti me Objecti ve (RTO) and Recovery Point Objecti ve (RPO) in order to reduce the risk of data loss. The framework attempts to maintain the availability of data by achieving high data reliability, low backup cost, and short recovery and ensure conti nuity for business before, during and after the disaster inci dent. This paper ai ms at proposing a multi-cloud framework maintaining high availability of data before, duri ng and after the disaster occurrence. Besides, ensures the continuity of the database services during and after the disaster.
... The most directly related work to this replication work is complicated process on clouds server. The data replication and request response on cloud server as a static optimization problem on user access [7]. They show that this problem is NP-hard and request delay, which means that present, is no polynomial algorithm that provides an accurate solution. ...
Article
Full-text available
Cloud computing is tweaked suddenly in recent years of the distributed computing paradigm which works "on demand" or "Pay per Use" concept. In cloud computing all the computational resources (like storage, data) are shared among the users. Service Level Agreement (SLA) is connecting the user and the service provider. This agreement defines QoS parameters (like availability, reliability, scalability, memory and cost etc.) Now, most of people and organizations are migrating to use cloud storage. Recently, the popular Internet companies, such as Google, Yahoo, and Microsoft offers more services for millions of users every day. These services are hosted in data centers that contain thousands of servers, as well as power delivery (and backup) and networking infrastructures. Because users demand high availability and low response times, each service is mirrored by multiple data centers that are geographically distributed. Providers' and user' perspective to investigate the issues and to provide cost-effective optimized cloud storage while meeting the reliability and availability requirement throughout the whole Cloud storage process. This research proposal presents RAAES framework which has 3 major component to optimize and to provide efficient cloud Storage. The research issue of this work is significant and has practical value to the Cloud computing technology. Especially, this research could significantly reduce the occupied space and cost jointly with meeting the reliability assurance requirements. Similarly, it could ultimately reduce the request-response time delay while enhancing the availability requirements, hence has a positive impact on promoting the development of Cloud by an efficient Storage.
... The work confirms that automatic reconfiguration can yield substantial benefits which are resizable in practice. Zhen et al. (2015) in [16] designed to minimizing data redundancy for high reliable cloud storage system. The evaluations through analysis and experiments validate of their claims are the optimum storage allocation scheme can significantly reduce the search space, the calculation process can be easily simplified, accelerated and the redundancy can be efficiently save by the scheme. ...
Article
Full-text available
The number of cloud storage users has improved abundantly at recent times. The reason behind is, the Cloud Storage system minimizes the burden of maintenance and it has less storage cost compare with other storage methods. It provides high availability, reliability and it is most suitable for high volume of data storage. In order to provide high availability and reliability, the systems introduce redundancy. In replicated system, the cloud storage services are hosted by multiple geographically distributed data centres. But the file Replication is rendering little bit threat about the Cloud Storage System for the users and for the providers it is a big challenge to offer efficient Data Storage. Since the increasingly expanded utility of Cloud storage, the improvement of resources management in the shortest time to respond to the user's requests and the geographical constraints are of prime importance to both the Cloud service providers and the users. The data replication helps in attractive the data availability which reduces the overall access time of the files, but at the same time it occupies more storage space and storage cost. In order to rectify the above mentioned problems, need to identify the popularity of the file. So this paper proposed new ranking algorithm which lists the most often accessed files and less frequently accessed files. In future the least accessed a file's replications going to be reduced likewise most accessed file's replications going to be increased based on their SLA.