Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Cloud infrastructures are becoming a common platform for storage and workload operations for industries. With increasing rate of data generation, the cloud storage industry has already grown into a multi-billion dollar industry. This industry offers services with very strict service level agreements (SLAs) to insure a high Quality of Service (QoS) for its clients. A breach of these SLAs results in a heavy economic loss for the service provider. We study a queueing model of data backup systems with a focus on the age of data. The age of data is roughly defined as the time for which data has not been backed up and is therefore a measure of uncertainty for the user. We precisely define the performance measure and compute the generating function of its distribution. It is critical to ensure that the tail probabilities are small so that the system stays within SLAs with a high probability. Therefore, we also analyze the tail distribution of the age of data by performing dominant singularity analysis of its generating function. Our formulas can help the service providers to set the system parameters adequately.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We show how ensuring a low age of data helps us ensure the limits of RPO are always met. The work done in this chapter resulted in a journal publication [47] and a conference presentation [48]. Once an exact expression for the probability generating function of the distribution of the age of data is obtained, we do the analysis of its tail distribution. ...
Article
Full-text available
IT technologies related to Industry 4.0 facilitate the implementation of the framework for sustainable manufacturing. At the same time, Industry 4.0 integrates IT processes and systems of production companies with IT solutions of cooperating companies that support a complete manufactured product life cycle. Thus, the implementation of sustainable manufacturing implies a rapid increase in interfaces between IT solutions of cooperating companies. This, in turn, raises concerns about security among manufacturing company executives. The lack of a recognized methodology supporting the decision-making process of choosing the right methods and means of cybersecurity is, in effect, a significant barrier to the development of sustainable manufacturing. As a result, the propagation of technologies in Industry 4.0 and the implementation of the sustainable manufacturing framework in companies are slowing down significantly. The main novelty of this article, addressing the above deficiencies, is the creation, using the combined DEMATEL and ANP (DANP) and PROMETHEE II methods, of a ranking of the proposed three groups of measures, seven dimensions and twenty criteria to be implemented in companies to ensure cybersecurity in Industry 4.0 and facilitate the implementation of the sustainable production principles. The contribution of Industry 4.0 components and the proposed cybersecurity scheme to achieve the Sustainable Development goals, reducing the carbon footprint of companies and introducing circular economy elements was also indicated. Using DANP and PROMETHEE II, it can be concluded that: (i) the major criterion of cybersecurity in companies is validation and maintaining electronic signatures and seals; (ii) the most crucial area of cybersecurity is network security; (iii) the most significant group of measures in this regard are technological measures.
Article
Full-text available
The Information and Communication Technology (ICT) becomes a critical area to business success; organizations need to adopt additional measures to ensure the availability of their services. However, such services are often not planned, analyzed and monitored, which impacts the assurance quality to customers. The backup is the service addressed in this study, with the object of study of the automated data backup systems in operation at the Federal University of Itajuba - Brazil. The main objective of this research was to present a logical sequence of steps to obtain short-term forecast models that estimate the point at which each recording media reaches its storage capacity limit. The input data was collected in the metadata generated by the backup system, with 2 years data window. For the implementation of the models, the simple univariate linear regression technique was employed in conjunction, in some cases, with the simple segmented linear regression. In order to discover the breakpoint, a targeted approach to residual analysis was applied. The results obtained by the iterative implementation of the proposed algorithm showed adherence to the characteristics of the analyzed series, with accuracy measures, regression significance, normality residual through control charts, model adjustment, among others. As a result, an algorithm was developed for integration into automated backup systems using the methodology described in this study.
Article
Full-text available
Cloud storage systems are currently very popular with many companies offering services, including worldwide providers such as Dropbox, Microsoft and Google. These companies as well as providers entering the market could greatly benefit from a deep understanding of typical workload patterns their services have to face in order to develop cost-effective solutions. Yet, despite recent studies of usage and performance of these systems, the underlying processes that generate workload to the system have not been deeply studied.
Article
Full-text available
When comparing Cloud and non-Cloud Storage it can be difficult to ensure that the comparison is fair. In this paper we examine the process of setting up such a comparison and the metric used. Performance comparisons on Cloud and nonCloud systems, deployed for biomedical scientists, have been conducted to identify improvements of efficiency and performance. Prior to the experiments, network latency, file size and job failures were identified as factors which degrade performance and experiments were conducted to understand their impacts. Organizational Sustainability Modeling (OSM) is used before, during and after the experiments to ensure fair comparisons are achieved. OSM defines the actual and expected execution time, risk control rates and is used to understand key outputs related to both Cloud and non-Cloud experiments. Forty experiments on both Cloud and nonCloud systems were undertaken with two case studies. The first case study was focused on transferring and backing up 10,000 files of 1 GB each and the second case study was focused on transferring and backing up 1000 files 10 GB each. Results showed that first, the actual and expected execution time on the Cloud was lower than on the non-Cloud system. Second, there was more than 99% consistency between the actual and expected execution time on the Cloud while no comparable consistency was found on the non-Cloud system. Third, the improvement in efficiency was higher on the Cloud than the non-Cloud. OSM is the metric used to analyze the collected data and provided synthesis and insights to the data analysis and visualization of the two case studies.
Article
Full-text available
Cloud storage system provides reliable service to users by widely deploying redundancy schemes in its system – which brings high reliability to the data storage, but inversely introduces significant overhead to the system, consisting of storage cost and energy consumption. The core behind this issue is how to leverage the relationship between data redundancy and data reliability. To optimize both concurrently is apparently difficult. As such, to fix one as a constraint and then to reach another one becomes the consensus. We aim in the paper to pursue a storage allocation scheme that minimizes the data redundancy while achieving a given (high) data reliability. For this purpose, we have provided a novel model based on generating function. With this model, we have proposed a practical and efficient storage allocation scheme, which is proved to be able to minimize the data redundancy. We analytically demonstrate that the suggested solution brings several advantages, in particular the reduction of the search space and the acceleration to the computation. We also assess the improvement on the savings of data redundancy experimentally by adopting availability traces collected from real world – which encouragingly shows that the reduction of data redundancy by our solution can reach up to more than 30% as compared to the heuristic method recently proposed in the research community.
Article
In recent years, the Internet of Things (IoT) has emerged as a new opportunity. Thus, all devices such as smartphones, transportation facilities, public services, and home appliances are used as data creator devices. All the electronic devices around us help our daily life. Devices such as wrist watches, emergency alarms, and garage doors and home appliances such as refrigerators, microwaves, air conditioning, and water heaters are connected to an IoT network and controlled remotely. Methods such as big data and data mining can be used to improve the efficiency of IoT and storage challenges of a large data volume and the transmission, analysis, and processing of the data volume on the IoT. The aim of this study is to investigate the research done on IoT using big data as well as data mining methods to identify subjects that must be emphasized more in current and future research paths. This article tries to achieve the goal by following the conference and journal articles published on IoT-big data and also IoT-data mining areas between 2010 and August 2017. In order to examine these articles, the combination of Systematic Mapping and literature review was used to create an intended review article. In this research, 44 articles were studied. These articles are divided into three categories: Architecture & Platform, framework, and application. In this research, a summary of the methods used in the area of IoT-big data and IoT-data mining is presented in three categories to provide a starting point for researchers in the future.
Article
ScanZoom, an application developed by Scanbuy, allows the use of mobile camera phones to launch services and applications simply by taking a photo of barcode. The large database requirement needed for the development of ScanZoom is provided by Amazon's freely available web service API with its vast amount of information. ScanZoom provides users instant access to prices that Amazon offers regarding products in net shopping, but also the price that retailers in the Amazon network provides. Specific product information is joined by the item's packaging information, such as DVD movie run time, CD release date, and tracking list.
Conference Paper
The explosive growth of data generation and increasing reliance of business analysis on massive data make data loss more damaging than ever before. Nowadays many organizations start relying on cloud services for keeping their valuable data. It is a critical issue for cloud service provider to protect the data for individual users securely and effectively. To protect the data in a system with multiple data sources, backup schedule plays an important role for achieving the desired level of data protection while minimizing the impact on system operation. In this paper we investigate the use of Markov Decision Process (MDP) to guide the scheduling of data backup operation and propose a framework to automatically generate an MDP instance from system specifications and data protection requirements. We then demonstrate the benefits of the MDP approach.
Article
Parallel Discrete Event Simulation (PDES) is based on the partitioning of the simulation model into distinct Logical Processes (LPs), each one modeling a portion of the entire system, which are allowed to execute simulation events concurrently. This ...
Article
Random quantities of interest in operations research models can often be determined conveniently in the form of transforms. Hence, numerical transform inversion can be an effective way to obtain desired numerical values of cumulative distribution functions, probability density functions and probability mass functions. However, numerical transform inversion has not been widely used. This lack of use seems to be due, at least in part, to good simple numerical inversion algorithms not being well known. To help remedy this situation, in this paper we present a version of the Fourier-series method for numerically inverting probability generating functions. We obtain a simple algorithm with a convenient error bound from the discrete Poision summation formula. The same general approach applies to other transforms.
Article
Traffic with self-similar and heavy-tailed characteristics has been widely reported in communication networks, yet, the state-of-the-art of analytically predicting the delay performance of such networks is lacking. This work addresses heavy-tailed traffic that has a finite first moment, but no second moment, and presents end-to-end delay bounds for such traffic. The derived performance bounds are non-asymptotic in that they do not assume a steady state, large buffer, or many sources regime. The analysis follows a network calculus approach where traffic is characterized by envelope functions and service is described by service curves. The system model is a multi-hop path of fixed-capacity links with heavy-tailed self-similar cross traffic at each node. A key contribution of the paper is a probabilistic sample-path bound for heavy-tailed arrival and service processes, which is based on a scale-free sampling method. The paper explores how delay bounds scale as a function of the length of the path, and compares them with lower bounds. A comparison with simulations illustrates pitfalls when simulating self-similar heavy-tailed traffic, providing further evidence for the need of analytical bounds.
How the cloud enables rapid growth in SMBs
  • Deloitte
Deloitte, Small business, big technology. How the cloud enables rapid growth in SMBs, 2014, ( https://www2.deloitte.com/content/ dam/Deloitte/global/Documents/Technology-Media-Telecommunications/ gx-tmt-small-business-big-technology.pdf ).
Data age 2025: The evolution of data to life-critical don't focus on big data
  • D Reinsel
  • J Gantz
  • J Rydning
D. Reinsel, J. Gantz, J. Rydning, Data age 2025: The evolution of data to life-critical don't focus on big data, 2017, ( https://www.seagate.com/our-story/ data-age-2025/ ).
It as a service: from build to consume, McKinsey & Company
  • I S Elumalai
  • S Tandon
I.S. Arul Elumalai, S. Tandon, It as a service: from build to consume, McKinsey & Company, September 2016, ( https://www.mckinsey.com/industries/ high-tech/our-insights/it-as-a-service-from-build-to-consume ).
  • P Flajolet
  • R Sedgewick
P. Flajolet, R. Sedgewick, Analytic Combinatorics, 1, Cambridge University Press, New York, NY, USA, 2009.