I suggest for you to try the real data provided by Google. The Google workload traces are collected from large cloud systems (over than 12,500 compute nodes) during 29 days. The traces consist of different types of over than 25,000,000 tasks belong to about 930 users. Real workload traces can provide a very high level of realism when used directly in performance evaluation experiments.
More details about this data, and how to download it, are available in the report presented by C. Reiss and J. Wilkes. The report title is: "Google Cluster-Usage Traces: Format And Schema, Google Inc., version 2, 2013" ..
I suggest for you to try the real data provided by Google. The Google workload traces are collected from large cloud systems (over than 12,500 compute nodes) during 29 days. The traces consist of different types of over than 25,000,000 tasks belong to about 930 users. Real workload traces can provide a very high level of realism when used directly in performance evaluation experiments.
More details about this data, and how to download it, are available in the report presented by C. Reiss and J. Wilkes. The report title is: "Google Cluster-Usage Traces: Format And Schema, Google Inc., version 2, 2013" ..
I need the workload of VMs. The Google workload traces have been collected from PMs. Is it correct?Indeed, I need the resources consumption of VMs. How do I compute resources consumption based on tasks?
Up to the best of knowledge, the Google workload do not collected from PMs.
Each row in the workload represents one task, a set of parameters is associated with each task ( as I remember, 20 parameters are associated with each task, e.g. the required Cycle Per Instructions CPIs and so on). So, the parameters are related to tasks, not to the PMs.
I can send u a sample if you are interested in this workload.
Thank you a lot. I would be grateful if you could send me a sample. Could you explain me how I should consider tasks as VM and compute their resources consumption?
Please find attached is a sample of Google trace. As you will see it is an Excel file, each row represents a task, and each column represents a feature or parameter for the tasks. A detailed description for the tasks' parameters are available in the report I mention to you before.
Let me explain the relation among ( Task / VM / PM ):
To execute or serve a task in the cloud, the first process to be performed is The VM Allocation, which is the process of allocating or mapping a VM with specific configuration to the task. Many algorithms and policies were proposed to solve the VM allocation problem. The allocated VM must meet the task's QoS.
After finishing the process of VM Allocation, the next step is to perform the VM placement process, which is the process of placing or mapping the VM to its best fit PM. Also, Many algorithms and strategies were proposed to solve the VM placement problem and meet certain goals.
Please do not hesitate to ask if you have any further questions.
Thank you very much for the attachment and your nice explanation. According to some assumptions in different papers, could we assume that each task is mapped to one VM? in other words, could we consider a task as a VM?
Thank you for your good explanation. I have understood allocating VMs to tasks comprehensively. So, I can assume that one VM is allocated (or mapped) to one task without lose of generality.
I think the important difference between two policies of VM allocation is on resources consumption . In the most papers, for simplicity, Space-shared policy is usually considered. Could I assume that VMs are allocated based on Space-shared policy? Which policy is more common?
Yes, this assumption is valid now. You can assume that your work is based on Space share policy.
What do you mean by (Resource consumption)? Up to my knowledge, there is no such term in the field.
However, in the real virtualized cloud computing environment, both policies are used based on task's requirements. Both are applied at the same time, VM possibly switched from one policy to another during the execution.
Good luck in your study, and please feel free to ask.
The main focus of my research is on the prediction of resources usage of VMs. Based on predicted results, we should allocate the appropriate resources to VMs in a way that QoS is satisfied and SLA violation is avoided.
I have another question about IO of disks and network: If there are the throughput of disk write /read(or Network received/transmitted throughput )of VMs, how could I compute disk (network) utilization for each VM?
In trying to utilize the available scarce resource in cloud such as Virtual Machines (VMs), intending cloud resource users can be referred to as resource consumers. Therefore, the efficient mapping of cloudlets or tasks to VMs can be called resource consumption.
Hi, I'm working on A Failure-aware Virtual Machine Scheduling Technique in Cloud Computing and I'm going to use Google's data set. In addition, I want to use machine learning techniques.
But after reading a lot, I have a lot of questions.
1) Why aren't the data columns named? I don't understand what each column is about.
2) Does this data have a label? Or do I need to tag failure data first?
3) Is it possible to implement this whole project in MATLAB program?
Or should I use a cloudsim and does the cloudsim have a predictive module?
if not, how do I transfer the prediction output MATLAB to CloudSim?
4) Don't you have a suggestion for a video tutorial on working with cloudsim?
I did not work on Google's data set. If you want to focus on VM scheduling, you can assume that results have been predicted precisely. It means that you have a black box predictor, which predicts as you want.
Virtual machine (VM) placement is a key technique for energy optimization in cloud data centers. Previous work generally focus on how to place the VMs efficiently in servers to optimize the physical resources used (e.g., memory, bandwidth, CPU, etc.), network resources used or cooling energy consumption. These work can optimize the energy consumpti...
Cloud applications use more than just server resources, they also require networking resources. We propose a new technique to model network bandwidth demand of networked cloud applications. Our technique, Gridiron, augments VM workload traces from Azure cloud with network bandwidth requirements. The key to the Gridiron technique is to derive inter-...