University of Sousse
Question
Asked 1 October 2016
Where can I find real workload traces of VMs of Cloud?
I am working on workload prediction of VMs of cloud. I need real workload traces of VMs.
Workload traces should include the demand/usage of CPU, Memory and network each VM.
- It would be great if some info about
1) Power/energy consumption per VM and Host
2) Total capacity of each resource type (CPU, Mem, ...) of the
servers that host these VMs and server specifications (No. of CPUs,
RAM installed, ...).
are also included in the traces.
Please, I am in Urgent to have such traces and I would appreciate anyone
can help or direct me to where I can find such data. I can't use the simulated data.
Most recent answer
Can you please guide me how to use Google traces 2019 on cloudsim?
Popular answers (1)
Mälardalen University and Dalarna University
Dear Maryam,
I suggest for you to try the real data provided by Google. The Google workload traces are collected from large cloud systems (over than 12,500 compute nodes) during 29 days. The traces consist of different types of over than 25,000,000 tasks belong to about 930 users. Real workload traces can provide a very high level of realism when used directly in performance evaluation experiments.
More details about this data, and how to download it, are available in the report presented by C. Reiss and J. Wilkes. The report title is: "Google Cluster-Usage Traces: Format And Schema, Google Inc., version 2, 2013" ..
5 Recommendations
All Answers (27)
Intel
Here's one from Delft that looks like it can meet your needs. They request crediting them when publishing results.
1 Recommendation
Mälardalen University and Dalarna University
Dear Maryam,
I suggest for you to try the real data provided by Google. The Google workload traces are collected from large cloud systems (over than 12,500 compute nodes) during 29 days. The traces consist of different types of over than 25,000,000 tasks belong to about 930 users. Real workload traces can provide a very high level of realism when used directly in performance evaluation experiments.
More details about this data, and how to download it, are available in the report presented by C. Reiss and J. Wilkes. The report title is: "Google Cluster-Usage Traces: Format And Schema, Google Inc., version 2, 2013" ..
5 Recommendations
Arak University
Dear Auday,
I need the workload of VMs. The Google workload traces have been collected from PMs. Is it correct?Indeed, I need the resources consumption of VMs. How do I compute resources consumption based on tasks?
Best
Mälardalen University and Dalarna University
Dear Maryam,
Up to the best of knowledge, the Google workload do not collected from PMs.
Each row in the workload represents one task, a set of parameters is associated with each task ( as I remember, 20 parameters are associated with each task, e.g. the required Cycle Per Instructions CPIs and so on). So, the parameters are related to tasks, not to the PMs.
I can send u a sample if you are interested in this workload.
Regards.
1 Recommendation
Arak University
Dear Auday,
Thank you a lot. I would be grateful if you could send me a sample. Could you explain me how I should consider tasks as VM and compute their resources consumption?
Mälardalen University and Dalarna University
Dear Maryam,
Please find attached is a sample of Google trace. As you will see it is an Excel file, each row represents a task, and each column represents a feature or parameter for the tasks. A detailed description for the tasks' parameters are available in the report I mention to you before.
Let me explain the relation among ( Task / VM / PM ):
To execute or serve a task in the cloud, the first process to be performed is The VM Allocation, which is the process of allocating or mapping a VM with specific configuration to the task. Many algorithms and policies were proposed to solve the VM allocation problem. The allocated VM must meet the task's QoS.
After finishing the process of VM Allocation, the next step is to perform the VM placement process, which is the process of placing or mapping the VM to its best fit PM. Also, Many algorithms and strategies were proposed to solve the VM placement problem and meet certain goals.
Please do not hesitate to ask if you have any further questions.
Auday
2 Recommendations
Arak University
Dear Auday,
Thank you very much for the attachment and your nice explanation. According to some assumptions in different papers, could we assume that each task is mapped to one VM? in other words, could we consider a task as a VM?
Mälardalen University and Dalarna University
Dear Maryam,
Your two assumptions are totally different,
For simplicity and without lose of generality, you can assume that one VM is allocated (or mapped) to one task. This is what is called VM Allocation.
But you can not consider the VM as a task. Tasks request VMs to be executed.
I think you may need to know the following related to VM allocation:
There are two main policies for allocating VMs to tasks in cloud computing environments:
- Space-shared policy: The result is a VM with one or more cores
- Time-shared policy: The result is a core that holds two or more VMs
Regards,
Auday
2 Recommendations
Arak University
Dear Auday,
Thank you for your good explanation. I have understood allocating VMs to tasks comprehensively. So, I can assume that one VM is allocated (or mapped) to one task without lose of generality.
I think the important difference between two policies of VM allocation is on resources consumption . In the most papers, for simplicity, Space-shared policy is usually considered. Could I assume that VMs are allocated based on Space-shared policy? Which policy is more common?
Best
Mälardalen University and Dalarna University
Yes, this assumption is valid now. You can assume that your work is based on Space share policy.
What do you mean by (Resource consumption)? Up to my knowledge, there is no such term in the field.
However, in the real virtualized cloud computing environment, both policies are used based on task's requirements. Both are applied at the same time, VM possibly switched from one policy to another during the execution.
Good luck in your study, and please feel free to ask.
Regards,
Auday
2 Recommendations
Arak University
Dear Auday,
The main focus of my research is on the prediction of resources usage of VMs. Based on predicted results, we should allocate the appropriate resources to VMs in a way that QoS is satisfied and SLA violation is avoided.
Best
1 Recommendation
Arak University
Dear Auday,
I have another question about IO of disks and network: If there are the throughput of disk write /read(or Network received/transmitted throughput )of VMs, how could I compute disk (network) utilization for each VM?
Thanks a lot
Best,
Mälardalen University and Dalarna University
Great,
I suggest to use another term instead of "Resource Consumption" . Usually, in our field, "consumption" is related to energy rather than resources.
Best of luck ..
2 Recommendations
Singapore Institute of Technology
You may have a look at the Google Cluster Dataset!
I hope the dataset includes some information you need.
2 Recommendations
University of Edinburgh
Hi, I have another related question to the topic in this thread, asking if there are any workload traces that include deadline information?
Any hint or help will be appreciated very much.
In trying to utilize the available scarce resource in cloud such as Virtual Machines (VMs), intending cloud resource users can be referred to as resource consumers. Therefore, the efficient mapping of cloudlets or tasks to VMs can be called resource consumption.
GOVERNMENT ENGINEERING COLLEGE MODASA MODASA GUJARAT INDIA
how to use google cluster trace in cloudsim simulator ?? step by step guidelines please.
Golestan University
Hi, I'm working on A Failure-aware Virtual Machine Scheduling Technique in Cloud Computing and I'm going to use Google's data set. In addition, I want to use machine learning techniques.
But after reading a lot, I have a lot of questions.
1) Why aren't the data columns named? I don't understand what each column is about.
2) Does this data have a label? Or do I need to tag failure data first?
3) Is it possible to implement this whole project in MATLAB program?
Or should I use a cloudsim and does the cloudsim have a predictive module?
if not, how do I transfer the prediction output MATLAB to CloudSim?
4) Don't you have a suggestion for a video tutorial on working with cloudsim?
Arak University
Hello Maryam,
I did not work on Google's data set. If you want to focus on VM scheduling, you can assume that results have been predicted precisely. It means that you have a black box predictor, which predicts as you want.
Golestan University
Hello Dear Amiri thanks,
These questions came to me during the study and unfortunately I could not answer them to start the implementation.
Golestan University
Hello Dear Amiri thanks,
These questions came to me during the study and unfortunately I could not answer them to start the implementation.
Golestan University
Hello
I need a virtual machine scheduling simulation in the CloudSim.
To base it on my work!!!
Can anyone help me?
Concordia University
Hi,
I am looking for a real-world dataset of a cloud/ edge environment.
This traces/dataset should include normal data and fault data.
Also, these traces should trace CPU usage, Memory usage, and packet loss.
I need these traces/dataset for my project.
I would appreciate anyone who can help or direct me to where I can find such data.
Thank you
SRBIAU
If you are able to use big-query, try Google ClusterData 2019, and if you need to work offline with the dataset, try the 2011 version.
Similar questions and discussions
Related Publications
Data center workload modeling has become a necessity in recent years due to the emergence of large-scale applications and cloud data-stores, whose implementation remains largely unknown. Detailed knowledge of target workloads is critical in order to correctly provision performance, power and cost-optimized systems. In this work we aggregate previou...