Question
Asked 1st Oct, 2016

Where can I find real workload traces of VMs of Cloud?

I am working on workload prediction of VMs of cloud. I need real workload traces of VMs.
Workload traces should include the demand/usage of CPU, Memory and network each VM.
- It would be great if some info about
1) Power/energy consumption per VM and Host
2) Total capacity of each resource type (CPU, Mem, ...) of the
servers that host these VMs and server specifications (No. of CPUs,
RAM installed, ...).
are also included in the traces.
Please, I am in Urgent to have such traces and I would appreciate anyone
can help or direct me to where I can find such data. I can't use the simulated data.

Most recent answer

26th Mar, 2022
Mawa Thabet
University of Sousse
Can you please guide me how to use Google traces 2019 on cloudsim?

Popular Answers (1)

5th Oct, 2016
Auday Al-Dulaimy
Mälardalen University
Dear Maryam,
I suggest for you to try the real data provided by Google. The Google workload traces are collected from large cloud systems (over than 12,500 compute nodes) during 29 days. The traces consist of different types of over than 25,000,000 tasks belong to about 930 users. Real workload traces can provide a very high level of realism when used directly in performance evaluation experiments.
More details about this data, and how to download it, are available in the report presented by C. Reiss and J. Wilkes. The report title is: "Google Cluster-Usage Traces: Format And Schema, Google Inc., version 2, 2013" ..
5 Recommendations

All Answers (27)

1st Oct, 2016
Jeff Sedayao
Intel
Here's one from Delft that looks like it can meet your needs.  They request crediting them when publishing results.
1 Recommendation
5th Oct, 2016
Auday Al-Dulaimy
Mälardalen University
Dear Maryam,
I suggest for you to try the real data provided by Google. The Google workload traces are collected from large cloud systems (over than 12,500 compute nodes) during 29 days. The traces consist of different types of over than 25,000,000 tasks belong to about 930 users. Real workload traces can provide a very high level of realism when used directly in performance evaluation experiments.
More details about this data, and how to download it, are available in the report presented by C. Reiss and J. Wilkes. The report title is: "Google Cluster-Usage Traces: Format And Schema, Google Inc., version 2, 2013" ..
5 Recommendations
5th Oct, 2016
Maryam Amiri
Arak University
Dear Auday,
I need the workload of VMs. The Google workload traces have been collected from PMs. Is it correct?Indeed, I need the resources consumption of VMs. How do I compute resources consumption based on tasks?
Best
5th Oct, 2016
Auday Al-Dulaimy
Mälardalen University
Dear Maryam,
Up to the best of knowledge, the Google workload do not collected from PMs.
Each row in the workload represents one task, a set of parameters is associated with each task ( as I remember, 20 parameters are associated with each task, e.g. the required Cycle Per Instructions CPIs and so on). So, the parameters are related to tasks, not to the PMs.
I can send u a sample if you are interested in this workload.
Regards.
1 Recommendation
6th Oct, 2016
Maryam Amiri
Arak University
Dear Auday,
Thank you a lot. I would be grateful if you could send me a sample. Could you explain me how I should consider tasks as VM and compute their resources consumption?
7th Oct, 2016
Auday Al-Dulaimy
Mälardalen University
Dear Maryam,
Please find attached is a sample of Google trace. As you will see it is an Excel file, each row represents a task, and each column represents a feature or parameter for the tasks. A detailed description for the tasks' parameters are available in the report I mention to you before.
Let me explain the relation among ( Task / VM / PM ):
   To execute or serve a task in the cloud, the first process to be performed is The VM Allocation, which is the process of allocating or mapping a VM with specific configuration to the task. Many algorithms and policies were proposed to solve the VM allocation problem. The allocated VM must meet the task's QoS.
   After finishing the process of VM Allocation, the next step is to perform the VM placement process, which is the process of placing or mapping the VM to its best fit PM. Also, Many algorithms and strategies were proposed to solve the VM placement problem and meet certain goals.
Please do not hesitate to ask if you have any further questions.
Auday
2 Recommendations
7th Oct, 2016
Maryam Amiri
Arak University
Dear Auday,
Thank you very much for the attachment and your nice explanation.  According to some assumptions in different papers, could we assume that each task is mapped to one VM? in other words, could we consider a task as a VM?
7th Oct, 2016
Auday Al-Dulaimy
Mälardalen University
Dear Maryam,
Your two assumptions are totally different,
For simplicity and without lose of generality, you can assume that one VM is allocated (or mapped) to one task. This is what is called VM Allocation.
But you can not consider the VM as a task. Tasks request VMs to be executed.
I think you may need to know the following related to VM allocation:
There are two main policies for allocating VMs to tasks in cloud computing environments:
  1. Space-shared policy: The result is a VM with one or more cores
  2. Time-shared policy: The result is a core that holds two or more VMs
Regards,
Auday
2 Recommendations
7th Oct, 2016
Maryam Amiri
Arak University
Dear Auday,
Thank you for your good explanation. I have understood allocating VMs to tasks comprehensively. So, I  can assume that one VM is allocated (or mapped) to one task without lose of generality.
I think the important difference between two policies of VM allocation is on resources consumption . In the most papers, for  simplicity, Space-shared policy is usually considered. Could I assume that VMs are allocated based on Space-shared policy? Which policy is more common?
Best
7th Oct, 2016
Auday Al-Dulaimy
Mälardalen University
Yes, this assumption is valid now. You can assume that your work is based on Space share policy.
What do you mean by (Resource consumption)? Up to my knowledge, there is no such term in the field.
However, in the real virtualized cloud computing environment, both policies are used based on task's requirements. Both are applied at the same time, VM possibly switched from one policy to another during the execution.
Good luck in your study, and please feel free to ask.
Regards,
Auday
2 Recommendations
8th Oct, 2016
Maryam Amiri
Arak University
Dear Auday,
The main focus of my research is on the prediction of  resources usage of VMs. Based on predicted results, we should allocate the appropriate resources to VMs in a way that QoS is satisfied and SLA violation is avoided.   
Best
1 Recommendation
8th Oct, 2016
Maryam Amiri
Arak University
Dear Auday,
I have another question about IO of disks and network: If there are the throughput of disk write /read(or Network received/transmitted throughput )of VMs, how could I compute disk (network) utilization for each VM? 
Thanks a lot 
Best,
8th Oct, 2016
Auday Al-Dulaimy
Mälardalen University
Great,
I suggest to use another term instead of "Resource Consumption" . Usually, in our field,  "consumption" is related to energy rather than resources.
Best of luck ..
2 Recommendations
11th Oct, 2016
Tram Truong-Huu
Singapore Institute of Technology (SIT)
You may have a look at the Google Cluster Dataset!
I hope the dataset includes some information you need. 
2 Recommendations
13th Jun, 2018
Mateusz Ochal
The University of Edinburgh
Hi, I have another related question to the topic in this thread, asking if there are any workload traces that include deadline information?
Any hint or help will be appreciated very much.
24th Apr, 2019
Zubair A. A. Oziada
Universiti Teknologi Malaysia
In trying to utilize the available scarce resource in cloud such as Virtual Machines (VMs), intending cloud resource users can be referred to as resource consumers. Therefore, the efficient mapping of cloudlets or tasks to VMs can be called resource consumption.
26th Nov, 2019
Hirenkumar Ramanbhai Patel
GOVERNMENT ENGINEERING COLLEGE MODASA MODASA GUJARAT INDIA
how to use google cluster trace in cloudsim simulator ?? step by step guidelines please.
25th Apr, 2020
Maryam Heidary Pak
Golestan University
Hi, I'm working on A Failure-aware Virtual Machine Scheduling Technique in Cloud Computing and I'm going to use Google's data set. In addition, I want to use machine learning techniques.
But after reading a lot, I have a lot of questions.
1) Why aren't the data columns named? I don't understand what each column is about.
2) Does this data have a label? Or do I need to tag failure data first?
3) Is it possible to implement this whole project in MATLAB program?
Or should I use a cloudsim and does the cloudsim have a predictive module?
if not, how do I transfer the prediction output MATLAB to CloudSim?
4) Don't you have a suggestion for a video tutorial on working with cloudsim?
25th Apr, 2020
Maryam Amiri
Arak University
Hello Maryam,
I did not work on Google's data set. If you want to focus on VM scheduling, you can assume that results have been predicted precisely. It means that you have a black box predictor, which predicts as you want.
27th Apr, 2020
Maryam Heidary Pak
Golestan University
Hello Dear Amiri thanks,
These questions came to me during the study and unfortunately I could not answer them to start the implementation.
27th Apr, 2020
Maryam Heidary Pak
Golestan University
Hello Dear Amiri thanks,
These questions came to me during the study and unfortunately I could not answer them to start the implementation.
22nd Jun, 2020
Maryam Heidary Pak
Golestan University
Hello
I need a virtual machine scheduling simulation in the CloudSim.
To base it on my work!!!
Can anyone help me?
30th Jun, 2020
Raha Abbasi
Concordia University Montreal
Hi,
I am looking for a real-world dataset of a cloud/ edge environment.
This traces/dataset should include normal data and fault data.
Also, these traces should trace CPU usage, Memory usage, and packet loss.
I need these traces/dataset for my project.
I would appreciate anyone who can help or direct me to where I can find such data.
Thank you
3rd Sep, 2020
Ali Rezaee
SRBIAU
If you are able to use big-query, try Google ClusterData 2019, and if you need to work offline with the dataset, try the 2011 version.
23rd Aug, 2021
Neha Garg
Sant Longowal Institute of Engineering and Technology
Hi,
I am looking for comparison between PlanetLab dataset and Google cluster data.

Similar questions and discussions

Related Publications

Article
Full-text available
Virtual machine (VM) placement is a key technique for energy optimization in cloud data centers. Previous work generally focus on how to place the VMs efficiently in servers to optimize the physical resources used (e.g., memory, bandwidth, CPU, etc.), network resources used or cooling energy consumption. These work can optimize the energy consumpti...
Preprint
Full-text available
Cloud applications use more than just server resources, they also require networking resources. We propose a new technique to model network bandwidth demand of networked cloud applications. Our technique, Gridiron, augments VM workload traces from Azure cloud with network bandwidth requirements. The key to the Gridiron technique is to derive inter-...
Got a technical question?
Get high-quality answers from experts.