Figure 1 - uploaded by Georgios Andreadis
Content may be subject to copyright.
Capelin, a new, data-based capacity planning process for datacenters, compared against the current approach.

Capelin, a new, data-based capacity planning process for datacenters, compared against the current approach.

Source publication
Preprint
Full-text available
Cloud datacenters provide a backbone to our digital society. Inaccurate capacity procurement for cloud datacenters can lead to significant performance degradation, denser targets for failure, and unsustainable energy consumption. Although this activity is core to improving cloud infrastructure, relatively few comprehensive approaches and support to...

Contexts in source publication

Context 1
... minimize operational risks, many such industry approaches currently lead to significant overprovisioning [25], or miscalculate the balance between underprovisioning and overprovisioning [49]. In this work, as Figure 1 depicts, we approach the problem of capacity planning for mid-tier cloud datacenters with a semi-automated, specialized, data-driven tool for decision making. ...
Context 2
... linear programming [63], game theory [57], stochastic search [24], and other optimization techniques work well on simplistic capacity-planning problems, they do not address the multidisciplinary, multi-dimensional nature of the problem. As Figure 1 (left) depicts, without adequate capacity planning tools and techniques, practitioners need to rely on rulesof-thumb calibrated with casual visual interpretation of the complex data provided datacenter monitoring. This stateof-practice likely results in overprovisioning of cloud datacenters, to avoid operational risks [26]. ...
Context 3
... propose in this work Capelin, a data-driven, scenario-based alternative to current capacity planning approaches. Figure 1 visualizes our approach (right column of the figure) and compares it to current practice (left column). Both approaches start with inputs such as workloads, current topology, and large volumes of monitoring data (step 1 in the figure). ...
Context 4
... is a large factor, suggesting that vertically scaled topologies are more susceptible to overcommission, and thus lead to higher risk of performance degradation. The decrease in performance observed in this metric is mirrored by the granted CPU cycles metric in Figure 16b (Appendix D), which decreases for vertically scaled topologies. Among replaced topologies (all combinations including ), the horizontally scaled, homogeneous topology ( ) yields the best performance, and in particular the lowest median overcommitted CPU. ...
Context 5
... scaling is correlated not only with worse performance, but also with higher failure counts. We see that vertical scaling leads to a significant increase in the maximum number of deployed images per physical host (Figure 17d), which leads to larger failure domains and thus potentially higher failure counts. The effect is less pronounced when making heterogeneous compared to homogeneous procurement. ...
Context 6
... Other metrics show very similar distributions. Small differences may be accounted to the number of VMs being slightly smaller in the "replay experiments" due to missing placement data ( Figure 10). ...
Context 7
... we may at one point trust the simulator to produce correct outputs, the addition or modification of functionality in subsequent versions of the simulator may Figure 10. Validation with a replay policy, copying the exact cluster assignment of the original deployment. ...
Context 8
... differentiate between select overviews (depicting only the results of a subset of workloads) and summary overviews (aggregating over all workloads). Table 2. Continued in Figure 18. Table 2. Table 2. Continued in Figure 20. Figure 20. ...
Context 9
... aggregated across the full set of workloads, including workloads not displayed in the more detailed figure. For a legend of topologies, see Table 2. Continued in Figure 21. (f) Total number of time slices in which a VM is failed, aggregated across VMs Figure 31. ...
Context 10
... a legend of topologies, see Table 2. Continued in Figure 21. (f) Total number of time slices in which a VM is failed, aggregated across VMs Figure 31. Impact of operational phenomena and different allocation policies on the base topology. ...
Context 11
... with a replay policy, copying the exact cluster assignment of the original deployment. For a legend of topologies, see Table 2. Continued in Figure 41. . Validation with a replay policy, copying the exact cluster assignment of the original deployment. ...
Context 12
... minimize operational risks, many such industry approaches currently lead to significant overprovisioning [25], or miscalculate the balance between underprovisioning and overprovisioning [49]. In this work, as Figure 1 depicts, we approach the problem of capacity planning for mid-tier cloud datacenters with a semi-automated, specialized, data-driven tool for decision making. ...
Context 13
... linear programming [63], game theory [57], stochastic search [24], and other optimization techniques work well on simplistic capacity-planning problems, they do not address the multidisciplinary, multi-dimensional nature of the problem. As Figure 1 (left) depicts, without adequate capacity planning tools and techniques, practitioners need to rely on rulesof-thumb calibrated with casual visual interpretation of the complex data provided datacenter monitoring. This stateof-practice likely results in overprovisioning of cloud datacenters, to avoid operational risks [26]. ...
Context 14
... propose in this work Capelin, a data-driven, scenario-based alternative to current capacity planning approaches. Figure 1 visualizes our approach (right column of the figure) and compares it to current practice (left column). Both approaches start with inputs such as workloads, current topology, and large volumes of monitoring data (step 1 in the figure). ...
Context 15
... is a large factor, suggesting that vertically scaled topologies are more susceptible to overcommission, and thus lead to higher risk of performance degradation. The decrease in performance observed in this metric is mirrored by the granted CPU cycles metric in Figure 16b (Appendix D), which decreases for vertically scaled topologies. Among replaced topologies (all combinations including ), the horizontally scaled, homogeneous topology ( ) yields the best performance, and in particular the lowest median overcommitted CPU. ...
Context 16
... scaling is correlated not only with worse performance, but also with higher failure counts. We see that vertical scaling leads to a significant increase in the maximum number of deployed images per physical host (Figure 17d), which leads to larger failure domains and thus potentially higher failure counts. The effect is less pronounced when making heterogeneous compared to homogeneous procurement. ...
Context 17
... Other metrics show very similar distributions. Small differences may be accounted to the number of VMs being slightly smaller in the "replay experiments" due to missing placement data ( Figure 10). ...
Context 18
... we may at one point trust the simulator to produce correct outputs, the addition or modification of functionality in subsequent versions of the simulator may Figure 10. Validation with a replay policy, copying the exact cluster assignment of the original deployment. ...
Context 19
... differentiate between select overviews (depicting only the results of a subset of workloads) and summary overviews (aggregating over all workloads). Table 2. Continued in Figure 18. Table 2. Table 2. Continued in Figure 20. Figure 20. ...
Context 20
... aggregated across the full set of workloads, including workloads not displayed in the more detailed figure. For a legend of topologies, see Table 2. Continued in Figure 21. (f) Total number of time slices in which a VM is failed, aggregated across VMs Figure 31. ...
Context 21
... a legend of topologies, see Table 2. Continued in Figure 21. (f) Total number of time slices in which a VM is failed, aggregated across VMs Figure 31. Impact of operational phenomena and different allocation policies on the base topology. ...
Context 22
... with a replay policy, copying the exact cluster assignment of the original deployment. For a legend of topologies, see Table 2. Continued in Figure 41. . Validation with a replay policy, copying the exact cluster assignment of the original deployment. ...