Fig 5 - uploaded by Kalyan Vaidyanathan
Content may be subject to copyright.
Traces of fan speed with the dynamic CPU load and noise (standard deviation is set to 0.04). TABLE III C OMPARISONS OF THE PERFORMANCE AND THE POWER CONSUMPTION 

Traces of fan speed with the dynamic CPU load and noise (standard deviation is set to 0.04). TABLE III C OMPARISONS OF THE PERFORMANCE AND THE POWER CONSUMPTION 

Source publication
Conference Paper
Full-text available
Time lag and quantization in temperature sensors in enterprise servers lead to stability concerns on existing variable fan speed control schemes. Stability challenges become further aggravated when multiple local controllers are running together with the fan control scheme. In this paper, we present a global control scheme which tackles the concern...

Context in source publication

Context 1
... further validate the stability of the proposed global coor- dination scheme, we performed a simulation while running the proposed fan speed control scheme along with the CPU load controller (in Section III under time-varying CPU utilization. Fig. 5 shows the varying CPU utilization (solid line and left Y-axis) and the fan speed (dotted line and right Y-axis). As shown in the figure, even with time-varying CPU utilization, the proposed control solution provides stable fan speed ...

Similar publications

Conference Paper
Full-text available
The shipping industry plays an important role in the development of nations; especially boundaries laying on the long sea coast like in Vietnam, Japan, South Korea, Philippines, etc. Due to, the main propulsion plant on ship need to maintain and keep at good condition then operating as well as lying on the shore. To keep the main engine rotational...

Citations

... This work, however, does not take into account other system settings, such as number of cores and core frequency. Finally, Kim et al. [25] propose a fan speed controller to guarantee server operation stability without directly considering optimization of fan power. ...
... Lee et al. [229] designed a fuzzy fan controller to maintain the component temperature while suppressing the fan speed fluctuation. Kim et al. [230] presented an adaptive Proportional-Integral-Derivative (PID) server fan controller which eliminated the fan speed oscillation induced by time lags and quantization in temperature sensors. ...
Article
Recently, the rapid growth in both data center power density and scale poses great challenges to the cooling system. On one hand, data center operators try to over provision cooling resources for fear of server failures induced by accumulated heat. On the other hand, they also want to reduce the energy cost as the cooling system takes up a significant portion of overall energy consumption. Among all available cooling solutions, air cooling dominates the data center industry due to its simpleness. However, its cooling efficiency has been questioned due to the low air density and specific heat. In this paper, we provide an overview for current endeavours to improve the air cooling efficiency. We group existing researches according to the locations where they can be applied from the perspective of air flow cycle. We also discuss the thermal measurement issues. We hope this paper can help researchers and engineers to design and control their data center air cooling systems.
... In this method, the effects of fan's vibrations should be considered as increasing the fan speed will increase the fan's vibrations. This vibration can drastically reduce the I/O throughput of hard disk drives while processing intensive workloads [34]. ...
Article
The growing global demand for services offered by data centers (DCs) has increased their total power consumption and carbon emissions. Recent figures revealed that DCs account for around 2% of total US electrical power consumption, approximately 40% of which is for powering their cooling systems. A high portion of energy spent on cooling is typically due to the inherent inefficiency of the heat removal process existing in multi-level from microchip to the cooling infrastructure level. Depending on the type of cooling system, air-cooled or liquid-cooled, this inefficiency can be significantly improved upon by utilizing various thermal management and efficiency enhancement techniques at each level. This paper reviews the state-of-the-art of multi-level thermal management techniques for both air- and liquid-cooled DCs. The main focus is on the sources of inefficiencies and the improvement methods with their configuration features and performances at each level. For the air-cooled DCs, various advanced methods for the chip, server, rack, plenum and room levels have been reviewed. Recent advances in thermal modelling of air-cooled DCs and their energy optimization methods have also been broadly reviewed. Furthermore, the performance of various methods such as pool boiling, jet impingement, and spray cooling for direct liquid-cooled DCs and single-phase, two-phase and heat pipe cooling for indirect liquid-cooled DCs have been compared. Finally, free cooling as an energy efficient method in reducing total power consumption of DCs’ cooling system has been reviewed in this paper. The advancements in two main types of free cooling methods, air-side and water-side economizers, are discussed and their performance characteristics are compared.
... e authors compare their results to the reactive fan controller, showing the reduction of fan energy consumption by up to 20%. Similarly, Kim et al [12] proposed the fan speed control scheme reducing the performance degradation and the power usage. ey present Proportional-Integral-Derivative controllers, which are immune to non-ideal temperature e ects. ...
Conference Paper
Thermal management and cooling are essential parts that have significant influence on the energy efficiency of data centers as cost of cooling can exceed 50% of the whole energy data center energy consumption. Optimized thermal management of data center also affects reliability and availability of a data center due to prevention of creation of the so called hot spots. In this paper, we present a model and optimisation method for thermal management a server platform, developed within M2DC project, equipped with a high number of heterogeneous hardware. We also show how the management of the individual servers and chassis influences efficiency of the whole data center. First, we present how this affects the commonly used PUE metric and how this approach can be misleading in evaluation of the data center effectiveness. Secondly, we show how intelligent fan management may influence energy used for cooling, change of IT systems energy consumption and the overall gain.
... The control/optimization policies in the MANGO resource management will be able to evaluate the QoS and non-functional requirements of the applications, in a hierarchical, system-wide multi-objective optimization going beyond electronics to mechanical aspects (e.g., liquid cooling pump control) [18]. The collected information from monitors enables the prediction of temperatures in the different parts of the servers and racks, which will be passed to the hierarchic runtime manager, structured in a hierarchical architecture exploiting both OS and hypervisor levels, which will be able to tune the system knobs (Pstates, fan control, tasks assignments, etc.), to mitigate performance variability [19]. Overall, we will target to exploit the run-time thermal-power predictions to mitigate performance variability due to thermal emergencies under highly dynamic workload variations [20], as it is one of the key challenges in the highly heterogeneous MANGO computing server architecture. ...
... The control/optimization policies in the MANGO resource management will be able to evaluate the QoS and non-functional requirements of the applications, in a hierarchical, system-wide multi-objective optimization going beyond electronics to mechanical aspects (e.g., liquid cooling pump control) [18]. The collected information from monitors enables the prediction of temperatures in the different parts of the servers and racks, which will be passed to the hierarchic runtime manager, structured in a hierarchical architecture exploiting both OS and hypervisor levels, which will be able to tune the system knobs (Pstates, fan control, tasks assignments, etc.), to mitigate performance variability [19]. Overall, we will target to exploit the run-time thermal-power predictions to mitigate performance variability due to thermal emergencies under highly dynamic workload variations [20], as it is one of the key challenges in the highly heterogeneous MANGO computing server architecture. ...
... Various techniques assume readily available temperature metrics from servers while exploring thermal management techniques. There are non-ideal temperature measurements in servers due to time lag in measurements and temperature signal quantization [5]. ADC (analog digital converters) has quantization errors because their limits are bounded by 8 bit data size. ...
... Multiple controllers acting independently could potentially cause instability in the system. Reference [5] proposed a global thermal management controller to coordinate the actions of multiple low level controllers. It would benefit to extend the solution of [1, 5] leakage power controller by considering the thermal profile of all the servers in the data center. ...
... Reference [5] proposed a global thermal management controller to coordinate the actions of multiple low level controllers. It would benefit to extend the solution of [1, 5] leakage power controller by considering the thermal profile of all the servers in the data center. Workload, ambient conditions and user priority (energy efficiency or high performance) would decide the CPU utilization and hence fan speed. ...
Technical Report
Full-text available
Data centers comprise of storage devices that have surpassed Moore’s law and computer devices that are close to the limit set by the laws of physics & thermal dynamics. Increased chip package densities and reduced feature size have resulted in significant leakage power that cannot be neglected. Typical PWM1-based fan controllers tend to adjust fan speeds to provide cooling according to workload and ambient conditions, but contribute to vibrations that degrade disk throughput. In this paper, we examine the relationship between these unobserved characteristics (leakage power and degraded disk throughput) and their effect on data center performance. We extend the problem by including the effects of ambient temperature on leakage power, cooling needs and efficiency metrics. We then discuss the challenges faced in leakage power aware thermal management techniques and propose recommendations.
... where T cpu ðtÞ is a temperature of the processor at given time t and T cpu ð0Þ is initial temperature of this processor. Equation in the given or similar form prevails in most of the encountered approaches [3][4][5]. ...
Article
In the recent years, we have faced the evolution of high-performance computing (HPC) systems towards higher scale, density and heterogeneity. In particular, hardware vendors along with software providers, HPC centers, and scientists are struggling with the exascale computing challenge. As the density of both computing power and heat is growing, proper energy and thermal management becomes crucial in terms of overall system efficiency. Moreover, an accurate and relatively fast method to evaluate such large scale computing systems is needed. In this paper we present a way to model energy and thermal behavior of computing system. The proposed model can be used to effectively estimate system performance, energy consumption, and energy-efficiency metrics. We evaluate their accuracy by comparing the values calculated based on these models against the measurements obtained on real hardware. Finally, we show how the proposed models can be applied to workload scheduling and resource management in large scale computing systems by integrating them in the DCworms simulation framework.
... For example, one of the studies made by Vasan et al. have shown that the fans consumed about 220.8 W (18%) while the CPU power consumption was measured at 380 W (32%) [269], while the total approximate system power consumption was measured at 1203 W. In another study conducted by Lefurgy et al. it was observed that fan power dominated the small configuration of IBM p670 server power envelope (51%) while in large configuration it represented a considerable portion (28%) of the server power envelope [270]. Furthermore, fan power is a cubic function of fan speed (P fan ∝ s 3 fan ) [16], [271]. Hence over-provisioning of cold air into the servers can easily lead to energy inefficiencies [272]. ...
Article
Full-text available
Data centers are critical, energy-hungry infrastructures that run large-scale Internet-based services. Energy consumption models are pivotal in designing and optimizing energy-efficient operations to curb excessive energy consumption in data centers. In this paper, we survey the state-of-theart techniques used for energy consumption modeling and prediction for data centers and their components. We conduct an in-depth study of the existing literature on data center power modeling, covering more than 200 models. We organize these models in a hierarchical structure with two main branches focusing on hardware-centric and software-centric power models. Under hardware-centric approaches we start from the digital circuit level and move on to describe higher-level energy consumption models at the hardware component level, server level, data center level, and finally systems of systems level. Under the software-centric approaches we investigate power models developed for operating systems, virtual machines and software applications. This systematic approach allows us to identify multiple issues prevalent in power modeling of different levels of data center systems, including i) few modeling efforts targeted at power consumption of the entire data center ii) many state-of-the-art power models are based on a few CPU or server metrics, and iii) the effectiveness and accuracy of these power models remain open questions. Based on these observations, we conclude the survey by describing key challenges for future research on constructing effective and accurate data center power models.
... The thermal resistance of the CPU case was assumed to be constant under steady-state conditions [38]. Also, the heat sink's thermal resistance according to its convective heat transfer is a function of the heat sink surface wind speed, which is determined by the cooling fan's revolution speed (i.e., revolutions per minute, or RPM) (Eq. ...
Article
This paper proposes a server model that can be applied to simulating the annual cooling energy consumption of data centers and an analysis of its impact. Recently, aisle containment architecture and the variable air volume system have been widely adopted in the data center industry because of their superior indoor thermal management and cooling energy savings. However, the current simulation method, which fixes the server heat generation and the supply and return air temperature difference of the computer room air handler (CRAH), does not correctly reflect current cooling systems. In this study, a server model was developed that accounts for server thermal characteristics - including server power, fan airflow, and exhaust temperature - according to the given CPU utilization and temperature. After the proposed server model was validated, the annual cooling energy consumption was simulated for modular data centers with an air-side economizer and high ambient temperature. Unlike the conventional method, the proposed model showed that the cooling energy consumption increased when the CRAH supply air temperature was higher than 19 C because of the increase in fan energy consumption.